Switching Database Servers

Posted Sat May 19 @ 10:04:35 AM PDT 2012

I just finished switching over to a new database server, and it went pretty smoothly. Here's how I did it:

I have 3 servers, which I will call "old", "new" and "backup". Old is the current master database server, it is being replicated on "backup". I want to move the database onto new, and replicate it on backup.

First, take a SQL dump of the database on backup (i.e. stop slave, show slave status, mysqldump --all-databases > dump.sql). After the dump is taken, do not start the slave again. Copy the dump to new, and import it into new's mysql.

On new, configure my.cnf so it will work for replication (i.e. enable the binary log, set the serverid). And most importantly, enable log-slave-updates. Record new's log file and position with show master status. You can now start replicating old on new by change master to... using the log file and position you got from backup (when you took the SQL dump).

On backup, use change master to... to start replication of new (using the log file and position you got from show master status).

Now you have replication setup like this:

old -> new -> backup

Now you need to start routing your queries from old to new.

The way I did it was add an entry in my hosts file called "db" that pointed to the current database server. Then I used the hostname "db" whenever I was connecting to the database (instead of "localhost").

When I was ready to make the switch, I simply did this:

old$ service mysql stop old$ sed -i "s/olds_ip_addr db/news_ip_addr db/" /etc/hosts

This takes a second or two to execute and any reads or writes that happen in between will be lost. But you are guaranteed everything will be in sync when you're done.

XML

Posted Mon May 14 @ 08:33:05 PM PDT 2012

I can't stop laughing at this:

XML combines the efficiency of text files with the readability of binary files.

- Unknown

I ♥ JSON.

Fair Warning to Users of MySQL 5.5

Posted Sun May 13 @ 06:55:49 PM PDT 2012

I installed Ubuntu 12.04 LTS on my web/database server. If you apt-get install mysql-client mysql-server, Ubuntu will install version 5.5.22.

That's a pretty up-to-date version of MySQL, which is good. The only problem is that replication on 5.5.22 is completely broken! After spending a week trying to figure out why MySQL kept crashing with log-slave-updates turned on, I discovered this bug. Apparently, a lot of people are having trouble with replication too.

Proper Indices Really Important in InnoDB

Posted Sat Apr 28 @ 03:20:29 PM PDT 2012

I have a table with >485,000 rows in it. One of the columns is a candidate key (it's a URL slug). That slug column does not have an index on it even though I use it in WHERE clauses frequently (I didn't notice it lacked an index until today). The table is using MyISAM.

I switched the storage engine to InnoDB, and to my surprise, simple SELECT * FROM mytable WHERE slug = 'abc123' queries became really slow.

Here are some numbers:

485k rows; No index on slug column

MyISAM

SELECT * FROM mytable WHERE slug = 'abc123'

0.1906 seconds

InnoDB

SELECT * FROM mytable WHERE slug = 'abc123'

0.7147 seconds

Using InnoDB increased execution time by 275%. Ouch.

But if you add a B-tree index to the column (which I should have done in the first place), then the query execution time goes down to practically nothing (0.0003 seconds) for both storage engines.

PHP Strikes Again

Posted Wed Apr 25 @ 07:11:21 PM PDT 2012

I had a really bizarre problem. I spent a good hour trying to debug it, only to find out it is some quirky configuration setting from suhosin.

By default in PHP with suhosin, you can only POST 1000 variables to a page. So if you have a large form, with say, 101 items on it, and each item has 10 fields associated with it, PHP will silently truncate it down to 1000 variables. You'll spend an hour Googling stuff like "PHP not posting all variables", and not get any helpful answers.

Fortunately, it's easy to change. In suhosin.ini (in /etc/php5/apache2/conf.d/suhosin.ini on my system), change:

suhosin.request.max_vars = 1000 suhosin.post.max_vars = 1000

to something that works better for you.

Make sure to reload Apache when you're done (service apache2 reload).

Redirect www to non-www

Posted Sun Apr 22 @ 08:23:48 AM PDT 2012

In a previous post, I mentioned a side effect of visitors moving between the www, and non-www version of your site. Essentially, they can lose their login cookie. To fix it, you just need to tell your application to share the cookie with any subdomain.

But the better solution is to make a canonical version of your website (either www, or non-www), and redirect the user from the non-canonical version if necessary. This solves the cookie problem, and it also prevents Google or other search engines from indexing both versions of your website.

The way I handle the redirect is to put this in my virtual hosts file for the site:

Make sure you reload your apache service for the change to take effect.

Although this extra virtual host only looks like it redirects users from www.example.com/abc123 to example.com, it actually does what you want (redirects to example.com/abc123).

Finding the Connectedness of a 2D Array

Posted Mon Apr 09 @ 09:48:54 AM PDT 2012

For the game I am building for iOS, I needed to determine if a 2D array was completely connected, or had disjoint parts. As a simple example, consider this array:

# # ###### ### # # # # ############ # # # ##### # # #####

Note how all the #s are connected to each other (either on the top, bottom, left or right). I would consider that a connected array. Now to see what a disjoint array would look like:

# # # ###### ### # # # # # # # ### ############ # # # # # ##### # # # #####

See the two islands of #s on the bottom left, and the top right? That means this array has disjoint parts.

I came up with 3 algorithms, all which use O(n) space (where n is rows * cols):

  1. Recursive: Find the first non empty element in the array (i.e. the first #). Mark it as visited in your extra array. Traverse left, right, up and down, and mark the neighboring cell as visited, and from that cell, traverse left, right, up and down. The recursion bottoms out when you reach the edge of the array, or your hit a cell that has already been visited. When it's done, iterate through the array, and see if there are any non marked, non empty cells. If there is, then it is disjoint. The problem with this algorithm is that it could easily create a stack overflow on a small device like the iPad.
  2. Iterative: Find the first non empty element in the array. Mark it as visited. From there, iterate through the rest of the array, and if a cell has a neighbor (left, right, up or down) that is marked, then mark it. Repeat the previous step until no new marks are made (you will have to iterate through the array several times). When it's done, check for non marked, non empty cells. This algorithm runs in O(n*n) worst case time, but it won't blow up the stack.
  3. Graph traversal: Find the first non empty element in the array, mark it as visited, and add that point (row and column) to the "explore" queue. While the explore queue is not empty: pop a point out of the queue, and from that point, traverse the array as far to the left as possible (i.e. until you hit an empty cell, the edge of the array, or an already marked cell), and mark the cells as visited. If a cell is at an intersection (where there is a cell above or below that hasn't been marked [or to the left or right, if you are traversing up and down]), add it to the explore queue. Repeat that for the right, up, and down directions. When you're done, iterate through the array, and see if you can find a non marked, non empty cell. This algorithm runs in linear time, so I ended up using it.

Thinking Challenge

Posted Sat Mar 24 @ 09:33:08 PM PDT 2012

Here is a stupid little problem that took me a good half hour to solve. Given a set of n integers in the range [0, n), and two indices, i and j, laid out in a number line, determine which direction (left or right, with respect to i) will take you from i to j with the minimal distance, given that the number line wraps around.

For example:

i is at 2, and j is at 7. 0 1 2 3 4 5 6 7 8 9 10 11 === ^ ======> ^ <======== i ^ j ^ | | | | | | distance going left from i to j = 7 | | distance going right from i to j = 5 Therefore, go to the right.

And another example:

i is at 7, and j is at 2 0 1 2 3 4 5 6 7 8 9 10 11 ==> ^ <====== ^ ========= j ^ i ^ | | | | | | distance going right from i to j = 7 | | distance going left from i to j = 5. Therefore, go to the left.

Now, take a few minutes (or, if you're like me, half an hour), and try to solve that. In my defense, no one defined the problem like that, or gave me any examples. This type of problem came up in a piece of code I was writing.

The solution I came up with.

if i < j: distance_going_right = j - i distance_going_left = n - (j - i) else: distance_going_left = i - j; distance_going_right = n - (i - j) if distance_going_left < distance_going_right: return left else: return right

Finally Found a Use for the Else Statement on a Loop

Posted Thu Mar 08 @ 12:10:50 PM PDT 2012

Python has a bizarre construct that I find confusing (and up until now) completely useless. You can put an else statement after a loop. The else statement is only executed if the loop terminates normally. But if you break out of the loop, the else statement is not executed.

I was coding up the Miller-Rabin algorithm using the pseudocode in my security textbook. The flow of control was getting a little weird, so I was trying to figure how to elegantly handle it. Python's else statement to the rescue.

for i in range(trials): a = random.randint(2, n - 2) if pow(a, q, n) == 1: continue # still probably prime for j in range(0, k): if pow(a, 2**j * q, n) == n - 1: break # still probably prime else: # OMG!!! Else on a loop!!! return False # definitely not prime return True

Interestingly, the example of its use on the Python website also has to do with prime numbers.

To be Physical or Virtual...That is the Question

Posted Sat Feb 25 @ 07:20:11 PM PDT 2012

My set of websites is starting to overload my little Linode (with just 1 gig of RAM). I'm currently running with 29 megabytes of free memory, and 68 megabytes of used swap space.

Both top and iotop show MySQL doing a lot of heavy lifting and Apache isn't too far behind. mysqldumpslow is full of queries taking over 10 seconds to run. Yikes! What can I do?

Humorous Pictures

Upgrade to a bigger Linode

Pros

  • It's really simple to upgrade (i.e. Linode does it for me)
  • Completely managed hardware
  • Ability to clone the disk easily
  • Recovery shell (in case I lock myself out)
  • Free support (I never use it though)
  • Private networking (if I had any other Linodes in the same datacenter)
  • Sweet DNS
  • Lovely CPU, network and IO graphs
  • Simple backups

Cons

  • It's 2 gigabytes of RAM (my grandma's hearing aid has 2 gigs of RAM)

Put my own server in a datacenter

I can get a ridiculously overpowered, used server for a few hundred bucks. I was looking at this DL360 G5.

Pros

  • Tons of RAM (and I can put plenty more in it)
  • Dedicated CPU time (i.e. I don't have to share the CPU with anyone)
  • It's only $99 to rent space in a DC

Cons

  • I'm in charge of the hardware, which opens a can of worms:
    • When it fails, I have to fix it
    • Takes time to drive to the datacenter => longer downtime
    • If the DC fixes it, it will cost me a lot of money (like $100 an hour for labor alone)
    • It could be hard to hunt down compatible parts (in a timely manner)
  • No cloning of the hard disk (unless I do something like LVM)
  • No recovery shell
  • No simple backups (not a huge con, but Linode makes it too easy)

That's all I can think of now. I'm sure there are pros and cons I'm forgetting about.

I really hate making my users wait for my pages to load. No one has complained about it, but I still think it is unacceptable. I would love to stay with Linode, but there is a point where it stops being economical, and I'm far better off running my own hardware. I just don't know if I have crossed the line yet.

Older Posts