Database Normalization and Denormalization
Posted by Matt Johnson | 2007-12-31 18:08:58
This summer I was working on a large project for a client and it called for a redesign of their database. I was always told to normalize my databases for several reasons:
- No (or low) data redundancy
- Data consistency throughout the entire database
- More flexible if changes need to be made
- I'm sure they're lots more that I can't think off right now
I finished designing the highly normalized database and began programming (I was building some online courseware). About a month later I launched the courseware and everything was working fine. But after a few months, the courseware began to slow down.
Several thousand people used the courseware since I launched it and tables began filling up quickly. A few tables had hundreds of thousands of rows (which I expected). However, queries that ran on some pages that joined with those huge tables were taking longer and longer to execute. The first month most pages loaded in 1/4th of a second. About 4 months in, some pages took over a second to load. It doesn't sound like much, but I knew the load times would keep going up in a fairly steep linear fashion.
What I ended up doing was created another table that acted like a cache. It held statistics for each user, and instead of querying a table with 500,000 rows in it, I could query a table with a few thousand rows (one for each user). I had to rewrite some code to update the big table and the cache table whenever a change was made. But it ended up cutting down quite a bit on the load time. The data is no longer totally normalized, but denormalizing it really sped up the site. Space if cheap, time is not. I'd rather have a database taking up more disk space, than a slow load time for my users.
