Any technical person is interested in solving big challenges. How to scale well (and cheaply) is one of the really big ones; and something that no one really seems to get right. I've read some interesting posts discussing Twitter's problems lately (I'm a little slow to jump on this bandwagon, but like that's going to stop me). I haven't been a Twitter-er up until now, but it's hard to miss talk of them in the blogosphere.
While my first, naive, assumptions were that Twitter was a simple read/write system that could be knocked up in a week, numerous people have pointed out the many challenges that the website faces. TechCrunch has revealed that Twitter handles 3 million messages a day, from 200,000 active users. Many people have pointed out the fact that some Twitter users have upwards of 30,000 followers, and are themselves following that number of members themselves. This lead to the general consensus that Twitter is really just a giant messaging system, not just your basic CRUD setup.
I found this conclusion to be a lot more insightful than the the usual "Rails sucks" arguments that are floating around. The view was echoed by a Twitter obsessed friend the other day who asked my opinion on how a hypothetical Twitter clone could be built. He was also of the opinion that the difficult part would be creating a robust messaging system to distribute messages to storage shards and other distribution systems (SMS and any other "push" delivery mechanisms). Viewed from that perspective, it becomes an interesting engineering challenge. I'm sure that a stable system could be built without too many problems.
Based on this line of thought I was pretty surprised to read today that Twitter is running on a single master MySQL database with just two slaves serving out reads. Ouch. If they ran smoothly then there would be no reason to doubt that the company know what they are doing. However, this is obviously not the case. Even more confusing is an old blog post in which Twitter developers reveal that their biggest problems occur when users with many followers post. I can't really imagine what setup would account for that. It sounds like a denormalized database contained within a single physical server, which just sounds like lot of effort for little gain.
For a company that has just secured a second, $15 million, round of funding their performance is ridiculous. Out of a over a dozen staff they have only three or four tech guys. If I were investing I'd be asking what the hell is going on. I'm not trying to pretend that Twitter would be an easy fix, I don't envy the engineers. They have to maintain their current (failing) product while simultaneously pushing to build a completely different system behind the scenes. I've been there, and it's not fun. Apart from having to siphon off precious developer time to keep the old system patched up, they have to face the uphill task of rebuilding everything. Having a huge user-base makes this even more difficult, as they need to get this right first try. No one is going to be happy if an updated version falls over, drops features, or is any way inferior to what they have now. It wouldn't be surprising to see the new version of their back end systems continuously delayed far into the future.
It should be interesting to see how and if Twitter recovers from all of this. The user base seems to have been fairly forgiving up until now. Even I signed up knowing all the problems they have. Having many bloggers shouting their praise wont hurt them much. Even the many complaints are free press, really. I doubt an stable Twitter would have been written up on TechCrunch quite so much. They do face the threat of someone stealing their glory though. Each day they have outages is a gift to the dozens of Twitter clones that are undoubtedly out there. Just think Friendster; I'm sure they'll be stable any time now.