Welcome to the new century, FB/MySQL!
Perhaps Facebook needs a real database to underpin the works. Index creation on a MySQL table requires special tools in order to avoid taking the table down? Can that really be? Dear me.
Facebook has open sourced a new MySQL utility that lets the social networking colossus update its database indexes and juice query times without staging the changes on test servers. With the tool – known as Online Schema Change for MySQL, or OSC – it can update indexes on live servers. In the past, according Facebook MySQL …
My experience with MySQL isn't that extensive but I you can add indices to it with taking a table down. MyASM tables do implement pretty global locks but InnoDB is more traditional in this respect. Adding indices should IMHO lock tables for writing while the index is being written. Of course, a procedure that first checks that the data will be indexable, changes the schema and then updates the database it the way to do it for minimum disruption.
I think the news probably have set the champagne corks popping at Oracle who are probably planning some nice offers for Facebook as they wind down support for non-commercial versions of the MySQL brood. And, to be honest, if running Oracle means they don't have to worry about this kind of thing then it would be the right thing to do™. As it stands, the news is another indication of Facebook's sticking plaster technology. Amazing that they manage to get as much out of their systems as they do.
If we're talking about inserting / modifying data then absolutely this won't be possible during an index build on all DBs since an index build will lock the table exclusively during the time it takes to create it (you don't want data changing while you're creating your index!).
If however MySQL is blocking selects on the table during index creation then that is indeed worrying as it shouldn't be necessary.
Think I might have to install MySQL and make some big tables to test this out.
It's not correct to say that all databases require full table locks to create indexes. PostgreSQL for example, has had concurrent index builds for years using CREATE INDEX CONCURRENTLY. Even the standard CREATE INDEX only locks writers, unlike some databases which apparently also lock out readers.
"maybe, just maybe they should be using Oracle or SQL Server?
But then, I don't believe the app is *that* critical..."
That's a very simplistic viewpoint. Using oracle isn't going to magically make these things better and even if it does it will certainly introduce other problems they do not have now with mysql, like the cash needed for licensing for the thousands of oracle servers they'd need.
On the other hand, they took an existing application and extended it to suit their needs. Can you do that with Oracle?
select count(*) from facebooks_servers
select count(*) from plentyofFish_servers
(maybe 3-5 for webserver and emails)
Maybe you can tell us which system is more expensive??. They must have an army running round in roller blades at facebook HQ. PlentyofFish of course doens't have more users or activity but it's the most populate dating website, they have a few millions. Having a couple of servers to do the job I think tells you few things.
While I'm continually fending off clients saying "shouldn't we open source this, it'll be free?", I'm quite disappointed at the hordes of people rubbishing MySQL and claiming that Facebook would be better off spending millions of dollars on a commercial database, whose ROI they may find difficult to measure.
The fact is, the FOSS movement should be heartwarmed by the success of Facebook. Yes, it's an exercise in datamining the detritus of consciousness, but it's running on an open-source platform. It's an example of how - if you apply the correct development resource - it's a massively scalable, enterprise-ready database solution. The fact that (one of) the world's biggest web site is running on a LAMP stack is great news for people who want to believe that FOSS can be a credible alternative to the Microsoft/Oracle hegemony.
I'm not sure I see what the big deal here is. With any sane setup with load balanced and replicated MySQL, it is pretty trivial to drop batches of machines out of a load balancing pool, and update them on a rolling basis. Replication then makes them catch up with the changes since the index update. All easily automatable.
""" With any sane setup with load balanced and replicated MySQL"""
I don't think it's possibly to have a MySQL setup that loadbalances writes to multiple masters and still remains 'sane.' It's possible to make it work, and make it work reliably, but not without a certain amount of mindbending hackery.
Or did you just need to update indexes on your slaves, never on your write master?
I'm not sure I've understood what you're saying? You're reply seems to disagree with me by saying exactly the same thing I did?
Insert / update / delete won't be possible in a DB due to locking during index creation but there is no reason to stop selects.
Or do you mean that there is some DB out there that allows DML other than a select during an index build? In which case, could you tell me which one it is and how it achieves it safely?
Chris, I tried in SQL Server. I created a new index on 2 million rows table, took 5secs, at the same time successfully run an update which finished before the index. I thought it shouldn't have a problem and it didn't. You get what's called row level locking, so the server decides whether to lock the whole table or lock just few rows. If my update was updating all 2 million rows then I would expect the whole table to lock. Under normal conditions maybe 1-2% of rows will be updated at any working system. Obviously these operations are better left for when the system is less busy. For facebook that's 24 hours a day so small problem, but for SQL Server is not a massive issue, albeit will slow down little bit. If your main tables are partitioned then happens in parallel. Sybase works the same way, not too much experience with Oracle but should work the same way.
PostgreSQL allows concurrent insert/update/delete activity on a table, when creating an index using CREATE INDEX CONCURRENTLY. It does this in a vaguely similar manner to the script the story is about, by keeping track of the tuples that have changed whilst the index is being built, and then updating it with those changes before making it available for use.
The big difference is, that the code to do this is built into the server at a low level, rather than what can only be described as a very scary trigger based hack implemented outside of the database. I suppose that doesn't matter for Facebook though - who cares if the odd status update gets lost?
Biting the hand that feeds IT © 1998–2021