Non Acid?
I would NEVER touch a database that does not support ACID for anything important.. for the rest, we have mysql..
Database management systems (DBMS) are 20 years out of date and should be completely rewritten to reflect modern use of computers. That's according to a group of academics including DBMS pioneer Mike Stonebraker, Ingres founder and a Postgres architect taking his second controversial outing so far this year. Stonebraker upset …
T-SQL has been the bane of my life. After spending years perfecting skills in functional languages, learning database dos turned my hair grey,
You have one SQL Server. You need to talk to another SQL Server. Let's play a game of match up the 25,000 single quotes and consult the Oracle (no pun intended) whether the engine's going to try and pull back 8 million records before carrying out the where clause.
Surely the time has come when we can eliminate the need to communicate with database servers through single terminated lines of pseudo english.
what i hear is just another trendy Web 2.0 pronouncement, like "expertise is dead, long live the wisdom of crowds!" and "we don't need no steenkeeng personal computers, the cloud provides all that we require!" don't even get me started on Sadville.
good to see the old boys trying to keep up, but one of the major virtues of today's enterprise-grade DBMS is that it offers a good balance between data integrity, performance, redundancy, and availability. when H-whatever has the track record to show that it can do as well or better, the time will come to take them more seriously. at the moment, they sound like the semantic-web guys (is web-centric, SaaS vaporware more whispy and nonexistent than conventional vaporware? what is the sound of one hand not giving a sh*t?).
it's 2008, and Web 2.0 solutions are providing new and interesting hacking exploits every week. until the security model is more mature, i'll stick with a DB that exposes no API to anything but the hardened, SELinux-enforcing server in front of it; and since i'm extra-crusty, no PHP, either, you damn kids...now, get off my lawn, before i set the dogs on ya.
If transaction locking and multithreading were removed from databases then many web based applications would suffer for it. We're developing a service that must have transaction locking otherwise if multiple users try to edit the same record at once it would introduce a race condition and all edits except the last one will be overwritten. Sure, we could control this in the application but it's much more efficient (and cleaner) to do this in the db because transaction locking is an optimised and native feature of databases.
Perhaps this guy would consider removing support for nulls too? ;-)
<rant>
I do and I can tell you that one of the worst things I have to deal with is those who think that using a browser front-end is a good idea for a large, multi-session DBMS-based application. Instead of actually looking at whether it enhances the application, they look at the sexy web page and say "ooh yes please", without any regard for its efficiency.
Our once-simple front-end application software (which worked) now requires major rewrites every time the user wants an extra piece of data on screen. Software which was once very flexible has been replaced with 3rd party, Ajax-riddled, IE6-dependent garbage, because someone wanted to look a bit more "with it".
It ain't the database model which is broken. It's the attitude of managers who decide that because something is new, it must somehow be better.
One of our worst issues is performance, not because the database is slow, but because they won't fork out for the hardware required to run their expensive web-based bloatware.
If they think DBMS should be rewritten, they should get on with it, but most DBMS users are not going to fix something which ain't broke, so I hope it's just for fun.
</rant>
These comments are at tad tough. Stonebraker has written three of the popular DBMS engines, so he does have more clue than posters are crediting him with.
Stonebraker's point is that the current DBMSs are designed to have not-bad performance in all applications. If DBMSs are designed for particular types of application they can be up to 100x faster since they can make design decisions which fit the application better.
The paper also points out that DBMSs were designed in an era of differing resource constraints and designing for the current resource constraints can improve performance.
For example, ACID doesn't fit web-generated transactions well and optimising the special case of database consistency of web transactions allows the DB to be much faster in that application (if much slower in a banking application). Obviously, SQL's language sucks for web transactions (injection attacks, XSS).
The authors make the very good point that if SQL is so fantastic, then why do stored procedures exist? They view the heavy use of stored procedures in real DBMS applications a tool for developing a pseudo-application-specific language. They want to have an explicit tiny language, rather than reinvent the wheel each time.
Those claiming that current DBMSs are the one true way are too young to remember the huge performance boost in web searching that accompanied AltaVista's relaxation of just one aspect of the historical design of DBMSs (that RAM is small and expensive).
It's not as if all current DBMSs are created equal anyway. MySQL seems happy to relax some traditional requirements, and Sleepycat, c-tree and TinySQL all offer alternatives which suit some applications better.
The paper suggests that current DBMSs are too slow so to fix this problem chuck out all concurrency control, transactions, redo logs, undo logs, 2-phase commits etc etc ad nauseam.
And how do they propose to control the resulting chaos?
"Each class contains transactions with the same SQL statements and program logic, differing in the run-time constants used by individual transactions. Since there are assumed to be no ad-hoc transactions in an OLTP system, this does not appear to be an unreasonable requirement"
Ah yes, assume all requests to this H-S***** system are unique and never overlap!! Great.
Really folks, take a breath. Stonebreaker knows his stuff. He invented Ingress and did tons of research on RDBMS, and was the advisor for many, many PhDs who built Sybase, Oracle, etc.
He's on to something. Its not going to happen quickly, but RDBMS were designed in the mid-70s for slow machines.
Use technology.
Some of the comments here are pretty clueless.
I talked about this work with Mike for half and hour last week. I'm email with two of the other researchers. These are smart people.
I'll form my own opinion as to whether the assumptions they need actually match real-world applications all the way through, but the basic idea that transactions are SHORT is a sound one.
CAM
I agree with most of your comments but XSS (Cross site scripting) is hardly ever a problem that involves the database and not a problem of database design. It is a defect in poor coding allowing the server you are currently accessing to deliver remote content (or at least content from an unexpected source).
and its first working implementation was the object manager of smalltalk...
The idea is simple, just use data without transaction control and use automatic serialization only when two application logic threads happen to write to the same object at the same time. The result is a consistent in-ram object space that can be serialized down to disk when needed. This is the basic idea behind java and this is what drives some hard realtime databases used for a few larger mmo-s.
While getting rid of transaction locking and so on isn't immediately intuitive, I'm sure there could be some propeller-head way of working around potential issues there.
What I do object to is getting rid of nice simple SQL statements! I'm not a goddamned programmer (I'm not a DBA either), and I hardly see the need for having to learn a whole layer of obscure syntax in order to carry out database procedures. So are we now going to get programmers maintaining databases?
A dba (and some of us pathetic sysadmins) has portable skills - learning the ins and outs of a new database engine is reasonably straightforward (although there will always be Oracle and *sql gurus). I really don't like the idea of yer average programmer being in charge of DB *maintenance*, at least for anything that fits into something larger than a Sqlite or MSDE instance.
I've read enough of this claptrap now. When the man starts tearing down normalization and then goes on to justify this by saying that data warehouses consist of snowflake structures, then I have to hit the stop button and call BS. First of all the only data warehouse projects I've ever been part of that used a snowflake structure for their data warehousing were utter failures. Those that made use of normalization to create an underlying warehouse that then fed targeted data marts - which are implemented as stars typically - not only had terrific performance, but were well understood, documented and testable.
Tossing these concepts out the window and suggesting replacing them with entity relationships sounds like the usual academic abstraction crap dressed up for DBMs work. For a start entity relationships are part of the development and analysis that leads to the design of the normalized form and data marts. Secondly, the entity relationships do not completely describe the database.
I love this bit "The couplings between a programming language and a data sub-language that our community has designed are ugly beyond belief...". Care to justify such a sweeping statement much?
At the end of the day data manipulation has to be driven by commands. Whether they are embedded SQL or native language, the database doesn't care. Throwing stones at the entire concept of an RDBMS because your procedural language implementation stinks just doesn't add up.
No matter how you cut it, you will always have a strict subset of commands that are used for DML. Whether you choose to embed those commands in explicit calls or integrate the subset into your procedural language is a design choice that is entirely separate to the DBMS system itself. No matter how it's implemented linguistically databases will still have to perform queries process transactions and handle commits and roll backs. These are fundamental tasks whether your data is hierarchical or normalized. How your client does what it does matters not at all to the DBMS, all that matters is that it gets queries or transactions and is told when to commit a change and when to roll that change back.
Some of what is advocated here seems to be a common theme of current language design, pushing as much complexity as possible back into the runtime environment in order to make client programming much simpler. If all the complexity of the DBMS is hidden inside the runtime environment then the programmer doesn't have to deal with it. Great, except I have never seen such an environment where the increased complexity within the run time didn't result in a) more bugs in the runtime, and b) performance problems with a resource heavy runtime.
I dislike this approach because it breeds programmers and analysts who don't really understand the impact of what they design/implement.
Hang on. Whilst I'm a dyed-in-the-wool DBA even I accept that you need to tune and even reconfigure database servers for different jobs. Take Datawarehouses for example; bulk-loaded and therefore no need for full or even partial redo logs. Is what Stonebraker is talking about so different? Maybe the existing RDBMS systems need to be more configurable so that DBA's and developers have more options for their particular application? As for alternative query languages there's already MDX and soon LINQ.
Since (at least) the early 90s the research community has been working on looking at breaking out of the paradigm of the traditional DBMS. The way we look at data has changed. With the Internet, data is out there and the DBMS community is struggling to find out how to cope with it. So there are researchers out there trying new things out, such as data appliances. p2pdmbs, MapReduce etc.. Some of these things will live and some will die.
Unless you are at the bleeding edge I wouldn't worry about it until it comes out of a shrink-wrapped box.
Once again the "I've been using See-eek-well Suy-vur for over THREE YEAHS" weeners jump in with their willy-waving comments. Sure, when Berkley DB (from which Ingres, Postgres, SQL Server, Oracle, and almost all the otehrs draw a direct code-lineage) came along it was a breath of fresh air, but that air is now stale. The trouble is, SQL is such a hugely arcane and convoluted language, to learn, that many who SELECT to master the shouty-capitals tend to fetish over them. It was a good idea in the 1970s: so was draylon - get over it. Personally, I'm tired of shouting at some junky old database; if someone can come up with something genuinely more flexible then, I'm happy to see it.
Good article, as is anything that makes people think (and/or Rant).
@David, "Yuk!" - I agree, when and if someone comes up with something better. BTW don't diss my sofa ...
@Trix "(untitled)" - Problem : if it's simple enough for sysadmins to use then it's also simple enough for accountants and engineers to use (from Excel and Access). Therein lies another can of worms ...
@ AC "Maybe it is time ..." - DL/1 ? Pfahhh, surely you mean Pick or even Mapper ?
As usual, what this guy writes makes an awful lot of sense,
Nearly all of the features listed are needed by the industry, I'll take an example, data stored in a btree distributed across multiple nodes.
Currently all the applications (e.g. memcache) in this space use a DHT because the technology is well understood and simple.
One problem with DHT, you cannot do range queries. Whilst to some extent range based queries can be avoided this is not always the case.
This is where the future lies (and I'm not going to disagree with anyone who says the SQLXML crowd are a bunch of idiots, probably the same idiots who thought EJB was a good idea).
> How do you keep data safe when dealing with system ?
> reboots or server downtime?
> Nothing is ever up 100% of the time.
Easy. For system reboots, simply copy the databases to disk before rebooting and load them into ram again on startup(makes reboots a bit longer, but on a production server they shouldn't happen that often). Or even send them to another machines ram over the network.
For unplanned downtime/crashes etc., work out how much data you are willing to loose. Is the past hours worth ok to loose for an average crash once a year? If so, then back up the ram to disk each hour. This is a task that can run in the background while your box continues its job.
Most DB accesses are reads, not writes, so optimise for those. As for crashes losing the contents of RAM: You don't need to write the DB to disk, just the transaction log. Just replay the log to recover in the rare event of a crash.
Secondly, the RDBMS didn't take over the market because the relational model was superior to alternatives - just that SQL was vastly superior to the other options in the 80's and 90's as a means of getting value out of the data you had accumulated. The problem with the relational model is that it doesn't reflect a real world abstraction. Sure any problem space can be mapped to a relational solution, but often that mapping is convoluted (hence the object-relational problem that still hasn't been solved).
Network-hierarchical databases are pretty damn close to the ideal, as long as the tools to get at your data are good enough.
XML - sigh
I read Oracle's early papers on XML DB - "we store the schema and the data separately, with pointers into the data"
Codasyl in disguise. The same with the o-o databases, hierarchical databases - none of this is new at all.
Personally I can write SQL, I understand the syntax and have been using it for over 20 years. Once you realise you're really working with sets it's pretty easy to understand.
RDBMS is now just another commodity. I'm happy to use whatever comes next, I currently develop in Ruby on Rails / MySQL and hardly ever have to see the database directly. Interestingly, the people who designed Active Record made 90% stuff really easy and then allow you to drop into raw SQL for the complex, instead of retrofitting the kitchen sink.
I'll use whatever comes down the pipe, as long as it meets my needs - that's it.
...and how it invalidates the assumptions underlying earlier architectures.
Stonebraker is pitching the idea of "one thread, one core" and seeing how far he can go with it. (We all know how far the "one man, one computer" idea went...) If it takes the market a few years to provide enough cores per server to make it really take off, what of it?
It's practically a foregone conclusion that each core will have its own large, dedicated RAM to sidestep the symmetric multiprocessing shared-RAM access bottleneck.
Nothing new under the sun, at least nothing that would surprise a Tandem programmer. The Tandem designers figured out back in the seventies that loosely-coupled, independent processors were the best way to meet their self-imposed fault tolerance and linear scalability requirements. Although multicores on a single die won't be quite as fault tolerant due to physical and electronic proximity, they open up vistas of parallelism and linear scalability unachievable with SMP. The cores will probably use a single-threaded, message-passing microkernel instead of a more general-purpose, multi-tasking message-passing microkernel like Tandem did, and I think Stonebraker has assumed as much.
With continuing obscene increases in RAM, a single core could be tasked with handling standalone, in-memory manipulations of a subset of your tables, the centerpiece of Stonebraker's argument and H-Store design.
The late Jim Gray gets a lot of credit for bringing the sacred fire of true database religion from UC Berkeley to Tandem, where Stonebraker's H-Store has already been prefigured in the Tandem Disk Process. The Disk Process handles SQL queries meeting certain simplicity criteria for tables on the disk that it manages. The substitution of RAM for disk is already a commonplace, so where the H-Store breaks new ground is in giving each H-Store it's own processor. The infrastructure will need just enough kernel to manage memory and message passing (the H-Store application being single threaded) with the result that it will fit into a multicore core buffed out with enough RAM.
This is why most of the paper seems to me like a respec'ing of the Tandem Disk Process for multicores, with a rethinking and reallocation of responsibility for fault tolerance, not an abandonment of the same. Recovery becomes more coarse-grained simply because the computing elements are so much more powerful than they used to be.