Debates still rage...
almost to the point of dogmatism.
Many of us still take pride in being able to tune up a database solution, and aspire to join the 1TB club...
For those of us who were around in the industry during the mid to late 80s, it is interesting to think back to a time when vendors of relational database management systems (RDBMS) were struggling to be taken seriously. The line up then was DB2, Ingres, Informix, Oracle and Sybase, and back in those days RDBMS specialists …
The author seems to remember only Sybase.
But the real technology leader was Informix.
Starting with cooked files, SE or Standard Engine was the tank of all databases. It ran and ran well. Sure it had its limitations. But they were mainly OS limitations on how large a file could get.
Yet the databases morphed and exceeded those limitations. (Turbo yielded the use of raw devices) Online was more stable. The Informix rethought the engine in Online 7. (Which is still in use today, a testimony to its engineering.)
Of course when you think Informix, you have to think of Phil White and the "cooking of the books". (Blame that on the Illustra deal) But Phil was a visionary and saw that the future was OR. (Object Relational) or extensibility.
It took a while but even after the IBM acquisition, you now have IDS10 going on 11. New features and better performance.
The point is that if you follow the evolution of IDS and Informix, you will see that the database is not a commodity. How you access a database may make you think its a commodity, but then again, that would be saying that driving a car makes a car a commodity. (Lotus vs Yugo? or a Semi Truck/Lorry vs a VW Bug?)
The evolution exists because as hardware platforms evolve, so too will the requirements on the database.
If you want to compare Informix's evolution to Sybase you can look at how Sybase attempted to provide some of the same features in their adaptive server. Only they failed because if a poor excuse of a programmer wrote inefficient code, it would have a negative impact on the entire engine.
As to Oracle... some of their garbage works well enough to satisfy the marketing droids... ;-)
But back to the story. ALL of the major manufacturers evolved. That evolution either continues or the product becomes extinct. Anybody remember Progress? They still exist because they morphed.
To try and compare MySQL to these engines is like comparing a Ruger 10/22 (.22 cal rifle) to a Springfield Arms M1A1. (.308 cal ) Both do the same thing, (put a piece of "lead" down range. But the M1A1 will do it farther and more accurately with more down range energy.)
Bill Gates and Larry Ellison wanted the world to believe that the RDBMS was a commodity because it gave their products a chance to compete. And there is some truth to that. All of the tier one and tier two players can perform well enough to handle the bulk of the tasks asked by businesses. But they fall short of the leading edge requirements.
And kudos to RichardB's comment. Those of us who still support multiple databases will always have our favorite. ;-)
-G
A couple of random thoughts:
1) Databases aren't interchangeable, as many programmers tend to use evry last feature in the book - and the vendors want you to do that as it creates lock-in. So, moving from one database to another for many applications can be a painful experience and involves a lot of retesting.
2) Resilience and 24/7 operations just aren't there yet. We've seen the "unbreakable" just lock up for no reason, or corrupt its own data pages.
That said, for many users and many applications, databases are fast becoming a commodity. Download MySQL and you can be up and running pretty quickly.
Alan.
IMHO one of the biggest problems today for RDBMS's is the fact that many people still view it as a black box and believe that there is no need to understand the underlying engine - they do treat it as a commodity.
Usually this is with the misguided notion that they will be able to write once, deploy anywhere regardless of the underlying RDBMS. This is generally, however, a recipe for disaster. There is an interesting Blog post from Tom Kyte who is a prominent member of the Oracle community on this sort of subject here:
http://tkyte.blogspot.com/2007/02/how-to-scale.html
The other misnomer is actively avoiding or re-inventing features available in a given RDBMS - this is generally a worthless exercise as you will end up with a system that will not scale. I would argue that in general it doesn't so much matter which database you are using, but you must know how to get the best out of it. So whilst work with Oracle I don't pre-suppose that I know how SQL Server/DB2 does the same thing. That is not to say that they are inferior (they may or may not be - I don't know) just that each does it slightly differently and it is easier/better to work with a particular implementation rather than re-invent the wheel.
So is the database a commodity? Yes and no, as you say in your article there are some very capable "free" databases (not just MySQL, but also from Oracle with Oracle XE). In the case of XE this IS Oracle 10g, with some of the features pared down and some restrictions on sizing, for free. So you can become productive and achieve results in a commodity fashion.
However to get true scalability RDBMS's simply aren't a commodity, it takes skill and practice to get the best out of them.
Regards,
Gareth.
This is largely application dependent. The web age has ushered a whole new group of developers, users and applications on the back of the LAMP stack happy to come up with quick and dirty solutions and solve the attendant problems as they arise. Unfortunately the combination of PHP and MySQL (MyASM tables) leads to some very unintelligent programming: MySQL does not manage relations and does not enforce data integrity and PHP's database API is extremely DB specific with additional libraries required for DB abstraction. Passing statements and parameters separately from one another has only recently been discovered. Aside from those issues, many applications have benefitted directly from the incredible increases in performance that cheaper and faster memory has given us and have as such not had to address underyling architectural issues. This is by and large good as applications should be optimised for performance after the modelling has been done and a good developer should be able to decide when optimisations will do the trick or when new hardware is required. But could you please also mention PostgreSQL as the real heavyweight free RDBMS - the BSD licence makes it even more attractive for large scale developments but is often significantly faster than MySQL (MyASM) on queries involving several JOINs and table-locking can quickly enter into that. Fabian Pascal's website, http://www.dbdebunk.org, is always worth a look to see how far current developer perceptions differe from accepted good practice.
Aside from that one major issue preventing commoditisation is SQL itself as an intermediary between programming concepts (usually objects) and storage. Not only are there the problems of the various dialects to consider, but the very idea of generating SQL for fairly simple mathematical concepts as opposed to a standard RDBMS algebra encourages DB-specific customisations or optimisations: vendors encouraging the use of non-portable "stored procedures" for performance but also lock-in. Obviously with text-indexing and geospatial stuff one is outwith the commodity. But for a system that allows the transparent storage of objects accessible either directly as such or via SQL when mere mortals need to write their own queries, FirstSQL/J is worth mentionining.
The dramatic drop in price of server class hardware has forced Oracle, IBM and Microsoft to address the developer market differently and provide their castratred products for free in the hope of once scalability is required, developers will be willing and able to pay for it or in selling services and know-how for the high performance crowd. This is probably good for the market as a whole as it enables to make a more informed choice about development before they start work (rather than using the cheapest available and selling prototypes as finished products).
Unfortunately the industry's continuing need to differentiate by (useless) features is a more or less blatant attempt to hold off commodification. XML storage is probably the worst offender here: why does anybody wish to embed hierarchical data-structures written in an extremely sluggish data-exchange format on the wrong side of the data abstraction is simply beyond me. The real differentiation that they have discovered is the whole software stack and attendant infrastructure: OS, RDBMS, application server, clustering and storage solution. Nice to see Sun adopting PostgreSQL in there!
I would have to disagree to your first point.
Many larger ERP app producers refused to take advantage of advanced features that would only work on one RDBMS platform.
They believed that this would increase the cost of their support since they'd have to account for it and write special releases.
Your second point? There's a very, very large US retailer who runs IBM's IDS (informix) with only a handful of DBAs and support staff.
Also there's rumored to be a contest to see who's got the longest uptime of an IDS engine running.
So I would beg to differ as to the "lights out". issue.
And going back to Informix's Online 5 and Standard engine. There are a couple of shops still running without problems.
This is mostly due to the simplicity of the engine as well as the database needs of the application.
You can download MySQL, but its not really free. Just try putting it in to a production environment.
You want a free database, look at Cloudscape/Derby/JavaDB (all the same product just naming under (IBM/Apache/Sun).
Not to be an Informix bigot (I do support Derby, Sybase, Oracle too), Informix has a loyal following because of their incredible uptime and ease of support.
Yes, you are correct.
Application developers want to treat the RDBMS as a black box. Write the application and treat the database as a persistence object store. (Something we've heard many times before.) And yes, this attitude doesn't scale.
Application developers need to also become Logical DBAs to get the most out of the engine.
As to "free" Oracle, you get what you pay for. And like their Unbreakable ads, I'd caution anyone who migrates to Oracle because of the "free" clause, to read the fine print.
And just a note: IBM has a "free" version of DB2. But they don't have one for IDS. Now why is that? ;-)
(Methinks that true DBAs and shops that understand the true TCO, are willing to pay for a product that will make their life easier and reduce their actual TCO.)
But hey what do I know? I support them all cause the real money is making the application fly regardless of the underlying database. ;-)
-G
First question is "what is a commodity?". A price of zero for some versions is a good hint, but the other requirement is that alternatives can be substituted and provide the same functionality.
Basic SQL compatibility has come a long way, so with care and for simple applications yes, the core rdbms is now a commodity. For power users it's not the case though.
Ideally in tech development, todays value-add is tomorrows commodity. But vendors want to differentiate themselves in the marketplace and to lock lucrative customers into their platform, while power users have demands for performance and complexity, so we get feature sprawl.
Vendors are motivated to innovate, great, but also to differentiate contrary to commoditisation. The market may demand interoperability, but it'll be slow in coming, and with each industry standard adopted expect yet another set of vendor-specific features.
What gets my goat is where the attempt to add value encourages bad practice. Stored procedures with prefab execution plans reduce CO2 emissions, spiffing, let me feel cool bark pressing against my cheek (matron!), but don't overdo application code in the data storage layer.
And xml in clobs? Afraid I'm with dbdebunk on this, the proven relational paradigm should ideally be implemented in documents, or at least have proper translation at the app code layer for transmission and representation, but don't drag the rdbms back to hierarchical just because the W3C have a hierarchical document world view.
At least one good thing will come of the fiasco, xml document standards will eventually yield equivalent relational standards in the rdbms, then the value add will be relational documents avoiding the hierarchical overhead. Maybe after that and with standard SP languages we could expect full commoditisation with services being the value add. Tweak my cluster Larry-'san, here's a kimono for your trouble.
Databases have been a commodity for about 10 years now - anyone who wants one can pick one up for a song - or less.
That being said, there's still a lot of differentiation at the upper levels, and aside from a few basic standards - SQL, BerkeleyDB, LDAP - there is a surprising lack of consistency for an area otherwise as mature as this one.
Of course, not all databases are created equal - there are things that hierarchical databases can do that relational databases can't do, there are things that relational databases can do that hierarchical can't, there are things object oriented databases can do that neither hierarchical nor relational databases can do, and I'm sure there are things that hierarchical databases can do easily that object oriented databases either cannot do or cannot do easily or efficiently and things that relational databases can do easily that object oriented databases either cannot do or cannot do easily or efficiently.
Of course, this also depends on what one means by 'commodity'. To me, it means one can easily acquire the item fairly cheeply, because competition has driven most of the profit out of the business - at least, for the basic product. It does not mean that one can simply switch out a different product and get the same results - that can only be had by standarization, a completely separate maturity process. It also does not mean there are no premium versions available - machine screws are clearly a commodity, but titanium-plated machine screws were not, last I checked (of course, it's been a while since I've checked).
One of the best treatments on the subject.
http://www.houseabsolute.com/presentations/sql-is-not-relational/slide16.html
All the products listed in this discussion do not implement the relational model as laid out by Codd in his seminal paper "A Relational Model of Data for Large Shared Data Banks". If we are going to discuss 'relational databases', we'll have to stop talking about SQL databases.
The answer is - yes, and no.
The features and flaws of each database product are truly significant in my view. But the vendors have another problem.
The average developer knows very little about the features. It has got to a point where existing functions can be rebranded and re-released to the developer community - somewhat in the manner of old Disney films.
In the dark days of client/server, where the protocol between client and server was SQL, developers had to have at least a grounding in SQL and some idea of the consequences of their queries. I have noticed that many of todays developers have less than a basic knowledge of the database - more than once I've seen result sets pulled in to memory and then sorted, or pulled into memory so that the program can iterate through and discard records(objects) that are not wanted. So while today's SQL engines may be more powerful than ever, they are frequently being only used in the most naive way, and unfortunately the first places to look when an application has performance problems are the database and the queries. On the upside, the presence of JavaDB/Derby/Cloudscape in the JDK may lead to developers getting more awareness of the niceties of data access - or is that a triumph of hope over experience?
As far s I'm concerned, the DB layer has become a commodity. When I develop I use MySQL to store my objects and when I deploy to Production I might use Oracle or B2 or whatever is around and supported by the ORM tool I happen to use.
And don't tell me I have to know what lies underneath because it's like telling me I need to know Ethernet to build a network. It's not true. All I need is a compatible network adapter and hub.
It's the same with the DB. As far as I'm concerned, all I need is a DB that is supported. The ORM Tool will do the rest.
Now I agree that if you want the solution to scale you need specialized knowledge and performance tuning does not come free either. Ever tried to build a network for 1000 users? I did not and I would not try because I lack the knowledge to build the infrastructure. But as far as I know network connectivity has become a commodity.
To make the point one more time:
it's like pizza. Making pizza for the family is easy. Try making pizza for a 1000 people crowd...
Cheers, Micha
As Micha Roon says... "making pizza for the famiy is easy. Try making pizza for a 1000 people crowd ..."
You don't have to go that far.
I have a friend who is a chef and owns a restaurant. Cooking for family is easy. Try cooking in a restaurant even just for 16 people. (You're not cooking the same thing at the same time. )
And thats the thing. Anyone can write an application and tie it to the database.
It may run ok, and the database may be free for personal use.
Now ramp it up for production and then you'll have to consider licensing issues, how much horsepower it takes and how far you can grow before you run out of resources. Simple things like indexes that a DBA grok, programmers don't.
Sure you can have hibernate. But what happens when your main table contains more than 200,000 rows and you don't have an index? Or you have too many indexes and the query optimizer chooses the wrong one? (Provided that your database *has* query optimization.)
TANSTAAFL applies.
Databases are not a commodity.
Just ask a certain large retailer why they chose their platform....
I entirely agree with Micha.
But this all depends on the application being developed. I get the impression that nearly all of the DBAs discussing the merits of one DB or another are missing the point in terms of most database driven applications developed today. That is I would estimate that at least 90% of such applications will never have a need for any specialist feature of all the big DB manufacturers because they will never experience such huge demand. I'm talking about small agencies developing small, low volume, websites for many different clients. Such clients couldn't care less about the DB. For such developers systems such as Hibernate, Ruby on Rails or Subsonic are a godsend exactly because they cut out the entire DB layer. Sure Yahoo or Amazon will never be able to run their systems using off the shelf ORM tools, but then most applications are not Yahoo or Amazon.
Where I work I am the only developer which means I have to do everything from front end HTML and JavaScript to DB administration. I am not an expert in any of these but am sufficiently capable to work well in all of these areas. We use MySql and it is perfect for our needs because a) it is cheap and b) it is quick (or at least quick enough). We may soon be moving to MS Sql Server because this integrates better with .Net framework we use. 3 years ago we simply honestly didn't have the cash to spend 5k on a DB. Now we do. What I am trying to illustrate here is the needs and possibilities of the developer and his client are the biggest factor in making these decisions.
If you have the cash to employ a full time DBA then obviously you'd better make sure you are using the most appropriate DB. Until such a time the question is not relevant in my view. And if you run into DB optimisation issues before you can employ a full time DBA then there is something wrong with your business plan!
Robin
MIT professor Michael Stonebraker (who architected the INGRES relational DBMS and the object-relational DBMS POSTGRES) believes that DBMS users with modest requirements are likely well served by using one of the popular open source DBMSs. Hence, at the low end, he expects open-source DBMSs to take over.
But he notes that there are several markets where DBMS requirements are going up faster than hardware is getting cheap. Hence, the DBMS problem in these markets is getting harder, not easier, over time.
For more on this topic, see his paper “One Size Fits All: Part 2 – Benchmarking Results." You can find this paper here: http://www.cidrdb.org/cidr2007/papers/cidr07p20.pdf
One example discussed at length in the paper is data warehousing. He notes that business analysts have a seemingly inexhaustable thirst for making business decisions based on more and more data. So warehouse size is going up much faster than disks are getting cheaper.
Also, query complexity appears to go up as the square of warehouse size. Hence, warehouse administrators are in more pain over time, not less.
There are a variety of warehouse-specific code lines that offer a factor of 1-2 orders of magnitude better performance than the "one size fits all" systems from the open source vendors.
Other markets where a specialized architecture will yield comparable advantage include streaming data, scientific and intelligence applications, and text.
Professor Stonebraker will present a paper at the VLDB 2007 conference in September that will argue the same point for the OLTP market.
Hence, whenever a user cares about 1-2 orders of magnitude in performance, he will not be a candidate for an open source product.