* Posts by PeterCorless

2 publicly visible posts • joined 23 Jul 2020

Database consolidation is a server gain. Storage vendors should butt out


Database consolidation? More like a Cambrian explosion.

I'll agree with Mike the FlyingRat that "I don't know if the question is being framed properly." That this whole question seemed fuzzy so that "it depends" is the only right answer.

For specialized processing of data you are seeing a practical Cambrian explosion of different databases. Sure, you have your standard RDBMS for backoffice and operations work (ERP). But aside from that, a number of different SQL and/or NoSQL systems are being spun out into a constellation of special-purpose transactions and analytics platforms.

Just an example:

1. An IIoT NoSQL time series database (OTLP) that tracks all of a company's products in the field. You have that just because the RDBMS cannot deal with the direct rate of ingestion.

2. An Apache Spark analytics cluster that draws quality assurance results (OLAP) from the above time series database.

Now, this second database *might* be consolidated with #1 above if you use something like ScyllaDB, which allows workload prioritization. It requires a slightly larger consolidated cluster. It also means that you have to use the right tools to be drawing information and inferences out of your NoSQL database. Many data scientists would still rather get the data from a Spark cluster than do CQL queries. It's just more natural for them.

Then again, rather than Spark, you might have an Elastic cluster to do free searches. What *sort* of queries are you trying to do? Set, repeatable batch analytics, streaming analytics, or free text ad hoc queries?

Lacking methods for consolidation, you need to fall back on lambda architectures where you have a speed layer for OLTP (#1 above) and a serving layer for OLAP.

And also, as a reminder, there's no way your standard ERP system is keeping up with the raw rate of ingestion and analytics of IIoT. And no way the CFO is going to let quarter close be impacted because someone's trying to run an ad hoc data query on the ERP system. It is going to be a second and maybe a third or even fourth system system, though information can flow between them all if and as needed.

There can and will be multiple other systems out there. Real-time adtech/martech systems. These might have their own RDBMS, separate from your central corporate ERP, with a subset of user data.

Another part might be a graph data model, totally orthogonal to your RDBMS, to track 360º total touch of users across multiple social networks with your brand. The size of that graph alone would make your typical ERP DBAs grimace if you suggested trying to store those nodes and edges in your ERP. ("How?!?" they'll ask.) Hint: SQL ≠ Gremlin/Tinkerpop.

Yet another system might be a shopping cart system for your website. Sure, it might tie in closely to your ERP system, but for the purposes of keeping timeouts and abandonment low, it's built on a separate system to remain closer to your website than your back office. It will, of course, need to be able to traverse the firewall to get those orders into your ERP system for fulfillment. It may be SQL or NoSQL, depending on your use case.

Another database, or even a cluster of databases, may be used specifically for your R&D processes that are *not* the same as your corporate ERP system. There is NO WAY engineering wants to have to rely on a "please-mother-may-I" game with IT. They're going to do their own. You cannot stop them. The rest of the corporation won't even know what the acronyms and code words for these servers stand for. ("Good," the engineers will say.)

Other database(s) will track your own internal infrastructure. Every server and network device on your net will consolidate its logs for your CIO's needs. (Think Kafka feeding a NoSQL key-value store for log consolidation. For real-time uptime, latencies and throughput, Prometheus or Datadog for metrics, etc.)

And so on.

All of this is just your *on-prem* infrastructure. But what percentage of your infrastructure is now on the cloud? How much do you really know about what the back-end you are running on when you are getting the equivalent of a monthly time-share in a virtual server, or you are running APIs that are completely serverless anyway?

Every time someone says that there will be "one database to rule them all," I keep thinking, "Okay Sauron. That's just not how corporations work. Not even in 2000. Certainly not 2020." If anything, there's a reason for the explosion of database types and classes, and they all run on their own hardware because trying to make them all "one thing" is utterly absurd. Each has its own design patterns for consistency vs. availability, data distribution vs. consolidation, and so on.

The good news is that each of the database systems your corporation does run on will get faster and, as hardware generations roll out, cheaper overall to run the same workload. But don't be fooled entirely. You may save more money if you are being licensed "per server," but frankly many licenses are calculated "per core," in which case your higher densities will actual cost *more.*

Though, of course, if your database software is open source, you will not have to care one whit about your database cost as you scale.

Lastly, churning your on-prem servers just to get to higher densities has a capital cost expenditure which is never free. You might need to wait until you've reached the depreciation schedule on your current servers before funding opens up for new on-prem hardware. And then your boss might ask, with a hairy eyeball and a big spreadsheet, "So how does this compare to a public cloud, in terms of monthly spend?"

Disclosure: I work for ScyllaDB (scylladb.com). We make a monstrously fast NoSQL database, but we have to play well with an infinite number of other great systems.

NoSQL Cassandra developer community sets sights on JDK 11 as sped-up 4.0 beta finally hits the streets


Might want to check into the Urban Dictionary definitions of "muppet."

"A person who is ignorant and generally has no idea about anything."

"An alternative term for idiot or moron. Usually used in the UK to describe someone who is incompetant or gormless. Taken from the name for the popular Jim Henson Puppets."

"a person who defies explanation with regard to common sense and logic, exhubing [sic] an air of confidence that is mutually exclusive to that of their accomplishments or ability"

And those were the sharable ones SFW.


Otherwise, congrats on getting 4.0 to beta.