Are you kidding me?
This is about 5 years behind the times, FusionIO, OCZ, Sandisk and others did this years ago; talk about playing catch up.
35 publicly visible posts • joined 5 Aug 2011
Multi-Version Concurrency Control (MVCC) is indeed implemented without any locking, they use the CAS operator and time based versioning - versions remain in memory so long as an existing transaction (in Snapshot isolation) requires it.
SQL Server's implementation is indeed different - research BwTree.
The thing you are missing is that SQL Server is moving to the cloud, the cloud AWS and Azure certainly is not built on SAN, it's built on commodity servers with commodity storage with software doing the data replication for fault tolerance and distribution.
In the cloud space buffer pool extensions and in-memory OLTP can be a real help in mitigating the latencies with commodity spindle storage.
The industry is moving away from SAN's, certainly in the SQL Server space, that move is only going to accelerate in the years to come; look at what Violin have done - the embedded version of SQL Server runs as an appliance with their flash solution.
Not surprised the "I love my product" trolls are coming out.
Anyway, some facts around this: SQL Server has and has always had a buffer pool, data is read off storage into the buffer pool (in RAM); buffer pool extension is simply a way of tiering data access performance within the confines of the server box i.e. you can add an SSD (yes, doesn't need to be a PCI flash card - I can demo it on my laptop), works in standard edition of SQL Server as well.
The "in-memory" comes in two flavours - OLAP (column store indexing) and OLTP which is the new stuff implemented with hash and range indexing. SQL Server is playing catch up, however, unlike other it's built into the one product, so I can in one product, in one SQL statement join a normal storage based table with an in-memory table (e.g select count(*) from mynormaltable a inner join myinmemtable b on b.key = a.key. Also, the in-memory is not a separate database, it sits within the normal databases we have today but as a separate file group.
We aren't limited to "integer" keys, you can have an index on varchar columns with the new in-memory stuff - the only restriction it needs to be a specific collation.
With SQL Server 2014, I can go round with my ACER V5 laptop (come and see me present in Bristol next Wed :)) that cost £500 and demo buffer pool extensions and the in-memory stuff; that differs entirely from other vendors where their in-memory databases are entirely separate products with some only working on specialist (expensive) hardware.
Probably worth updating the article to point to this link Chris: http://blogs.msdn.com/b/windows-embedded/archive/2014/03/18/microsoft-sql-server-2014-now-available-to-direct-oems.aspx
The SQL Server community in the UK is massive with SQL Relay, SQL Bits, SQL Saturday and dozens of regional user groups, the reason I mention, it's nothing new that Fusion-IO, Violin and other storage vendors have for the past couple of years been showing off these capabilities through talks, sponsorship etc..
The real story here is that SQL Server 2014 embedded is available for OEM, we've had SQL Server appliances for SQL Server 2012 for a while now but built around DASD made up by spindles; the flash move is something new. The Buffer Pool extensions to SQL Server have been introduced as a cost effective method of circumventing the significant cost of producing the IOps required from a SAN architecture.
Roll on the day hey - we are all on commodity storage in the cloud or embedded SQL Server in an appliance so we no longer have to move our data across 8 or 16Gbit connections from the SAN into the server (move the program to the data rather than the data to the program).
Age old comment - Business is NOT there for IT, it's the other way round - a lot of people in IT forget that.
Business cannot do without IT, but, it can do without internal IT if that internal IT wraps itself up in empire building, politics, procrastination because of self-preservation in specific technology areas.
Ever thought why one of the reasons to outsource IT isn't just money?
Lol - expecting a lot of flame for the above, but I'll not bite.
Yep. Consistency is indeed done via time which requires extreme accuracy. Atomic clocks are involved but it does depend on what you are doing.
The other thing is that it's semi-hierarchical rather than truly relational, that helps because they can group data thus helping synch issues on updates.
Absolutely spot on; if you look at what the main cloud providers - Microsoft and Amazon do in terms of their storage - that is exactly the approach they have taken.
Microsoft Azure is not SAN based, nor is AWS.
Hadoop is successful not just because it does unstructured data via MR but you can chop the data up, distribute it and have the processing work where the data is (locally) which is the exact opposite of the SAN approach where you move the data off the SAN to the server.
It's going to take another few years but the need SAN's met can be more easily and cost effectively met with software and commodity kit.
I guess once we are all on the cloud then the SAN V DAS argument will be mute anyway - because the major cloud vendors ain't SAN based! :)
T
Yes - SAN has had it's place and certainly within the database area has not always been successful because of the black box approach that frustrates the hell out of folk trying to manage performance of said database - the DBA's. You don't see a SAN in the cloud thankfully, just commodity kit and DAS - at last!
Software has evolved to form reliable and performance rich distributed data processing aka Hadoop, Cassandra, Volt-DB to mention just 3.
Hardware has evolved - it's great to see even PCI based SSD being threatened with the true end goal of persistent memory - we are now able to put TB's of SSD directly into memory sockets.
It's going to be an interesting few years to come; and thankfully and hopefully that will be without SAN's!
I know they've comodity SSD SATA 3 drives (just ordered one this morning); but we need them in the enterprise space to drive those ridiculously high SSD drive prices down. Hopefully they'll pick up where OCZ hasn't really succeeded - the PCIe flash enterprise space.
In the database space IOPs is key; to get IOPs I physically need many many more hard drives than I need SSD for decent semi-random IO at a decent latency (< 5ms).
So, for my 1TB database I only need to buy a 1TB PCIe card, to get the same level of IOPS from hard disks I'd need dozens.
So, comparing storage shipped between SSD and Hard drives is just wrong, it should be a mix.
T
Not sure you are agreeing with me, but my point is this - why go to the trouble of a SAN when you can easily and more cheaply achieve the goal with PCIe connected flash?
You don't need to worry about switches and multiple HBA, controllers to get your throughput up.
You don't need to worry about latency nor IOP's
Redundancy is easy.
Remember as I said, I write within the context of the Database space.We need raw throughput because in BI we may be processing over hundreds of GiB's of data in a single query - that data is spread across the storage, something where disk geometry starts to play an effect on latency per IO.
I don't believe the answer is SSD (SAS or FC connected) but its Flash based PCIe; however, I don't believe that will easily become a reality until commoditised distributed database platforms take hold - something that we are already seeing with HADOOP to help deal with "Big Data".
T
As a database professional I need to quantify two things - how much data I need to store and how many IOps the IO subsystem needs to be capable of at a realistic latency of say < 5ms per IO.
As a raw comparison, say I want a 900GIB database, I want 10K IOps < 5ms IO; I can get a single PCIe card to do that with flash (easily and for less than £5K), if I want redundancy I buy another (£10K in total).
What about hard drives - 2.5" 15K, the max is 300GiB per drive, realistically around 300IOps per drive - how many drives do I need to buy in order to get the IOps and redundancy I need? Significant, so many in fact that I'd need an external array for a start - more cost, more controllers etc.
In the real world this rubbish about SSD's are more expensive per GB is just wrong, we require IOps in the real world the two go together, remember I need 900GiB of storage space, but I'd probably need 30 drives for 10K IOps with RAID 1+0 at latency of < 5ms per IO - that is about £9K just for the disks themselves without the two additional storage arrays to hold them, the dual controllers required and then the ongoing power and cooling....
T
A lot of the time people just completely miss the point about PCIe connected flash - a single card can easily operate at 1.9GiBytes/second (ref: ocz revo3 maxiops that I've got in this machine I'm writing on), per channel SAS 600 can only cope with 500MiB/sec but the "interface" is dramatically less so if you want to achieve 1.9GiBytes per second from a SAN - how do you go about it? With complexity, cost and the hope that the latency will be realistic.
[written with the context of database storage]
Anyway I wonder if this WD guy was talking about growth in the past 6 months which I'd expect because their factories were flooded so it only stands to reason that now they are manufacturing again there is a short fall to fill :)
T
You are partly right; but in terms of infrastrucure the principle is that the servers in the cluster are fully redundant, so shared nothing, hence dasd. Connecting the machines in the cluster to a SAN which is basically what this article is saying goes completely against scale out.
Commodity storage in the cluster nodes can be achieved with commodity SSD, SATA or PCIe based to get the local IOps and still be cost effectively disposable.
There is definitely an upsell into this space but I'd expect that because lets face we have a lot of folk with vestied interests in keeping SAN technology.
15Krpm disks have been around for over a decade and we are still only on capacities of 300GB on a 2.5" drive and 900GB on a 3.5" drive; you need at least a dozen 15Krpm drives to compete on a 50/50 random read/write work load with a typical SSD; even then the disks just can't get the data off quick enough.
Phase Change Memory is my bet.
T
Certainly within the database space IOPS at low latency are more and more important, and for typical SAN based disk archiecture it costs more and more because of the inherently poor latency performance of a random workload on a 15Krpm drive.
Flash hooked into PCIe is the way to go, resilience through software block replication to remote servers.
Role on a nother 5 years!
T
It's great writing at those speeds and I see lots of applications for that, in my realm - transaction log of a database for instance; web logs etc..
However, what about a semi or full random workload, will there be latency like we have already on spinning media, has anybody read the paper yet http://www.nature.com/ncomms/journal/v3/n2/full/ncomms1666.html?
It will be great if they can make the technology so we don't have spinning platters and head movements, basically make like flash :)
T
Ok, you got me - 19 years of MS SQL Server experience and 13 MVP awards aside, I've tried googling the spec of the INTEL SSD you've specified and it doesn't appear to exist even on INTEL site itself.
A quick check on the P410i specs and you'll see that SAS connected drives operate up to 6Gbits/sec (per channel or drive in english), however, SATA connected drives, which is probably what you are testing with (please show me the spec otherwise) are connected at 3Gbit/sec which is significantly short (max 264MiB/sec) of what a standard SSD is capable of - the majority are now SATA 3 (6 Gbits/sec) hence the capable 530MiBytes/sec throughput (per drive).
I hope you enjoyed my paper!
Anyway, if you would like to show us all just how almighty and experienced you are then why not blog the perfmon results and a proper spec including the correct model numbers of the drives you are using!
Tony.
@Philip Lewis: I've done a ton of testing on the OCZ Agility and IBIS drives and your numbers just don't stack up. Some figures can be found on my paper at http://www.reportingbrick.com.
I'd be interesting in you posting the hardware specifications, how you are attaching the SAS disks and how the SSD - note, the SSD's will likely be SATA connected and on say a HP controller will be limited to 1.5Gbits/second becaus of the SATA revision they use so the overall transfer speed will be significantly less that the SSD can achieve.
Back to the article, those figures are way behind what OCZ have had out for a couple of years now.
Tony.
I did an interesting experiment on a DL360, the disk slots are SATA (regardless of protocol), anyway I've a 4 disk 10K 300GB each in RAID 0 of SAS; the spare slot I put an OCZ agility 3 60GB drive in a caddy and slotted it in the server - create it as a logical disk etc.. Ok, its only SATA 1 so 1.5Gbits/second but I still put the RAID array to shame on physical IOps.
What I'm saying is that we have the facility of using and getting the perf from SSD's now using existing protocols, random IOPS with a sub-millisecond latency is what I'm after as a database specialist.
Is it a conspiracy that the controller is limited to 1.5Gbits for SATA drives? If it was SATA 2 I'd have 300MBytes per second per drive with IOPS sub-millisecond, SATA 3 600MBytes per second.... all on one drive! Just negates the need for so many 15K disks so the vendors start losing money; ever wondered why the price of enterprise SSD drives to go in your kit are like 5x + the price of commodity ones that actually out perform!
T
Certainly within the database space (Microsoft SQL Server) I'm seeing more and more businesses using PCI based Nand cards like FusionIO and OCZ VeloDrive; cost per IO is substantially smaller than the SAN delivered equivalent (IO's per second at a latency realistic for database use i.e. <= 3ms per IO).
Sounds like the storage vendors are fighting back and simply turning their smart "storage" arrays into "a server with dasd" :)
T
Had a sudden drop from 1ms per io to 11ms per io (60mb/sec to 5.5mb/sec), another 24 hours later and its 15ms per io (4mb/sec).
Going to keep it running, it lasted longer than I thought, but it proves that in a database setting with lots of write activity then wear rate is an issue, anyway, I'll blog next week sometime once I've all the data collated.
Tony.
Using RAID 1 isn't going to double the life, RAID 1 is mirroring so both drives will wear (sorry ware :)) out equally, perhaps you were thinking RAID 0 but with that there is no redundancy.
I've a test running now I've stopped the drive timeouts I was getting, I'll leave it for a few weeks and see what the results are. Am capturing all the logical disk through perfmon so should see any degradation.
Also - it was two entirely different systems I was looking at, neither enterprise level writes which was my point!
I've kicked off a test using IOMeter for a 54GB file, 8KB 100% sequential write with 8 outstanding IO's on a OCZ Agility 3 60GB, its on Windows 2008 R2 and I'm logging each minute the logical disk so I can see any degradtation over time. I'll leave it running and come back next weekend and let you know the results.
Yer yer, my spelling is crap, but I more than make up technically for that in SQL Server ;)
Does anybody have any figures in this area, I'm actually doing a project on using commodity SSD's for my masters BI project.
T
It would be good if that is the case.
Just checked one of my client's servers, a call centre, just a medium sized business and one of there databases has done 4TB of writes since middle of April (5 months). Another 14.5TB's since middle of May (3 months), those aren't enterprise installations either. Another 17TB's since end of June....
SO, my point is that in a database environment ware is a consideration, the biggest worry I have is there is absolutely no tests publicised in this area to see how long the live.
T
whats wrong with the 4 or 8 way raid0? If you doing this properly you'd buy two cards and use software mirroring.
I've also done a lot of benchmarking on 2 x 240gb ibis cards in that configuration and they work brilliantly, on my simple database workload I can very easily overload the cores so they can't keep up (see my blog on it and the other tests I've done so far on them: http://tinyurl.com/3d8ygeu).
I think the majority of ibis problems come because of motherboard imcompatibility problems caused by the lack of bios memory available.
T