Oracle... Memory? Servers?
Jack of all trades... master of none.
HP is using flash from Violin Memory to create Oracle Exadata-killing high-performance data base systems. This follows on from the fiasco of Oracle allegedly preventing HP from submitting a good TPC-C benchmark based on an HP DL980 server enhanced with Violin Memory's flash storage. As you can see in the 62-slide deck1 HP's …
What a piece of junk. Oracle admitted that they need scale up nodes in Exadata so they introduced an 8 socket box in addition to their 2 socket. This just shows how f'd up Oracle is when it comes to hardware. Nehalem EX was made for a 4 socket configuration. Each chip has 4 QPI links so it can use one to talk to I/O and the other 3 to talk to the other 3 chips in a four socket box.
Can you put a Nehalem EX in an 8 socket configuration? Yes but the performance is horrible. In order for chip 1 to talk to chip 8 it must use chip 3 as the communication path. Unfortunately, for Oracle Sun never invested in a "glue chip" to scale beyond the 4 socket Nehalem.
HP has "Cross-Network Connectors" XNC's to connect the chips in an 8 socket box and IBM has x5 chips.
I wonder if Larry has gotten past 100 Exadata's sold yet. He keeps talking about $2 BILLION of pipeline but when will an analyst get the ____ to get a solid ship number. Gartner? IDC? where are your facts.
I hear Sun SPARC is Dead on Arrival anyways.
Wow, envious ever? IDC has stated the numbers are actually better than what Oracle is saying and they expect over 4Billion by 2013 in Exadata alone.
When will Oracles "competition" stop FUDing Exadata and start actually making a competitive product? HP just adds a bunch of SSDs and IBM buys a low end solution that doesn't even use their "Flagship" DB2...
Show up or shut up Allison. Your FUD is very transparent and unflattering.
"SPARC is Dead on arrival" -- what does that even mean?
By my assessment the HPDBS (DL980 + Violin solution) is likely not positioned as an Exadata killer for bandwidth-sensitive DW/BI workloads. It simply doesn't have enough high-bandwidth storage plumbing. On the other hand, a single-rack Exadata only supports a scalable read:write ratio of 40:1 (their data sheet 1,000,000 RIOP : 50,000 WIOPS). Actually, that 50,000 WIOPS is a gross number accounting neither for redundant writes (ASM redundancy) nor the larger sequential writes that a transaction processing system also must concurrently sustain. In other words, mileage varies (downward trend).
This post has been deleted by its author
What exactly is exadata?
I get more and more confused each day. 8 Node Database Cluster, with Spinning Media, and Flash.
I am not discounting the Exadata Software, just looking at what the elements are.
You go to Oracle's website and you buy all of these items, you build this Hardware mirror to the DB Machine. Your clustering skills are bit rusty, so you hit the RAC SIG to get the latest tweaks.
Is this Exadata?
You say no, you have to have the Software, ok you get the software and install it.
What do you have?
As the owner of the report, the website, the application, the Business unit et. al. I just see a database, not the bandwidth, not the latency, or cluster cost or parallel execution, smart scan offloading, etc.
I just notice that what took an 1hour now takes less time, maybe 2x, 10x faster etc.
Do I really care what delivered it faster?
Take a look at the HP gear, it is not reinventing the storage tier, is is simply offering something that goes faster, that delivers that report, that web session, that response faster.
Hilarious! "The infamous write cliff?" What are they talking about - consumer SSD drives from three years ago? It's such an infamous thing that they just made it up?
Maybe other people have that problem but we certainly don't. Off the top of my head I can think of one my customers pushing 2TB/day through a 1.5TB appliance with no "write cliff".
I swear, the bigger they are, the DUMBER they are. How any of these multi-billion dollar IT firms have any credibility is beyond me...
This paper was presented in New Orleans at the Supercomputing show late last year. Results track with what my group knows to be true (and current)
So your "3 years ago" comments don't stand up, unless you think the DOE guys don't know what they are doing?
You can find their actual presentation at the SC 2010 website.
That is funny, you have 1.5TB Addressable plus and an additional 20% per card for Write Blocks, so 2TB/day is a system running at IDLE.
If you look at the other slide from NERSC, they are running a stress test, that fills up the card in 10minutes. At that point the "write cliff" is then hit. Write/Delete IOs start blocking other IO Operations.
I am not sure how you do not understand the fundamentals of NAND Flash and presume you have any credibility with which to comment on technology.
2TB/day, you should feel guilty that you over sold the client on gear that is not even being utilized.
These folks seem to know what they are doing and understand Flash technology. This was presented late fall 2010. They run supercomputer labs and don't seem to have any ax to grind - and they clearly found this issue common to all SSD technology.
Looks like almost all PCie and SSD technology suffer some type of degradation under load. "Write Cliff" is just what it looks like graphed.
So I'm curious about who the "DUMBER" folks are??? Data seems to be recent and fairly clear.
As for credibility..... Bring on the data.
Sent to me by TMS:-
There are a few things that are not presented clearly in the write performance comparison from the slide deck that you uncovered and I thought it might be helpful to shed some light on this. Flash chips need to have entire blocks (256 KB) written and erased at the same time. Since applications write in smaller chunks than this there is some over-provisioned capacity and between the host and the flash chips is something called a Flash Translation Layer. I’ll try to summarise the details of dealing with the writes here. For more information, SNIA’s Solid State Storage Initiative’s performance test specification goes into quite a bit of depth: http://www.snia.org/forums/sssi/knowledge/education/SSS_PTS_Whitepaper_Nov2010.pdf
The Flash Translation Layer keeps an index of where a block is physically written and the logical address that is presented to the host. After a continuous small block random write workload, eventually there are not any pre-erased blocks available to write to. The flash controller has to perform “moves” of data from a few blocks that have some valid data and some stale data. These moves are performed to get a few completely full blocks and a new empty block to write to. A move operation ties up the chip and is just a little less work than a write. The amount of over-provisioned capacity determines how many background moves you will have to perform for each new write that comes in under the worst possible scenario. On the RamSan-20 datasheet http://www.ramsan.com/files/download/589 we list this number for our random writes: 50,000 IOPS. Outside of this worse case, we can achieve 160,000 write IOPS.
In many ways the sustained number is not terribly important to real-world applications as anytime that you are not doing 100% random writes a good flash controller will perform background move operations to defragment the valid and stale data. To get to the worst case, you have to randomly write across all of the capacity and more without ever stopping to read. While some logging applications have this type of constant write workload they are never random.
So that brings us to the interesting question: can a flash controller be designed that doesn’t have this drop off in performance? Absolutely – you merely need to have more chips on the backend than the frontend can drive. That means that for most workloads you have idle chips. Notice that in the comparison from the presentation, a 10 TB flash chassis (8 TB after over provisioning) is being compared to the performance of single PCIe cards with 450 GBs or less. If you multiply the performance of any one of the cards presented below by ratio of their capacity to 8 TBs that they are being compared to, every single one beats the 1.4 GB/s that is provided by Violin.
Texas Memory Systems had to ensure we had a fast enough front-end on our high-capacity RamSan-630 system to drive all of the chips that are available in 10 TBs of flash (after over provisioning). That is why we support ten QDR InfiniBand (40 Gbps each) or ten 8 gbps FC ports and can supply over 10 GB/s of bandwidth from a single 3u system. Flash is still quite a bit more expensive than disks, so adding capacity without adding the front end performance doesn’t serve customers well.