Tinfoil hats on...
...said the drone operator :)
16 publicly visible posts • joined 21 Sep 2010
"Invista was out of the data path" - nope, not true. This product is long gone, so discussion is purely academic though.
http://en.wikipedia.org/wiki/EMC_Invista :
"Invista relies on Split-Path Architecture (SPA) to manage data flows. The intelligent switch redirects all control commands (e.g. SCSI Inquiry, Read Capacity, or Report LUNs) to a pair of redundant external controllers called Control Path Cluster (CPC), while the stream of I/O transactions flows directly from the host to the appropriate physical storage array via the intelligent switch hardware"
StorAge (also dead) was out of data path, as splitting of the IO was done in the host, via special driver
"Blade enclosures ran hot because they were the wrong shape, and the fact that by simply reorienting the parts you can get the machines to have the same computing capacity in the same form factor just goes to show you that the world still need engineers."
Seriously?? Where did you take that from? Huawei marketing materials? Every Intel chip draws X Watts & you can deliver n*X Watts of power / cooling to a cabinet. In all modern blade designs (including Cisco & HP) this is a limiting factor for cabinet density, not how cleverly you bend the metal parts ;)
Interesting stuff. I looked a while ago at Stratus vs. VMware stuff. The latter offer zero downtime via Fault Tolerance feature, which AFAIK is still limited to a single vCPU machines (any improvement in vSphere 5.1??). The former is expensive, but offers bullet-proof HA, indeed.
Hi Chris,
I will specifically pick on the idea of P4000, as I have bashed it in the past already - in a different context though, yet the points raised below are perfectly applicable here:
"The HP P4000 product range utilises industry standard servers, rather than specialised storage devices. This in turn means that each storage node has a number of single points of failure (SPOFs) in the form of e.g. server motherboard, CPU and/or memory subsystem. As each server has access to its internal (or directly attached) storage only, there is no other way to deliver local HA than to mirror the data across two (or more) local storage nodes. Because storage nodes are stretched across the network, maintaining a quorum is required to issue an automated node fail-over (to avoid so called split-brain scenario where two nodes becomes active due to network isolation). So just two storage nodes are not enough to accomplish this goal – a small virtual machine called Failover Manager (FOM) has to be up and running somewhere on the local subnet for automated fail-over purposes. This obviously increases the complexity of the solution which looks simple only in marketing materials."
Regards,
Radek
If this rumour proves to be true, then NetApp in my view will bag another 'psychological' win over EMC. Unified Storage message tempted EMC a while ago & now they are 'unified too' in a form of VNX. Now it turns out that a concept of a reference architecture, i.e. FlexPod, can be more tempting than a product, i.e. VBlock. That's just a speculation though - shortly we will see what EMC is really up to!
He is talking specifically about NetApp:
“This limits the number of VMs you can provision on 31xx and 60xx filers to around 300-500 before the CPU in the filers get really hot” – that’s factually incorrect: http://www.vmware.com/files/pdf/VMware-View-50kSeatDeployment-WP-EN.pdf (50,000 VMs on ten FAS3170 clusters => 50,000 /20 = 2,500 VMs per storage controller)
“and limits the performance of the VMs themselves due the 5-10ms latency of a typical storage request incurred by the network stack” – it varies depending on what networking gear is used & in most mass deployment scenarios (VDI for typical office worker) is irrelevant.
That being said:
- NetApp has their own server-side caching project: http://www.theregister.co.uk/2011/06/17/netapp_project_mercury/
- VMware View 4.5 (& XenDesktop 5 as well) can utilise ‘standard’ local SSD drive for caching purposes: http://www.vmware.com/go/statelessvirtualdesktopsRA; funnily enough, they have used NetApp FAS2050 for testing :)
"the key difference to focus on is whether or not you need the top performance and reliability of a SAN and are prepared to pay the premium. If not, you need a NAS"
Really??? Probably the author missed the fact that some NAS devices can be much beefier (& much more expensive) than a mediocre SAN. Few examples can be found here:
http://www.gartner.com/technology/media-products/newsletters/netapp/issue24/gartner4.html
Hmm, interestingly enough Oracle internally uses petabytes of NetApp storage - more accurately in their Austin DC, where they use NetApp "NAS boxes" to e.g. host Oracle on Demand.
Saying that NetApp is "just a NAS" is a plain lie (they can actually do NAS and SAN), but interestingly enough, Oracle uses NFS on NetApp as a storage protocol of choice, because it works *best* for them.
EMC may have a lot of fancy replication tools (with even fancier acronyms), yet the plain fact is - they can't match NetApp robust snapshotting technology, which addresses most of the day-to-day backup & recovery needs (including the example we are talking here about).
This is my take on this:
I am not going to blame Oracle for this misery - quite opposite actually.
The story unfolds:
"El Reg is told that Oracle support staff pointed the finger of blame at an EMC SAN controller but that was given the all-clear on Monday night."
However:
"Monash subsequently posted that the outage was caused by corruption in an Oracle database which stored user profiles. Four files in the database were awry and this corruption was *replicated* in the hot backup."
Sweet, isn’t it? And this is what differs a decent snapshot from replication – it will not inherit a corruption which just hit primary data, because it’s read only, full stop.
And the rest is obvious:
"Recovery was accomplished by restoring the database from a Saturday night backup, and then by reapplying *874,000* transactions during the Tuesday."
So my bottom line is:
Should they have properly implemented snapshot protection in place (e.g. on NetApp storage), the extend of this outage would be quite likely substantially smaller, as a recovery to, say, an hour old snapshot would mean replaying only a handful of logs comparing to 874,000 collected between Saturday backup and Monday crash...