Migrating data from storage arrays
Do you agree with Chris? Are there any other challenges or implications from shifting data off a live array?
Lots of interestingly-speculative calculations about how long it would take to move oodles of data across several interfaces locally.
But no calculation of how long it might take to move, say, 15PB of data into the cloud across the average ADSL2+ (or even fibre) link? Is it in the order of years? Lifetimes? Millennia?
Possibly the author got bored with the subject in the last few paragraphs, and thought he'd try to sneak in a bit of nonsense?
With you up to here:
"Perhaps the long-term answer is to move the data into the cloud,"
Are you not then tied to your cloud provider instead? Even tighter, in fact - moving data between big arrays in your own data centre is at least possible to do yourself, even if it takes a couple of weeks.
Depends on what the data is, but the usual approach of an initial bulk copy, following by a final sync of the differences, works quite well in my experience. The bulk copy can take as long as it likes as it isn't final (and you can throttle it if needed to avoid affecting live services), and the diff is very fast as has relatively little to do.
This has always been the most painful thing that any company puts itself through. Every 3-5 years an epic migration must take place involving all teams. It’s not the vendors fault….it’s just the amount of data being stored today means that any migration is going to take a long time. You just have to accept:
1.) Migration is painful and it’s going to take a lot of time.
2.) Point 1 is going to be a lot worse if you are changing vendors as well as migrating.
Sure, storage virtualization looks great on paper, but there’s a strong degree of vendor lock in and I don’t trust them. I’ve worked at a company that had a pricing agreement with a company providing such virtualization – all good right? Nope. “oh we’ve changed the product, it’s had a few different things added to it and a name change so it’s not covered under that pricing agreement. Please pay us $900k more than you were expecting for exactly the same licenses.”
A lot of vendors have started to give away their migration tools free with the array. This is a good move in the right direction.
Isilon is easy to migrate. Just add the new nodes to the cluster and flush data from the old to the new nodes via the backend. Literally a 5 minute job to kick off….however see point 2, it’s only easy if you’re staying on Isilon.
Tier 1 needs to follow this route. Add an engine and new disk to a VMAX, flush the old disk and data via the backend, remove the old disk and engine. No migration teams, no data going out of the array. Why isn’t this approach more prevalent?!? The chances of you staying with a particular vendor are heavily increased the more risk and effort there is to move away.
1) Bandwidth is usually not the limiting factor, you didn't even mention IOPS!
2) Just because you don't understand networking, doesn't make it OK to make wild assumptions.
3) Who copies the ENTIRE SAN to the new storage in one go?! very few people need to migrate a petabyte of data in one go. Generally they will migrate 1TB blocks (VMware LUNS for instance).
4) With a virtualised infrastructure, storage vMotion can be used to control migrations.
I agree with Lusty, typically you don't migrate an entire large array's worth of data all at once, unless it is all related to a single application (i.e., SAP) The length of time is exponentially proportional to the number of servers/applications attached. Usually it takes some months, with the first step involving triage to prioritize servers based on:
- how easy it is do
- how easy it is to get the approvals from the business owners, DBAs and OS admins
- how many roadblocks you get based on "these servers should be excluded because they will replaced soon"
After that you have a schedule that already runs longer than you would like, and you see further delays based on how many planned migrations get rescheduled one or more times due to something unrelated (there are performance issues with this application right now and we don't want to add more variables while we're trying to track it down) to completely unrelated (its end of the quarter so we're instituting a company wide change freeze for two weeks because some bozo screwed up a routine VPN change last quarter which caused the CFO's email to go down the night before earnings annoucement)
Eventually you may have that giant storage system running with just a couple of attached servers but business owners who are politically powerful are able to keep roadblocking storage migration because they don't see it benefitting them personally and thus don't want to take any risk, even if it is essentially zero risk. If you're lucky you can produce figures for upper management showing how much keeping that old array on lease/support is costing each month that convinces them to move, if you're unlucky you end up with that almost but not quite migrated array hanging on for a year or two longer (I've seen this happen, and with big EMC arrays that undoubtedly cost a fortune in monthly fees that were less than 5% utilized after the almost migration)
"Cloud" is a marketing term, primarily used by people who don't understand that it's a catch-all for the word "connected via a network". I suspect it derived from the '80s and early '90s textbooks describing packet switching, which were trying to teach upper-level protocols (OSI's so-called "presentation layer", mainly), without confusing people with info about the lower layers that actually transport the data.
In other words, some middle manager/marketer "took a class", was convinced he knew it all, and somehow managed to perpetrate this current meme known as "cloud computing".
Those of us who have been around for a while recognize it as it really is ... Basically, it's centralized computing, with higher bandwidth allowing GUI instead of text terminals. Mainframe technology with a glittery interface ... but mainframe technology nonetheless. It's an all-singing, all-dancing, brightly colo(u)red, dinosaur.
Barney, in other words. And targeting the same mental age group, computing-wise.
(First posted in late 2009 here)
Avere Systems takes the pain our of data migrations in NAS environments.
First, we virtualize any NAS system from any NAS vendor (e.g. NetApp, EMC/Isilon, Oracle/Sun ZFS) and join all systems into a single, global namespace.
Second, we offer a FlashMove(TM) feature that transparently moves data between NAS systems *while* the data is being accessed *without* disturbing the clients or application servers.
Third, we provide read & write caching and clustering to deliver whatever level of performance is needed by the clients and application servers. In this way, we offload the storage and hide the performance impact of the data migration activity.
For more information see: http://www.averesystems.com/Products_Software.aspx