Re: Been there, done this for both styles of migration. They all suck
l8gravely,
You concisely summarize a lot of the issues this industry faced, but a large part of the problem is that the architectures you describe were built 15+ years ago when a terabyte sounded big and client-server designs were prevalent and there has been little to no innovation for over a decade in this area. I would encourage you to look at solutions one more time as things have vastly improved - Komprise for instance does many of the things you yourself highlight as solutions in your comment, and solves the other issues you ran into with scalability:
i) Your comment: Do updates in the background, or even drop them if the system is busy. That's what late night scanning during quiet times are for.
This is exactly what Komprise does - it adaptively analyzes and manages data in the background without getting in front of the metadata or data paths. It throttles down as needed to be non-intrusive.
ii)Your comment: the key is to make restores *trivial* and *painless*. Otherwise it's not worth the hassle. And it needs to be transparent to the end users, without random stubs or broken links showing up.
Our point exactly. This is why we don't do what you said some of the prior solutions did (e.g. use proprietary interfaces to the storage systems, or use stubs that are proprietary and create a single point of failure). Instead we use standard file system protocols and our links are dynamic so they are not a single point of failure. With us, your data can move multiple times and the link does not change. The link can be deleted by a user accidentally and we can replace it without losing any state. You can move from one file system to another without having to port over stubs or links. There are no databases to manage.
iii) You mentioned traditional solutions did not scale well - either with stub access or scanning.
The traditional architecture is client-server - a central database holds the metadata, and this becomes a scaling bottleneck. As you point out, today data is routinely millions to billions of files scattered across different shares. This is why a distributed architecture is required to manage today's scale of data. Komprise uses a lightweight virtual machine based distributed architecture where you can simply add Komprise Observer virtual machines into an environment and they dynamically rebalance and distribute the analysis and data management workload across them. Just as Google search scales horizontally, Komprise scales without a central database, or any in-memory state that is costly to manage and recreate, and without a master-slave client-server architecture. This approach allows us to scale to hundreds of petabytes and more of data seamlessly and POC to production simply involves adding some more Observers.