* Posts by komprise

6 publicly visible posts • joined 17 May 2017

IT admins hate this one trick: 'Having something look like it’s on storage, when it is not'

komprise

Re: Been there, done this for both styles of migration. They all suck

l8gravely,

You concisely summarize a lot of the issues this industry faced, but a large part of the problem is that the architectures you describe were built 15+ years ago when a terabyte sounded big and client-server designs were prevalent and there has been little to no innovation for over a decade in this area. I would encourage you to look at solutions one more time as things have vastly improved - Komprise for instance does many of the things you yourself highlight as solutions in your comment, and solves the other issues you ran into with scalability:

i) Your comment: Do updates in the background, or even drop them if the system is busy. That's what late night scanning during quiet times are for.

This is exactly what Komprise does - it adaptively analyzes and manages data in the background without getting in front of the metadata or data paths. It throttles down as needed to be non-intrusive.

ii)Your comment: the key is to make restores *trivial* and *painless*. Otherwise it's not worth the hassle. And it needs to be transparent to the end users, without random stubs or broken links showing up.

Our point exactly. This is why we don't do what you said some of the prior solutions did (e.g. use proprietary interfaces to the storage systems, or use stubs that are proprietary and create a single point of failure). Instead we use standard file system protocols and our links are dynamic so they are not a single point of failure. With us, your data can move multiple times and the link does not change. The link can be deleted by a user accidentally and we can replace it without losing any state. You can move from one file system to another without having to port over stubs or links. There are no databases to manage.

iii) You mentioned traditional solutions did not scale well - either with stub access or scanning.

The traditional architecture is client-server - a central database holds the metadata, and this becomes a scaling bottleneck. As you point out, today data is routinely millions to billions of files scattered across different shares. This is why a distributed architecture is required to manage today's scale of data. Komprise uses a lightweight virtual machine based distributed architecture where you can simply add Komprise Observer virtual machines into an environment and they dynamically rebalance and distribute the analysis and data management workload across them. Just as Google search scales horizontally, Komprise scales without a central database, or any in-memory state that is costly to manage and recreate, and without a master-slave client-server architecture. This approach allows us to scale to hundreds of petabytes and more of data seamlessly and POC to production simply involves adding some more Observers.

komprise

Re: Great article - thanks

You bring up an excellent point of why customers shy away from storage virtualization or network virtualization solutions that front all the data. Komprise creates no lock-in - all the metadata is stored at the sources or the targets, and Komprise itself is mostly stateless. It can be removed from the picture at any point and all you have lost is some aggregate analytics.

komprise

Re: Run in the background

Aitor 1, Komprise uses the spare cycles on the NAS and typically run under 3.5% of the load on the NAS because of its adaptive architecture. We are in several multi-petabyte environments with nearly hundred thousand file shares across multiple servers and hetereogeneous NAS environments and customers have never had to set QOS policies for Komprise. Most environments have some spare cycles but finding them manually and managing them is hard. Running non-distruptively in the background takes advantage of the spare cycles without human intervention.

Newbies Komprise hope to krush data sprawl

komprise

Re: I don't understand why?

Dear Shung,

How are you managing your large number of files today? Are they all on a single storage system or multiple systems? How are you getting a view across all of the systems? Our customers are struggling with these problems - today they have no easy way of getting a single view of what is happening to their data across storage. Most know how much they spend on storage but they really don't know why. They want to leverage the cloud, flash, and other new options without disrupting existing users, applications or processes. They want to leverage multi-vendor storage options without getting locked in.

These are the problems we solve for our customers.

We provide a single view into how data is growing and being used across your storage. We move data transparently by policy without using stubs. We provide native access to data at the sources and targets without lock-in. We work across open protocols such as NFS, SMB/CIFS, REST/S3 without inserting any proprietary interfaces or agents.

Snapshots on your source work exactly as before and are not useless. When you restore from a snapshot after we have moved files, you perform the restore exactly as before and the files we moved are transparently restored. You don't have to log differences and manage any changes.

We are essentially reducing primary storage spend and reducing the cost of secondary data by transparently moving, replicating and protecting data on cheaper secondary storage of the customer's choice. We are providing a single virtual namespace across storage to easily find and search for data. And we are providing visibility across storage for intelligent data management.

If you are interested in getting more specifics, please email info@komprise.com and we would be happy to talk with you.

Krishna

komprise

The product is very simple to setup and scale - in 15 minutes, you can be up and running. Just setup the Komprise Observer virtual machine, point it at storage, and within minutes you will start seeing analysis even on petabytes of data. Komprise is adaptive and scale-out, just add more Observers to scale the environment. All the management is centralized. Customers can sign up to try the product for free (https://www.komprise.com/try-it-now/) and assess the simplicity and savings in their own environment.

komprise

Dear anonymous poster,

The cost table shows the before and after fixed costs of each terabyte of cold data moved in a 3 year model. Since the data is cold and has been moved out of primary storage by Komprise, it no longer has the cost of primary storage in the After column. Instead, we are counting the cost of the secondary storage (e.g. AWS S3 IA) over 3 years. And, we agree that even though the cloud or object storage is resilient and keeps multiple copies internally, you may still want a DR copy, which is why the DR cost is kept in the After column. And because there are multiple copies already on the object or cloud storage, and furthermore, the data is unchanging, there is no need to repeatedly back it up. By default, the Komprise recall policy is set to move the data back if it changes - but this can be modified.

Also, these costs don't include soft dollar operational savings. Our customers have realized similar savings in their environments.

Anonymous poster, I hope this helps.

Krishna