back to article The LUN must DIE. Are you with me, storage bods?

I know people think that storagebods are often backward thinking and hidebound by tradition and there is some truth in that. But the reality is that we can’t afford to carry on like this. Demands are such that we should grasp anything which makes our lives easier. However, we need some help with both tools and education. In …


This topic is closed for new posts.
  1. Nate Amsden

    already possible on 3PAR today

    From what I gather.. I for one haven't come across anyone requesting multiple small LUNs since I worked at a company with EMC/HDS storage(which was eight years ago - obviously storage in general was much smaller unit sizes anyway and thin provisioning wasn't available at least on the EMC/HDS platforms my company had). Pretty much ever since it's been all 3PAR.

    But more to the point, you have the ability with 3PAR to do basically what your asking -- which is provision a LUN based on performance metrics. To do it you need a software add on called Priority Optimization which allows you to specify IOPS and throughput limits on volumes/LUNs (or a collection of volumes). The system is supposed to respond within a second or so. There is an extra step or two to configure the QoS settings, but I figure it's still far easier than most other competing platforms.

    I suppose you could throw in the sub LUN auto tiering as well with adaptive optimization to make more efficient use of flash. Priority Optimization went GA in June, though was announced almost a year ago. Then you could take it further if you wanted with peer motion (which still needs more automation hooks, right now it's all manual I believe) and move workloads between arrays.

    These software add ons are not free though.

    Last I checked volume sizes were "limited" to 16T on 3PAR (used to be 2T up until a few years ago - though my vSphere 4.x hosts are still limited to 2T). I don't think that 16T is going to increase any time soon(I think HDS boasts something like 64T).

    So really the only time you need to provision multiple LUNs for a single application is if your manually splitting things between tiers(never done this myself all of my 3PAR arrays have always been a single tier of spindle type though always running multiple RAID levels on the same disks) or your facing size constraints.

  2. Jim O'Reilly


    Doesn't 9GB stem from a comment by Teradata's CTO in the mid 90's that 9GB drives were best for databases. How that turned into 9GB LUNs escapes logic, though I suspect it was a lack of real thinking.

    Explains why object stores and SSD escape some data center architects completely!

    1. Destroy All Monsters Silver badge
      Paris Hilton

      I don't get it.

      Googling for "9GB LUN" or "9GB Logical Disk" (the latter is probably the correct term) yields nothing.

      What am I doing wrong?

      1. Matt Bryant Silver badge

        Re: Destroyed All Braincells Re: I don't get it.

        It used to be a constant war with Oracle DBAs, though I must admit they seem to have dropped that habit over the last few years. But it was actually only PostgreSQL DBAs with which I ran into the "9GB disks/LUNs" mantra. The only time I've heard it recently was when we had one of those "we last upgraded this app for Y2K, can we port it onto a new server, please, with the same database layout" nightmares.

        1. Destroy All Monsters Silver badge

          Re: Destroyed All Braincells I don't get it.

          Thank you, oh knowledgeable Mutt Brilliant.

    2. Storage_Person

      Re: 9GB?

      Many, many years ago (late 90s) Solaris could only have a maximum of 5 partitions per disk (actually 7 but two were reserved). And Sybase could only a maximum of 2GB per partition. This resulted in a 9GB LUN being perfect: 4x2GB partitions for data and 1x1GB partition for binaries, logs and the like. The world has moved on, but many DBAs haven't.

      1. Anonymous Coward
        Anonymous Coward

        Re: 9GB?

        Ermmmm - no they weren't reserved. It was a well-standing practice but it could easily be overridden.

  3. Wanda Lust

    Not Storagebods, DBbods

    It's DB bods who are the problem. persisting with that claptrap they got on their DBA education where the elements of a database need to be partitioned across different sizes and protection/performance compromises in storage. Many I have encountered, even recently, still don't get the virtualisation that a LUN brought 10+ yrs ago nevermind the virtualisation that automated storage tiering, etc, brings to the game now.


      Re: Not Storagebods, DBbods

      > Many I have encountered, even recently, still don't get the virtualisation that a LUN brought 10+ yrs ago nevermind the virtualisation that automated storage tiering, etc, brings to the game now.

      Unacceptably high latency?

    2. Lost In Clouds of Data

      Re: Not Storagebods, DBbods

      We're not all idiots - some of us have progressed from the days of going down to the stripe # of a Unix disk (ah, distant horrific memories of the early 90's).

      Right now my issue isn't disk contention (my FusionIO drive hums at a nice speed these days), it's resources forced down from the bloody host that keep killing my shit.

      Virtualization is good, I'll grant you, but it's not the most effulgent of solutions for everything when the masters dictate what resources you might be able to share with the bosses pet project.

    3. Anonymous Coward
      Anonymous Coward

      Re: Not Storagebods, DBbods

      Poacher turned gamekeeper here (DBA to Storage Engineer). The amount of times I have to drum this into DBAs and Server Admins (virtual or otherwise)... to use a car analogy, at the LUN level it's like trying to balance wheel bearings to control a skid - just the other day I heard a DBA reasoning for it as "we get more spindles" (you don't and in fact are making the array work double-time in the case of sub-LUN tiering or thin provisioning), whereas instead the storage array has all manner of traction control (provisioning that stripes the data across spindles anyway), stability control (tiering) and other gubbins to take care of things.

      As for the 'High Latency' comment, I countermand that with 'Bad Planning' - you're trying to override old limitations that just aren't there anymore on modern arrays.

      Incidentally, one of the first Storagebod articles I've agreed with for once.

      1. Anonymous Coward
        Anonymous Coward

        Re: Not Storagebods, DBbods

        I thought the Unacceptably high latency comment was the 10+ years it's taken the penny to drop...but maybe that's just my warped sense of humour.

  4. M. B.

    Can do that on Compellent as well...

    ...using Storage Profiles to determine where the data and snapshot data is stored on a per-volume basis (which tiers the data and its snapshots should reside on). It's not as polished as the 3PAR method of actually assigning thresholds but will work well for most companies.

    Alternately you can use the defaults and have all your writes come in to tier 1 RAID10 and let Data Progression re-stripe/move the data to where it needs to go over time.

    It's a bunch of disks in a pool, sorted by tier. The system writes data at the RAID level you specify to the tiers you specify on a per-volume basis, then takes a snapshot of the data and converts that on the fly as well to whatever RAID level you specify and moves the data up or down through the tiers depending on how frequently it's accessed.

    And they don't call the storage units "LUNs" either, if that makes you feel any better.

  5. Daniel B.

    Maybe it's the filesystem?

    I might see that part of the problem is also related to filesystems and how they manage large spaces. Could be an issue with some of 'em? ZFS might fix that; just get a large enough LUN and put ZFS on that. Want to "partition" for your DBA? Just make the virtual partitions on zfs and tell them it's done. Everyone's happy, and you only get one LUN assigned by storage!

    1. Nate Amsden

      Re: Maybe it's the filesystem?

      if your running on Solaris sure.... Can't imagine any DBA in their right mind wanting to use ZFS on any other platform for a production DB (supportability reasons alone).

      For MySQL on Linux I saw this earlier in the year

      "[..]So I trashed the raid-10 arrays, configure JBODs and gave all those drives to ZFS (30 mirrors + spares + OS partition mirror) and I limited the ARC size to 4GB. I don’t want to start a war but ZFS performance level was less than half of xfs for the tpcc test and that’s maybe just normal."

      1. Paul Crawford Silver badge

        Re: "ZFS performance level was less than half of xfs"

        There is inevitably a performance hit going to ZFS if all other things are equal due to the block checksums, etc, that it uses to guarantee higher integrity.

        However, you can often get a major boost if using SSD for the ZIL (ZFS Intent Log) as that provides fast confirmation of data commitment (so your application 'knows' that the data is saved) while also allowing ZFS to schedule the stripe write over the main storage HDD in a more efficient manner.

        Enough RAM (about 1GB per TB of storage is the rule-of-thumb) is, obviously, also an advantage. But make that ECC memory, as there is little point in using ZFS for slower but high-integrity off the storage devices if the data can be (and occasionally is) corrupted in memory.

        Finally, be sure to run ZFS as a kernel-mode driver, not as normally done for licensing reasons as a user-space loop-back device, otherwise performance takes a major hit (one of the reasons NTFS on Linux is not so fast).

      2. Gordan

        Re: Maybe it's the filesystem?

        If you are getting hung up about performance when assessing ZFS you are completely missing the most important point. For small systems handling data that isn't particularly valuable, and where you are cash-strapped to the point where you absolutely cannot aford any more hardware, you wouldn't be using ZFS.

        If your data's correctness is of significant value, however, alternatives simply do not cut it. I suggest you

        read this.

        The particularly relevant quote is: "Now for comparison, take a look at, say, Greenplum’s database software, which is based on Solaris and ZFS. Greenplum has created a data-warehousing appliance consisting of a rack of 10 Thumpers (SunFire x4500s). They can scan data at a rate of one terabyte per minute. That’s a whole different deal. Now if you’re getting an uncorrectable error occurring once every 10 to 20 terabytes, that’s once every 10 to 20 minutes—which is pretty bad, actually."

      3. Anonymous Coward
        Anonymous Coward

        Re: Maybe it's the filesystem?

        ERmmm - perhaps you should have a look at the Oracle ZFS Storage Appliances. You don't have to be running Solaris to use them :-)

    2. Paul Crawford Silver badge

      Re: "just get a large enough LUN and put ZFS on that"

      Is it not much better to do it the other way round, to use ZFS to combine bulk HDD storage and SSD write-intent log drives in to a high integrity array, then use iSCSI to export a 'block device' to any application that is incapable of using a standard file-system?

      I am not expert in storage systems, but from my perspective we should be moving away from applications needing block devices (presumably an approach dating back to horribly inefficient FAT systems and the like) and using network file systems so user+application data is stored as files, but on centrally managed and backed-up machines?

  6. Anonymous Coward
    Anonymous Coward

    I couldn't agree less with the entire premise of this article.

    1. Anonymous Coward
      Anonymous Coward

      Well there's a well-reasoned argument. Well done.

  7. Chad Steele

    150TB Luns no problem except....

    There isn't a file system out there that can handle a file system error, try doing a fsck on a 150TB volume, it takes 6 months. Storage has far far outpaced the capabilities of programmers to safely utilize it.

    1. Martin Glassborow

      Re: 150TB Luns no problem except....

      Really....we have much larger file-systems than that...and yes on occasion we have to fsck generally takes no longer than a few minutes if that. Of course, we cheat; we run clustered file-systems.

  8. Anonymous Coward
    Anonymous Coward

    loads of small luns

    Had a contract for a while at a large, profitable, american investment bank who buy in storage frames, give them to the storage team who would divide them into 16GB luns as part of commissioning and then force the internal customers to order storage in multiples of that. Insanity - we had servers with literally thousands of luns presented to them. Then some mgmt twat would ask me to do performance analysis meaningfully on them. iostat output was epic....

    1. Anonymous Coward
      Anonymous Coward

      Re: loads of small luns

      buy in storage frames, give them to the storage team who would divide them into 16GB luns as part of commissioning and then force the internal customers to order storage in multiples of that.

      Depending upon what you want to achieve that can be a very good way of increasing performance.

      I know of a bank who create RAID0 LUNs (like that) on multiple different storage arrays and then stripe the data across them at the host level (effectively RAID00), giving them amazingly fast virtual disks.

      The same kind of setup can be used to implement RAID01/10 or RAID50, RAID60 etc. Imagine spreading the IOPs through mutliple high speed adapters, across multiple fabrics, through multiple storage array caches to thousands of physical disks. Replicate that kind of performance inside of a storage array (which can be managed by the array software) and you might be able to get rid of LUNs, until then they're a very useful tool for managing capacity/performance.

      LUNs provide the ability to manage storage capacity/performance beyond the storage array.

  9. Mostor Astrakan

    Ye gods...

    Oracle bods here are still asking for two equal copies of their transaction log archives so they can mirror them from Orrible. And of course they need to be on "Separate disks in case one of them breaks", and can I *guarantee* that?

    Dude. This is the 2010s. Disks don't break anymore.

    1. Daniel B.

      Re: Ye gods...

      Dude. This is the 2010s. Disks don't break anymore.

      Au contraire, mon ami! Haven't you seen the awful failure rates of the newest HDDs? It is now more likely than not that you will have at least one disk failure when buying a batch of WDs or Seagates!

      1. Anonymous Coward
        Anonymous Coward

        Re: Ye gods...

        Well in 2009 I had the unenviable task of swapping out 1386 disks lately due to an increased failure rate due to head actuator "issues" so yes - they absolutely do break! And with monotonous regularity when you have a very large collection of them.

      2. Adam White

        Re: Ye gods...

        Physical hard disk drives break. LUNs, which are now a software construct, do not.

  10. Gordan

    Layout Optimization - Humans vs. Automated Process

    "It also encourages people to believe that they can do a better job of laying out an array than an automated process."

    You can bitch about this all you like, but the fact is that no technology is available where an automated process does a better job than a good sysadmin who knows what they are doing.

    That is not to say that the automated process cannot be made to be as good - it is merely to say that storage vendors haven't created one yet. They have a vested interest in doing a crap job here - storage is generally priced per GB, so they just want to sell you more storage to make up for a shortfall in IOPS. they are also geared toward creating kit that panders to the ignorant sysadmin that doesn't understand how the storage stack works (everything between the application and the spinning rust or NAND) - and there's a reason for that, too; they advertise to the PHBs as having a product that even an idiot can administer, therefore there is no need to hire expensive people who actually know what they are doing.

    The LUNs are there for a reason. In traditional RAID, they allow you to partition your disks into RAID arrays and ensure that the expected workload's blocks (both file system and application) align appropriately with the RAID blocks. Doing your homework on this can make a very substantial difference to performance (in some cases multiples). With more advanced *AID technologies (such as ZFS) this changes somewhat, but there are still plenty of low level aspects to consider that will affect the performance.

    Unfortunately, we are still a long, long way from any cheap, poorly optimized performance system can provide any required performance, no matter how poorly managed.

  11. Dave Nicholson

    First step? Get random!

    XtremIO is built to leverage random access media. (Flash just happens to be the current shipping version of this today.) When break free of the bondage that is "the RAID stripe", all sorts of cool things are possible.

  12. JohnMartin

    I couldnt agree more

    -Disclosure NetApp Employee-

    Oddly enough, I published a blog on computerworld on exactly this same subject about a day before this came out ... I love zeitgiest

    It's not just the fragmentation space and managment that LUNs cause, but also things like queue depth management, and a stack of other stuff that most people simply cant manage effectively at large scale.

    As a case in point ..

    Q. what is the first thing you do after you carve up a LUN ? ...

    A. Put a filesystem on it

    OK, arguably you turn it into a more useful construct first by using a volume manager before you put on a filesystem, nonetheless LUNs aren't particularly useful or elegant storage containers. We've had better ways of managing and provisioning storage for quite some time now (pretty much since SCSI and LUNs were invented), but for some insane reason, we decided to repeat and persist in repeating the things that Mainframes gave up as a bad idea ages ago.


    John Martin

    1. DavidRa

      Re: I couldnt agree more

      OK I hear what you're saying, and yes:


      Q. what is the first thing you do after you carve up a LUN ? ...

      A. Put a filesystem on it

      True enough, but I have a followup question. Which one?

      Because the one thing I don't see addressed here is heterogenous environments. Which filesystem can my Win2012 box running Exchange share with a Win2012 box running SQL and a RHEL box running Oracle?

      Or is everyone assuming the storage should be presented to a set of hosts running an intermediary layer and carved up there?

      Funnily enough, though, most of my arguments with storage engineers have revolved around them wanting to give me little 3 and 5 disk RAID 5 sets for each Exchange database, or insisting that RAID 5 is better in every way than RAID 10 (spindle utilisation yes, performance .... not always despite what three-letter-vendor claimed). Not sure the LUN argument has even reached those cretins yet.

      1. JohnMartin

        Re: I couldnt agree more

        Ok, at the risk of looking like I'm just here to spriuk my employers wares, my answer to your question "Which Filesystem" would of course be WAFL :-).

        Of course I'd avoid the LUN question altogether and run SQL Server, and virtualised Exchange over SMB3, and Oracle using their Direct NFS implementation. You could run all of them into the same flexible volume, but you'd probably be better advised to use different flexvols for each application. (I'm waiting for MS to certify Exchange on SMB3 directly the way they do with SQL, technically it shouldnt be a problem AFAIK).

        How would this be different than LUNS ? Well to start off with you've avoided the steps associated with LUN and volume management, so you've saved some time there, then you've got the advantage that you don't have to guess about how much to space you need to pre-allocate, the volumes can automatically grow and shrink without any host side changes, backup is trivially easy, and there's no queue depth issues or SCSI Reserve/Release issues to worry about if you want to implement some form of clustered database tech on top of it. There's a lot more, but I don't want to sound too much like a commercial.

        FlexVols and WAFL ay not be exclusive in these benefits, but they do demonstrate that there is a neat kind of architectural elegance you get when you use a storage container that wasn't designed to mimic a physical device (which incidentally also inludes things like VMDKs and VHDXs but I'm getting ahead of myself).

        1. Anonymous Coward
          Anonymous Coward

          Re: I couldnt agree more

          Appreciate some of the simplicity arguments but much of this applies only versus very old architectures where you do have to pre-allocate storage and which then makes it very difficult to reclaim unused or freed space. Anytime you create a fixed size container, aggregate, volume, LUN etc regardless of the protocol you're serving it out over your effectively committing upfront resource it's just the scale that differs.

          How about things like multi-tenancy on a shared volume, how do you handle QOS to ensure one application can't monopolize the I/O ? What makes backup trivial ? if you want consistency and granular recovery your new backup is likely to look like your old backup, just the routing has changed. How about tiering ? maybe Exchange won't benefit as much as Oracle, how do I separate them without more upfront allocations ?.etc etc.

          Personally I think committing your applications (not home directories) to a shared file system from any storage vendor, is a recipe for lock in.

  13. Dolapevich

    Lun resizing?

    Sorry if I am missing the point, but I think we can safely provision a single lun for any size, and rezise it when needed. Even whindows supports that.


    1. The Storage Guy

      Re: Lun resizing?

      This is not true. Although most/all storage vendors allow you to grow a LUN (within reason)...Most/All host OS's have limitations on how big a LUN can be. Plus you need to throw in the fact that without some host-side software the OS will not know that you have grown/shrank the LUN (shrinking a LUN adds another conversation completely). Because a LUN (in most cases) are contiguous blocks on disk and are "pre-allocated" (depending on thin-provisioning), shrinking a LUN in some cases is not only difficult but could be impossible (EMC). You would need to create a new "smaller" LUN and copy all data to it and destroy the original. This is because most systems do not have a way to differentiate 1's and 0's inside the LUN from "usable data" and/or "sparce data".

      1. Anonymous Coward
        Anonymous Coward

        Re: Lun resizing?

        Not so much these days Windows for instance can have huuuuge LUN's if your silly enough to want to create them. You can also detect the change and extend them online without any need for host side software. But yes shrinking a LUN can be risky as you first have to compact the file system, shrink it and then slice that part of rhe LUN away, which is a pretty risky procedure, get it wrong and your file system and dependent data is gone.

        The simplest and least risk method is to have the array shrink the back end disk allocation without affecting the overlaying LUN and file system. e.g 3PAR does this by compacting unused or deleted blocks and then shuffling them back into a global pool for reuse elsewhere.

        That way the file system sees a consistent view of the LUN but the back end disk consumption equals only what's actually been written into the file system plus a bit of buffer. The process is repeated automatically to ensure the back end allocation stays as efficient as possible without impacting either LUN's or file systems.

        Not a LUN shrink as such, but in many ways much better since it's risk free and requires no user intervention.

  14. This post has been deleted by its author

This topic is closed for new posts.

Biting the hand that feeds IT © 1998–2022