back to article IBM parks parallel file system on Big Data's lawn

The IT universe is seeing a massive collision taking place as the worlds of high-performance computing, big data and warehousing intermingle. IBM is pushing its General Parallel File System (GPFS) further to broaden its footprint in this space, with the 3.5 release adding big data and async replication features as well as …

COMMENTS

This topic is closed for new posts.
  1. Graham Wilson

    IBM's realised file systems are key.

    Whilst Microsoft continues to get bogged down wasting time with its Metro UI, others are concerning themselves where it matters--filing user data.

    I've said for years that the filing systems on PCs are primitive and obsolete and that they need to take a leaf from mainframe database file systems (remember M$ promised Win-FS and never delivered).

    With huge and increasing amounts of data, the simple filing systems of Linux/Mac/Windows are no longer adequate. Advanced database filing systems where files have extended metadata for both the user and the file data are sorely needed.

    1. Kebabbert

      Re: IBM's realised file systems are key.

      "...With huge and increasing amounts of data, the simple filing systems of Linux/Mac/Windows are no longer adequate. Advanced database filing systems where files have extended metadata for both the user and the file data are sorely needed...."

      Ok, and what advantage does the metadata you talk of, give us? Why do we need it? You only say it is needed, but why? Please enlighten us? :o)

      .

      .

      How does this GPFS compare to Lustre + ZFS? The new coming IBM supercomputer at 20Teraflops will use Lustre + ZFS. It will have 55 Petabyte of data, and 0.5-1TB/sec bandwidth. They are porting ZFS to Linux Lustre, because ext does not scale.

      http://zfsonlinux.org/docs/LUG11_ZFS_on_Linux_for_Lustre.pdf

      http://www.youtube.com/watch?v=c5ASf53v4lI

      Nor does ext protect against data corruption. All hard disks have checksums to detect and correct data corruption, but that is not enough. Every once in a while, the disks will encounter random bit flips that the checksums can not correct. Even worse, some of the errors are not even detectable by the disk. The only solution is to use End-To-End checksums, which ZFS does. Thus, ZFS detects and protects against data corruption. There is research on data corruption and ext / ntfs / XFS / JFS / etc here (research shows that ZFS does protect your data):

      http://en.wikipedia.org/wiki/ZFS#Data_Integrity

      1. Hard_Facts
        Linux

        Re: IBM's realised file systems are key.

        Another product riding on IT buzz-word / frenzy "Big Data" trying to capitalize on it.

        Yes there is Data growth, challenges of Big data to handle, but not by making Bigger storage & Larger filesystems with efficient data stores...

        What is needed is efficient algorithm that stores more data, occupies less space & speeds access to it.

        We need, IBM "The most valuable brand" to bring value differently to solve problems & make "A Smarter Planet".

        1. Kebabbert

          Re: IBM's realised file systems are key.

          "...What is needed is efficient algorithm that stores more data,..."

          What do you mean? Can you explain?

          ZFS stores 2^128 data. That is more than mankind have produced, or will ever produce. To store that kind of data, requires moving electrons so much that the energy needed to move electrons, would boil away all water on earth. If you try to store 2^128 bits on 3TB disks, you need as many disks as 10 moons weigh. And the moon weighs a lot.

          So why do we need to store more data? Maybe I misunderstood you?

This topic is closed for new posts.

Other stories you might like