back to article So much data, so little time: How to not flip your wig processing it

Working with data can be a pain in the butt. You do it because you need to, and because there's value in it – data-driven enterprises thrive on being able to eke as much concrete information as possible out of the stuff in order to maximise efficiency and attack the market share of the competition. But data is complicated and …

COMMENTS

This topic is closed for new posts.
  1. Korev Silver badge

    Extra software licences for reporting

    A cynic would suggest that some of the vendors in this space would love to flog you licences for a very expensive reporting module and would be unwilling to let you do this kind of thing. This kind of thing err saps your motivation to do project like this.

    1. Khaptain Silver badge

      Re: Extra software licences for reporting

      That same cynic would probably also suggest that IT Budgets, Time frames, lack of staff/resources are common elements that MOST of us have to endure on a daily basis..

      The perfect IT department is a myth otherwise we wouldn't need all these "miraculously expensive " products...

    2. Hans Neeson-Bumpsadese Silver badge

      Re: Extra software licences for reporting

      Indeed.

      On a recent project I was working on, we considered having a separate database to handle out-of-band reporting. But the database platform was Oracle, so costs were prohibitve.

  2. Anonymous Coward
    Anonymous Coward

    For example, anyone fresh out of university ... normalise everything to death

    I'd kick those "anyones" in the nuts/ovaries very hard.

    When designing anything, but specially when designing software you need to ask WHY to get to the HOW. But that's why most students (and other teachers) avoid me. I ask them to think about what they're doing and why*.

    *I'm not all bad, I often accept answers like "because I want to see what happens" -- completely acceptable for an educational environment.

    Now get off my lawn and take you second-hand copy of SQL for Dummies with you.

    1. Ian Michael Gumby

      Re: For example, anyone fresh out of university ... normalise everything to death

      There's more to this...

      Many want to re-purpose or re-use existing RDBMS tech in the big data space.

      So thats why there's still a drive to normalize the data.

  3. Ian Michael Gumby
    Flame

    Not worth reading...

    The author has no clue about what he attempts to talk about.

    So many errors that it would take a rebuttal article to correct the original.

    Free clue. Batch processing for summaries has to occur at some point when the data is stable. You can do this periodically throughout the day or at night. Note that if you're a global company night is relative.

    Also, if you aren't going to use the data, it makes no sense to spin cycles computing averages kpis for no reason.

    The real issue is how the data is delivered. Some data comes in throughout the day in flat files, thus you have to wait until 'end of day' or all sources have delivered the data. Then you have data which is streamed. This data can be used to generate running totals /averages or other kpi.

    But the author is correct in that most people today only know RDBMs. We've since flipped back to hierarchical structures and unless you're old enough to have been taught COBOL or have worked with a Pick system (Revelation, U2, etc ...) or have converted IMS,... you really don't know much about it.

    Or have spend time working with the newer tools where you have field and record separators in hive.

    But I digress. Maybe the author should learn something before he writes an article ?

  4. Anonymous Coward
    Anonymous Coward

    Not sure I agree with not doing over night jobs.

    I've built data warehouses in my last 3 jobs for finance reporting joining the data from the finance application and CRM systems and set them up to run first thing in the morning, all the data is available and I can answer any question on the data within minutes, summaries are also a doddle as cutting the data is a simple query. I'm sure there are examples where you would need "live" data but a snapshot at the end of the previous day is usually enough for most reporting requirements. I just don't see the benefit of direct connections where using an output and loading that in works just as well. This is just my experience so others may have a different opinion.

    1. IamStillIan

      I concur; usecase is everything.

      Additional reporting servers are a cost; not all systems need to be so timely; why make things more expensive when there's no benefit?

  5. Version 1.0 Silver badge

    Dev nul:

    It's always handy to have one of these when working with large datasets.

  6. FozzyBear
    Facepalm

    Mirror copy of all My source systems just for reporting. Is the author serious !!!!!!

    Seven figures to implement the hardware to host all that, Not to mention the ongoing costs involved. Plus the fact it gets you no closer to a solution that the business requires. All for the sake of holding raw transaction data from multiple source systems that may or may not at sometime in the future maybe needed by the business.

    Or carefully implement an overnight batch process that retrieves the data the business "NEEDS" to see in their reports and analysis, processed, summarised where needed and ready for viewing the next day.

This topic is closed for new posts.