back to article CompSci boffins claim they can recreate missing lines in log files

CompSci boffins think they've come up with a novel way to recreate missing entries in log files. In a paper titled Bagging Recurrent Event Imputation for Repair of Imperfect Event Log with Missing Categorical Events, Dr Sunghyun Sim and Professor Hyerim Bae (both from Pusan National University in South Korea), and Professor …

Page:

  1. Disgusted Of Tunbridge Wells Silver badge
    Holmes

    Alternative headline: CompSci boffins find that logs are overly verbose and duplicated.

    1. Wellyboot Silver badge

      and can be edited.

      1. W.S.Gosset Silver badge

        and can NOT be edited, Your Honour.

        FTFY^H^H^H^H

  2. Wally Dug
    WTF?

    System Error

    "restoring missing event values... which can overcome human or system error."

    What exactly is a "system error"? Is a cronjob that wasn't run due to <<whatever>> a system error and, if so, could an entry be inserted into the log file for that "missing" run when in actual fact we need the entry to be not there? The last sentence is completely true: "...imputed logs clearly have potential to make life interesting for digital forensics practitioners." Perhaps for admins too.

    Maybe I'm missing the point, but surely if a log file is so critical that there will already be security in place?

    1. ThatOne Silver badge
      Facepalm

      Re: System Error

      Hear that distant rumbling? That's hackers shivering with anticipated pleasure! Just delete any incriminating log files about your activities, and the victim's storyteller will create new innocent ones to replace them...

  3. Loyal Commenter Silver badge

    Using "AI" to amke guesses

    So, what they are doing is creating log entries that the software deems *should have* been there, with time-stamps that it reckons are about right.

    Thus rendering one of the main purposes of a log prone to error. If I want to read through a log file (and want is probably a bit of a strong word there), I will almost certainly want to know the exact sequence of events, which are likely to have occurred in close succession.

    Given the nature of modern multi-threaded and asynchronous programming, the timing and sequence of events can be very important in tracking down and diagnosing issues. If some "AI" has come along and inserted entries into that log file with "best guess" timing / sequence / content, it is going to be actively counter-productive.

    I'd be focusing instead on why some of your log entries aren't getting recorded accurately in the first place, because this sounds like a "clever" solution for an imagined problem. I can't say I've ever experienced this sort of thing happening with any of the logging frameworks I've ever used.

    1. Neil Barnes Silver badge

      Re: Using "AI" to amke guesses

      Thank you - you saved me saying exactly that.

      If there's nothing in the logs after an event, there is absolutely no benefit in imagining something that might perhaps fill the slot; it tells you exactly nothing.

      1. Steve K Silver badge

        Re: Using "AI" to amke guesses

        It's precisely wrong...

        1. jake Silver badge

          Re: Using "AI" to amke guesses

          It's actually not even wrong.

          It's a fabrication, a guess, a story, and has no place in (especially!) forensics.

      2. ThatOne Silver badge
        Devil

        Re: Using "AI" to amke guesses

        > there is absolutely no benefit in imagining something that might perhaps fill the slot; it tells you exactly nothing

        Yes, but it's neater... And since quite often the letter of the rule is way more important than the spirit, you need to have clean, neat, complete logs, no matter what's in them.

        [15/Dec/2021:08:02:21] Lorem ipsum dolor sit amet, consectetur adipiscing elit

        [15/Dec/2021:08:02:58] sed do eiusmod tempor incididunt ut labore et dolore magna aliqua

        (and so on)

      3. amanfromMars 1 Silver badge

        Re: Using "AI" to amke guesses

        If there's nothing in the logs after an event, there is absolutely no benefit in imagining something that might perhaps fill the slot; it tells you exactly nothing. .... Neil Barnes

        Surely one cannot be serious and actually believe any or even all of that, Neil?

        Such would virtually tell anyone with an earnest honest interest practically everything needed to be known and not done with regard to the event.

      4. JDX Gold badge

        Re: Using "AI" to amke guesses

        I'm going to make a wild 'guess' that the people who worked on this probably know a little bit more about it than random IT people (like me) who lack the context. Otherwise they wouldn't bother doing it.

        1. Loyal Commenter Silver badge

          Re: Using "AI" to amke guesses

          I'm going to make a wild guess that they are actually academics who don't have the multi-decade real-world experience of a lot of the commenters here, so exactly the opposite of that is true.

          It's probably someone's thesis topic. "Pick an interesting problem to work on. No, it doesn't matter if that problem doesn't really exist, Knuth solved all the real ones decades ago."

          1. Wellyboot Silver badge

            Re: Using "AI" to amke guesses

            Two things spring to mind.

            Having a log just stop gives an easily spotted point of FUBAR to work back from, not many of us will appreciate trying to find the last real entry just to start the process.

            If the overall system is running well enough to be able to produce made up log entries it can B****y well punt a live message to someone saying 'X' has just packed in logging the events we were expecting.

        2. jake Silver badge

          Re: Using "AI" to amke guesses

          I'm going to make a wild guess that the people who worked on this probably know a little bit more about how to justify recieving grant money than people (like me) who got our degrees, and then got out of Uni and entered the real world. Otherwise they wouldn't bother doing it.

    2. iron Silver badge

      Re: Using "AI" to amke guesses

      Agreed. If an AI fills in the lines that should have been there then I won't find the error I'm looking for, making those logs totally worthless.

      1. Brewster's Angle Grinder Silver badge

        Synthesizing a haystack without a needle will not help you find the missing needle

        What you're looking for in a log is the exception to the rule - not the humdrum pattern.

    3. Loyal Commenter Silver badge

      Re: Using "AI" to make guesses

      As it happens, one of my many and varied dumpster fires that needed putting out today involved doing exactly this, unpicking log files from two different sources, one of which logs things happening in parallel in multiple threads, line by line, to work out the sequence of events to determine at exactly which point an API returned an internal error, to try and infer why.

      If any of those log entries had been "filled in", either with "expected content", or a timestamp from somewhere else (hint: not all sever clocks are synchronised ot the fraction of a millisecond, but the times in these logs are accurate to that degree, and entries are always written in the order that they are logged, even if they have the same time stamp), then this could very well have led me to the wrong conclusion, which, thankfully was of the "it's someone else's problem" variety.

      On the other hand, if a log is so regular that you can easily infer the order that entries should occur, even if such entries are missing, then you're not really writing a useful log. You're either writing an audit, or wasting disk space. You probably don't want to be writing software that automatically falsifies audits for you.

    4. Zippy´s Sausage Factory
      Devil

      Re: Using "AI" to amke guesses

      What's the betting this gets used in court. Someone says "someone deleted from our log file... must be hackers" and uses the AI to "rebuild" the log files. Those get submitted in court, and of course because it's AI and it's a computer it "never makes a mistake".

      I mean OK this is a bit of slippery slopeism and it probably says more about my cynical worldview than anything, but as usual we have to be careful with AI and remember that it isn't really intelligent, it just pretends to be.

      Anyway, I'm off to go and hide in my cupboard. Might do a bit of moaning and wailing later, if I feel in the mood. (Gnashing of teeth is a luxury I reserve for the weekends).

      1. Pascal Monett Silver badge

        AI doesn't pretend anything

        It's only marketing and overzealous presenters who prance about using the term and, like Tesla's "autopilot", pretend that it is something other than what it is : a statistical analysis machine.

      2. oiseau Silver badge
        Facepalm

        Re: Using "AI" to amke guesses

        ... remember that it isn't really intelligent, it just pretends to be.

        Hmmm ...

        Like some boffins researching solutions for inexistent problems?

        O.

    5. oiseau Silver badge
      Facepalm

      Re: Using "AI" to amke guesses

      ... inserted entries into that log file with "best guess" timing / sequence / content, it is going to be actively counter-productive.

      You are too kind.

      The net result will be rubbish.

      ... sounds like a "clever" solution for an imagined problem.

      Quite so.

      But there's been a lot of that going around in the past few years.

      Think Poettering's systemd for a good example of that.

      O.

  4. andy 103
    Stop

    correlates data *from other sources*??

    "Recreating" lines isn't really accurate then. All they're doing is getting data that has already been recorded from other sources and then trying to work out where it fits into a file with "missing" data.

    Why is time, energy and effort being spent on these bullshit activities?

    The three authors couldn't find a tool to recreate missing events. So they built one that correlates data from other relevant sources.

    If the data is already there, then the actual real problem is that some people don't know where it is.

    I can't envisage this being used in any serious or critical application. Imagine if flight data recorders worked on this premise. We'll just try and guess the sequence of events so we can put everything into 1 convenient file, rather than having the prerequisite knowledge to determine them accurately... Fuck off.

    1. JDX Gold badge

      Re: correlates data *from other sources*??

      The fact you can't understand it doesn't make it useless. It just means you don't understand it.

      You've missed the point.

      1. Loyal Commenter Silver badge

        Re: correlates data *from other sources*??

        Just because you *can* do something, does not mean that you *should*.

        I may be missing the real-world use case for this, but it sure sounds like it's an "academic problem".

        Bemoaning that others "don't understand" is more than a mite condescending. I'm pretty sure most people have understood what they are doing, we just don't know why you would want to do that.

        To most analytical minds, the absence of an entry in a log file tells us more than having a guessed-at entry filled in, in its place. Especially so, if that entry is being constructed from other data, because it's pretty obvious that if it is missing from the log file, then we should go looking to see where it actually is, and in doing so, hopefully gain an understanding of the sequence of events that has led to that situation.

        Log analysis is an art more than a science. More often than not it will consist of a process of filling in gaps ourselves to determine the sequence of events that could have happened. In doing so, this may raise other questions, which may well lead us towards a real underlying problem that needs solving. If you hand this process over to an "AI" to make a best-guess at everything, then this problem-solving process never happens.

      2. andy 103
        WTF?

        Re: correlates data *from other sources*??

        @JDX interestingly you weren't able to elaborate on what the point of it actually is. Followed by a comment from yourself 2 hours later "Struggling to see quite how this works."

        Oh dear.

    2. oiseau Silver badge
      Facepalm

      Re: correlates data *from other sources*??

      Why is time, energy and effort being spent on these bullshit activities?

      Well ...

      Maybe there's good money to be made doing that?

      There's public for anything.

      O.

  5. Pirate Dave Silver badge
    Pirate

    BOFH

    Eh, did they ever think that maybe some log entries are missing for a reason? Sheesh...

    1. Antonius_Prime
      Devil

      Re: BOFH

      It's OK.

      Repeat in front of a mirror until you can say it with the most shaken, heartbroken expression you can manage (and not giggle):

      "There's been a terrible accident..."

      Apropos of nothing, anyone seen my bag of quicklime and my roll of carpet? I put them down when I went to get my print out of poorly surveiled woodland sites and building sites with deep concrete pours occurring soon...

  6. Doctor Syntax Silver badge

    It will be added to systemd in the next release.

  7. Peter Galbavy

    Use an AI guessing to train another AI and lie about "evidence". Nice. Just what some politicians need.

    Event logs are very often used as evidence - not necessarily the legal kind - to establish the sequence and timing of events, who/what was involved and responsible. Tampering with those event logs is just like any other record tampering, even if it's tied up in a nice red bow and a gift tag that says "With Love from your favourite AI".

    THen the side note about logs being used to train AIs is in itself suspicious. If you use fake records to train an AI then all you are doing is reinforcing whatever bias you decided was important to you.

    Is there a rotting fish icon?

    1. amanfromMars 1 Silver badge

      Misinformation is not a Great 0Sum Game

      Event logs are very often used as evidence - not necessarily the legal kind - to establish the sequence and timing of events, who/what was involved and responsible. Tampering with those event logs is just like any other record tampering, even if it's tied up in a nice red bow and a gift tag that says "With Love from your favourite AI”. ..... Peter Galbavy

      Tampered event logs in the West are invariably default tagged and gifted as if “From Russia with Love”

      Can you imagine the insight/foresight such a gross mischaracterisation delivers to those in the East? It tells them practically all that they need to know about the weaknesses being attacked and defeated in the West.

    2. Anonymous Coward
      Anonymous Coward

      @Peter

      just love ppl signing their thoughts with their own name, tippin my hat

  8. Scott Broukell

    But, did the events actually take place or not, were they totally imagined or virtual and, more importantly, were they socially distanced events?

    1. Anonymous Coward
      Anonymous Coward

      Iteration 1. There was no event.

      Iteration 2. There was an event, but we joined the event database and the rule database, so the event must have obeyed the rules.

      Iteration 3. There was an event that broke the rules but we weren't there.

      Iteration 4: I join your denial to the 'they all lie, all the time' axiom and hey presto: truth!

      1. Adrian 4 Silver badge

        What you need is to log the logging events so you can investigate why they weren't logged.

        1. jake Silver badge

          Quis custodiet ipsos data-commentariis?

    2. PerlyKing Silver badge
      Go

      Re: did the events actually take place or not

      You're making it sound like the perfect application of this would be in quantum computing.

      1. Antonius_Prime
        Trollface

        Re: did the events actually take place or not

        Up until the logs get observed. Until then, they're in a state of superposition and we can't know the contents...

        1. Anonymous Coward
          Anonymous Coward

          Re: did the events actually take place or not

          So... if an event occurs, and there's no logger to record it, does the admin make a sound?

          1. Gene Cash Silver badge

            Re: did the events actually take place or not

            Yes, he says "I need another drink"

  9. JDX Gold badge

    Example?

    Struggling to see quite how this works. A visual example would be really helpful.

    1. diodesign (Written by Reg staff) Silver badge

      Re: Example?

      I've added an infographic and a link to a summary of the study by one of the universities. It basically, to me, works by figuring out what data from various sources is needed to create a log's entries, and then automating the process of generating missing entries from that data.

      C.

      1. Doctor Syntax Silver badge

        Re: Example?

        It's still no clearer exactly what they're doing because it's just a pile of jargon. It's the "recurrent event imputation" that concerns me. The nearest I can make of it is "There's usually an event of type X here but there isn't in this case so let's add one." Possibly it means something different and got lost in translation from the Korean.

        1. Polleke
          Holmes

          Re: Example?

          And this is exactly what you don't want because a missing log entry sometimes means a lot more than all the log entries together.

  10. Anonymous Coward
    Anonymous Coward

    Depends on how you're logging...

    ... but round here the *absence* of a log entry is a pretty good indication that a major, utterly disastrous, problem occurred between the last log entry and where we should see the missing log.

  11. Pascal Monett Silver badge
    Thumb Down

    "what the log entry should have been"

    That is nothing more than rewriting history.

    The absence of information can be just as significant as its presence. If something exists in one source and is absent from another, that means that there is a process that failed and could not write a log entry. For debugging purposes, that is literally more important than the pseudo re-creation of log data.

    1. PRR

      Re: "what the log entry should have been"

      > That is nothing more than rewriting history.

      Nice work if you can get it.

      And you can get it, if you lie.

  12. jake Silver badge

    I can't tell you how many times ...

    ... that the lack of a log entry pointed out the exact issue I was tracking down.

    Sometimes what isn't there is the important thing ... this, if implemented, will be nuke on sight on any system I admin ... just like any other form of malware.

    1. amanfromMars 1 Silver badge

      What I can tell y'all about these times ...

      Sometimes what isn't there is the important thing ... this, if implemented, will be nuke on sight on any system I admin ... just like any other form of malware. ..... jake

      Things have moved on by quantum leaps and bounds, jake, into new fields of terror and/or excitement with the realisation of an enigmatic achievement which can neither be effectively attacked and gratuitously assaulted nor ever physically damaged and virtually defeated.

      Always sometimes what isn't revealed there is the important thing ... for that, whenever correctly configured and implemented, cannot fail to nuke on sight any systems administration like no other form of unknown malware or known software empowering hardware and vapourware/ponziware/zombieware

      To some who would be many is that a Doomsday 0Bug to Fear and Server, to A.N.Others and a Few a Heavenly Delight to Diabolically Savour and Favour ......... and a Present Code Red Conditioning Event to Deny is on ACTive Mission PACT Manouevres ‽ .

      And for those who may need to know* what they are trying to deny is a current situation ....say hello and welcome to Advanced Cyber Threats and Persistent ACTive CyberIntelAIgent Treats and all possible variations and reverse engineerings of those themes and memes.

      * ....Royal Chartered, £2.6bn granted, UKGBNI Cyber Security Council ??? :-) It just wouldn’t be fair on them, would it, for them to be able to plead complete ignorance of such an affair hence their being specifically singled out and highlighted in this post although one doesn’t have to be a genius or an Einstein to realise there be at least a few others worthy of mention who might wish to more fully avail themselves of such novel info and disruptive intel in order to take overwhelming advantage of its many benefits and massively utilise its myriad pitfalls/exploitable 0day vulnerabilities.

      ????? Surely you do not expect the Future to be anything like the Past and bear any responsibility for continuing woes in the Present.????? That would be to suggest madness rules, progress has not been made and evolution is halted..... which is clearly preposterous and evidently ridiculous.

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2022