
Alternative headline: CompSci boffins find that logs are overly verbose and duplicated.
CompSci boffins think they've come up with a novel way to recreate missing entries in log files. In a paper titled Bagging Recurrent Event Imputation for Repair of Imperfect Event Log with Missing Categorical Events, Dr Sunghyun Sim and Professor Hyerim Bae (both from Pusan National University in South Korea), and Professor …
"restoring missing event values... which can overcome human or system error."
What exactly is a "system error"? Is a cronjob that wasn't run due to <<whatever>> a system error and, if so, could an entry be inserted into the log file for that "missing" run when in actual fact we need the entry to be not there? The last sentence is completely true: "...imputed logs clearly have potential to make life interesting for digital forensics practitioners." Perhaps for admins too.
Maybe I'm missing the point, but surely if a log file is so critical that there will already be security in place?
So, what they are doing is creating log entries that the software deems *should have* been there, with time-stamps that it reckons are about right.
Thus rendering one of the main purposes of a log prone to error. If I want to read through a log file (and want is probably a bit of a strong word there), I will almost certainly want to know the exact sequence of events, which are likely to have occurred in close succession.
Given the nature of modern multi-threaded and asynchronous programming, the timing and sequence of events can be very important in tracking down and diagnosing issues. If some "AI" has come along and inserted entries into that log file with "best guess" timing / sequence / content, it is going to be actively counter-productive.
I'd be focusing instead on why some of your log entries aren't getting recorded accurately in the first place, because this sounds like a "clever" solution for an imagined problem. I can't say I've ever experienced this sort of thing happening with any of the logging frameworks I've ever used.
> there is absolutely no benefit in imagining something that might perhaps fill the slot; it tells you exactly nothing
Yes, but it's neater... And since quite often the letter of the rule is way more important than the spirit, you need to have clean, neat, complete logs, no matter what's in them.
[15/Dec/2021:08:02:21] Lorem ipsum dolor sit amet, consectetur adipiscing elit
[15/Dec/2021:08:02:58] sed do eiusmod tempor incididunt ut labore et dolore magna aliqua
(and so on)
If there's nothing in the logs after an event, there is absolutely no benefit in imagining something that might perhaps fill the slot; it tells you exactly nothing. .... Neil Barnes
Surely one cannot be serious and actually believe any or even all of that, Neil?
Such would virtually tell anyone with an earnest honest interest practically everything needed to be known and not done with regard to the event.
I'm going to make a wild guess that they are actually academics who don't have the multi-decade real-world experience of a lot of the commenters here, so exactly the opposite of that is true.
It's probably someone's thesis topic. "Pick an interesting problem to work on. No, it doesn't matter if that problem doesn't really exist, Knuth solved all the real ones decades ago."
Two things spring to mind.
Having a log just stop gives an easily spotted point of FUBAR to work back from, not many of us will appreciate trying to find the last real entry just to start the process.
If the overall system is running well enough to be able to produce made up log entries it can B****y well punt a live message to someone saying 'X' has just packed in logging the events we were expecting.
I'm going to make a wild guess that the people who worked on this probably know a little bit more about how to justify recieving grant money than people (like me) who got our degrees, and then got out of Uni and entered the real world. Otherwise they wouldn't bother doing it.
As it happens, one of my many and varied dumpster fires that needed putting out today involved doing exactly this, unpicking log files from two different sources, one of which logs things happening in parallel in multiple threads, line by line, to work out the sequence of events to determine at exactly which point an API returned an internal error, to try and infer why.
If any of those log entries had been "filled in", either with "expected content", or a timestamp from somewhere else (hint: not all sever clocks are synchronised ot the fraction of a millisecond, but the times in these logs are accurate to that degree, and entries are always written in the order that they are logged, even if they have the same time stamp), then this could very well have led me to the wrong conclusion, which, thankfully was of the "it's someone else's problem" variety.
On the other hand, if a log is so regular that you can easily infer the order that entries should occur, even if such entries are missing, then you're not really writing a useful log. You're either writing an audit, or wasting disk space. You probably don't want to be writing software that automatically falsifies audits for you.
What's the betting this gets used in court. Someone says "someone deleted from our log file... must be hackers" and uses the AI to "rebuild" the log files. Those get submitted in court, and of course because it's AI and it's a computer it "never makes a mistake".
I mean OK this is a bit of slippery slopeism and it probably says more about my cynical worldview than anything, but as usual we have to be careful with AI and remember that it isn't really intelligent, it just pretends to be.
Anyway, I'm off to go and hide in my cupboard. Might do a bit of moaning and wailing later, if I feel in the mood. (Gnashing of teeth is a luxury I reserve for the weekends).
... inserted entries into that log file with "best guess" timing / sequence / content, it is going to be actively counter-productive.
You are too kind.
The net result will be rubbish.
... sounds like a "clever" solution for an imagined problem.
Quite so.
But there's been a lot of that going around in the past few years.
Think Poettering's systemd for a good example of that.
O.
"Recreating" lines isn't really accurate then. All they're doing is getting data that has already been recorded from other sources and then trying to work out where it fits into a file with "missing" data.
Why is time, energy and effort being spent on these bullshit activities?
The three authors couldn't find a tool to recreate missing events. So they built one that correlates data from other relevant sources.
If the data is already there, then the actual real problem is that some people don't know where it is.
I can't envisage this being used in any serious or critical application. Imagine if flight data recorders worked on this premise. We'll just try and guess the sequence of events so we can put everything into 1 convenient file, rather than having the prerequisite knowledge to determine them accurately... Fuck off.
Just because you *can* do something, does not mean that you *should*.
I may be missing the real-world use case for this, but it sure sounds like it's an "academic problem".
Bemoaning that others "don't understand" is more than a mite condescending. I'm pretty sure most people have understood what they are doing, we just don't know why you would want to do that.
To most analytical minds, the absence of an entry in a log file tells us more than having a guessed-at entry filled in, in its place. Especially so, if that entry is being constructed from other data, because it's pretty obvious that if it is missing from the log file, then we should go looking to see where it actually is, and in doing so, hopefully gain an understanding of the sequence of events that has led to that situation.
Log analysis is an art more than a science. More often than not it will consist of a process of filling in gaps ourselves to determine the sequence of events that could have happened. In doing so, this may raise other questions, which may well lead us towards a real underlying problem that needs solving. If you hand this process over to an "AI" to make a best-guess at everything, then this problem-solving process never happens.
It's OK.
Repeat in front of a mirror until you can say it with the most shaken, heartbroken expression you can manage (and not giggle):
"There's been a terrible accident..."
Apropos of nothing, anyone seen my bag of quicklime and my roll of carpet? I put them down when I went to get my print out of poorly surveiled woodland sites and building sites with deep concrete pours occurring soon...
Use an AI guessing to train another AI and lie about "evidence". Nice. Just what some politicians need.
Event logs are very often used as evidence - not necessarily the legal kind - to establish the sequence and timing of events, who/what was involved and responsible. Tampering with those event logs is just like any other record tampering, even if it's tied up in a nice red bow and a gift tag that says "With Love from your favourite AI".
THen the side note about logs being used to train AIs is in itself suspicious. If you use fake records to train an AI then all you are doing is reinforcing whatever bias you decided was important to you.
Is there a rotting fish icon?
Event logs are very often used as evidence - not necessarily the legal kind - to establish the sequence and timing of events, who/what was involved and responsible. Tampering with those event logs is just like any other record tampering, even if it's tied up in a nice red bow and a gift tag that says "With Love from your favourite AI”. ..... Peter Galbavy
Tampered event logs in the West are invariably default tagged and gifted as if “From Russia with Love”
Can you imagine the insight/foresight such a gross mischaracterisation delivers to those in the East? It tells them practically all that they need to know about the weaknesses being attacked and defeated in the West.
Iteration 1. There was no event.
Iteration 2. There was an event, but we joined the event database and the rule database, so the event must have obeyed the rules.
Iteration 3. There was an event that broke the rules but we weren't there.
Iteration 4: I join your denial to the 'they all lie, all the time' axiom and hey presto: truth!
I've added an infographic and a link to a summary of the study by one of the universities. It basically, to me, works by figuring out what data from various sources is needed to create a log's entries, and then automating the process of generating missing entries from that data.
C.
It's still no clearer exactly what they're doing because it's just a pile of jargon. It's the "recurrent event imputation" that concerns me. The nearest I can make of it is "There's usually an event of type X here but there isn't in this case so let's add one." Possibly it means something different and got lost in translation from the Korean.
That is nothing more than rewriting history.
The absence of information can be just as significant as its presence. If something exists in one source and is absent from another, that means that there is a process that failed and could not write a log entry. For debugging purposes, that is literally more important than the pseudo re-creation of log data.
Sometimes what isn't there is the important thing ... this, if implemented, will be nuke on sight on any system I admin ... just like any other form of malware. ..... jake
Things have moved on by quantum leaps and bounds, jake, into new fields of terror and/or excitement with the realisation of an enigmatic achievement which can neither be effectively attacked and gratuitously assaulted nor ever physically damaged and virtually defeated.
Always sometimes what isn't revealed there is the important thing ... for that, whenever correctly configured and implemented, cannot fail to nuke on sight any systems administration like no other form of unknown malware or known software empowering hardware and vapourware/ponziware/zombieware
To some who would be many is that a Doomsday 0Bug to Fear and Server, to A.N.Others and a Few a Heavenly Delight to Diabolically Savour and Favour ......... and a Present Code Red Conditioning Event to Deny is on ACTive Mission PACT Manouevres ‽ .
And for those who may need to know* what they are trying to deny is a current situation ....say hello and welcome to Advanced Cyber Threats and Persistent ACTive CyberIntelAIgent Treats and all possible variations and reverse engineerings of those themes and memes.
* ....Royal Chartered, £2.6bn granted, UKGBNI Cyber Security Council ??? :-) It just wouldn’t be fair on them, would it, for them to be able to plead complete ignorance of such an affair hence their being specifically singled out and highlighted in this post although one doesn’t have to be a genius or an Einstein to realise there be at least a few others worthy of mention who might wish to more fully avail themselves of such novel info and disruptive intel in order to take overwhelming advantage of its many benefits and massively utilise its myriad pitfalls/exploitable 0day vulnerabilities.
????? Surely you do not expect the Future to be anything like the Past and bear any responsibility for continuing woes in the Present.????? That would be to suggest madness rules, progress has not been made and evolution is halted..... which is clearly preposterous and evidently ridiculous.
Biting the hand that feeds IT © 1998–2022