Including external code provided by someone, somewhere without checking it meticulously is a big, big mistake.
Of course, it makes the projects easier, cheaper but nastier and potentially dangerous. It's a bad practice.
Machine learning models are unreliable but that doesn't prevent them from also being useful at times. Several months ago, Socket, which makes a freemium security scanner for JavaScript and Python projects, connected OpenAI's ChatGPT model (and more recently its GPT-4 model) to its internal threat feed. The results, according …
Oh, I had a PR a few years ago, which removed a bit of hand crafted HTTP code (about 10 lines in all), to be replaced with whatever the library of the moment is for doing HTTP stuff is these days. (Because another abstraction atop the system provided libraries is a good idea).
When I queried it, as I thought the code worked just fine and didn't see the point of adding another dependency, I was told that there wasn't a bug, it's just someone didn't understand the code and 'this library is the industry standard'.
3 months later, a quirk of our production setup brought the entire website to a halt.... the reason... this HTTP library.
We make requests to a plurality of back ends, and the retry/back-off policy implemented by this library applied to the web server as a whole... so failed requests to one back end would cause every request to any back end to have an increasing delay applied.
It was a trivial fix, but it really hammered home the 'If it's not broken; don't fix it' approach.
(The code that got replaced did not suffer from this issue).
It is, but tell that to project managers that pressure Devs into banging out a project in a matter of weeks.
I've been bumped off several projects for daring to suggest that using a full blown framework like Zend to build a simple API is a dumb idea.
Unfortunately the reality is that a significantly large portion of software developers are talentless snippet shovelling wankers that can't build without at least 8 billion layers of abstraction training wheels.
It's insane, for example, how many people call themselves JavaScript developers that can't write vanilla JS.
Fortunately, you can tell who they are because they typically use a sticker bombed MacBook Air.
They also get really pissy when you show them that their entire "project" can be done with a Perl one liner from 30 years ago.
A recent example I can cite was with a guy that used fucking Blazer (he's a former VBA 'dev', seems Blazer is the current safe haven for these types) to write a web crawler. I replicated the functionality in 30 minutes with a tiny bit of pure PHP with a bash wrapper...and it performed 200% better.
"Ah but it's not multithreaded" he said.
So I opened up 16 windows, assigned a core affinity to each then ran the script 16 times from a single command.
"There, now it is"
His jaw dropped.
Seriously, the perception of complexity is mind bogglingly warped. People seem to think that because someone has written a library or wrapper for something that it must be hard.
It usually isn't.
Yes there is an argument for usability where my hacky solution was concerned, but FFS it's a web crawler...nobody isn't going to "use" it...it's automated and goes off to do its thing.
You have of course reviewed every line of code in your system personally? You've validated the compilers, reviewed and recompiled the firmware, confirmed that the silicon matches the bios? You never deploy a patch without decompiling it first?
Commendable.
And once you've confirmed all that, we can begin to consider the OS. And then the actual software that does stuff.
That's a childish response. There always has to be some level of trust, which you apportion according to the level of risk and the reputation of the developer. For all their failings (including greed), at least you know that the big name companies have security teams and processes in place because they want to protect their reputation. You don't know what the situation will be for most individuals contributing to open-source projects and libraries, at least at first, so you should evaluate risk and act accordingly.
Tell me you've never worked on a secure system without telling me you've never worked on a secure system.
In environments that carry extremely sensitive and secret information, this is common practice. As far as the OS is concerned, for "top secret" stuff, there are hardened kernels available that have been subject to scrutiny by experts that audit them (yes that includes Windows kernels) as well as custom security policy apparatus that can be deployed.
The way it works in say a military system is that your server will begin in a super locked down, hardened state...absolutely nothing will run, only the OS will boot...you can't even open notepad. The process from there is to analyse each application that may be required to exist on there using various tools, such as the sysinternals suite (procmon, regmon etc etc) to identify each and every component required for that app to just launch. You then repeat this process for each feature of the application that is required. Gradually opening up the system to accommodate. This stripping back goes all the way down to the extra fucking wallpapers, icons and fonts. If there is no use case for Wingdings and clouds.bmp on the secure system...it's gone.
You never start from a completely open system then lock it down, you start from a locked down system and you gradually open it up. It's far less time consuming this way and you achieve the result that you scoffed at above.
This way you don't really need to worry as much about rogue lines of code existing in an application because you've analysed each feature that you require and you know exactly what each feature is going to call, be it IP addresses, remote server, registry keys, file paths etc etc. if there is some sort of rogue backdoor in a product you've installed and you somehow haven't spotted it during your analysis, it can't possibly do anything because you haven't allowed anything outside of the bare minimum for the application to do something.
As for how applications / solutions that rely on tons of libraries, frameworks etc are tested...they simply aren't...they just won't be used as certifying them would be far too time consuming. There may be libraries that do get regularly tested, like OpenSSL, curl etc etc but that one library you used to speed up interacting with that one API that you use occasionally...no...it just won't be allowed.
Where patching is concerned, this is usually achieved in layers. You have a second identical setup that is airgapped and miles away from production that you can install a patch on, then test on. You run through the same process as above, you check all the reg keys that get accessed, file paths, procmon etc etc etc...if the patch requires no new additional paths, keys, etc to opened and it doesn't affect functionality / performance at all...jobs a good un, you can roll that patch out. If something changes, then those changes are reviewed, scrutinised etc etc before anything is rolled out.
Once again though, the patching for these types of systems is not parallel to the retail patching systems...your "top secret" Windows machine isn't connecting to the public "Windows Update" servers.
Where device firmware is concerned...it's much the same thing...except with a twist...firmware for network devices etc is usually a stripped back version of what you mind find on the "open market" versions of the hardware. Absolutely nothing superfluous exists in these firmware images. Don't need DHCP on that router? It won't even be included as a feature...you couldn't enable it, even if you wanted to because it isn't there and in these sort of environments, you don't typically troubleshoot a device that is having issues...you swap it out to keep everything running, then and only then do you being troubleshooting (if it's even appropriate), it's most likely if the device is mission critical that it will never be repaired, it will just go off to be shredded.
Configs are usually read only (and flashed to a ROM chip), it's highly unlikely that a device will need a change while it is operating...if a change is required, it is made to another image, flashed to another device, then the device with the change is swapped in. This way, the only attack vector for "fucking with" your routers, switches etc is to get physical access to the devices and use a chip flashing tool to install your "hacked" config. Good luck with that, getting past the 4 or 5 checkpoints that exist between outside and the room that holds the "spare" kit and the additional 2 checkpoints that exist to get into the comms room...in a building which is manned 24/7 by several teams that constantly check each other.
The best kind of security for this sort of thing is to ensure that absolutely nobody can change the config of a running device. Think about it...why would you need to change something on a network that has been heavily planned out that will only be used for performing specific tasks that has been tested to death? The answer is, you don't. If in the unlikely event a change does need to be made because a mistake has occurred or the scope of the system has changed...you wouldn't ever do it on a live system...you'd test the fuck out of it on a closed non-production system first before deploying it. There is no banging away at a config until it works on a system like this.
There is this romantic notion that in a high security network, everything has to be incredibly high tech to fend off all the skids, Trrrsts and spies but the reality is that the solutions are incredibly low tech and usually looking after these networks is very low tech and incredibly boring. Which is what it should be.
Good security is not defined by ones ability to lock something down, it is defined by ones ability to identify what you don't need and to get rid of it.
I have facepalmed my own head in to the shape of a fucking spoon reading countless posts by sysadmins recounting the methods they have used to prevent certain attacks. My favourite one is sysadmins proudly using hot glue to cover over the front I/O on a machine to prevent "Bad USB" hacks. This is duuuuuuuumb. The approach you would take if you knew anything about security at all is that you would completely remove the front I/O entirely...in fact you'd probably ensure that the machines you purchase never had front I/O in the first place and that it is impossible to add it. You see where I'm coming from? You can't attack what isn't there.
I've seen amateur cybersecurity folks warble on about the chicken wire fence that have around their setups to prevent radio based attacks...my god this is fucking stupid. If you're in a faraday cage, you have absolutely no requirement for any radios, whatsoever...so you'd be better off just getting rid of any devices that have a radio in them. If you're a full blown tinfoil hatter and you're worried that one of your teeth is broadcasting to a local spy station, then just be underground...live in a basement in a concrete building...you don't need the fucking chicken wire...it doesn't really work anyway! To cover up every possible frequency and wave length, you'd ultimately need to be in a fucking lead coffin.
Out in the wild, if you want to achieve a decent level of security when you decide to deploy that new trendy app you've built...rather than spin up a full fat Ubuntu 22.04 VPS with all of its pre-installed packages and the baggage that comes with it...consider using a stripped back OS and only add the bits that you need...use a distro like Alpine. When it comes to your codebase, use as few libraries as you can and those you can't live without, if they're open source, strip out the bits you aren't using, yeah it's time consuming, but at least you'll be able to say with some degree of confidence that you have achieved some level of security and worst case scenario, you will have so little to monitor and manage that any kind of attack will stand out like a sore dick and will be much easier to rectify.
Is it harder to set it up? Yes. Is it more secure because it comes with only the bare minimum of packages to boot up? Yes. Does it take longer to get that sweet, sweet app live...Yes. Is it worth it? Fuck Yes!!
Ignore the urge to get your shit launched as fast as possible, step back, breathe and think...simplify and avoid bloat. Your life will be all the better for it. You will ultimately save time, money, effort and you'll be able to sleep confidently at night knowing that you did everything you could to prevent some sort of breach and when you tell your customers that the hack was the result of an "unforeseen" situation, you'll be telling the truth and you'll be able to nail down the vulnerability a lot faster.
As soon as the miscreants and criminals who write malicious code get their hands on equivalent AI, I think this is just going to escalate. Malicious actors will just use AI to obfuscate code really well. Mind you, highly obfuscated code should just be a red flag anyway.
But my concern still stands, it'll become an arms race and I for one, do not welcome our robot overlords.
The problem is that you can use a GAN approach: one LLM acting as a detector, the other as a generator. You just invert the objective function so the generator refines its output until the detector fails to detect the malware. It needn't be obfuscated in the sense you're using the term, just too complex for the detector to ferret out.
"Applying human analysis to the entire corpus of a package registry (~1.3 million for npm and ~450,000 for PyPI) just isn't feasible"
So the only thing the miscreants really need to learn is how to stay under a pseudo-AIs radar.
Simple ? Maybe not, but when they do get there, there will be plenty of time for mayhem before somebody twigs the coup and solves the problem. Then the miscreants will find another way.
Sword and shield, always.
Whilst a miscreant might find a way of staying under the radar for a particular iteration of an AI system once the AI has been updated to detect the malicious code it can process the whole software base much more efficiently than a system that relies on human expertise.
This still leaves the challenge of initially detecting the "new" malware technique and training the AI system - which is also the case with human expertise based systems.
The first part of that challenge is not necessarily entirely one-sided. To exploit the malware the malicious actors need to get it into target systems and interacting with their command, control, and data collection systems. The big operators (major APTs) are subject to a great deal of attention from the cyber-security industry (as well as government organizations and university research centres) so there are multiple ways their activities might be discovered - and once that is done the process of determining how they have breached security should be greatly aided by automated techniques that can anayze large code bases efficiently.
Training AI systems is ongoing research - not all aspects are good news (we have seen articles hereabouts about methods of malciously subverting training models in undectable ways, but this is adding to our knowledge.
AI systems could (and probably will) be used by malicious actors and this will probably advance the "script kiddies" technical sophistication - but they, and the more technically competent APTs, will also face the challenge of keeping their AI system training current with a potentially fast changing battlefield.
We have also seen articles about ChatGPT and how aspects of its creative output can seem impressive but also at times very dumb. It still seems that using AI to be creative is much more challenging than using it in a data processing role - suggesting that AI technology favours the defence team rather than the offence.
Garbage in
Garbage out
Still holds true 50 years after it was coined
We had a package that only showed what it was really doing behind the scenes when the network went out and it started to report it could not reach the Russian server
It tried at random times but never more than once a day
Tricky little git
Some people worry "AI" might develop to the point it destroys humanity.
I don't think we need worry about that.
The massive rush to embrace "AI" with all its associated uptick in energy use (the "new generation" of "AI" tools are all energy hungry at many points in their lifecycle, from building all the extra kit needed through to running it, which will help accelerate global climate change) will increase the chance we hit a particularly nasty climate tipping point well before the terminators are unleashed (or whatever similarly outlandish "AI will destroy humanity" scenario that Effective Altruism fans & others seem so keen on believing*)
Though at least the eye watering cost of some of the best "AI" systems deterring heavy usage is helping keep energy use down to relatively manageable levels currently
* I worry more about short termism & "I'm alright Jack" mentality of people screwing over Homo (not very) Sapiens rather than "AI"