What!?
Not Wordle!
There’ll be hell to pay if you stuff up my Wordle!
A 4chan user has leaked 270GB of internal New York Times data, including what's said to be source code and other web assets, via the notorious image board. According to the unnamed netizen, the information includes "basically all source code belonging to The New York Time Company," amounting to roughly 5,000 repositories and 3 …
Hilarious that they gave themselves away because the e-mail was too cheery for that editor, a grumpy old bear no doubt.
BTW it's time companies started putting their stuff in long-term storage (i.e. tape, either on-premise or in Cloud providers, I know AWS offers this). That'll make it a lot harder for miscreants to nab it. Most of the stuff isn't needed online anyway.
> it's time companies started putting their stuff in long-term storage (i.e. tape, either on-premise or in Cloud providers, I know AWS offers this).
And they charge for it as well, so be sure that what you put there is worth it (just shoving everything up "just in case" will get pricey)
> That'll make it a lot harder for miscreants to nab it.
Slower, that is for sure, but if you have leaky security for credentials that allowed access to live storage you may also have leaked credentials that allowed the miscreants the ability to ask for "this tape to be mounted, please" - and what are the chances people are paying less attention to what is going on with "the dusty old stuff" than the live system?
BUT companies shouldn't be thinking about using long term storage, don't let me put you off, for backup it is a Good Idea.
> Most of the stuff isn't needed online anyway.
Ah, tricky - and pricey, again.
Before relying on long-term storage as your only copy (with a backup tape, of course) you have to be able to *prove* that "this whole directory tree is not going to be needed for X months" (where the cost of live storage is greater than long-term-but-readily-accessible storage). Which means you have to have all your data properly organised - as in, *properly*, not just "I can do a quick file search to find it" - and are absolutely certain that the Javascript sitting here isn't going to be needed on a customer's browser tomorrow ("we stopped generating ages that use that last month, it can go" "Big Name just died, put his pre-written obituary up *now*!" "What do you mean, it can't find its script?").
What you need is a trained Digital Archivist, to help you organise and learn how to determine what to keep live, what to put in the shelf - and what you can just get rid of completely, reducing your storage costs.
Oh, you have thought about all this and are just going to buy another hard drive rack to go into the server room, it is easy, quick and cheap enough, besides who has time to think about these things, we have a newspaper to get out. Ok. COPY!
> Tape deteriorates...
Didn't we go over this recently? Oh yes: Tape is so dead, 152.9 EB of LTO media shipped last year
The conclusion there seems to be that *every* form of digital storage has a limited lifespan and if you want to keep it you are on an endless treadmill of copying to new devices as the years go by. Even if you choose a robust, long-life, medium (e.g. parchment) you still have to be sure to maintain the scanner that lets you read it back into memory.
And it gets worse: spinning rust can be taken out of the case (Maplins used to sell caddies for this) and will be readable in a few years, but make sure you power up your SSD-based external USB drive more frequently.
> BUT companies shouldn't be thinking about using long term storage, don't let me put you off, for backup it is a Good Idea
Whoops. Let's just correct that line, so it doesn't contradict itself:
BUT companies should be thinking about using long term storage, don't let me put you off, for backup it is a Good Idea
"BTW it's time companies started putting their stuff in long-term storage (i.e. tape, [...] That'll make it a lot harder for miscreants to nab it. Most of the stuff isn't needed online anyway."
They should consider using that for backup, but it does little to help with security. Most of that stuff is needed online, or so much of it that they probably need a hot copy. Your repos are much less useful if programmers and sysadmins have to keep asking to get tapes mounted so they can work with the thing. The only thing you can remove from hot storage is the stuff you're pretty sure you won't need soon, and I would define soon as "any time in the next year, from even one person".
They need to modify parts of their code. They need to deploy it to new systems. For both reasons, their code is not something they can just store on tape. Doing that is useful if their hot storage gets damaged and they need to recover, but trying to do it without the hot storage at all will only lead to frustration.
>” BTW it's time companies started putting their stuff in long-term storage (i.e. tape, either on-premise or in Cloud providers, I know AWS offers this). That'll make it a lot harder for miscreants to nab it. Most of the stuff isn't needed online anyway.”
Largely irrelevant. What is not being said is where the source code was extracted from; it could have been from a cloud-based developer platform …
Remember backups are about recovery, archives are for long-term storage. But neither are appropriate repositories for code that is in production and hence either being worked on or need to be accessed at short notice.
Please, you are The Reg, it is too obvious that what has been leaked/stolen is not course code but site content: articles, photos, videos, etc. 270GB of actual program source is Windows and Office sized material, not a newspaper site no matter how big it is.
I'd guess more likely a dump of the webserver root, containing both source code and content such as images etc. Possibly not including the actual article texts as that would usually be in a separate database. Although credentials to access said database could be in the "source code"
Ah, come on, give us some credit!
The words source code are in scare quotes ('source code') because that's how the leaker described it. In the article we call it internal data and assets. When you see 'source code', that's the claim: the article refers to what's actually been allegedly leaked.
C.
> not course code but site content
Pull up the NYT home page and do a View Source on it.
There's a lot more 'gibberish' than plaintext. Like 830k total characters and 13k of visible text. Word 'SCRIPT' appears over and over. Yes, links, formatting, boilerplate..... but 64X is a lot of overhead.
I don't recall the comic book Perry White as being particularly grumpy but cheery definitely not.
Just imagining what an editor has to endure - from barely literate copy from writers who for the most part could be beneficially replaced by chatgpt, to manglement and C-suite crap from above - they should be extended honorary life membership to the worshipful guild of defenestrators.
> the giveaway was that the message was far too cheery for that editor to be real.
I love it as a security protocol. How to scale it up... it's tying electronic communications to things only known to the humans in meatbag-space:
"Mike, you ugly bastard, I need the updated figures for last month ASAFP. Oh, that tie you were wearing last week looked like someone vomited carrots on a pair skid-marked Y-fronts"
The challenge is adapting it for the customer relations team.
> The challenge is adapting it for the customer relations team
Challenge - accepted!
The trick is to come up with novel variants of the old "id-ten-t" and "pebcak", one per customer and get 'em to quote it back to you as their customer id.
"Hi, yes, last week you guys said I was a Flash[1] customer."
I shall pass the baton on to the rest of you lot, confident that you can come up with a plentiful supply of new and useful phrases to suit the expected customer base.
[1] Flatulent, Longwinded And Shiny Headed
Back in the 1960s or 1970s, the NYT embarked on a huge, specific-to-them, office automation project. It was alluded to in The Mythical Man-Month, by Frederic P. Brooks. The project was one of the first few to use the "Chief Programmer Team" methodology, and achieved an amazingly-low bug count, given its tens of thousands of lines of source code.
I'd like to see that code.
"But you'll have to download all 69 RAR chunks to decrypt it and find out.""
If they RARed a big ZIP file or other archive, maybe. Otherwise:
rar x -kb filename.rar (or .r00 or whatever numbering system they used)
on the first one or two parts. I'm not sure if pointy clicky GUI archivers can do that since I find them mostly too cumbersome to work with.
Where the comments are turned off. Pristine articles on their own domains with none of our deranged rantings. Wouldn't want to track mud on their shoes and it lets the authors have much cozier relations with industry side and still pretend to stay on brand.
The commentards here show engagement, but moderating us is a money pit. And leaving us one corner of the pig pen to wallow in leaves us the illusion of free will and makes it less likely for the spleen venters to do so on another platform people will notice. As long as we are griping to each other HERE the straight world isn't paying much attention. At least for now.
Working for a different employer a number of years back, one with a flagship endpoint security product... NYT was a customer. They insisted on running an old unsupported version of the product configured as 'antivirus only'. We had documented multiple calls and emails recommending they upgrade to a newer version and deploy the full security stack.
Yup, they had an outbreak. Yup, they went public. Yup, they blamed us. Nope, we couldn't say a darn thing about it.
I learned something that day.... Don't believe the news, even the sacred ones who claim the high ground as the nation's "Newspaper of Record". Sure, we all know the news is politically biased. But this was outright omitting the facts to suit their own interests.
Applying that lesson to this latest security problem... We cannot trust a single word the NYT is saying about the incident.