back to article New York Times source code leaks online via 4chan

A 4chan user has leaked 270GB of internal New York Times data, including what's said to be source code and other web assets, via the notorious image board. According to the unnamed netizen, the information includes "basically all source code belonging to The New York Time Company," amounting to roughly 5,000 repositories and 3 …

  1. Anonymous Coward
    Anonymous Coward

    What!?

    Not Wordle!

    There’ll be hell to pay if you stuff up my Wordle!

  2. prh99

    An anonymous 4chan poster was responsible for the "Gigaleak" leaks from Nintendo, so could very well be.

    1. Benegesserict Cumbersomberbatch Silver badge

      Anonymous 4chan posters are responsible for all manner of shite, so could very well not be.

  3. StrangerHereMyself Silver badge

    Cheery

    Hilarious that they gave themselves away because the e-mail was too cheery for that editor, a grumpy old bear no doubt.

    BTW it's time companies started putting their stuff in long-term storage (i.e. tape, either on-premise or in Cloud providers, I know AWS offers this). That'll make it a lot harder for miscreants to nab it. Most of the stuff isn't needed online anyway.

    1. PRR Silver badge

      Re: Cheery

      > ...the message was far too cheery for that editor to be real.

      As a son of an editor, I get it.

    2. Anonymous Coward
      Anonymous Coward

      Re: Cheery

      Or, as Gibbs (NCIS) demonstrated once, just pull the friggin' network cable.

      1. Drakon

        Re: Cheery

        Two idiots, one keyboard.

    3. that one in the corner Silver badge

      Re: Cheery

      > it's time companies started putting their stuff in long-term storage (i.e. tape, either on-premise or in Cloud providers, I know AWS offers this).

      And they charge for it as well, so be sure that what you put there is worth it (just shoving everything up "just in case" will get pricey)

      > That'll make it a lot harder for miscreants to nab it.

      Slower, that is for sure, but if you have leaky security for credentials that allowed access to live storage you may also have leaked credentials that allowed the miscreants the ability to ask for "this tape to be mounted, please" - and what are the chances people are paying less attention to what is going on with "the dusty old stuff" than the live system?

      BUT companies shouldn't be thinking about using long term storage, don't let me put you off, for backup it is a Good Idea.

      > Most of the stuff isn't needed online anyway.

      Ah, tricky - and pricey, again.

      Before relying on long-term storage as your only copy (with a backup tape, of course) you have to be able to *prove* that "this whole directory tree is not going to be needed for X months" (where the cost of live storage is greater than long-term-but-readily-accessible storage). Which means you have to have all your data properly organised - as in, *properly*, not just "I can do a quick file search to find it" - and are absolutely certain that the Javascript sitting here isn't going to be needed on a customer's browser tomorrow ("we stopped generating ages that use that last month, it can go" "Big Name just died, put his pre-written obituary up *now*!" "What do you mean, it can't find its script?").

      What you need is a trained Digital Archivist, to help you organise and learn how to determine what to keep live, what to put in the shelf - and what you can just get rid of completely, reducing your storage costs.

      Oh, you have thought about all this and are just going to buy another hard drive rack to go into the server room, it is easy, quick and cheap enough, besides who has time to think about these things, we have a newspaper to get out. Ok. COPY!

      1. PB90210 Bronze badge

        Re: Cheery

        The problem then becomes the format of that backup. Tape deteriorates, hardware becomes unavailable, formats becomes unsupported, labels fall off, tapes gets overwritten to save money...

        1. that one in the corner Silver badge

          Re: Cheery

          > Tape deteriorates...

          Didn't we go over this recently? Oh yes: Tape is so dead, 152.9 EB of LTO media shipped last year

          The conclusion there seems to be that *every* form of digital storage has a limited lifespan and if you want to keep it you are on an endless treadmill of copying to new devices as the years go by. Even if you choose a robust, long-life, medium (e.g. parchment) you still have to be sure to maintain the scanner that lets you read it back into memory.

          And it gets worse: spinning rust can be taken out of the case (Maplins used to sell caddies for this) and will be readable in a few years, but make sure you power up your SSD-based external USB drive more frequently.

          1. Anonymous Coward
            Anonymous Coward

            Re: Cheery

            As NASA found out, don't put your tapes on the bottom shelf were the big floor buffer can erase them.

          2. Anonymous Coward
            Anonymous Coward

            I prefer papyrus

            My mummy is pretty old and never had a failure, and still lighter than a clay tablet.

      2. that one in the corner Silver badge

        Re: Cheery

        > BUT companies shouldn't be thinking about using long term storage, don't let me put you off, for backup it is a Good Idea

        Whoops. Let's just correct that line, so it doesn't contradict itself:

        BUT companies should be thinking about using long term storage, don't let me put you off, for backup it is a Good Idea

    4. LordZot
      Happy

      Re: Cheery

      Grumpy editors are a security measure? Got it.

      LOL. That's so hilarious!

    5. spireite Silver badge
      Trollface

      Re: Cheery

      I'm of the opinion that grumpy members of staff should be put in long-term storage

      1. Fruit and Nutcase Silver badge
        Mushroom

        Re: Cheery

        Certain companies have a term for that.

        e.g. Resource Action

    6. doublelayer Silver badge

      Re: Cheery

      "BTW it's time companies started putting their stuff in long-term storage (i.e. tape, [...] That'll make it a lot harder for miscreants to nab it. Most of the stuff isn't needed online anyway."

      They should consider using that for backup, but it does little to help with security. Most of that stuff is needed online, or so much of it that they probably need a hot copy. Your repos are much less useful if programmers and sysadmins have to keep asking to get tapes mounted so they can work with the thing. The only thing you can remove from hot storage is the stuff you're pretty sure you won't need soon, and I would define soon as "any time in the next year, from even one person".

      They need to modify parts of their code. They need to deploy it to new systems. For both reasons, their code is not something they can just store on tape. Doing that is useful if their hot storage gets damaged and they need to recover, but trying to do it without the hot storage at all will only lead to frustration.

    7. Roland6 Silver badge

      Re: Cheery

      >” BTW it's time companies started putting their stuff in long-term storage (i.e. tape, either on-premise or in Cloud providers, I know AWS offers this). That'll make it a lot harder for miscreants to nab it. Most of the stuff isn't needed online anyway.”

      Largely irrelevant. What is not being said is where the source code was extracted from; it could have been from a cloud-based developer platform …

      Remember backups are about recovery, archives are for long-term storage. But neither are appropriate repositories for code that is in production and hence either being worked on or need to be accessed at short notice.

  4. SecretSonOfHG

    “Source code” does not make 270GB

    Please, you are The Reg, it is too obvious that what has been leaked/stolen is not course code but site content: articles, photos, videos, etc. 270GB of actual program source is Windows and Office sized material, not a newspaper site no matter how big it is.

    1. cyberdemon Silver badge

      Re: “Source code” does not make 270GB

      I'd guess more likely a dump of the webserver root, containing both source code and content such as images etc. Possibly not including the actual article texts as that would usually be in a separate database. Although credentials to access said database could be in the "source code"

    2. IGotOut Silver badge
      FAIL

      Re: “Source code” does not make 270GB

      And yet again some smart arse that fails to read the first paragraph..

      " 270GB of internal New York Times data, including source code, "

      Note the word "INCLUDING".

      1. Anonymous Coward
        Anonymous Coward

        Re: “Source code” does not make 270GB

        But reading he article is hard. And giving my opinion is so easy.

    3. diodesign (Written by Reg staff) Silver badge

      Re: “Source code” does not make 270GB

      Ah, come on, give us some credit!

      The words source code are in scare quotes ('source code') because that's how the leaker described it. In the article we call it internal data and assets. When you see 'source code', that's the claim: the article refers to what's actually been allegedly leaked.

      C.

    4. PRR Silver badge

      Re: “Source code” does not make 270GB

      > not course code but site content

      Pull up the NYT home page and do a View Source on it.

      There's a lot more 'gibberish' than plaintext. Like 830k total characters and 13k of visible text. Word 'SCRIPT' appears over and over. Yes, links, formatting, boilerplate..... but 64X is a lot of overhead.

  5. Bebu
    Windows

    Great Caesar's ghost - a cheery editor!

    I don't recall the comic book Perry White as being particularly grumpy but cheery definitely not.

    Just imagining what an editor has to endure - from barely literate copy from writers who for the most part could be beneficially replaced by chatgpt, to manglement and C-suite crap from above - they should be extended honorary life membership to the worshipful guild of defenestrators.

    1. Dan 55 Silver badge

      Re: Great Caesar's ghost - a cheery editor!

      Perhaps he's more like J Jonah Jameson than Perry White.

    2. that one in the corner Silver badge

      Re: Great Caesar's ghost - a cheery editor!

      > I don't recall the comic book Perry White as being particularly grumpy

      Clearly you never tried to call him "Chief"!

      That young Jimmy, he never would learn.

  6. Dave 126 Silver badge

    > the giveaway was that the message was far too cheery for that editor to be real.

    I love it as a security protocol. How to scale it up... it's tying electronic communications to things only known to the humans in meatbag-space:

    "Mike, you ugly bastard, I need the updated figures for last month ASAFP. Oh, that tie you were wearing last week looked like someone vomited carrots on a pair skid-marked Y-fronts"

    The challenge is adapting it for the customer relations team.

    1. that one in the corner Silver badge

      > The challenge is adapting it for the customer relations team

      Challenge - accepted!

      The trick is to come up with novel variants of the old "id-ten-t" and "pebcak", one per customer and get 'em to quote it back to you as their customer id.

      "Hi, yes, last week you guys said I was a Flash[1] customer."

      I shall pass the baton on to the rest of you lot, confident that you can come up with a plentiful supply of new and useful phrases to suit the expected customer base.

      [1] Flatulent, Longwinded And Shiny Headed

      1. An_Old_Dog Silver badge

        An Alternative Interpretation

        When I read "Flash[1] customer", I initially thought you were referring to buyers of the security-buggy software product known as Adobe (nee Macromedia) Flash!

  7. Tron Silver badge

    Possibility.

    Software vendors have 'source code'. The NYT just use the stuff. Said unnamed bro may have bought a used laptop, recovered the contents of the HDD and found that it was previously owned by a NYT exec.

    1. An_Old_Dog Silver badge
      Boffin

      Re: Possibility.

      Back in the 1960s or 1970s, the NYT embarked on a huge, specific-to-them, office automation project. It was alluded to in The Mythical Man-Month, by Frederic P. Brooks. The project was one of the first few to use the "Chief Programmer Team" methodology, and achieved an amazingly-low bug count, given its tens of thousands of lines of source code.

      I'd like to see that code.

    2. John Brown (no body) Silver badge

      Re: Possibility.

      Any code run via an interpreter is, by definition "source code" even while being run in production. Javascript, Python etc., all the usual suspects.

  8. mostly average

    It's 4chan

    There's just as much chance it's just a huge archive of Shrek erotic fan fiction. But you'll have to download all 69 RAR chunks to decrypt it and find out.

    1. John Brown (no body) Silver badge

      Re: It's 4chan

      "But you'll have to download all 69 RAR chunks to decrypt it and find out.""

      If they RARed a big ZIP file or other archive, maybe. Otherwise:

      rar x -kb filename.rar (or .r00 or whatever numbering system they used)

      on the first one or two parts. I'm not sure if pointy clicky GUI archivers can do that since I find them mostly too cumbersome to work with.

  9. xyz123 Silver badge

    here it is, in all it's New York Times Glory:

    If $story == Bullshit then Print.

  10. benderama

    If they’re using AI it could be LLM training data. Maybe?

  11. Pascal Monett Silver badge
    Trollface

    Wait, 4chan is an image board now ?

    I thought it was a cesspile of fould language, threats to my Mother and endless "I did not - Yes you did" competitions.

    When did they upgrade ?

  12. T. F. M. Reader

    So is it OK to train LLMs on it now?

    Asking for a friend.

  13. Raphael

    PS: Subhead was inspired by Lester's burning Burning Man man headline.
    I miss Lester Haines.

    1. Antron Argaiv Silver badge

      As do I.

      And I miss the Reg's tagline being prominently displayed. At least they haven't shut off commenting in the name of corporate sanitization.

      1. Anonymous Coward
        Anonymous Coward

        No, they just post more stuff on their sub brand sites

        Where the comments are turned off. Pristine articles on their own domains with none of our deranged rantings. Wouldn't want to track mud on their shoes and it lets the authors have much cozier relations with industry side and still pretend to stay on brand.

        The commentards here show engagement, but moderating us is a money pit. And leaving us one corner of the pig pen to wallow in leaves us the illusion of free will and makes it less likely for the spleen venters to do so on another platform people will notice. As long as we are griping to each other HERE the straight world isn't paying much attention. At least for now.

  14. Anonymous Coward
    Anonymous Coward

    NY Times are security idiots

    Working for a different employer a number of years back, one with a flagship endpoint security product... NYT was a customer. They insisted on running an old unsupported version of the product configured as 'antivirus only'. We had documented multiple calls and emails recommending they upgrade to a newer version and deploy the full security stack.

    Yup, they had an outbreak. Yup, they went public. Yup, they blamed us. Nope, we couldn't say a darn thing about it.

    I learned something that day.... Don't believe the news, even the sacred ones who claim the high ground as the nation's "Newspaper of Record". Sure, we all know the news is politically biased. But this was outright omitting the facts to suit their own interests.

    Applying that lesson to this latest security problem... We cannot trust a single word the NYT is saying about the incident.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like