back to article FYI: Data from deleted GitHub repos may not actually be deleted

Researchers at Truffle Security have found, or arguably rediscovered, that data from deleted GitHub repositories (public or private) and from deleted copies (forks) of repositories isn't necessarily deleted. Joe Leon, a security researcher with the outfit, said in an advisory on Wednesday that being able to access deleted repo …

  1. Martin M

    I'm with GitHub on this one

    If you commit a secret to a public repo, it is broadly equivalent to pasting it on the service formerly known as Twitter. Deleting the Xeet may make you feel a bit better, but you don't know who's already copied it. Deleting a repo is even more pointless with a distributed source control system that actively and encourages the wide and automated distribution of commits. In both cases, you just have to treat the secret as hopelessly compromised.

    Instead of whining, what you need to be focusing on is a/ rotating your secrets ASAP and b/ stopping people doing it again by with secret scanning.

    1. Richard 12 Silver badge

      Yes and no

      It's reasonable to expect that deleting a commit tree that nobody else has yet accessed will prevent it from being accessed in future.

      Yet GitHub doesn't do that. It continues to make them available to all forks after the deletion, and even actively assists in finding them.

      That part is the issue.

      Sure, you need to try to prevent keys from being committed by mistake with gitignore and pre-commit scanning, change the keys etc, but one expects to be able to delete to limit the damage when the protections fail.

      1. Martin M

        Re: Yes and no

        I’m not convinced that deleting the repo meaningfully limits the damage.

        Just because the commit isn’t in a fork yet doesn’t mean the secret hasn’t been accessed. If one expects that published information can be magically made private again, one’s expectations probably need adjusting. Panicking and deleting repos will probably just get in the way of effective response.

        The thing that *actually* limits the damage is rotating the secret, then it doesn’t matter who’s got the old one. You should be able to do this quickly anyway, so deleting the repository shouldn’t even buy much time advantage. If you can’t - well, that’s another area to take a look at.

        1. Joe W Silver badge

          Re: Yes and no

          OK, here's the situation.

          you fork an upstream repo, your fork is private

          you commit something there that should not see daylight (keys to the Lamborghini or whatever)

          you delete that commit to hide your sins

          and now that commit is apparently still easily accessible from upstream.

          1. Martin M

            Re: Yes and no

            The commit is to a *public* upstream, not a private repo.

            “The key had been publicly committed to a GitHub repository. Upon learning of the blunder, the tech biz nuked the repo thinking that would take care of the leak.”

            The video also makes this crystal clear.

            1. Anonymous Coward
              Anonymous Coward

              Re: Yes and no

              > The commit is to a *public* upstream, not a private repo.

              That seems to be the issue tho - Joe W's example is that the deleted commit *is* in the private repo, but it's still accessible from the forked parent. (TBH that scenario brings up the possibility of a "private" fork effectively being public, depending on how that link back to the parent is set up, but as I read it the issue is only with deleted [or "deleted"] entities.)

          2. Martin M

            Re: Yes and no

            Also - as far as I can see it's not actually possible to create private native GitHub forks of public repos in GitHub, so your example appears to be impossible.

            You can of course push and pull commits between repos on GitHub via a local repo (https://stackoverflow.com/questions/10065526/github-how-to-make-a-fork-of-public-repository-private). But at that point, GitHub doesn't know they're at all related and if you push anything to the public upstream, that is something you explicitly asked to do.

            1. robinsonb5

              Re: Yes and no

              > Also - as far as I can see it's not actually possible to create private native GitHub forks of public repos in GitHub

              The example they cite is when you have a private repo that will eventually become public, fork it to make permanently-private fork and then later make the original repo public. Anything commited to the still-private repo up until the point the first repo is made public, can be accessed from the now-public repo. (As long as you know the commit hashes, that is - but unfortunately they're easily discoverable.)

        2. Anonymous Coward
          Anonymous Coward

          Re: Yes and no

          I suggest you re-read the article. The main issue isn't about trying to delete/hide something you made public, it's about things that were never public being accessible..

          1. Martin M

            Re: Yes and no

            I have reread the article. If you actually look at the videos and read the concrete examples, they talk entirely about commits of sensitive information to public repos. There's not a single example of commits to private repos becoming accessible from public ones.

            1. robinsonb5

              Re: Yes and no

              The one case they talk about where the contents of private repos become publicly viewable is when a formerly-private repo with a private fork is made public.

              In that case, any commits made to the still-private fork up until the time the parent repo became public are accessible from the public repo. That runs counter to users' expectations.

              (Any commits made to the private repo *after* the parent one has gone public will remain private, however.)

              The issue is that people are mentally modelling forks as "that's my copy of the repo, completely separate from the original" whereas in reality the fork is just a different interface to the same pool of blobs. Furthermore, while you wouldn't be able to access commits from another fork in the same pool of blobs unless you know the commit hash, github makes those commit hashes discoverable.

        3. lglethal Silver badge
          Stop

          Re: Yes and no

          If you accidently leave your password lying visible on the street. And a dozen people walk past and see it. Yep you're in trouble. But if you then remove it quickly before more people can see it, then the chances are low that those dozen people are going to cause you problems. They might pass it on to one or more other people, but chances are on your side that nothing will happen, and you will have time to fix your security problems. If you are unlucky, one of those people will post it somewhere else and yep then your screwed, but the quicker you remove the password, the lower the chances that you're going to be targeted straightaway.

          Just leaving your password in sight and saying "Oh well, too bad!" means you will be targeted straightaway, and you have no time to sort your defences first...

          1. Martin M

            Re: Yes and no

            I am 100% not suggesting someone should say "Oh well, too bad". I'm suggesting they rotate their keys, with extreme urgency.

            Passwords dropped in the street are very unlikely to be seen and used by even an opportunistic attacker. Credentials pushed to public repos (at least, those that are recognisable as a credential) are going to be automatically scraped within seconds or minutes by bad actors around the world. Their day job is either exploiting them directly or selling them to someone who will.

            By the time the errant developer notices and tells someone who has repo deletion rights, it is, with near 100% certainty, too late to recover them. It may - if you're lucky, and the attack is not fully automated - be possible to prevent them being used and a foothold established in your network.

            1. Zibob Silver badge

              Re: Yes and no

              Im ignorant of how this all works first off.

              But would rotating the keys actually do anything as the old data is still there?

              1. doublelayer Silver badge

                Re: Yes and no

                Yes, rotating the key is equivalent to changing the password. You can even leave it public. Anyone on the planet can know that "a83dc027b9a62170" used to unlock something, but it doesn't now. The data is now worthless as long as you can make sure it is no longer usable.

                If you can't or don't choose to do that, then deleting the repo looks like a second-best solution. The problem is that it's far too weak and people who think that's good enough are failing a necessary security step. Once it's been committed to the public once, there is a chance that someone has seen it and you can neither reliably detect whether they have nor prevent them from having done so. At that point, you have a risk. Deleting a repo does not eliminate that risk. That credential is compromised and absolutely must be revoked as soon as possible. One of the examples from the article was from a user that wanted to keep using the credential after it had been posted, and that is not a good idea.

        4. Martin M

          Re: Yes and no

          For the people downvoting me - you are aware how quickly AWS credentials accidentally exposed on GitHub are found and abused by attackers? Honeypot tests suggest 1 minute.

          https://www.comparitech.com/blog/information-security/github-honeypot/

          Note that at no point in the "what to do if you've exposed credentials" section does it say "delete the public repo in the hope that this will alter the past". Magical thinking.

          Having played around with this on GitHub, I will say that the message on trying to delete a repo isn't explicit enough about the unexpected (if documented) behaviour. It really ought to have a disclaimer that says "If you're trying to delete commits you wish you hadn't pushed everywhere, this won't achieve it", and a link to a page describing what will actually help.

          1. Michael Wojcik Silver badge

            Re: Yes and no

            Yes, we have plenty of evidence that automated scanners harvest secrets accidentally committed to GitHub within seconds. I agree on that point, and that remediation has to include invalidating those secrets as soon as the mistake is discovered. (And pre-commit secret scanning, better development practices, etc.)

            But I think the interaction between dangling commits and forks in public, centralized git servers are also a problem. In my opinion, that's because public, centralized git servers are an inherently broken design. It's not how git was designed to be used, it discards the major advantage of a fully-distributed change-management system, and it's just stupid. The result of so much development moving to GitHub is a vast number of developers misusing git with no idea how it works. It's an abysmal way to work and typical of what's wrong with software development.

            Regardless of the problem, GitHub is the wrong solution.

      2. tekHedd

        What happens in Repo Stays In Repo

        "It's reasonable to expect that deleting a commit tree that nobody else has yet accessed will prevent it from being accessed in future."

        No it isn't. It's supposed to be a history. In a code repo, the ability to permanently delete past changeset data should be considered a bug or design flaw. The inability to lose history is the whole point.

        Source code has no right to be forgotten, when it's in an SCM, because the point of the SCM is to remember.

        1. robinsonb5

          Re: What happens in Repo Stays In Repo

          > No it isn't. It's supposed to be a history. In a code repo, the ability to permanently delete past changeset data should be considered a bug or design flaw. The inability to lose history is the whole point.

          While that's true, when a user deletes a fork, their expectataion is that the fork is a separate repo which can be deleted in its entirety, history and all. The "obvious" mental model of what's happening is "that's 'my' copy, separate from the original." - but that's not how it's actually implemented.

          1. Anonymous Coward
            Anonymous Coward

            Re: What happens in Repo Stays In Repo

            If their expectation is that a fork is private, they are wrong. The root cause is probably why they are accidentally leaking secrets in the first place...

      3. Ideasource

        Re: Yes and no

        It is also reasonable to expect that Using a third party platform Will eventually leak your important data.

        Reasonably, no one should be a surprised , And everyone should have been anticipating such things working it into their business strategies. Reasonably that should be the risk of those who choose to trust a third party.

        It is unreasonable to expect any real degree of control over anything out of your own personal hands.

        My point is that What is reasonable was sacrificed as compromise for promoting commercial activity over public networks a long time ago. A little late for For reasonable now. We sent reasonable to the gallows a long time ago. And it was quite profitable.

        We now have a system of economic AND industry dependencies on framwork of specific exception to what is reasonable.

        trying to Bring reasonable back into the mix just might crash the whole system.

        Which i'm fine with.

        But I suspect many others are not and overwhelmed by the implications to cognitive dissonance.

        1. Sandgrounder

          Re: Yes and no

          It is also equally reasonable to assume that every business that looks after it's own systems will at some point have some employees that make incompetent actions resulting in data breaches on data loss. It is also likely that a not insignificant number of companies will suffer the actions of malicious employees.

          There is no perfect solution, no 100% guarantees.

      4. MachDiamond Silver badge

        Re: Yes and no

        "but one expects to be able to delete to limit the damage when the protections fail."

        Why do you expect that? Why would you expect any privacy at all after you have read and understood the EULA?

        The aerospace company I worked for had 3 in-house repos. One in LA, one in NY and one in Atlanta all at company-owned facilities. Nothing was ever deleted, but could be archived off leaving only the current files generally accessible. There was no point in using an outsourced service partly due to cost and partly due to the ITAR implications should the data go walkabout. If there was ever any issues, we could have a full replacement of the data via overnight shipping in the worst case. It could have been conceivably worth the cost to have a person at one of the other locations book the first available flight and courier out some drives in their carry-on to have it in hours. None of that would happen with a third party service.

      5. Anonymous Coward
        Anonymous Coward

        Re: Yes and no

        Literally the definition of source control is being able to retrieve something if it gets deleted. Also GitHub will be storing this stuff on tape, and who knows where else.

        An expectation that you can delete is flawed.

  2. Pascal Monett Silver badge
    Stop

    "this is expected and documented behavior inherent to how fork networks work"

    Wrong.

    Delete means delete, just as no means no.

    It doesn't matter if the data has already been accessed or not. A deletion order should mean that it should no longer be accessible, period.

    1. Yorick Hunt Silver badge
      Thumb Down

      Re: "this is expected and documented behavior inherent to how fork networks work"

      Send an e-mail, then delete it - will that prevent the recipient from seeing it?

      This is known, documented and intended behaviour; complaining about it because you're too lazy/impatient isn't the way to go.

      1. Joe W Silver badge

        Re: "this is expected and documented behavior inherent to how fork networks work"

        I'd guess that this can open a can of wriggling regulatory proverbials. There must be a way to delete stuff, as far as I understand the rules.

        From an architectural point of view this can be difficult. GIT does not like it if you delete a commit that lies embedded in a chain of commits.

        1. Martin M

          Re: "this is expected and documented behavior inherent to how fork networks work"

          The real architectural problem lies in deleting the commit from the hard disks of the attackers who immediately scraped it, which renders the other ones a bit moot.

          1. Snake Silver badge

            Re: hard disks

            I'm wondering about the storage space. If dangling commits never really get deleted that means hundreds of thousands, or maybe millions, of dangling commits are still in the system. Exactly how much storage space is this wasting?! Considering that people who 'delete' things actually expects the data to, you know, cease to exist, GitHub must be wasting big money as their repo storage continues to grow with nothing actually getting "deleted" - it's the Almost Eternal Commit History.

            1. Michael Wojcik Silver badge

              Re: hard disks

              Dangling commits are just references to existing data, so all that's being created is metadata. My guess is that only a tiny fraction of the actual data that's committed to GitHub is ever "deleted" anyway, so the overhead of not actually ever deleting any data is proportionally tiny.

              Any decent change-management system, including git, is mostly storing deltas, so in effect it's de-duplicating within a repository. If GitHub has de-duplication technology running across repositories (for all the cases where someone copies a file from repo X to repo Y, or commits a bunch of standard headers or boilerplate code or whatever), then they potentially save on a huge amount of storage there too. And most of what's on GitHub is source code, which compresses well.

              Sure, GitHub needs a lot of storage by some measures, but compared to other giant applications hosted by Microsoft (e.g. Bing) or other firms, it's really not very big at all.

      2. Mike007 Silver badge

        Re: "this is expected and documented behavior inherent to how fork networks work"

        There is a big difference between "documented" and "known"...

        I would put money on the fact that you haven't read the complete documentation for every product you use. I would also put money on the fact that if you were to read the documentation you would find at least 1 thing that makes you go "wtf?".

        And regarding your example, outlook having that "recall email" button for external emails is atrocious - misleading every non-technical user who sees it. When I was on the helpdesk I came across numerous examples of users who "knew" they had recalled an email straight away so the recipient won't have had a chance to read it...

        1. Michael Wojcik Silver badge

          Re: "this is expected and documented behavior inherent to how fork networks work"

          This is one of the reasons why GitHub is a Bad Thing; the vast majority of its users have no idea how either git or GitHub work, both at the feature level and at the implementation level.

          1. sammystag

            Re: "this is expected and documented behavior inherent to how fork networks work"

            You've asserted twice that the "vast majority" or "vast numbers" of git users have "no idea" how it works. What are you basing that on? I'm a daily user of git, so are all my colleagues, we know how it works or we couldn't use it effectively. Who are the users that don't know how it works? I would expect most git users to be developers whose job it is to understand this stuff.

            1. Julian Bradfield

              Re: "this is expected and documented behavior inherent to how fork networks work"

              There are plenty of people who use git without knowing how it works. We use it to store teaching materials, and although all of us are quite bright and also have all the training required to understand git in intimate detail, most of us can't be bothered. I know "git pull", "git add", and "git commit", which is all I need to get on with my actual job of teaching the students. In previous years, we even used it to collect student exercises, though thankfully that's gone, and I guarantee less than 10% of the students had any idea of what was going on. Git is used by inexpert git-users any time somebody in position of either power or enthuisiasm decides to use git for a project/job.

            2. doublelayer Silver badge

              Re: "this is expected and documented behavior inherent to how fork networks work"

              I don't know how many people this applies to, but there are many who don't know much about Git and use it anyway. When someone first picks up the tool, it looks pretty easy. You add a file, you commit the changes, you push. The code goes up. When someone else has made changes, you pull. Great, I understand Git.

              They think that right up until their first merge conflict. Oh, I can't just push while someone else might be doing the same thing. So they learn branches. I push to my branch, you push to your branch, then we merge them. Great, I understand Git.

              They think that until they need to get code back. How do I find the code after someone's merged over it? None of my commands do that. So they learn some other ones that work with the history, and they learn some blunt tools for returning to the head. Great, I understand Git.

              They think that until a branch merge conflict. Okay, it's time to learn rebase, and rebase isn't a simple command. But they read about it and do some experiments, but now, they know they don't understand Git, at least not fully.

              I can't say I do either. I have a relatively good understanding of some of the internals. I know enough to know that you can't simply delete a commit and expect it to become invisible. I can describe some of the internal structures accurately, and I can sound confident when I do it as long as people only ask about the ones I've actually looked at. But since I have not written code inside Git, nor have I memorized every manual page in it, I do not know everything there is to know about Git. Nor am I the least knowledgeable person on my team. We know enough about Git that we get what we need and don't break things. That doesn't make us experts. And we're professional users. There are lots of beginners who know less because they've used it less.

    2. Ideasource

      Re: "this is expected and documented behavior inherent to how fork networks work"

      Delete has always meant removing obvious reference.

      Reality continues unabated to demand and enforce that you cannot destroy information.You can merely obscure it or transform it, But it will still exist somehow in some form to be potentially uncovered and understood.

      Fundamental aspects of reality. information cannot be agreed into a nonexistence. It cannot be litigated into nonexistent.

      All these attempts actually accomplish is tricking humans into ignoring reality with spotty results across individuals

  3. Ayemooth

    Other cloudy git providers?

    Any word on whether a similar exploit is possible with other similar services (Gitlab, Bitbucket, etc)? Or they did they consider this possibility when designing their own fork features?

    It seems reasonable to me that a fork should only include non-dangling commits. I wonder what a git clone does... I'll give it a try when I'm at my desk.

    J

    1. Joe W Silver badge

      Re: Other cloudy git providers?

      "Clone" should get you the whole upstream repo, unless you do something specific only the "main" branch. And if you then change "remote" to a new remote repo and commit stuff there, upstream cannot see it.

      The problem is that upstream can access all commits of your private fork.

  4. Anonymous Coward
    Anonymous Coward

    Meaning Of The Word "Deleted".........

    (1) Saved something somewhere

    (2) "Somewhere" has a backup procedure

    (3) "Somewhere" is public......like FaceBook

    (4) Deleted my stuff in the original "somewhere"

    ......but my stuff got restored..............................

    ......people out there (FaceBook?) had saved my stuff....................................

    ......Wayback Machine (see www.wayback.archive.org)..................................

    Yup.....I don't think that people actually understand the meaning of the word "deleted".........................................

    1. tekHedd

      Re: Meaning Of The Word "History".........

      SCMs save the history of all changes to a code base, for repeatable builds, blame, and just generally because that's it's job. You can always go back to any of those points in time, because that's why it's there.

      Seems like an awful lot of people don't know why git exists in the first place. The ability to remove things from history invalidates the premise.

      1. Roland6 Silver badge

        Re: Meaning Of The Word "History".........

        > The ability to remove things from history invalidates the premise.

        I suspect many are uncertain of whether git differentiates between ‘save’ and “commit”.

        In my book, I should be able to save and delete whatever I want without it being recorded in “history”. However, once I’ve committed a piece of work to publication different rules may apply and git history’s WORM characteristics come into play. However, nothing in my private fork should be visible outside of my fork’s usergroup unless I explicitly permit it.

  5. Rich 2 Silver badge

    GitHub

    GitHub is a (largely? Completely?) free service and a public one at that

    If I used it then I would have zero expectation of security. I would also have zero expectation of it operating in a particular fashion

    You get what you pay for. If security and certainty are what you want then bring your version control system in-house

    Complaining about this is like complaining about your completely free email service going down

    1. sedregj Bronze badge
      Big Brother

      Re: GitHub

      "GitHub is a (largely? Completely?) free service and a public one at that"

      No it has a "free" offering and a subscription service with extra facilities. In return for you posting your code on the site, they get to read it and all that entails.

  6. jcc5169

    Be Smart

    If you have intellectual property in the form of source code, don't store it in a public repository. You are inviting theft.

  7. Bebu
    Windows

    I am guessing ...

    if you clone an existing repository to a local machine, create another repo in github, change the remotes in the local config file to the new repo and then push your local copy this problem isn't going to arise?

    I have done this a couple of times when I couldn't arsed to find out how to fork a public repo from the web interface.

    Although if github were to deduplicate commits based on their hashes that might not be the case. I don't know how git actually works but if it all works by references (to prior and successor commits, and to the actual commit itself) then it gets interesting.

    References might be said to the root of all programming evil? I was always a fan of value-result. :)

    I imagine if each commit has an access control list (ACL) attached and when a repo is deleted the particular access control entries (ACE) it received (or inherited) from the now deleted repo were also removed this problem could be mitigated.

    As I only store a piddling amount of CC-0 code that I have knocked together for past BOFH activities and a few other programming trivialities I am definitely not going to lose any sleep.

    If I had serious stuff I would host locally and archive my repos, compress, encrypt and then perhaps store the result on github as a backup but even then I probably wouldn't risk it.

  8. herman Silver badge

    Schrodinger

    That is what happens when you have a cat in a box logo.

  9. DaemonProcess

    Secure boot bypass

    So the secure boot bypass vuln just released had test private keys which are still trusted held in github for a while...

  10. Henry Wertz 1 Gold badge

    scrub?

    Maybe they should have a seperate 'scrub' action or the like... if git makes ot that hard to trule delete something it could at leasy let someone nuke out the actual API keys or whayever bad thing they sent to github.

    Personally I don't even have anything hosted on github, but I thoufht it was well known that git doesn't erase anything, that delete just makes it no longer appear but the data is still there. I mean that's the point off a revision control system.

  11. Anonymous Coward
    Anonymous Coward

    > Data from deleted GitHub repos may not actually be delete

    Why did people think it would be?

    I think some people need to take some time to reflect more upon tech and the companies surrounding it...

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like