back to article GitLab scans its customers' source code, finds it's as fragile as you'd expect

GitLab, a rival to Microsoft's hosted git service GitHub, has for the second time tested the security of customers' hosted software projects... and found them wanting. The code storage and automation biz initially scanned hosted code for security issues in April. Having just just reprised its examination, the outfit has found …

  1. druck Silver badge
    Holmes

    Don't build on sand

    Its either because my background as an embedded C developer or that I've got old*, but the thought of using layers of libraries which you aren't in control of, fills me with horror. It took until C++11 until I was happy using STL, and I still think Boost is often a step too far, no matter how useful. But then again, when I'm doing Python I'm happy to use anything that's in pip, as long as it has good documentation.

    However, if I ever start using javascript, dragging in god knows what libraries from npm while cutting and pasting code I don't understand from stackoverflow, I hope someone will put me out of my misery.

    * I think you've already decided which.

    1. Anonymous Coward
      Anonymous Coward

      Re: Don't build on sand

      *I think you've already decided which.

      Perhaps it's that oh-so-rare quality: competence?

      This makes me think of a thing I read a while ago about John Carmack's way of working which really resonated with me:

      Carmack's code at the time was kind of amazing. In the most complimentary way possible, I call Carmack a "coding insect." Like how a bee knows how to build a hive, Carmack codes with a complete picture in his head of what parts he needs to make a whole. Back then with every generation game engine he'd start over from scratch—I mean really from scratch, not namby-pamby "I rewrote some of the code and called it scratch." Since his engines ran on a variety of machines and OSes, he wrote every damn function himself. Carmack needs to log something? Carmack writes a logging function. New generation of engine? New logging function. EVERYTHING from SCRATCH.

      Because I was young, super-anal, and wasn't on SSRIs back then, I once asked Carmack why he didn't use libraries for common functions that he could share between engine revisions. Carmack's a super-nice guy, but on this one instance he used the "Well, I think my methods work pretty well..." defense. I never suggested coding style changes again.

      But, really, for him it made no sense to share code, because, like a bee, it was just as fast to write new code. The template was in his head, he types really REALLY fast—why bother importing something?

      I have been accused of NIH in the past, but in the same way as Carmack, it's not (exactly) NIH, it's that I can write something and have complete understanding and control in a similar timeframe to importing something and learning the API, and where I'm at the whims of someone else to have things like bugs addressed or features added.

      ...or, at least, that's what I tell myself. It's also possible that I'm just old too... ;)

      1. Randy Hudson

        Re: Don't build on sand

        In the mid 90s I had a buddy like that in college. In his AI class, they had to write a program that navigated a 2D maze. When he presented his program, it was a first person view of some creature running through a 3D world similar to DOOM. When the teacher what software packages he had used to achieve visuals above and beyond what even the professor considered within the reach of undergrads, his reply was, "a C compiler"

    2. Kevin McMurtrie Silver badge

      Re: Don't build on sand

      It shocks me too. It's top secret or it's full of banking data so the back-end is locked down tight with reviews, vulnerability scanners, and digital signatures following every step of deployment. Then everyone's browser grabs whatever JavaScript from 10000 people who are surely all honest and blessed with top-notch security skills. I've worked for at least three companies that had browser pages hacked via 3rd party JavaScript. It wasn't direct, but a long chain of delegation that ended up in a bad place. Only one of those companies moved to locally hosted JavaScript.

    3. msobkow Bronze badge

      Re: Don't build on sand

      I largely agree. Unless libraries are provided by *very* well known sources like Apache or the operating system, or have a proper *vendor* maintaining and enhancing them, I *only* rely on the libraries provided by the *language* for the sake of portability.

      With Java 1.0, that wasn't really possible, nor with the C++ of the era. But I've found the pre-release GNU versions of the upcoming C++20 standard to be very full featured, and have only *rarely* had to resort to direct Linux API calls, and Java has been there since around JDK 11. JDK 9 got close, but JDK 11 achieved it. :)

      Python "portability"? Please. That thing is such a nightmare of bodge I wouldn't deploy a lemonade stand advertising app with it. :(

      1. fnusnu

        Re: Don't build on sand

        Like struts?

        1. Charlie Clark Silver badge

          Re: Don't build on sand

          Or POI. Provenance, particularly from Apache, is no guarantee of quality.

    4. Filippo Silver badge

      Re: Don't build on sand

      I work the same way, when I can. In my work outside of JS, mostly C# and C++, it's reasonably easy: if you really need something, chances are the developer of the library isn't using anything except for system libraries and their own code. You need to trust him, but that's it. Currently, in my main product (hundreds of thousands of lines), I use one datagrid component, and some implementations of well-defined standards (e.g. a modbus library); the rest is system SDK only. If one of those components publishes a fix, I upgrade and rebuild and that's it.

      But in JS? The idea of minimizing dependencies just isn't there; it's not in the culture at all. On the opposite, people will use a library just to avoid writing a one-liner. As soon as you import anything at all, which you'll have to because there are no "system libraries" worthy of the name, you find you have literally dozens of libraries in your dependency chain, by as many different developers, and a jungle of interlocked version requirements that prevents you from easily upgrading any of them, should a critical fix be published. At that point, you can't trust; you can only hope.

      1. Randy Hudson

        Re: Don't build on sand

        "people will use a library just to avoid writing a one-liner"

        Perhaps some thought this was an exaggeration. Check out the "has" package. It gets 14m downloads a week.

        https://github.com/tarruda/has/blob/master/src/index.js

    5. bombastic bob Silver badge
      Coffee/keyboard

      Re: Don't build on sand

      the thought of using layers of libraries which you aren't in control of, fills me with horror

      it fills me with nausea. yeah, same idea.

      "What's up? My LUNCH, that's what!"

      icon, because, that.

      On a related note, looks like they were only scanning container thingies. So I guess all of us C and C++ devs are left out of their security scans...

    6. DrXym Silver badge

      Re: Don't build on sand

      C++ is no more exempt from this any other language. If I want to produce an HTTP server in C++ then my choices are:

      1. Write one myself and good luck

      2. Pull in an implementation one, e.g. Beast, Mongoose, libwebsockets, whatever

      In either case I expose myself to potential bugs and exploits of those packages.

    7. Brewster's Angle Grinder Silver badge

      Re: Don't build on sand

      I propose we give this a name - metal programming. As in, you like to program as close to the "metal" as possible. Where the metal is considered to be the "standard environment" of the language.

    8. Charlie Clark Silver badge

      Re: Don't build on sand

      The problem with NIH (not invented here) is that there is no reason to think that your own code will be any better or safer. While it's naive to assume that that package you depend upon is bug free and secure, it might well be better tested (both functionally and securitywise) than your own.

      Python seems to do well, possibly because many of the potential problem areas are handled by extremely robust parts of the standard library. It might also help that so many pentest tools are written using those self same parts of the standard library. Unit testing, which as any fule no, doesn't prevent any bugs or exploits but can be mighty useful when fixing them, is also pretty pervasive and this together, with a long-standing open source culture and liberal licensing also means that many key libraries are routinely scanned. This also helps in situations where code audits are required.

  2. RM Myers Silver badge
    FAIL

    Fragile source code

    But is it "Agile"? Remember folks, the goals are to move fast and break things. The consistent accomplishment of one of these goals has been the crowning achievement of the 21st century IT.

    1. Steve Todd

      Re: Fragile source code

      If you think Agile means that, you’re doing it wrong. It’s perfectly possible to be Agile and still produce code tested to the n-th degree. What it defines is an iterative approach where small batches of work are completed, tested and demonstrated to the users. Their feedback then goes into future cycles. Individual sprints may not be perfect, but they shouldn’t be released to the users.

      In my experience very few companies use the methodology correctly, but stick the name on some bastardised version that looks closer to Waterfall.

      1. bombastic bob Silver badge
        Devil

        Re: Fragile source code

        Agile sounds like it has too much bureaucracy in it, and an oxymoronic name.

        What you really need is a bunch of mad scientist types (like me), a decent engineer as a manager, a clear goal, and a sufficient slot of time. Generally can get it all done under budget that way with as much as 10:1 productivity (or even better, depending), so long as you have a competent engineer dividing up the tasks in a sane manner [and as few meetings as possible].

        You know, OLD school! And NO rapid/radical direction changes. Those go into "rev 2".

        1. Mike 137 Silver badge

          Re: Fragile source code

          "OLD school! And NO rapid/radical direction changes"

          You also need a clear vision of what the code should not do as well as what should. Testing for boundary conditions is fundamental to creating robust code, and you need to define those boundary conditions up front so you know what to expect and how to address it.

          I suspect that the majority of testing now done is merely driven by valid inputs - "it works! Amazing! Get it out the door!" so these crashingly elementary bugs:

          CWE-20: Improper input validation, which enables injection attacks.

          CWE-787: Out of bounds write of intended buffer, which enables remote code execution.

          CWE-400: Uncontrolled resource consumption, which enables denial of service attacks.

          never get found, which is why they're still the top three after all this time.

        2. sabroni Silver badge
          Thumb Up

          Re: And NO rapid/radical direction changes.

          Smashing! Good to see competitors spend months delivering something that we've already tried and found the customers don't like.

        3. This post has been deleted by its author

      2. a_yank_lurker Silver badge

        Re: Fragile source code

        Agree about what is called 'Agile' is not really agile. The original idea was to break the project down into well defined pieces that could be done relatively quickly and presented to the end user for review and comment. The idea is to get feedback from the user as the project matures to keep it on the straight-and-narrow. The concept is sound; break the project down into more manageable pieces and get frequent and regular review from the user throughout. It was never about getting code out the door faster though that might be a byproduct of the cycle as issues are more likely to be identified earlier in the cycle and fixed quickly but to well designed and written code out the door the does what is expected and needed.

  3. petef

    Public?

    I do hope that they were only scanning public repos and not private.

    1. Anonymous Coward
      WTF?

      Re: Public?

      Why would you hope that? It's not like gitlab is releasing links to private repos with vulnerabilities.

      You shouldn't be using the fact something is in a private repo as a security feature. If you're storing passwords in a private repo, for example, you're still doing it wrong, because a) any gitlab employee who wants them now has all your passwords, and b) what if gitlab is hacked or has some disgruntled employee nerf their system to make all private repos visible?

      As for vulnerabilities in your code, why would you not want to know about it? The only time it wouldn't make any difference is if it's code that isn't public-facing, and even then I think it's a good thing to at least be aware of potential issues, even if they have no impact and you don't plan to do anything about them.

      1. InsaneGeek

        Re: Public?

        Gitlab says they use encryption at rest. If they are able to do private scans of your code that would mean that they have a way to access your unencrypted source code... all it would then take is finding an employee that would take some cash or getting hired under false pretenses to perform some rather insidious industrial espionage

        1. Anonymous Coward
          Anonymous Coward

          Re: Public?

          All that means is that their disks are encrypted and that some random guy in the datacenter can't abscond with a disk and sell your secrets, it doesn't mean that gitlab don't have the keys, or that your secrets are safe from gitlab employees. In fact, since they're serving up your unencrypted code via http and ssh interfaces, they must have access to your code. This all seems pretty obvious and straightforward to me. But maybe I'm missing something?

          Even if they did claim to not have access to your private repos, if you are relying on a third party to encrypt things like business-critical private keys for you, you're still doing it wrong. If you must store that sensitive stuff on a server you don't control, then it should be encrypted at your end before it ever leaves your premises. And in that case, they won't have any ability to scan it.

          Of course, that begs the question as to where you'd store the key for the encrypted blob you uplaoded to gitlab...

          ...pastebin, maybe?

          1. whaber

            Re: Public?

            A great place to store secrets is Hashicorp Vault which is integrated into GitLab: https://docs.gitlab.com/ee/ci/examples/authenticating-with-hashicorp-vault/

            There are other good places too of course.

        2. whaber

          Re: Public?

          Security scans in GitLab are configured by the project maintainers on a per-project basis. https://docs.gitlab.com/ee/user/application_security/

          The security scanning features are (for the most part) free for use by open-source projects (and a paid feature for private and customer self-hosted projects).

    2. whaber

      Re: Public?

      "Data sources

      The trends report's underlying data is sourced from projects hosted on GitLab.com and does not include data from our self-managed customers. It is comprised of medium or higher severity vulnerabilities appearing in five or more projects that occurred between September 2019 and October 2020. All project-specific data was anonymized."

  4. HAL-9000

    Shocker

    There's a lot of java/node gobbledygook there. Just saying ;)

  5. Ashto5

    Cook your own

    I had to show an Angular dev how I would have coded a small piece of a webpage in JS,

    “why would you do that you can use a library” was the reply.

    I asked him “what does the library do ?“

    “It validates the input in the text box With simple markup”

    I then asked

    “what else’s it does”

    his reply I think shocked himself

    “I don’t know”

    And the penny dropped ....

    1. sabroni Silver badge
      Thumb Up

      Re: Cook your own

      Cool story bro!

    2. DrXym Silver badge

      Re: Cook your own

      The VanillaJS lib is frequently sufficient for a lot of web pages. However I think that as the page becomes less of a page and more of an application then the likes of Angular / React become a necessity.

  6. Duncan Macdonald Silver badge
    FAIL

    The worst bit about the Javascript mess

    Is that a lot of the rubbish executes in the users browsers. Faulty code executing in a server only harms the business that owns the server - faulty code executing in a users browser can do (and has done) damage to innocent people.

    In my opinion Javascript execution in browsers was a huge mistake - worse than including Adobe Flash.

    Of necessity I use NoScript wherever possible to limit exposure - unfortunately too many critical sites require Javascript to operate. For those sites all I can do is hope that there is no critical bug that is going to harm my system.

    My opinion of Javascript =========================>

    1. bombastic bob Silver badge
      Devil

      Re: The worst bit about the Javascript mess

      I would guess that MOST of what is done using Javascript is COMPLETELY unnecessary. I generally avoid it, unless I'm coding a UI for an embedded system that uses a web-based UI. In such a case, to get the kinds of performance you might need, resorting to javascript becomes the easier solution (like maybe coding a popup detail editing thing that acts like a dialog box, by unhiding nested 'div' sections to display it, and re-hiding when you pres 'ok' or 'cancel' to make it go away). Otherwise, it's pure HTML and CSS, with all of the work done server side whenever possible. (yeah I do UIs too, when I have to. I prefer device control, but one-man-banding it means doing the UI so...)

      I really hate working with scripty HTML devs, though, primarily because I will probably end up cleaning their mess (using LOTS of profanity more often than not while doing so), and THEM locking things into a particular monolithic library (or style sheet from hell) just makes it worse. Such "developers" need to be "educated" properly, often with a Cat-5-o-nine-tails, clue-bat, or rubber chicken. Just kidding. (no I'm not)

  7. Anonymous Coward
    Anonymous Coward

    So did they supply that report to the developers to resolve those security defects?

    1. whaber

      TLDR: Yes

      The maintainers of the projects that were scanned set up the scanning themselves. The vulnerabilites can be viewed by the developers in the security dashboards: https://docs.gitlab.com/ee/user/application_security/security_dashboard/

  8. Anonymous Coward
    Anonymous Coward

    Code written by millennials is generally not very good. Perhaps the beard oil fumes impair thinking.

    Just a theory.

    1. Anonymous Coward
      Anonymous Coward

      Ooooh, ground breaking!

      People new to job not as good as seasoned veterans. Pictures at 10!

      1. Anonymous Coward
        Anonymous Coward

        Re: Ooooh, ground breaking!

        The differences were that we wanted to learn rather than appear cool, we listened to our mentors rather that assume that we were 'special' and therefore knew it all, and there was no GitHub for us to spew our malformed creations into so that others would use them without any care or attention whatsoever.

        GitHub should force committers to add their verified age to their project home pages, so we could avoid anything written by wispy-bearded man-children.

  9. sabroni Silver badge
    Boffin

    But still not as bad as the way most websites run

    I can put together a node app using npm and I have to depend on other people's code. Like I do when I work in any language. I have a tool that pulls the dependencies in and reports known vulnerabilites to me. It's my problem, I control the payload that's delivered to the customer and can audit as I see fit.

    When I visit theregister.com, for example, it tries to pull in scripts from theregister.com, doublick.net, google-analytics.com and jwplayer.com. (I have a sneaky suspicion that if I allowed those I'd get some more domains listed but the site works fine without them so they don't get run on my machine.)

    How does a site developer take responsibility for the scripts delivered by other domains? I can audit the code I pulled from npm but I have no control over what a third party domain serves. How could I?

    So the much more fundamental issue with modern JS development isn't that we build using code from lots of people we can't trust, it's that we build services that pull code from domains we don't control. In that situation we can never audit the code and be confident it is secure.

  10. zapgadget
    Pint

    Or, just an idea...

    The "security professionals" could do the world a favour and fix the bugs instead of just exposing them?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2021