back to article How does Monzo keep 1,600 microservices spinning? Go, clean code, and a strong team

Software engineers from digital bank Monzo told developers at the QCon event in London how and why it runs its banking systems on 1,600 microservices. Monzo's session at QCon was in stark contrast to Monday's presentation in which Sam Newman warned that a microservices architecture is a "last resort". Senior engineers Matt …

  1. Warm Braw

    You don't need to know how 1,600 services work

    I would be astonished if, starting with a clean slate, Monzo hadn't come up with an architecture that looks coherent, at least on paper.

    The interesting thing will be what it looks like in 10 years time: it's accumulating cruft that makes traditional banking systems difficult to maintain. I hope by then there's still some clear specification of what each of those 1600 services (or whatever number is then required) does and that the dependency graph doesn't look like my spare cable drawer. If so, well done, but we'll only know in hindsight.

    1. sabroni Silver badge

      Re: You don't need to know how 1,600 services work

      The alternative is 10 years of cruft in a single monolithic service that encapsulates all 1,600 mico-services into one. All the same code but without being forced to create discreet interfaces or separate concerns properly. The monolith would be in a worse state after 10 years.

      1. Tom 7

        Re: You don't need to know how 1,600 services work

        But given the 1600 micro-services will largely be interdependent the API's for them will resemble very closely the functions in a monolith so there is no reason for the monolith to be any worse than the interdependence. The problems arrive with complexity and the ability of those in charge of the thing to understand what's going on. The architecture is largely irrelevant if there are people in charge whohavent got a clue or haven't got the actual power to di the right thing.

        And surely each micro-service has a discreet interface or its not going to be used except in a monolithic way?

        1. Charlie Clark Silver badge

          Re: You don't need to know how 1,600 services work

          Microservices are a great way to throw away really useful stuff like IPC and burn resources. And, at the end of the day, you easily end up with the same kind of mess (each service will great on its own) but require more resources.

          1. Anonymous Coward
            Anonymous Coward

            Re: You don't need to know how 1,600 services work

            Indeed, who needs shared memory and fast threaded processes when you can package everything up in Json or XML, send it down HTTP(S), then unpackage it at the other end. Those old idiots in charge of legacy monolithic software didn't have a clue about what they were doing.

            1. Morten Bjoernsvik

              Re: You don't need to know how 1,600 services work

              REST Microservices scales linearly horizontal and vertical, even if they are way slower than IPC and direct sockets, you can scale them on 10x of pods. Build in some service and probe endpoints and you practically have a self healing infrastructure as long as you have a backend database that scale.

        2. richardcox13

          Re: You don't need to know how 1,600 services work

          Exactly.

          While each micro-service may be simple, all that has happened is the complexity has been pushed up to the interactions between those micro-services (to some form or orchestration layer).

          Maybe that allows Monzo to, at least now, to more easily manage that complexity. Or maybe having a greenfield development has just avoided the legacy complexity because Moizo don't have that much IT history.

          Only time will tell.

      2. DJV Silver badge

        @sabroni

        "The alternative is 10 years of cruft in a single monolithic service that encapsulates all 1,600 mico-services into one. All the same code but without being forced to create discreet interfaces or separate concerns properly. The monolith would be in a worse state after 10 years."

        Hmm, that could be systemd's future then (shudders).

    2. Charlie Clark Silver badge
      Pint

      Re: You don't need to know how 1,600 services work

      I know it's early but you're spot on.

    3. I Am Spartacus
      Pint

      Re: You don't need to know how 1,600 services work

      With respect to the cruft that develops over time, Monzo say that they include metrics with each of these microservices. This means that they can detect what is cruft by monitoring what doesn't get used. They can then decommit those services that become unused over time as business models change.

      Furthermore, if they record which services do get used a lot, and which are slow to deliver, then they know where to spend engineering effort to optimize.

      I actually think this is very smart indeed.

      Hat tip and a virtual pint to Monzo.

      1. Sammy Smalls

        Re: You don't need to know how 1,600 services work

        Nothing that couldn't have been done in a monolith of course. Perhaps the real change is the understanding of the detailed metrics and reporting required to maintain something like a banking system over a long period of time.

        1. hittitezombie

          Re: You don't need to know how 1,600 services work

          It's much harder to debug why a single thread is working slowly on a monolith app, esp. when you have thousands of other threads interfering.

          I don't like the amount of resources wasted but it's a positive move when it comes to debugging and analysis.

          1. Bronek Kozicki

            Re: You don't need to know how 1,600 services work

            As for the resources wasted, it really depends what they use for serdes and how much of it they do. The slide mentions RPC, so hopefully they are not wasting CPU time for number-to-text-to-number conversions. This is probably the biggest resource sink, when operating on floating-point numbers, in the architectures I know. As for the asynchronous communication over the network, if the volume is not excessively large you mostly pay in latency rather than other resources.

            The nice side of a well designed microservice architecture is that you can tell which services are little used and, because each is relatively simple, you can also retire them and/or move functions elsewhere without much impact elsewhere.

        2. Loyal Commenter Silver badge

          Re: You don't need to know how 1,600 services work

          I think the most valuable thing here is that the microservice architecture enforces encapsulation. Monolithic software doesn't, unless you do so by design. As someone who has to maintain software that began its life in the '80s in a language that doesn't even really have a concept of encapsulation, I can't understate how important encapsulation is for maintainability.

          It also sounds like it makes scalability and resilience easier. Good to see also that Monzo recognise the adage that premature optimisation is the root of all evil.

    4. Ken 16 Silver badge

      Re: You don't need to know how 1,600 services work

      10 years? Give it at least 40 before you compare it with other banking systems. You need a few changes of technology management, generations of tech and trends in architecture and deployment for real entertainment value.

  2. John H Woods Silver badge

    Optimise for readability

    "We optimise code for readability. One of our engineering principles is not to optimise [performance] unless it is a bottleneck."

    As a one time user and creator of intense analytics and a sometime performance engineer, I cannot commend this approach enough.

    1. Giovani Tapini

      Re: Optimise for readability

      And as, an apparently old-school, developer, I also agree. Writing for maintenance and future change, is often best over absolute performance.

      I have experimented with the reverse on one of my teams, where they had literally no idea how my code worked to produce the same results.

      Absolute performance is only necessary in the system space, not at application level in most cases.

      1. John Sager

        Re: Optimise for readability

        Or hard sums libraries, think Numpy.

    2. Gordon 10

      Re: Optimise for readability

      I agree. Or as I would put it - essentially optimising against the prima-donna effect. We've all met that guy - and its usually a guy - who thinks the most obscure code with a fractional performance advantage is the dogs nuts.

      See also optimising for the future engineer who doesn't have the N weeks/months of sprint/planning context needed to support/extend the current solution.

      1. sabroni Silver badge
        Thumb Up

        Re: Optimise for readability

        Loving the single downvote from an aggrieved prima-donna.

    3. T. F. M. Reader

      Re: Optimise for readability

      "...our engineering principles" ?

      Hmm...

      “<...> [P]rogrammers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.”

      - Donald Knuth, TAOCP [circa 1968]

      1. Loyal Commenter Silver badge

        Re: Optimise for readability

        Quite frankly, if they're basing their principles on something they read in Knuth, they are probably doing it right. The fact that those principles are half a century old and counting only highlights the foolhardiness of those who choose to ignore them, the ignorance of those who never learned them, and the arrogance of those who never taught them.

  3. Anonymous Coward
    Anonymous Coward

    The cathedral and the bazaar

    Anyone else remember that?

  4. jonha

    > "Note, though, that Monzo uses a lot of custom, in-house tools and libraries that are not easy to replicate."

    That's the key sentence, IMHO. Yeah, this means more work and slower deployment in the beginning... but it's an approach that, in the long run, means you know exactly what's actually running on your servers and you understand how it's operating.

  5. Christian Berger

    Banking isn't really a highly computational process

    For example on average there are only around 1300 credit card transactions per second in the US. While this may sound like a lot, it's probably less computation than playing an MP3 file takes.

    Of course there is _way_ more database activity, but we live in an age where storing your database in RAM or on fast flash memory is feasible.

    To put this into context, every fixed line call in Germany has to go through a complete lookup of the portability database. That's a database listing every number that has ever been ported. That's millions of datasets. The lookup works with a simple barely optimized program which rarely takes more than a millisecond to look up a dataset, even on a very modest computer.

    1. <script>alert('the register');</script>

      Re: Banking isn't really a highly computational process

      This comment shows ignorance. There is so much more that goes into processing a card payment and it has to be done in fractions of a second

      1. Anonymous Coward
        Anonymous Coward

        Re: Banking isn't really a highly computational process

        This isn't quite true — a lot can go into processing a card payment, but a card issuer on the mastercard network often has multiple seconds to make a decision and respond to a card payment request.

    2. Zilla

      Re: Banking isn't really a highly computational process

      You're many orders of magnitude wrong in your assessment of the computation power required.

      Payment systems are incredibly complex and there are many layers, from the merchant, to the terminal, the merchant acquirer, the scheme provider etc....

      Those 1300 card transactions per second probably equate to over 50,000 transactions at the data/message/API layers at least and that's just at the scheme end, and because it's payments, everything is logged up the ying yang.

      Source: I build platforms for payment providers.

      1. NeilPost Silver badge

        Re: Banking isn't really a highly computational process

        Though Monzo will be at the very end of that chain as not a terminal provider, merchant acquirer etc

        From the other end as a POS Solution provider I do agree that real-time payment auth is essential though.

        Also a Monzo customer and the App is very fresh compared to my HSBC primary account ... yes ‘monolith’... you frustratingly need to revert to Internet Banking go do loads of stuff. HSBC PITA.

        1. Happy_Jack

          Re: Banking isn't really a highly computational process

          To be fair HSBC and First Direct are at the absolute opposite end of the technical debt scale compared to Monzo, Starling, etc.

      2. Giovani Tapini

        Re: Banking isn't really a highly computational process

        Agreed, payments is one of the few areas where real-world performance is a critical issue.

        1. Loyal Commenter Silver badge

          Re: Banking isn't really a highly computational process

          FWIW, I use Monzo for day-to-day banking. Trasfers to and from my other bank account are near-instantaneous, as are notifications of card-spending on the app, and things like bill-splitting.

          I think the poster above is possibly confusing computational expense with complexity. I've never worked on payment processing systems, but I suspect the larger part of payment processing is not the computation involved, but the communication between different layers and providers. In a 1600-node microarchitecture, I'd be very surprised if a single payment involved more than a handful of those nodes; the clever bits, I suspect, are going to be in the routing. The same logic would be required whether it's passing a message on between different microservices, or data between classes in a monolith.

    3. Lee D Silver badge

      Re: Banking isn't really a highly computational process

      "To put this into context, every fixed line call in Germany has to go through a complete lookup of the portability database. That's a database listing every number that has ever been ported. That's millions of datasets. The lookup works with a simple barely optimized program which rarely takes more than a millisecond to look up a dataset, even on a very modest computer."

      I should damn well hope so. Sorry, but it's 2020, and you're doing a lookup from a list of, say, millions of numbers to retrieve a small set of data associated with it?

      There's no way it's searching one-by-one... it's hashing prefixes and following trees. If it touches 13, 14 entries for comparision, I'll be amazed. And at 3GHz, even, that's literally taking fractions of a millisecond, even if it takes hundreds of thousands of instructions on an in-memory lookup (a million data rows is NOTHING to keep in main memory).

  6. SVV

    It's certainly bold, but is it naive?

    I'm certainly not going to rule out the possibility that by going so different on the base technologies used for banking it could possibly be a better way of doing things. But it smells a little dangerously hipsterish to me, and the full range of real life drawbacks of some of these technologies may not yet be fully known (there are always drawbacks). Not to mention that banks kind of need IT people with experience in banking as well as in IT, and going this way closes the door on people without the new hipster skills, and they may end up having to recruit from the wreckage of failed startups up the road in Shoreditch instead. What set off my doubt radar is the rather wide eyed enthusiasm with which they announced the discovery of things that to most people should be bleeding obvious - like breaking down big things into little things when designing functions and services. And your greenfield newborn system will look great now, as they usually do, but what will the consequences be when Mr Hack moves in and starts leaving unseen booby traps all over the place, as he always does.

    As a final thought, I remember 20 years ago when I worked in banking and the new systems were being written using a newly hyped tech that everybody had got very excited about that also broke down services into small independent components that could be scaled up easily and were deployed in containers. It was called Enterprise Java Beans and it caused no end of massive problems after a while.

    1. Brewster's Angle Grinder Silver badge

      Re: It's certainly bold, but is it naive?

      "...going this way closes the door on people without the new hipster skills..."

      How's the recruitment for COBOL programmers going?

      I've not programmed Go. But it looks a fairly straightforward language to pick up. Programming is programming. I agree with the rest of your point, though.

    2. sabroni Silver badge

      Re: going this way closes the door on people without the new hipster skills

      So literally all it takes for upvotes is to invoke the eternal enemy of the nerd, the hipster.

      Micro services are for hipsters you say? Well that technology must be total shit then.

      Always surprises me how this tech forum is so averse to new things. We work in IT ffs!

      1. Peter2 Silver badge

        Re: going this way closes the door on people without the new hipster skills

        Always surprises me how this tech forum is so averse to new things. We work in IT ffs!

        Upvoted, but of course the reason that people are adverse to new things is that, well.

        We have an existing system that has a reliability best compared to granite where every failure case has been tested the hard way and dealt with by putting measures in place to eliminate them.

        Then a new system comes along that's trendy and cool that doesn't have any of that old reliability stuff on it that has the flexibility (and probably the structural rigidity) of jelly. There is only so many such changes and resultant disasters that one can mentally deal with before turning into a gibbering wreck, and many of us have largely passed this point. Many of us deal with this by screaming some variation of "**** off!" when offered something trendy in favour of something that's been around for a long time that's well understood with equally well understood reliability.

        1. Ken 16 Silver badge
          Facepalm

          The first question has to be "Why?"

          I actively have to damp down my own love of new and shiny tech and techniques to remind myself what we're actually trying to do. Why is the new approach better than using something off the shelf? Why choose a technology where you need to pay premium for niche skills rather than premium for experience? If you can't answer that why to the satisfaction of all your stakeholders then stop.

          Otherwise you're just blockchain.

  7. Anonymous Coward
    Anonymous Coward

    Wow

    It just seems so evanescent. I mean, they run 1600 little things that contribute to a bigger thing, but there's still no real "thing" there at the center. It's like a cloud of bubbles stuck together, pop enough bubbles and the whole thing disappears.

    Not a criticism, per se, more an observation.

    1. localzuk Silver badge

      Re: Wow

      Does that not apply to pretty much everything? A human body is just a collection of cells with no "thing" at the center. Pop enough of those cells, and the "human" disappears.

      1. Anonymous Coward
        Anonymous Coward

        Re: Wow

        Yeah, but it used to be that under all the bubbles was an AS/400 or S/36, or something from Sun/Compaq/etc. This is just bubbles all the way down.

        1. localzuk Silver badge

          Re: Wow

          Great, so you used to have one giant single point of failure... And we know how well that worked out for many banks like Natwest...

    2. Lee D Silver badge

      Re: Wow

      You'd rather have one big bubble that any little spike inside it can pop the entire thing?

    3. erikscott

      Re: Wow

      The glue that holds it all together? The Database. Banks are just databases with some retail branch offices. :-)

      I used to work in a (US) FDIC (our bank regulator) regulated environment - we were technically a bank, and when I left we were adding some obviously "retail banking" kinds of things - checking accounts and mortgages, for instance. I got out ahead of the mortgage-backed debacle, but it was pure luck.

      We were moving to a sort-of microservices model, but mostly using MQseries Transaction Integrator (formerly NEON, now Who Knows What). Systems were 90+% MVS (IMS and CICS+DB/2)

      Also, am I the only one who saw "Cassandra" and immediately thought "Free money for everyone!"?

      A quick look at their website and wikipedia suggests they're basically a prepaid debit card business, maybe even just a reseller, who recently added checking accounts. In the US you could outsource all of this to First Data Corp. Many, many small banks do just this - "The Bank of Southeastern Crud County" (over 500 customers!) provides a building, tellers, and a few million in startup capital. First Data does the whole back office, even prints junk mail if you want them to. It's conceivable that Monzo's entire system is just/mostly interfaces to vendors.

      So, yeah, if they're 95+% virtual, this would work. FDC (just as an example - I don't endorse them) would provide the database of record and even Cassandra would then be survivable. They already mentioned the big card providers, plus Apple Pay and a few others, so perhaps many of their 1600 systems are marketing and other ancillary activities. Evidently it works, but given how much money they've had to raise I doubt it works cheaply.

      1. Bronek Kozicki

        Re: Wow

        Getting a banking licence is only this easy in the US. On this side of the world things are very different, for one there are no "mom & pop" banks. Also, large chunk of bank systems is integration, and it is not considered ancillary.

  8. RB_

    1600 single-points-of-failure you say ?

    Fetch me my sales trousers at once! I sense an urgent need for some DR "innovation"..

    1. sabroni Silver badge

      Re: 1600 single-points-of-failure you say ?

      1600 fault tolerant nodes and a detailed audit trail of who called what when?

      No, i want one stack trace that's a million lines long thanks!

    2. Bronek Kozicki

      Re: 1600 single-points-of-failure you say ?

      Quite the opposite, I think. 1600 services is what you might get from generous application of chaos engineering - I bet there is lots of data redundancy builtin.

  9. Steve Channell
    Happy

    Loosely coupled strongly cohesive

    Old lessons still apply, rigorously define the data model and incapsulate the transport. gRPC is a special case because the right network fabric can be faster than IPC

  10. usariocalve

    Really, the development methodology you pick depends on your set of requirements.

    Monolithic applications are more difficult to change, and generally are more fragile.

    Microservice-based applications have their own challenges, but because the APIs are published their contracts are much easier to enforce and side-effects are essentially non-existent...or at least easily traceable.

    You can sort of get modularity in a monolithic app, but you start having versioning problems once you get to a certain point. You can design around that problem in a microservice architecture, since it's the behavior that matters, not the implementation.

    In the end, micro services will probably be better because it reduces the cognitive load on an individual developer. It requires more work up front, but in the long run developer brainpower is the most expensive resource you have, so anything that makes development easier will win.

    1. Loyal Commenter Silver badge

      Microservice-based applications have their own challenges, but because the APIs are published their contracts are much easier to enforce and side-effects are essentially non-existent...or at least easily traceable.

      If you write your unit tests properly, and make sure they pass, then the only sort of bug you should get would be ones that come from faulty specification.

      It goes without saying that it is easier to specify a single function that it is a whole piece of software, so the devil here is in how they interact.

      My feeling is that this approach lends itself to much higher software reliability.

  11. Robert Grant

    meaning you can simply add more hardware to scale, rather than having to migrate to a bigger system

    This is a weird clarification in an article full of much more advanced terminology.

  12. Sil

    Off-topic, kind of

    Regarding trends on the bleeding edge, did you read Facebook's explanations on the full rewriting of Messenger for iOS?

    To me, after years of building hipster javascript framework upon javascript framework, it seems Facebook devs (re)discovered what older devs did. Such as sql stored procedures and such.

    https://engineering.fb.com/data-infrastructure/messenger/

  13. gatestone

    A bit more info:

    Building a Bank with Kubernetes by Oliver Beattie, Monzo

    https://www.youtube.com/watch?v=YkOY7DgXKyw

    Banking on Go - Matt Heath - SF Docker + Go Meetup

    https://www.youtube.com/watch?v=iRNwLjKeVRE

  14. Jhelberg

    Why docker?

    As go delivers almost static binaries, where does the use of docker come from? Using go, one can do without docker.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like