I just finished watching Why Google Stores Billions of Lines of Code in a Single Repository and honestly, while it looks intriguing, it also looks horrible.
Have you run into issues? Did you love it? How was it/
You usually run into issues if you are trying to use off the shelf tools and git providers. IMO GitHub and GitHub actions sucks hard for monorepo. The fact that all actions have to be stored in a single directory for example almost certainly is unmanageable rats nest waiting to happen at any sufficiently large business with a sufficiently complex product or set of products.
This is why companies like google run their own forms of git with custom wrappers to let you do things like pull a segment of the terabyte sized repo or run partial builds with tooling that basically runs some kind of graph against the changes. Bazel for example had to be invented to help solve that problem at Google and pants similarly for twitter (who also has a monorepo)
If you are willing to invest in using tools like bazel and own building all these complex wrappers then it can be fine. But if you want to off the shelf gitlab or GitHub actions and use your IDEs built in git tooling it’s not going to be for you. That’s the difference between what’s possible or a good idea at a medium shop vs a company with 40k engineers
In my experience at a company that just moved away from monorepo, half the off the shelf vendors and foss tools out there balk at you if you expect monorepo support. We moved away specifically because at our current company size it is more tolerable to have our different products separate and eat the occasional pain of mass pattern adjustments across the repos than to build out a team to manage the custom tooling required for a gig plus sized monorepo
Plus, even google doesn’t have a true monorepo. Chrome and Android are not in the same repo as search for example. Find your seams and manage them appropriately
+1 about not having a true monorepo. Meta doesn’t have one either, despite how much we like to talk about it. So there’s still friction when you need to “canary” a change from one repo to another
Thanks for the insight. Are there any tools that you used at your company that you’d recommend? Did you encounter any opensource CICD for monorepos that worked?
I discovered JOSH which was intriguing to put in front of existing source forges, but I don’t know of source forges that support monorepos by design. Github and Gitlab are multirepo for sure and shoehorning a monorepo into that, like nix did with nixpkgs, is cumbersome.
We use them at Meta. It’s easier to interact with other parts of the codebase, but it doesn’t play well with libraries so you end up redoing a lot of stuff in-house.
I would only recommend a monorepo if you’re a company with at least 5,000+ engineers and can dedicate significant time to internal infra.
I would only recommend a monorepo if you’re a company with at least 5,000+ engineers and can dedicate significant time to internal infra.
It’s funny because at least one FANG does not use monorepos and has no problem with them, in spite of being at the same scale or even perhaps larger than Facebook.
I wonder why anyone would feel compelled to suggest adopting a monorepo in a setting that makes them far harder to use and maintain.
Is it Amazon because they did a really good job at keeping teams separate (via APIs)?
I don’t think they did an exceptional job keeping teams separated. In fact, I think monorepos only end up artificially tying teams down with an arbitrary and completely unnecessary constraint.
Also, not all work is services.
it doesn’t play well with libraries
What do you mean by that? Is it the versioning of libraries that isn’t possible meaning an update to the interface requires updating all dependent apps/libs?
Updating a library in a monorepo means copying it all over and hoping the lib update didn’t break someone else’s code. Whereas updating a library normally would never break anything, and you can let people update on their own cadence
I set up a monorepo that had a library used by several different projects. It was my first foray into DevOps and we had this problem.
I decided to version and release the library whenever a change was merged to it on the trunk. Other projects would depend on one of those versions and could be updated at their own pace. There was a lot of hidden complexity and many gotchas so we needed some rules to make it functional. It worked good once those were sorted out.
One rule we needed was that changes to the library had to be merged and released prior to any downstream project that relied on those changes. This made a lot of sense from certain perspectives but it was annoying developers. They couldn’t simply open a single PR containing both changes. This had a huge positive impact on the codebase over time IMO but that’s a different story.
How is it done at Meta? Always compile and depend on latest? Is the library copied into different projects, or did you just mean you had to update several projects whenever the library’s interfaces changed?
At Meta, if it’s an internal library, the team that maintains it updates all the code to use the latest version (that’s the advantage of a monorepo). As an aside, if your project broke because someone else touched your code, that’s on you for not writing better tests.
If it’s an external library, it either has a team responsible for it that does the above, otherwise it probably didn’t get updated since the day it was added.
Thank you!
I think it mostly has to do with how coupled your code modules are. If you have a lot of tightly coupled modules/libraries/apps/etc, then it makes sense to put them in the same repo so that changes that ultimately have a large blast radius can be handled within a single repo instead of spanning many repos.
And that’s just a judgement call based on code organization and team organization.
I’m inclined to interpret monorepos as an anti-pattern intended to mask away fundamental problems in the way an organization structures it’s releases and dependency management.
It all boils down to being an artificial versioning constraint at the expense of autonomy and developer experience.
Huge multinationals don’t have a problem in organizing all their projects as independent (and sometimes multiple) source code repositories per project. What’s wrong with these small one-bus software shops that fail to do that when they operate at a scale that’s orders of magnitude smaller?
I’ve been a big fan of monorepos because it leads to more consistent style and coding across the whole company. It makes the code more transparent so you can see what’s going on with the rest of the company, too, which helps reduce code islands and duplicated work. It enables me to build everything from source, which helps catch bugs that would only show up in prod due to version drift. It also means that I can do massive refactorings across the company without breaking anything.
That said, tooling is slowly improving for decentralized repos, so some of these may be doable on git now/soon.
deleted by creator
(…) you can see what’s going on with the rest of the company, too.
That’s a huge security problem.
Edit for those who are down voting this post, please explain why you believe that granting anyone in the organization full access to all the projects used across all organizations does not represent a security problem.
Because security through obscurity is not security at all.
They work great when you have many teams working alongside each other within the same product.
It helps immensely with having consistent quality, structure, shared code, review practices, CI/CD…etc
The downside is that you essentially need an entire platform engineering team just to set up and maintain the monorepo, tooling, custom scripts, custom workflows…etc that support all the additional needs a monorepo and it’s users have. Something that would never be a problem on a single repository like the list of pull requests maybe something that needs custom processes and workflows for in a monorepo due to the volume of changes.
(Ofc small mono repos don’t require you to have a full team doing maintenance and platform engineering. But often you’ll still find yourself dedicating an entire FTE worth of time towards it)
It’s similar to microservices in that monorepo is a solution to scaling an organizational problem, not a solution to scaling a technology problem. It will create new problems that you have to solve that you would not have had to solve before. And that solution requires additional work to be effective and ergonomic. If those ergonomic and consistency issues aren’t being solved then it will just devolve over time into a mess.
Most companies will never have a monorepo at the level of these bigger companies. So I personally don’t think most people need to worry about the limitations of github/lab as platforms.
However if you happen to be having those kinds of issues, I think looking at what the big companies are doing and/or starting to split things up makes sense.
There’s also alternatives with custom ci jobs within non GitHub/lab within the git universe that may help out with those sorts of operations. I know actions still feel very beta in some toolsets so it may be easier/more useful to run your own arch. I’ve been enjoying forgeo/gitea for example, but it’s not like you can’t do the same with girlab runners or GitHub enterprise. Depends on use case.
There’s also alternatives with custom ci jobs within non GitHub/lab within the git universe that may help out with those sorts of operations.
Why would anyone subject themselves to explore nonstandard and improvised solutions to try to fit a usecase that fails to meet your needs to a tool that was not designed to support it?
Do people enjoy creating their own problems just to complain about them?
We use a mono repo for a new cloud based solution. So far it’s been really great.
The shared projects are all in one place so we don’t have to kick things out to a package manager just to pull them back in.
We use filters in azure pipelines so things only get built if they or dependent projects get changed.
It makes big changes that span multiple projects effortless to implement.
Also running a local deployment is as easy as hitting run in the ide.
So far no problems at all.
We use filters in azure pipelines so things only get built if they or dependent projects get changed.
Any guides on how to do this? I know about filtering triggers by where changes happens, but how do dependent projects get triggered? Is that a manually maintained list or is that something automatic? I mostly use Gitlab, but am curious how Azure Pipelines would do it.
You have a list of filters like “src/libs/whatever/*” if there is a change the pipeline runs.
I wrote a tool that automatically updates these based on recursive project references (c#)
So if any project referenced by the service (or recursively referenced by dependencies) changes the service is rebuilt.
I see. OK. I thought that was built into Azure pipelines.
Pretty cool tool you built 👍 Is it language agnostic?
No it relies on the c# project files. It looks for all projectreference tags in the projects file and recursively grabs all of them and turns them into filters.
Like all other patterns, it can be done well or done poorly. I’ve experienced both with monorepos. The pain is greater when it is painful. But if the contribution, build, and release procedures are well designed and clearly documented it can also be nice.
From a personal experience:
I see some benefits but i will make your developer life a nightmare… it is like trying to focus on a single problem in a zoo on fire with all gate open…
I have no experience using them across an entire company, let alone team. So far, it always seemed good enough to keep just each project in a repo. But yeah, for larger projects consisting out of multiple applications, I would not want to work without a monorepo for that.
Many of the benefits from the video still apply, mainly the consistency in code changes was always useful. You can check out any commit and you’ve got the exact state how everything worked together at the time. No wrangling different versions, no inconsistencies between APIs.
In our build process, we include the Git commit into the applications to have it logged on start-up, so when we get an error report+logs, we can always easily look at the respective code.But it does depend on your build tooling, if this works well. JVM languages with e.g. Gradle’s multi-project builds are great. Rust’s workspaces are a treat. Python is fucking atrocious with everything we’ve tried (pipenv, poetry, lots of custom scripts+symlinks).
all the tech companies ive worked at have used monorepos so i dont know any other way
How do they handle updates to common code, especially breaking changes to the public API?
Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo