I recently left Google — after an 8.5 year career, excited to adopt Git and Github for my new company DeepFraud AI. I am shocked at how awful Git is.
Let’s take a quick look at Google’s repository
- over a billion files
- millions of projects
- over forty thousands engineers
- tens of thousands of commits daily
- many millions of commits annually
At Google we do everything the open source community says not to
- a mono repository (open source prefers multi repo)
- a centralized source control system (open source prefers decentralized)
- all engineers work in the master branch, for the most part (open source prefers everyone works in their own branch)
Yet.. things at Google are going just fine. How could it be that arguably the most advanced technical organization with the most collaborative work force is using the most deprecated strategy?
The bad things about Git are really, really bad.
Let’s look at some big problems with Git and the popular Feature Branch Workflow in particular.
1. Log spam
A successful push generates an absurd amount of log spam.
This is madness! Why do we need 15 lines of debug code for a successful commit?
What other tool in the world has such a lack of self respect as to generate this kind of crap? Imagine if “cp” generated 10 lines of status messages every time you copied a file. I want a clean console with informative messages only. This is log spam!
Pull Requests and Merging is language is absurdly confusing
Pull Request (PR) is an overloaded, confusing term. A pull request is a merge, it’s as simple as that. But for historical reasons, Git and Github call it a Pull Request. But not always. Let’s consider these facts:
- git pull already exists. And it has nothing to do with merging branches or Pull Requests
- git merge is the command to initiate a pull request. The help text states “git-merge — Join two or more development histories together”
The web tools and desktop tools now all use the Pull Request language instead of calling it Merging. We are doomed! For a newcomer, this is incredibly frustrating.
Feature branches are totally flawed and cause work in isolation
The idea of working in a personal feature branch, then committing to master is seems great in theory, but consider this.
The reality is awful. Engineers work in a complete silo, isolated from the rest of the team. The modus operandi is that nobody reviews their code until weeks have potentially passed, and now the changes are so wide spread, the code review will be mediocre at best. Nobody wants to spend 8hours reviewing someone else’s 2 weeks old code and 40 changed files.
At Google we work in the master branch. We’re encouraged to decomposition changes into the smallest change possible (no more than 2 or 3 files) and submit regularly, multiple times per day is fine. Each submit must be reviewed and approved by peers. This process raises the quality bar at an extraordinary rate and encourages and enforces daily interaction between engineers.
The “advantage” of working offline is.. frankly stupid
The connectivity of the world looks something like this.
The idea of being able to work offline, and later commit to a remote depo is frankly stupid.
- Wifi and 3G is everywhere. Even on airplanes.
- Many of us work in cloud VPN’s now, making a fast connection mandatory
- Google has become the de-facto documentation tool, without it, there is paralysis.
We pay a huge cost of increased complexity managing a local and remote repo, and gain, basically, nothing.
Unstaged, Staged, Commit
Managing unstaged -> staged -> commit takes time out of a programmer’s busy day. Files should either be in committed or not committed state. Unstaged files concept encourages keeping lots of unnecessary experimental files around (aka crap), and a “git add .” will inevitably leak those into the repo at some point in the future.
You might be asking.. Why not just undo that commit? (good luck with that one, undoing commits cleanly requires a PHD-or-equivalent in Gitsciencetology)
Working on multiple projects requires multiple clones
To work on two projects simultaneously, you have choices:
- Commit all changes in one project before changing branches
- Stash all the changes
- Clone a new repository
Stashing is another living nightmare. It’s basically another layer of local repository added on top of a local->remote->branch->master layer cake.
We are now managing a branch with a local and remote repository, committed, uncommitted changes which may or may not live in the remote repo yet, staged and unstaged files, and now these magically stashed files.
Remote vs. Local Repository makes time travel look like child’s play
Within a developer’s environment, there is:
- a master branch (and development and testing)
- their personal development branch
- a remote and local version of each branch
- a version that the remote HEAD is actually at..
At any given time, the developer is managing at least 4 distinct and potentially fatally conflicting states in time, multiplied by the number of branches, multiplied by the number of cloned repos, multiple by the number of repos. Are you fucking kidding me?
Git/Github developers are living in a constant state of uncertainty, never quite sure when their next quantum leap (a Pull Request) will cause a catastrophic collapse of the universe.
What we need is Back to the Future, what we get is Inception.
Undoing commits is a nightmare
This one deserves its own special section. There are more ways to undo changes than there are hours in the day. Each more confusing and complex than the previous.
You can git revert, git reset, do it in a soft, hard, or default way, and if you’ve ever ended up in this horrible howto file clicking on links scratching your head, wondering “how the hell did I get here?”, now you know exactly how. You’re a git developer.
Removing an entire commit requires a special award. The paragraph below is the “simplified 101 version”. lol.
The rabbit hole goes deep
What I described so far are surface level beginner problems. If you’ve ever found yourself staring at a reflog, you know how much deeper the dagger can penetrate.
Measure the productivity loss and make immediate changes
Count the number of minutes your engineers spend each day manipulating git and translate those into lines of code not written, or meetings not taken. The cost is very material.
As an intermediate step, you can do these simple things:
- Work in the master branch. Engineers submit directly to master branch. Add a pre-production branch and production branch to push stable changes into. Allow personal branches only for very special purposes (typically for code that is intended to be thrown away).
Caveat: we tried this, and you lose the ability to do code reviews, which require branches and pull requests. We tried to encourage “review the commit after the fact” but this inevitably failed.
- Monorepo. Stop creating repositories for every project, prefer to have all code in one place. This has worked well for us.
These should both translate to immediate productivity gain, but you can’t do away with the other evils aforementioned. Happy coding.
Update: We evaluated Phabricator and Gerrit but found the solutions were too kludgy since they are effectively adding even more complexity on top of an already complex ecosystem.