Slaying dragons with git, bash, and ruby
Sep 07, 2010
An often over-looked feature when using git are the various hooks you have available. They cover pre-applypatch
, post-update
, and anything between or beyond. I suspect a lot of people may have first been introduced to them when integrating with a Continuous Integration server as a means of telling it to test a new build, but they work equally well as a hidden monkey saving your from showing the world some of your more embarrassing mistakes.
Getting started with git hooks
Within your cloned git repository you'll most likely be aware of the .git/
directory. Within there you'll have another directory called hooks/
which, surprise surprise, your git-hooks live. You'll probably have a bunch of existing hooks in there with .sample as the extension to stop them being executed, it's worth taking a look at them to get an overview of the various hooks and what is possible.
To get a hook to fire you need a file with the appropriate name (remove the .sample extension on each file you want to run), and it needs to be granted execute permissions:
$ chmod +x .git/hooks/hook-name-here
Joining forces for real ultimate power
Coding in ruby most the day makes it the quickest language for me to use to throw together a script. Thankfully you can write you hooks in ruby, or just about any language really, just change the shebang line accordingly:
#!/usr/bin/env ruby
However there are lots of things that are much easier to do from a command line than they are in a ruby script, and so we will stand on the shoulders of giants and use the underlying *nix tools to do what they're best at, and use ruby to keep things re-usable and readable.
Catching out bad habits
One thing I've been guilty of in the past is hastily trying to fix a bug, and then accidentally leaving a debug breakpoint in the committed code. If that ever made it onto a production system it would leave it hanging and unresponsive. Even on other developer machines it causes enough confusion. So to make me look much more reliable than I really am, enter the git pre-commit
hook:
#!/usr/bin/env ruby
if `grep -rls "require 'ruby-debug'; debugger" *` != ""
puts "You twit, you've left a debugger in!"
exit(1)
end
Now whenever I try to commit code, it will first run a recursive grep
over the codebase to ensure I've not left my debug statement in (I can be sure it always looks like "require 'ruby-debug'; debugger" as I have it bound to a shortcut).
Stopping an incomplete merge
There's been occasions where a particularly large rebase or merge creates a lot of conflicts in a file, and one of those has snuck through and rather than being fixed the inline diff has actually been committed. Time to add another check to pre-commit
, using egrep
to scan recursively for the 3 different line markers that git uses to indicate a merge conflict:
#!/usr/bin/env ruby
if `egrep -rls "^<<<<<<< |^>>>>>>> |^=======$" *`
puts "Dang, looks like you screwed the merge!"
exit(1)
end
If you try this though you'll probably discover that it doesn't quite work as expected, because there are some binary files that happen to include these characters. More shell scripting to the rescue then, we will pipe the results into a couple of other commands to filter it out. First it goes via xargs
to allow us to take the input from STDIN and pass each line recursively into file
to find out what type of file we are dealing with. We then pipe that into egrep
again to select only the script and text files:
#!/usr/bin/env ruby
if `egrep -rls "^<<<<<<< |^>>>>>>> |^=======$" * | xargs file | egrep 'script|text'` != ""
puts "Dang, looks like you screwed the merge!"
exit(1)
end
It would be nice at this point to actually know what files have been affected, without needing to commit the above series of commands to memory, so we can output it again this time passing the result into awk
to strip out just the filename:
#!/usr/bin/env ruby
if `egrep -rls "^<<<<<<< |^>>>>>>> |^=======$" * | xargs file | egrep 'script|text'` != ""
puts "Dang, looks like you screwed the merge!"
puts `egrep -rls "^<<<<<<< |^>>>>>>> |^=======$" * | xargs file | egrep 'script|text' | awk -F: '{print $1}'`
exit(1)
end
Helping your workflow
I'm a big fan of committing regularly in manageable amounts, but I want to ensure each commit is self-contained and has all the tests passing. I don't want to be in a state where I revert a commit and end up with a broken app. However, there are times where I'll be spiking something or refactoring a class and I'd like a temporary save point incase I make a mess of things and want to step back. To do that, I typically commit with a message like "WiP: Got Foo working, about to fix Bar." with the intention of coming back when it's complete and amending that commit to include the additional changes and have a more meaningful message. Sometimes I forget to use --amend
though and things don't go to plan. That's another one that is easy to avoid:
#!/usr/bin/env ruby
if `git log --oneline --author=\`git config --get-all user.email | sed s/@.*//g\` -n 5 | grep -i wip` != ""
puts "You've left a WiP commit message behind"
end
You might need to do a little tweaking on that one depending on your setup, so I'll break it out in the order the commands will be executed to help you modify to your needs. First, I use git-config
to return the email address of the current user:
$ git config --get-all user.email
I then pipe that into sed
to return just the bit before the @ sign:
$ sed s/@.*//g
That's all been executed in a sub-process (I've backslash escaped the back tick characters at each end of the command: "git config --get-all user.email | sed s/@.*//g"). The result of that command is passed into git-log
to return the last 5 commits for that author:
$ git log --oneline --author=username_here -n 5
And finally, grep
is called on the result to ensure I haven't left the string "wip" in any of the commits:
$ grep -i wip
Ensuring you don't break the build
The hook that kicked it all off for me was to ensure that I didn't break the build, mostly as an attempt to claim moral superiority over anyone else who was found guilty of doing it themselves. Little did they know I had a secret weapon to protect my perfect performance ;)
#!/usr/bin/env ruby
puts "Running tests..."
`rake test > /dev/null 2>&1 && bundle exec cucumber features > /dev/null 2>&1`
if $? != 0
puts "Tests failed"
exit(1)
end
Making it more self-aware
This approach worked great for a couple of days, but I quickly got frustrated because I'd have to add the --no-verify
parameter to commits quite regularly. I really only wanted to run all the tests when I was committing on master before I pushed changes upstream to everyone else. The other problem was that my "WiP" workflow meant I'd have to use --no-verify
whenever I was amending a commit and it struck me the script should be intelligent enough to know I was trying to do the right thing.
Detecting master
Determining if the current branch was master was relatively straight-forward:
`git symbolic-ref HEAD | grep master` != ""
So just wrap that as part of the if statements you only want to be executed when you're on the master branch.
Detecting commit amend
Working out if you are amending a commit is a little trickier. The options passed to commit
aren't passed through to your script, so it requires a bit of process hackery in both ruby and bash to find out if --amend
was used. First we use the built in $$
variable in ruby to return the process ID of the ruby process, and use it with ps
and grep
to return all matching processes:
`ps -f | grep #{$$}`
We then pass that into awk
to extract the parent's process ID, and make an assumption that the first line is the parent:
`ps -f | grep #{$$} | awk '{print $3}' | head -n 1`
Back into ps
and grep
again now that we have the process ID of the parent we use it to return the full command and options that were passed to git-commit
, and then grep
again to see if --amend
was passed in:
`ps | grep \`ps -f | grep #{$$} | awk '{print $3}' | head -n 1\` | grep -e "--amend"`
Phew!
Wrapping it all up
All that would create a mess of if statements and duplication throughout your git pre-commit
hook, and any other hook you might want to apply this logic to so I've bundled it all up in a reusable class that I include in any project. I'll keep updating it as my needs develop, feel free to fork it and add features for other languages and frameworks.
Previously I led the Terraform product team @ HashiCorp, where we launched Terraform Cloud and set the stage for a successful IPO. Prior to that I was part of the Startup Team @ AWS, and earlier still an early employee @ Heroku. I've also invested in a couple of dozen early stage startups.