Yesterday I had the sheepish* experience of spending a couple hours trying to work out why the code I had committed in a working state suddenly stopped working. So I thought I’d write a blog post about how to debug this mysterious situation! While some of these scenarios are less likely to happen in local development, they can definitely happen in production, so if, like me, you believe in “You build it, you run it”, this can help debugging the live site as well!
What happened to me?
I was writing code to manipulate MS Word documents. I had automated tests that checked the XML of the MS Word, but for sanity, I had a local output for the file so I could open it and visually check it. I committed my changes in a working state, went to lunch, and came back and…the tests were all green but the output file was missing the changes. I tried a bunch of things that had previously caused weirdness with MS Word (hint: MS Word is really tricky to work with), ranging from invalidating my IDE’s cache, killing all the gradle processes, wiping the output folder, and a lot of swearing. I eventually realised the problem was that I was using parameterised tests, but not parameterising the NAME of the file I was outputting…so it was overwritten with each test run, and the last run had an empty input which (correctly) resulted in no change to the MS Word document. Cue more swearing.
So let’s talk about times when things mysteriously don’t work, and what some of the causes can be.
Cause 1 – external files
If you are working with input or output files, here are some questions you can ask:
– If reading from an input file, has the file been saved before being read?
– How is the output file being named?
– Is the output file path exactly what I am expecting it to be?
– Can the data in this output file be modified by another process (check out concurrency control)?
Cause 2 – software versions
I once worked in a company with a couple different codebases, each one of which used a different Node version. If you forgot to switch the Node version, you’d get weird errors (hint: there is a way to automatically switch versions on *nix – check out .nvmrc). Some questions you can ask:
– Could switching directories cause my current software version to be incorrect?
– Does this code depend on an external dependency which may have changed its version?
Cause 3 – build issues
Your code probably relies on some built dependencies. Some questions you can ask:
– Are all the dependencies built and where I expect them to be?
– Can I verify that each step of the build process is working as expected (implicit: do I know each step of the build process)?
– Is something interfering with my build process?
Cause 4 – asynchronous issues
I recently wrote a test that reliably failed every other time it was run. The problem? Part of the test relied on an unzipping process that was a bit slower – so by the time the assertions were run, the unzipping hadn’t been finished yet and the input files weren’t present. I ran a cleanup process to delete the files, but the files weren’t present by the time that was run either – so they hung around for the next test run. Some questions you can ask yourself:
– Is there any part of this process that will take longer than other parts?
– Could the capability of my operating system / hardware be slowed by some other process?
– If my tests rely on an external connection, is the network down?
– Is a remote server down?
And when none of these questions leads you anywhere, I have also found copious amounts of swearing usually do the trick…as long as I leave it for a few hours and come back to it.
*Extending the definition of sheepish to mean “something which cases me to feel sheepish” as “sheepifying” just feels so wrong.