So recently I have been blessed enough to talk to several people, who are new to the software development field, and been able to do some mentoring. And firstly, I’m the one that’s lucky for this, as there are few things better than meeting with people who are new to this industry and getting to engage with their ideas. If it isn’t something you do regularly, you should start.
But one of the things that has become very much apparent to me, just how little time is spent actually teaching how to debug. I know I saw this when I was teaching, but there’s this tendency by many in academy to show students how to code, and when they run into errors show them how to fix them. Which at it’s core sounds like “Yes, Kevin that’s what teachers do…” but I would actually argue it is a fundamentally flawed principle. The reason being that error messages and fixing things that are broken is a pretty large part of being a developer, and by giving junior developers the answer, we are doing the preverbal “giving them a fish, rather than teaching them to fish.”
To that end, I wanted to at least start the conversation on the a mindset for debugging, and how to figure out what to do when you encounter an error. Now obviously I can’t cover everything, but I wanted to give some key tips to how to approach debugging when you have an error message.
Honestly, debugging is a lot like a police procedural, and it’s a good way to remember the steps, so hang with me through the metaphor.
Tip #1 – Start at the Scene of the Crime – The Error Message
Let’s be honest
Now I know this sounds basic, but you would be surprised how often even senior devs make this mistake. Take the time to stop, and really read the error message and what I mean by that is do the following:]
- What does the error message tell you?
- Can you find where the error is occurring?
- Is there a StackTrace?
- What component or microservice is throwing the error?
- What is the error type?
Looking at an error message is not just reading the words of the error, but there are usually other clues that can help you solve the mystery. Things such as the exception type, or a stack trace where you can find the exact line of the code is going to be critical.
Honestly, most people just read the words and then start making assumptions about where an error occurred. And this can be dangerous right out of the gate.
Tip #2 – Look for Witnesses – Digging through logs
Now, in my experience an error message is only 1 piece of the puzzle / mystery, the next step is to really look for more information. If you think about a police procedural on TV, they start at crime scene, but what do they do next…talk to witnesses!
Now, in terms of debugging we have the added benefit of being able to refer to logs. Most applications have some form of logging, even if it’s just outputting messages to a console window, and that information can be very valuable in determining an error message’s meaning.
Start looking for logs that were captured around the same time, specially looking for:
- What was occurring right before the error?
- What data was being moved through the solution?
- What was the request volume that the system was handling?
- Were there any other errors around the same time?
Any information you can find in the logs is critical to identifying and fixing the issue.
Tip #3 – Deal only in facts
Now this next on, is absolutely critical, and all to commonly overlooked. Many developers will start making assumptions as this point, and start immediately announcing, I know what it is and start changing things. Resist this urge, no matter what.
Now, I’m not going to lie, some errors are easy and with a little bit of searching it becomes really easy to see the cause and address it, and if you are 100% sure, that should be the case. But I would argue in the TV procedural perspective, this is the different between the rookie and the veteran. If you are new to this field, resist the urge to jump to an answer and only deal in facts.
What I mean by this is to not start letting your jumping to conclusions cloud the story you are building of what occurred and why.
Tip #4 – Keep a running log of findings and things you tried
This is something I do, that I started and it pays dividends. Just like the cops in a police procedural, they make a case file as soon as they capture their original findings, and you should to. Keep a running document, either in word, or for me I use OneNote. I will copy into that document all the findings.
- Error Messages
- Relevant Logs
- Configuration Information
- Dates / times of the errors occurring
- Links to documentation
Anything I find and I will keep appending new information to the document as I find it.
Tip #5 – Look for changes
The other key piece of evidence most people overlook is the obvious question of “What changed?” Code is static, and does not degrade at the code level overtime. If it was working before and isn’t anymore, something changed. Look for what might have changed in the solution:
- Was code updated?
- Were packages or libraries updated?
- Was a dependency updated?
- Was their a hardware change?
All of this is valuable evidence to helping to find your reason.
Tip #6 – Check documentation
A good next step is to check any documentation, and what I mean by this is look to any reference material that could explain to you how the code is supposed to work. This can include the following:
- Documentation on libraries and packages
- ReadMe / GitHub issues / System Docs
- Code Comments
Anything can help you better understand how the code is supposed to work and identify the actual way the code is supposed to behave.
Tip #7 – Trust Nothing – Especially your own code
At this stage, again people like to make assumptions, and I can’t tell you the number of times I have done this personally, but you stare at code and say it doesn’t make sense. I know X, and Y, and Z are correct, so why is it failing? Only to find out one of your assumptions about X, Y, or Z was false. You need to throw all assumptions out the window and if necessary go and manually verify everything you can. This will help you identify the underlying problem in the end.
Also at this stage I see the other common mistake. Keep your ego out of debugging. Many developers will look at the code they’ve built and they trust it because they built it. But this bias is usually the most damaging to your investigation.
Similar to the running joke of “The husband always did it…” I recommend adopting the philosophy of “Guilty until proven innocent” when it comes to any code you write. Assume that something in your code is broken, and until you can prove it, don’t start looking elsewhere. This will help in the long run.
Let me give an example, let’s say I am building code that hits an API, and I write my code and it looks good to me, and I go to run it and I get back a 404 error saying not found. I’ve all too often seen devs that would then ping the API team to see if their service is down, or networking to see if something is blocking the traffic, all before even looking to see “Did I get the endpoint right?”
Doing this makes you look foolish, and wastes people’s time. It’s better to verify that your code is working properly, and then that will empower you to have that conversation with networking as:
You: “I think it’s a networking issue.”
Network Engineer: “Why do you think that?”
You: “I’ve done the following to rule out anything else…so I think it could be ________________”
Tip #8 – Try to reproduce in isolation / Don’t Make it a hatchet job!
If you get stuck at this point, a good trick I find is to try and reproduce the error in isolation, especially when you are looking at a microservice architecture, there can be a lot of moving parts. But it can be helpful to try and recreate an error away from the existing code base by isolating components. This can make things easier to give evidence, and not unlike a police procedural where they try to reproduce the events of a theory, it can be a great way to isolate a problem.
The one thing to try really hard to avoid, is taking a hatchet to code, all too many times I’ve seen people start doing this pattern to solve a problem:
- I’m going to try this…
- Run Code
- Still Broken…
- Change this…
- Run Code
- Still Broken…
You are actually making your life harder by not being methodical, now I’m not saying don’t try things, but try to be more deliberate and make sure you take time to log your thoughts and attempts if your running log. This can be critical to keeping things logical and methodical and not spinning your wheels.
Tip #9 – When you find the answer right it down.
When you finally find the answer, there is this tendency to celebrate, and push that commit, cut that PR and be done. But really your not doing yourself any favors if you stop there. I find it helpful to make sure you take the time to answer the following:
- Do I fully understand why this occurred?
- Can I document and explain this?
- Am I convinced this is the best fix for this problem?
Really you want to make sure you have a full understanding and complete your running log by documenting the findings so that you can refer to them in the future.
Tip #10 – Make it easier and test in the future
The other thing that is largely overlooked and skipped due to the “Fix Celebration” is the debrief on the issue. All to often we stop and assume that we are done because we made the fix. But really we should be looking at the following:
- Is there an automated way I can test for this bug?
- How will I monitor to make sure my fix worked?
- Does this hot fix require further work down the line?
- Does this fix introduce any technical debt?
- What can I do to make this type of error easier to debug in the future?
- What parts of the debug and testing cycle made it hard to identify this error?
- What could I have done differently to make this go faster?
- What does this experience teach me?
These kinds of questions are critical to ongoing success in your software development career and the health of your project longer term.
I hope you found these 10 tips helpful!