Rami Hikmat

How to Approach Debugging Bugs

Debugging can be cumbersome sometimes, here is a quick list of steps to follow when solving bugs.

More often than not you find yourself debugging bugs. Chances are, most bugs are fixable by locating the code, debugging it and writing the fix. This article will delve into how to approach debugging a bug and fixing it. It doesn’t cover intermittent bugs, those are more involved and could happen due to any external factor, it could be that a solar flare caused an intermittent bug that can’t be reproduced until the next solar flare or never. First of all, let’s establish the terminology. A bug is a logic mistake, i.e. a set of input/s produces the wrong output/s. A bug is usually found by someone QAing a feature or when a user expects something else to happens based on an interaction with a software. Either way, the bug gets reported and gets eventually delivered to you the engineer to handle it. The first thing tip for when working on a bug is to read the description correctly and know what is the expected output. This solidifies in your mind what the bug is and what the fix should be. It also helps you know to differentiate between other bugs you may find so you don’t end up fixing the wrong bug and the bug stakeholder then comes back to tell you that the bug is not fixed while you’re under the impression that it got fixed. After understanding what the bug is and what the correct logic should be, try reproducing the bug. This is really crucial for any engineer, so many people forget to do that. Sometimes bugs are fixed by code refactoring, other bug fixes, dependency changes, configuration changes and etc given that bugs are not generally handled as soon as they are reported, months may pass by until a bug is getting handled. If the bug is not reproducible after multiple tries, communicate that to the stakeholder and let them know the steps followed to reproduce it to confirm with them. Some bugs are usually only reproducible if the exact same input that was followed by the user who discovered the bug is used. That’s why, consider asking for the same inputs where possible for every bug report. This can tell you about edge cases in your code. Note that if there is caching involved in the place where the bug is, try disabling the relevant caching and see if the bug still happens. It is often caching that causes a lot of fake bugs because cache entries may live for a long time with expired state. Once you reproduced the bug. Try locating the code responsible. It is not always easy to find the bug, generally you may know the local code specific but not the exact location of the bug. This is where adding logs will help, the more logs the better. A log should contain a unique name for the place in code, input data, current state and time stamp. The time stamp is useful to see if the bug maybe caused by something taking a long time. Now trace the logs while running the software and reproducing the bug, you will see that the logs tell you what the bug fix is by just inspecting the state and comparing it to the expected state. Logs can tell you more as well, some bugs happen due to race conditions. Race conditions is when 2 or more competing code paths update the same memory at around the same time, this could cause the wrong sequence of updates to cause an issue which is the bug. The best way to reduce race conditions, is to reduce the use of global/shared resources. What if the bug is not due to the code you own, i.e. an external dependency’s code. This is where the complication goes through the roof, what if the bug was fixed in a later version of the dependency but the upgrade is really expensive and may require all code paths using the dependency to be tested. The best approach here is to let other engineers know of this bug and ask for when an upgrade is due. If an upgrade is not easy to do, consider using a different dependency to fix it or better, try adding a special condition to use your fix. This is a chance to introduce technical debt, be mindful of that. If your software can use breakpoints for debugging then it is even better to add breakpoints near potential code blocks and inspect the state. After locating the code, learn a bit more about the code context. Then see how the code is used and what cases must be true to reach the execution path that causes the bug. This is really useful to know so that you don’t introduce another bug as part of fixing the current bug, though it sometimes happens. If the bug is a UI bug then make sure the interface looks as expected in multiple cases, not just the one described by the bug. Make sure it is fixed for other views as well. Usually it is a logic error, consider seeing if the code responsible for the bug is generic, if it is generic then make sure to fix it in a generic way if not too hard to do but if it is too hard to do, try making the bug fix code specific to the instance that causes the bug. This way you will reduce the blast radius and not affect any other code as a result. If the code you’re fixing is covered under testing, consider creating a new unit test specifically made for the bug fix. Make sure to note that the unit test is specifically for a bug fix so that other engineers know this when reading through the test code. If there is no unit testing covering the local code where the bug is and you have time to make the code better. Consider refactoring it in a way that supports unit testing if not already and add unit tests. Now consider that the bug is fixed, you know what to do. Submit your code, get it reviewed, ask a QA person to QA it if possible and push it. Verify that it is fixed in production then notify stakeholder that it is fixed. Let them check it themselves to confirm. Solving bugs isn’t always pretty. A lot of technical debt is usually introduced by bug fixes. If the bug fix took sometime to fix and you learned something new from it. Consider sharing with others so that it is easily found by others. To reduce bugs added to production, consider adding unit tests to new code or code you’re refactoring. Unit testing is really crucial because it runs by a machine constantly instead of a person manually testing a piece of code manually. Make sure to always test behaviour and not the implementation. Generally, the more bugs that you solve, the more you will know how to fix them the next time around. Also, the more you will develop your own methods of bug fixing. This method outlined here works well for me but it will eventually evolve to be better as a result of fixing more bugs in new domains.