Crash bugs are generally spectacular; your application just—poof—vanishes! Or it shows error messages from the OS in foreign languages or with some indecipherable codes. Or maybe it generates 17 entries in the Windows application log. In the worst cases, it might even cause a blue screen and a spontaneous reboot.
But usually, crash bugs are the easiest to fix. Just load the offending application or script in a debugger, run it, repeat the steps required to make it go boom, and you are right at the spot where it happens. Moreover, inspecting the code at the crash location—including the variables at the time of the crash—often leads to some revelations:
- “Oh, I should check the parameter for not being NULL.”
- “Why did I not initialize this variable?”
- “Must have been 5:00 pm when I wrote this.”
Then you fix whatever was wrong and be on your merry way. Sometimes, however, the scenario does not work out that way. Let’s examine some simplified sample code:
734 Start-process ./ExportImportantData.exe ./Outputfile.csv
735 $Data = Import-Csv -Path ./Outputfile.csv
736 Do-Something($Data)
I know that the eagle-eyed among you have already spotted the problem; remember, this is only an example. On most machines, but not all, this will blow up. The error message likely points to $Data being empty or null, so you set a breakpoint at line 734 and start the debugger. Then you single-step and verify that the process started and produced the output file. You single-step again and verify that the CSV has loaded correctly; still no crash.
Now you remove the breakpoint and just run the script in the debugger, hoping it will crash and that the debugger will show you where. No such luck; it all works fine—no crash. You run the script on your machine, and it works five times out of ten; the other five times—it crashes. So what is happening here? It’s a timing issue.
Timing
Line 734 starts a process and does not wait for it to finish. Sometimes, it makes it through on your fast developer machine before the next line is executed; sometimes, it does not. It never works on your server—under load with many other processes.
By setting a breakpoint in the debugger, you ensure the process will finish before allowing the next statement to be executed. If you run a script in a debugger, it still executes a lot slower than it would under normal circumstances. So if the data export is quick enough, it will not fail.
The lesson here is that a debugger will ultimately change the timing of your code—everything is slowed down. Also, your computer may be much faster or much slower than the machine where the problem was first observed. Knowing this will enable you to find timing-related problems more easily.
Defensive Coding
The best defense against these types of problems is to write defensive code:
- If you run another thread or process, always check for a defined state before proceeding.
- Check if your process or thread creation was successful.
- If you load or generate data objects, always confirm if that succeeded.
- Never trust parameters; many developers fail to check parameters for out-of-bounds values.
- You may be the only one using that function today and you know what to pass to it—but what about the next guy?
Related
Feedback
Comments? Stories of hard-to-find bugs? Suggestions for more topics? Use the comments section!