Social Icons

Pages

Sunday, February 14, 2016

Debug Your Code Like Sherlock Holmes


As a child I was an advanced reader.  A lack of athletic interest and a kindergarten teacher interested in early reading development contributed to what turned into a rather odd situation.  The small-town school I attended was K-12, resulting in a very limited, combined library which was segmented into "elementary reading" and "upper reading."  In elementary school, during our library trips, I found myself begging for permission to visit the "High School" side to find books that would challenge me a bit more than Dr. Seuss and "Choose Your Own Adventure."  I never quite understood why the dowdy, grumpy librarian who ruled that universe was so reluctant to give me access, but I was relentless and swayed her in the end.

It was on one of my forays into these wonderfully mysterious shelves of the library, stuffed with thick, musty smelling volumes bound in cardboard and canvas, that I stumbled upon Arthur Conan Doyle's Sherlock Holmes.  I knew about Holmes already, of course, and had even read a few children's books based on some of the stories.  I'd probably even seen a movie or two, but I hadn't read the originals.  I tumbled into these books like Holmes and Moriarty going over Reichenbach Falls and never quite crawled back out of them.  They had a profound effect on my view of the world and the way I interacted with it.

Probably the most influential concept I found in Sherlock Holmes' worldview was the simple Victorian belief that anything in the universe could be understood given a proper application of the human mind and senses.  If you couldn't understand something, it was because you were approaching it from the wrong angle or you didn't have enough information.  This belief, simultaneously deterministic and optimistic, meant that I could achieve anything, solve any mystery, if I just focused hard enough and applied the right methods and didn't give up.

This rather lengthy preface is a lead-in to what is, for me, only a recent revelation:  I probably owe Arthur Conan Doyle quite a bit of credit for preparing me for the world of software development, specifically for the arduous task of tracking down bugs in what is often someone else's code.

While I'm completely inadequate for the task of summarizing the skills that Sherlock Holmes can bring to bear on a case, I have managed to identify four main things you can do to help boost your own code-oriented investigations:
  • Work backwards from the scene of the crime
  • Use a magnifying glass
  • Separate the wheat from the chaff
  • Bring in a sidekick
Each of these deserves their own explanation and a quote from the legendary sleuth himself, so let's start with the first.

Backwards from the Scene of the Crime

In solving a problem of this sort, the grand thing is to be able to reason backwards. That is a very useful accomplishment, and a very easy one, but people do not practice it much. In the everyday affairs of life, it is more useful to reason forwards, and so the other comes to be neglected. There are fifty who can reason synthetically for one who can reason analytically.
Sherlock Holmes - A Study in Scarlet

As Holmes states, it's a rare ability to work backwards from the end result of something.  This is never more evident when you're staring at the typical stacktrace a crashed application vomits onto your console.  Most coders, when faced with such a mess, reach immediately for the debugger and try to re-run everything in attempt to get the compiler to give them the answers.  It's worth taking a moment, however, to think about what could have led to the situation first.  It just might save you significant time and trouble.

The ability to "think like" the compiler or the interpreter is an invaluable skill that for the most part can only be acquired through practice.  I picked up this ability by naively stumbling into software development as a tinkerer in the mid-90's, when IDE's were expensive and rare.  Plus, since I was building web sites, Perl in a Unix environment was the language of choice, and there were few dev tools for that combination. It was just me, my text editor, and the Perl runtime binary.

With the web still in its infancy, and having no money for books, I essentially learned Perl from the "man" pages found in Unix and a few poorly written example scripts I found on FTP sites.  This was an extremely frustrating way to learn a programming language, and consisted of me writing a few lines of code, attempting to run it with the Perl runtime binary, and examining the output of the syntax checker to see what I'd done wrong.  Slowly and painfully, I began to understand how things worked, what was expected by the compiler, what would actually execute.

Going through this pain helped me to learn to trace through my code piece by piece, mutation by mutation, always keeping in mind what the compiler was doing in response to my code.  Nowadays, this is rarely necessary, as the IDE will immediately show you any syntax error, and there are probably a dozen analyzers you could use that will tell you why that "for" loop is better off as a LINQ query or why the FramJib library's flibbertyjibbet() call is deprecated.

This is why I believe pausing to mentally route your code's execution is more important than ever for gaining the ability to troubleshoot and debug.  If we don't disengage from our development tool's assistance every once in a while, our minds go soft.  It's like going for a run instead of a drive to keep your muscles from going flabby.  Those analyzers will only go so far, and if that isn't far enough to solve your bug, you are on your own, buddy.

Once you've got your debugging muscle flexed and ready to go, how do you proceed?  Start from the scene of the crime:  Find the line of code that experienced the error and work your way backwards.  What went wrong?  Is it a simple memory overflow or null pointer exception?  Once you have that in mind, try to imagine what sequence of events could have led to the error.  If it's not immediately obvious, you may need to invoke the next step of Holmesian debugging....

Where's My Magnifying Glass?

"Data! Data! Data!" he cried impatiently. "I can't make bricks without clay."
Sherlock Holmes -The Adventure of the Copper Beeches

If simply viewing the scene of the crime itself does not reveal any immediate leads, you may need to look more closely at the surrounding area.  For Sherlock Holmes, this meant whipping out the old magnifying glass and gathering clues, poring over every inch of the crime scene.  In programming, this often means using the debugger of your IDE to check the values of the various variables and object properties that are effective at the time the error occurred.  Unfortunately, if we are avoiding the crutch of an IDE debugger, or if we are troubleshooting a production-level runtime error where IDEs cannot be used, we may not have that luxury.  In this instance, we may need to fall back on more primitive, but still effective, techniques.

In the days of Perl CGI programming, the "print" statement was a common debugging technique.  You would print the value of whatever variable you were interested in to the "console" and it would appear in the output of your program.  Et voila, poor man's debugger.  Nowadays, runtime logging has been raised to an art form, and there is almost certainly a logging library or ten that you can implement in your program to get realtime logging output.  While this takes time and careful implementation, the benefits of being able to "turn on the firehose" of data in a runtime environment when you want it, and turn it off when you don't, can prove invaluable in ways that an IDE debugger just can't.

There are many sources for clues.  Some of these are offered by the environment you are working in and may go unnoticed.  For example, if you are tracking down a bug in a web application, how often have you gone to the actual web server's HTTP logs and analyzed the traffic going back and forth?  Tools like WireShark and Fiddler can provide a dynamic view of this same information as it happens and more, but if they weren't running at the time of the exception, the web server logs can provide some possibly crucial insight.  Cross-referencing the times of the log entries there with the times of the information in your debug log can be very enlightening.

In a more complex situation, the server's main logs may also hold some nuggets of information.  On Linux the syslog file or on Windows the Event Viewer.  Again, cross-references the times with other data you have helps you put together a picture of what was going on at the time of the exception.

Other sources of data include:  The sysadmins ("What was changed recently on the server?"), the user who experienced the error ("What had you done just before the error?  Any recent changes on your computer?"), and your source control system (you do use source control, right?) to find out what recently changed in your own code.

Perhaps in gathering all this data, you may identify something that looks out of place.  A missing value where one was expected.  A string that looks a little too long.  An object whose properties are not fully populated.  A line of code that was recently changed for what seems to be no good reason.  These are the "suspects" you can approach first.  If the clues fit into one of the potential scenarios you concocted while working backwards from the scene of the crime, you may have just found your culprit, or at least have narrowed down your search.

If instead you end up with a huge load of data and no clear leads, you may need to try another of Holmes's sleuthing techniques....

Separating the Wheat from the Chaff

"How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?"
Sherlock Holmes - The Sign of Four

I often annoy my co-workers when they present me with a coding issue and a potential cause for the issue and I say "That's impossible."  I'm not purposefully trying to irritate them when I say this (although I should try to kick the habit of just saying it outright).  Rather, I'm trying to state that unless I am mistaken in some very basic understanding of the execution environment (at a level which would probably require me to relearn my job from scratch) what they believe is happening is literally not possible.  I'll give an example:

Once one of my co-workers came to me with a very bothersome problem.  He was changing some JavaScript on a page and loading it in his web browser, but the effects he wanted from his code were not appearing on the page.  He had been changing code for several minutes, but was seeing no effect.

He showed me some of his code, which involved dynamically modifying the display of the web page based on the properties of an object using KnockoutJS (an excellent library for this kind of thing.

I could see a place where the value of the object's property named "Type" was to be displayed.  When we viewed the page, a value appeared.  Then I saw in the JavaScript that the object's property name was "DocumentType" and no "Type" property was visible.  I was immediately skeptical.

"The web page you are showing me could not have been created with this code,"  I said.  After recovering from his justified annoyance from my statement, my colleague took another look and found that, yes indeed, his web server was focused on a different version of the files in question and none of his modifications were being used.  Once we fixed the configuration issue, he quickly corrected the code.

Alan Watts, a philosopher who gained popularity in the 70s, liked to say "Problems which remain persistently insoluble should always be suspected as questions asked in the wrong way."  I try to keep this in mind when I find myself hitting a brick wall when debugging something.  I go back to "first principles" and build from there:  

  • Is the event handler responding to the user's button click actually the one I think it is?
  • Is my browser executing my code and not a cached version?  
  • Is the web server running my code and not something else or a cached version of it?  
  • Am I connecting to the right database and not some other copy of it?  
I work my way through the stream of execution, from the whatever the user is doing to start the chain of events, a button click, etc., all the way to the rendering of the final result (assuming we got that far before the crash).

Often this exercise will lead to insight or I may just stumble over the culprit.  There's something to be said for plain old dumb luck, and believe me, I've been its beneficiary more times than I can count.

If, however, all this detective work still leaves you without a final answer, you can try one additional trick that Holmes employed constantly.  It's one which helped him far more than he might have wanted to admit....

Bring in a Sidekick

"Come, Watson, come!’ he cried. ‘The game is afoot. Not a word! Into your clothes and come!"
Sherlock Holmes - The Adventure of the Abbey Grange

There is a reason that Sherlock craves Watson's participation in his adventures despite the pointed jabs he delivers regarding Watson's lack of deductive abilities.  As hard as it may be for an introvert like me to admit it, there is in my view nothing more helpful to the aspiring detective (or troubleshooter) than having someone to talk to.  Quite often, a member of our team will pull me aside to show me a problem they are struggling with and in the mere act of describing it, they discover the problem.  What's more, sharing the mystery with someone else brings in new perspective and fresh ideas.  Pair programming is in many ways a tacit confirmation that this approach is effective.  New ideas and angles can power you forward to a final answer.  If nothing else, at least you'll have affirmation that you're not crazy and this problem really is a hard one.  Mystery loves company. (Sorry, I couldn't resist)

The next time a particularly criminal bug gives you the slip, leaving only a stacktrace as its fingerprint, see if these techniques, espoused by the most famous detective of all time, help you track down the scoundrel.  And if they do, why not post a comment here to let us other Above Average Programmers know how it worked for you?  As Holmes himself said, "Nothing clears up a case so much as stating it to another person."

No comments:

Post a Comment