Wednesday, June 13, 2007

Fault finding

Suddenly, out of the blue, I had a problem. With Windows XP. With Visual Studio 2005. Whatever, really. My system behaved badly. And it did so in a really strange fashion.

Here's what happened. I was putting the finishing touches on the Sloth v0.1 beta, and I was going to do a test run. No, not run the unit tests, just run Sloth and check on a tiny bug reported by an alpha tester. I verify that it builds error-free and then run it. Windows hangs. Not blue-screen hangs, not computer dies-hangs, it just sits there. I can move the mouse around, but no applications react. I can click Ctrl-Alt-Del and get the little Windows Security box, and here I can click buttons. However, they do nothing. Alt-Tab does nothing. Eventually, I bite the bullet and do the physical-reset thing. The computer comes back up, so without doing anything else, I load up Visual Studio, and press F5. The same thing happens. This time the frustration hits me like a bag of wet onions, so I go apeshit on the keyboard for a while. This actually leads to something - a push of the "Log off"-button after Ctrl-Alt-Del followed by a never-ending series of Ctrl-Alt-Del actually gets Visual Studio to the point where it complains about a cross-thread exception, which is bullshit - this happens in the startup piece of code that has always run perfectly, way before any additional threads are started, or even the first form has been loaded.

The third time around I figure out something else as well - on the next run (with or without debugging) after the initial one that hangs the system, everything works fine. I draw the conclusion that my "Log off" during the frantic fight to get control back, has shut down some application that is creating hell for me. How can I be sure of that? Well, at this point, I can't, so I get out the old pen and paper and create a function table of when stuff works and when it doesn't.

This table takes into consideration a lot of things; Did I kill all the non-essential crap in the systray? Did I kill all non-essential processes I can find in Task Manager? Did I rebuild the whole project before running it or not? Does it happen with other solutions in Visual Studio?

I did a pretty decent fault-finding job, the kind that would make a tech support manager (such as myself) proud, and in the end, after about 20 reboots, I had a table showing one single fact or correlation or whatever statistics people call it - the only time that Sloth would ALWAYS work on the first run attempt after a boot, was if I first opened Task Manager and went Rambo on whatever I could find there. Ach zu, the culprit is a piece of running software. A virus. Malware. Whatever. I felt uplifted by this, it meant that I could probably dodge the bullet and avoid a reinstall of Visual Studio, or even Windows XP. I just needed to find the bastard.

Up until this point, after each reboot I had been running a minimalist thingy - no fancy stuff loaded, except for Visual Studio, FireFox (for finding out what all those dubious .exe files running on startup were), regedit, and My Former Virus Killer. And yep, you are about to find out why I call it that, suffice to say right now I am not naming the software because I can not afford any multi-million dollar lawsuits so close to my vacation.

So I write down all the processes I routinely kill on each iteration, divide them into groups, and start off killing only a group at a time, when something jumps off the screen; Isn't that firefox process awfully small, memory wise? I love Firefox, I really do, but it is a memory hog (yeah, ok, I run it all the time with a million tabs open, but still), but I have never seen a running Firefox using only 4MB of RAM...! And hey... there are two of them!

Needless to say, the problem resolved itself from there. I couldn't actually find a second firefox.exe on my computer, but I guess processes can easily change their names. I did find someone else who had a similar problem, and part of the solution someone had recommended, was Avira AntiVir (the free PersonalEdition). A quick install and update of this software, and a night of sleep later (all of this occured in the hours before and after midnight), my system was once again relatively uninfected.

My former viruskiller is now in the bit bucket for not detecting anything even after bothering me with those really annoying update popups all the time, and AntiVir has the job until it somehow shows that it doesn't have the anti virus balls required.

How did I get infected? I don't know. I consider myself a relatively careful user, with firewalls enabled and running in nazi mode, using only Firefox and Thunderbird for web and mail, scheduled tasks running scans of lots of stuff, but I guess one virus slipped below the radar somehow and got in. It's dead now, but we must never forget. You can. You probably didn't even read this far, and I don't blame you.

No comments: