The PC that would never turn off and how I saved the world

Tharic-Nar

Senior Editor
Staff member
Moderator
Before I get into the nitty-gritty of the title, I'll just give you a little background. I currently work for a place that deals with hardware repair of electronics and electro-mechanical assemblies, with my place being seated in the PC and PCB area. Today, a computer graced my desk with a peculiar and, at first, an unknown problem. A spot inspection showed that it had blown a multitude of its capacitors (these are old systems, 2.4GHz Pentium 4s). Recapping a board is fairly straightforward, if a little tedious. After the initial repair, I booted it up fine, everything was working, and put it through the usual assortment of stress-testing. It past everything without a hitch... until it came time to shut it down.

Part of the testing process is to boot the unit from both network images and from a local test hard drive with good old Win XP loaded. I booted the system up into XP, ran some additional tests, then shut it down so I could boot it up on the network. As it was shutting down, I went to check up on another unit that was benching, turned back and the unit I thought I had shutdown was still in Windows. So I go to shut it down again, this time hanging around and waiting for it to complete. I see the screen goes blank, the fans stop spinning, I thought all was well... until 5 seconds later, when It was now hitting the POST screen again. I look on, puzzled.

While it's half way through POST, I hit the power button again to turn it off. The fans spin down, and the unit goes dead... for 5 seconds and begins booting again. This is when I start going from my 'Puzzled' face to 'WTF' face. I catch the POST startup and check the BIOS, see if there are any abnormalities in the power settings, such as wake on LAN, KB, last power-on state, etc. Nothing out of the ordinary at all. I disconnect both KB and LAN, so that the only thing connected to the PC was the monitor and power. It was still booting up.

It's at this point I got really concerned and went to a colleague for help. I explain the situation, he pulls a funny face, we walk over to the unit and we both look dumbstruck as this PC kept turning itself on, regardless of the hardware configuration. We even removed the front-panel switch (to see if it was shorting) and it was still booting! It's at this point, the glasses are dawned and the beard stroking begins.

The first and most obvious thing to check ( for us), is to see if we (I) had done something wrong while soldering all those new capacitors. Reversed polarity, solder splatter shorting something out, nicked another component, that kind of thing. It was through this careful analysis (about 30 seconds!) that my colleague noticed something missing from the board. There was no clock crystal on the board. I quickly scurried over, picked up another board laying around, compared the two, and sure enough, there was a crystal missing. This left me even more puzzled, since the caps I had to replace where nowhere near the crystal, and those things are secured in place with foam and all sorts. Checking the workbench showed that nothing had come off. The board came to us without a crystal!

This is when things get baffling. The system passed every test we threw at it, including a network matched timing test of the RTC (that uses these clock crystals), and there were no problems. The system remembered BIOS information and settings, kept time, communicated over serial, parallel, network and USB fine. Memory was fine, hard drive and optical was fine, graphics fine. Everything was fine, except this never shutdown issue. So, I marked out where the crystal was to go, hit the soldering iron, put everything back together, booted it up, shut it all down, and.... it stayed shut down!

So there you go people. A PC with a life of its own, ready to take over the world with malicious intent; powering itself back on just as you turn your back on the office to go home, after a long day's work. You can all rest easy now, knowing that I stopped its evil and vindictive plans, all because of a 1 cent watch crystal.

... though I would like know WHY that worked. Seriously.

Also, I saw my first PC today with RDRAM (that's Rambus if you don't know). While the stuff was used with consoles like the N64 and PS2, PC coverage was limited, and quickly ousted by the now ubiquitous DDR. And if you're wondering, yes, business still want PCs repaired, equipped with the latest and greatest bus standard, ISA.
 

Rob Williams

Editor-in-Chief
Staff member
Moderator
Wow, that is mad! I had a PC once with an Intel board that had a similar issue, although it wasn't 100% that it would reboot when you chose to shut down in the OS. I can't imagine that being the same issue though, since the PC was never worked on, and I doubt a component just fell off. Talk about a maddening experience though.

And RDRAM... I almost hate to admit that I didn't even know that was available on some PCs. Still cherish that 4MB upgrade for my own N64.
 

Tharic-Nar

Senior Editor
Staff member
Moderator
Today I had another two units come through with the same issue (they're multiplying!) and I did the same thing by replacing the crystal. It didn't work! They're adapting! Panic stricken, I remembered something else my colleague mentioned about the Super I/O IC that's on the board, since that handles the front panel as well as coms. Took the board out, reflowed the the chip, plugged everything back in, and it worked! Until it didn't after another benchmark later on. This time around, I brought out the big guns, I took a soldering iron to that bastard chip...

Funny enough, Wikipedia has a picture of a near identical chip... http://en.wikipedia.org/wiki/File:Smsc_superIO_on_IBM.JPG

Trying to solder one of those chips is... not easy. The first time I tried, there was a couple small bridges on the pins, which resulted in serial and parallel coms failing. Had to use a mix of the iron, wick, flux and hot air to get it done, but eventually proved successful. With the chip firmly soldered in place, all coms worked and the system stopped turning itself on.

This saving the planet from the robot apocalypse stuff is hard going.
 

Kougar

Techgage Staff
Staff member
Strange, I have no idea how that's possible either to be honest.

My old X58 motherboard did that all the time after ~2 years of heavy use, not sure exactly when it started doing it. I could shut down via windows or the power button, and it would auto-restart after 5 seconds. Never figured out why, but I ruled out the wake-on events, shorts, and the case itself. I'm so glad to be rid of that motherboard now :D
 

Tharic-Nar

Senior Editor
Staff member
Moderator
It appears to be a problem with (shock, horror) this lead-free ROHS compliant solder that's used. These older boards, especially Pentium 4s, got really hot. Add a day to day duty cycle of cooling at night and heating up during the day, the solder just cracks. Reflowing either the specific chip or the whole board appears to be the only real, long term method of fixing things.

Since posting this, I've had to reflow another 5 boards to stop this behaviour. It's only a specific model that needs this doing so far, but it's annoying nonetheless. For that X58 board getting the same issue, I think it would have to be a south-bridge issue, since Super I/O chips are rarely used now. In that case, you'd need some specialist kit to handle that kind of job, ( an IR board pre-heater, hot-air gun with a wide area nozzle and mount). You could just bung the whole board in an oven (like people do with graphics cards) and hope for the best, but it ain't exactly a healthy method.

It's funny really, the volume of weird and quirky issues I've come across, fixing these PCs. We go through bins full of hard drives, since we have a very low tolerance on failure, being a single remapped sector under test. I've personally come across dozens of computers that fail to boot, simply because of bad RAM. Power supplies are the bane of nearly every PC I come across - to which I've exploded 4 so far. I'll tell you now, when a varistor pops, you giggle, when a filter cap blows, you're startled, when a 450v 100uF mains input cap blows... you fucking shit yourself (they're the size of a D cell battery).

You'll be surprised to know and just how many issues resetting and shorting out the RTC CMOS and BIOS can fix. Disconnect from the mains and allow the system to discharge fully, remove the battery, remove the jumper, short out the two battery contacts and press/hold the reset button (if it has one) for 5 seconds. Put the battery back in and turn the system back on. Often, I power the system back on with the jumper still removed (though some systems fail to boot if you do). Jump into the bios when prompted and then put the jumper back in. Load defaults, save and reboot. This can fix a wide variety of issues, like mismatched memory, failed bios updates (if the system can not update at all, not if a bios updates but fails as a result), system won't POST, random MB issues, COMS problems with serial... it's a fairly standard cure-all really.

I've heard PCs that sound like car engines, humming away due to an imbalanced fan. I've had a power spike rev up one of those blower-type fans (hamster wheels), and break some of the fins off.

Fixing a bent pin on a 478 is tricky, but on an LGA775, it's brutal.

When it comes to cleaning... ye gods... Let's just say that I fully support Apple's warranty failure on a smokers PC.

Anyway, there is one issue that has me somewhat baffled. Integrated Network Interface Cards (NICs) that 'forget' their MAC address. Work that one out peeps. We get so many motherboards in that have no MAC address, it's not even funny. Trying to fix them though, is extremely difficult, since we don't know where the issue originates from. Often, the MAC is stored in the EEPROM next to the NIC, so reflashing the EEPROM can technically fix it... but how is it happening in the first place. If the MAC is built into the NIC chip directly, well... you kind of need a new chip, which is pretty much impossible, so you replace the whole motherboard.

SoCs are going to be the bane of modern computers. If something goes wrong... you need a whole new unit. Disposable electronics at it's best, I guess.
 

Rob Williams

Editor-in-Chief
Staff member
Moderator
"I've personally come across dozens of computers that fail to boot, simply because of bad RAM."

It's not that related, but last night I had a RAM not secured in 100%, and the PC turned on, but just for half a second. I was a little worried at first... but it's clear that RAM definitely is a fickle beast.

"You'll be surprised to know and just how many issues resetting and shorting out the RTC CMOS and BIOS can fix. Disconnect from the mains and allow the system to discharge fully, remove the battery, remove the jumper, short out the two battery contacts and press/hold the reset button (if it has one) for 5 seconds. Put the battery back in and turn the system back on."

Sheesh, that seems like a lot of work. On modern boards you usually just have to move the jumper a couple of pins over in order to reset the CMOS. Or leave the battery out for 15 minutes, but obviously that's not ideal for what you do.

Fixing a bent pin on a 478 is tricky, but on an LGA775, it's brutal.

PGA CPUs can go... well, you know. Have had a couple of nightmares dropping AMD chips onto carpet... and those were mild cases :S
 

Tharic-Nar

Senior Editor
Staff member
Moderator
Removing the battery and waiting 15 minutes is the equivalent of shorting the pins out with the method I described, it's just that one takes 15 minutes, the other takes 30 seconds. Might sound complicated, but it's actually quite simple - plus after you do it a hundred times... lol.
 
Top