Thursday, June 14, 2012

HP ProLiant DL165 G5 BIOS Salvage

Hi everyone,

here's a story that is a little more "production" instead of "private". Still, I think it was fun to do the work, and might be worth sharing.

The admins of a customer recently decided to upgrade their DL165 G5 hosting server from 4x 500GB hard disks to 4x 2TB. As a result, the server would not recognize the new disks because the controller could not handle this volume size.
HP has provided a BIOS update for this, and somewhere in the description they mentioned that all BIOS updates that were released need to be installed in the correct order first. They forgot this step and flashed the newest BIOS version right away. The software did not indicate any problems but after the successful flash process, the server was bricked. It would power up, but no boot logo would appear and the fans would stay on maximum RPM. Even after hours, nothing would change.
They moved the system to an equally-equipped machine before the BIOS update took place so at least the hosted systems were easily recovered. But the server that was used had other tasks that no server would take care of now. A replacement was needed.

So the choices were:

  • dump the entire server and buy a new one
  • buy a new mainboard from HP for 900 € (this is the only support HP is capable of)
  • buy a used mainboard with no guarantee for 500 €

As there was nothing to lose, I offered to try and unsolder the SMD EPROM chip from the board to re-flash a working firmware, then putting a PLCC32 socket where the EPROM was located to ensure that another mis-flash would not brick the server again. At least that would make it easier to replace the chip when needed.
After that was agreed with the customer, I set to buy materials required for the operation:

  • Hot air reworking station for SMD soldering / desoldering (with some PLCC-shaped caps that put hot air out of four longish jets for heating all four edges of the chip simultaneously)
  • EPROM programmer - I picked a Chinese model first but found that the software runs only under Windows 2000 and, at best, under Windows XP 32-bit, plus it was not LPC capable so there was no chance to get anywhere with the SST 49LF080A Flash EPROM
  • DIL-to-PLCC32 adapter
  • next EPROM programmer - Batronix Barlino II, a really nice German product. Solid and beautiful metal case, solely USB-powered (competitors require an external power supply in addition), works with all existing Windows flavors and comes with a software suite that is really easy to understand and handle
  • Gas-driven pen soldering iron
  • Flux dispenser pen
  • Solder wick
  • Multiple PLCC32 sockets to be on the safe side in case I'd burn the first ones
  • Testboy 
Total cost: about 300 EUR. I could have bought an SST 49LF080A chip for the unlucky case that it would not survive the desoldering but as time was rather short and suppliers did not have one right away, I skipped that and hoped for the best.

Desoldering the EPROM

Fortunately HP engineers have not put too many components around the BIOS EPROM. Otherwise, a little wrong move could have had fatal consequences. The EPROM came loose nicely after about 20 seconds of 325°C. PCB traces did not feel too happy though, some were loosened a bit while removing solder remains with the solder wick so extreme care was needed from this point on. Actually I would not recommend to use solder wick at all as it tends to melt together with the PCB soldering pads and one easily tears them off. Next time I think I'd rather use a desoldering pump instead.
Plus, I'd probably remove the BIOS backup battery before starting to avoid shorting it.

Preparations for the PLCC32 Socket

Then I put some flux around the pads to ensure that no bridges would appear between two pins later.
Fixing the socket was not quite easy as the hot air gun did not allow for too precise movement, so I fixed two pins with the gas soldering iron first, hoping that they would keep the socket in place while using the hot air gun to finish the other pins.
Of course, the cheap plastic frame on the bottom of the socket melted before any pin got a hold on the PCB so I had to get the next socket and cut the bottom spacer out. That would leave some more space for the hot air to reach the soldering pads, and not much to melt on the way. No idea how that should work in practice. These things were designed to withstand a maximum temperature of about 130°C - not really a temperature range where soldering can be done at all. But maybe I misunderstood something here?
Well, after removing the bottom spacer from the socket, there is a slight risk that the chip be pushed in too deep socket so it gets stuck and could only be removed with some violence.
I decided to add a piece of cardboard beneath the chip later which would make up enough space at the bottom to insert a chip pull tool.

Testing the Works

After fixing the socket, testing time came. I tested the contacts on the top side of the socket against the respective pin below to see if all contacts are connected - bingo, seemed okay! Also checking for bridges between neighbor pins did not give any negative result. Phew! I'd expected much more trouble in my first steps of SMD soldering.
Even though the frame of the socket had suffered a bit from the heat, it had kept its shape and a short attempt to insert the EPROM did not show any mechanical trouble.
So I let it all cool down for a few minutes, then turned my attention to the EPROM.

EPROM Maintenance

As EPROM programming devices usually have DIL sockets, an adapter is needed to provide a PLCC32 socket. It maps each DIL pin to the respective PLCC32 pin. No mapping is needed there because the pinout of the 49LF080A is compatible with standard DIL EPROM pinouts. An additional check against data sheets indicated that this was the best way to go.
The EPROM programming software was able to read data from the EPROM, and I kept a copy of it to make sure that I'd have a path back in case everything else should fail.
Writing the new image worked like a charm, too, and after roughly 75 seconds, the EPROM was equipped with a BIOS image considerably older than the one that caused the breakdown. It is a rather generic image that can be used as a starting point to apply later updates sequentially as HP demands.

Final Test

Inserted the EPROM back into the new socket of the server and prepared for the frightful power-on moment. These machines create on hell of a noise once powered on. About 15 small fans at the highest possible RPMs - don't even want to imagine what happens if you stick a finger in there by accident.
Well, so far, that was a known sight. Black screen, jumbo jet noise, but not much more. Hey, some action in the area of the RAID LED indicators on the PCB, and after about 15 seconds, the HP boot logo would appear! Yoohoo! That looked like something.
I have not known the server's behavior before, but it seems to be typical that power-up takes a lot of time until any action can be perceived.
The unit did not recognize its hard disk drives. But that was expected because there was a mix of 500GB and 2TB drives connected, and probably none of them formatted to offer anything close to a bootable OS. At least the disks were properly identified. That was enough of a proof for me that the unit had its life back and admin people at the customer site could now use it again.



Conclusion

It was quite an investment first, but in retrospect it was fun to revive the unit and find that SMD soldering is not as impossible as I had anticipated.
The DL165 appeared pretty low-end though. While it is clear that this is one of the more affordable server units, I mean, why solder an EPROM chip right to the PCB? Was that cost saving or rather intentionally forcing BIOS update victims into buying newer machines? Production would surely not have become much more expensive with BIOS EPROMs in sockets.
And while the 19-inch single-height rack housing looks pretty impressive first, that form factor comes at a price. That many small diameter fans to keep the unit from burning itself, no more than four hard disks (which are not accessible from outside with hot-plug enabled hard disk mounting frames but require shutting down and opening the unit whenever there is any maintenance to do). No thumb screws, need a screwdriver for everything. And, of course, no blade-style connector at the back but everything just like with a desktop PC. Separate mouse, keyboard, VGA, LAN, USB etc. connectors. A lot of unwiring is needed to get the unit out of its rack (the reverse way is equally painful).

Well, I hope I can get hold of one bricked DL165 and offer my BIOS rescue services. Obviously not many people dare to go that far.

So if there is anybody having a similar problem (BIOS accidents may happen with usual PC mainboards, too), just write me an e-mail and we'll see what can be done.

Thanks for reading!

Greets,
Joe