Thursday, June 14, 2012

HP ProLiant DL165 G5 BIOS Salvage

Hi everyone,

here's a story that is a little more "production" instead of "private". Still, I think it was fun to do the work, and might be worth sharing.

The admins of a customer recently decided to upgrade their DL165 G5 hosting server from 4x 500GB hard disks to 4x 2TB. As a result, the server would not recognize the new disks because the controller could not handle this volume size.
HP has provided a BIOS update for this, and somewhere in the description they mentioned that all BIOS updates that were released need to be installed in the correct order first. They forgot this step and flashed the newest BIOS version right away. The software did not indicate any problems but after the successful flash process, the server was bricked. It would power up, but no boot logo would appear and the fans would stay on maximum RPM. Even after hours, nothing would change.
They moved the system to an equally-equipped machine before the BIOS update took place so at least the hosted systems were easily recovered. But the server that was used had other tasks that no server would take care of now. A replacement was needed.

So the choices were:

  • dump the entire server and buy a new one
  • buy a new mainboard from HP for 900 € (this is the only support HP is capable of)
  • buy a used mainboard with no guarantee for 500 €

As there was nothing to lose, I offered to try and unsolder the SMD EPROM chip from the board to re-flash a working firmware, then putting a PLCC32 socket where the EPROM was located to ensure that another mis-flash would not brick the server again. At least that would make it easier to replace the chip when needed.
After that was agreed with the customer, I set to buy materials required for the operation:

  • Hot air reworking station for SMD soldering / desoldering (with some PLCC-shaped caps that put hot air out of four longish jets for heating all four edges of the chip simultaneously)
  • EPROM programmer - I picked a Chinese model first but found that the software runs only under Windows 2000 and, at best, under Windows XP 32-bit, plus it was not LPC capable so there was no chance to get anywhere with the SST 49LF080A Flash EPROM
  • DIL-to-PLCC32 adapter
  • next EPROM programmer - Batronix Barlino II, a really nice German product. Solid and beautiful metal case, solely USB-powered (competitors require an external power supply in addition), works with all existing Windows flavors and comes with a software suite that is really easy to understand and handle
  • Gas-driven pen soldering iron
  • Flux dispenser pen
  • Solder wick
  • Multiple PLCC32 sockets to be on the safe side in case I'd burn the first ones
  • Testboy 
Total cost: about 300 EUR. I could have bought an SST 49LF080A chip for the unlucky case that it would not survive the desoldering but as time was rather short and suppliers did not have one right away, I skipped that and hoped for the best.

Desoldering the EPROM

Fortunately HP engineers have not put too many components around the BIOS EPROM. Otherwise, a little wrong move could have had fatal consequences. The EPROM came loose nicely after about 20 seconds of 325°C. PCB traces did not feel too happy though, some were loosened a bit while removing solder remains with the solder wick so extreme care was needed from this point on. Actually I would not recommend to use solder wick at all as it tends to melt together with the PCB soldering pads and one easily tears them off. Next time I think I'd rather use a desoldering pump instead.
Plus, I'd probably remove the BIOS backup battery before starting to avoid shorting it.

Preparations for the PLCC32 Socket

Then I put some flux around the pads to ensure that no bridges would appear between two pins later.
Fixing the socket was not quite easy as the hot air gun did not allow for too precise movement, so I fixed two pins with the gas soldering iron first, hoping that they would keep the socket in place while using the hot air gun to finish the other pins.
Of course, the cheap plastic frame on the bottom of the socket melted before any pin got a hold on the PCB so I had to get the next socket and cut the bottom spacer out. That would leave some more space for the hot air to reach the soldering pads, and not much to melt on the way. No idea how that should work in practice. These things were designed to withstand a maximum temperature of about 130°C - not really a temperature range where soldering can be done at all. But maybe I misunderstood something here?
Well, after removing the bottom spacer from the socket, there is a slight risk that the chip be pushed in too deep socket so it gets stuck and could only be removed with some violence.
I decided to add a piece of cardboard beneath the chip later which would make up enough space at the bottom to insert a chip pull tool.

Testing the Works

After fixing the socket, testing time came. I tested the contacts on the top side of the socket against the respective pin below to see if all contacts are connected - bingo, seemed okay! Also checking for bridges between neighbor pins did not give any negative result. Phew! I'd expected much more trouble in my first steps of SMD soldering.
Even though the frame of the socket had suffered a bit from the heat, it had kept its shape and a short attempt to insert the EPROM did not show any mechanical trouble.
So I let it all cool down for a few minutes, then turned my attention to the EPROM.

EPROM Maintenance

As EPROM programming devices usually have DIL sockets, an adapter is needed to provide a PLCC32 socket. It maps each DIL pin to the respective PLCC32 pin. No mapping is needed there because the pinout of the 49LF080A is compatible with standard DIL EPROM pinouts. An additional check against data sheets indicated that this was the best way to go.
The EPROM programming software was able to read data from the EPROM, and I kept a copy of it to make sure that I'd have a path back in case everything else should fail.
Writing the new image worked like a charm, too, and after roughly 75 seconds, the EPROM was equipped with a BIOS image considerably older than the one that caused the breakdown. It is a rather generic image that can be used as a starting point to apply later updates sequentially as HP demands.

Final Test

Inserted the EPROM back into the new socket of the server and prepared for the frightful power-on moment. These machines create on hell of a noise once powered on. About 15 small fans at the highest possible RPMs - don't even want to imagine what happens if you stick a finger in there by accident.
Well, so far, that was a known sight. Black screen, jumbo jet noise, but not much more. Hey, some action in the area of the RAID LED indicators on the PCB, and after about 15 seconds, the HP boot logo would appear! Yoohoo! That looked like something.
I have not known the server's behavior before, but it seems to be typical that power-up takes a lot of time until any action can be perceived.
The unit did not recognize its hard disk drives. But that was expected because there was a mix of 500GB and 2TB drives connected, and probably none of them formatted to offer anything close to a bootable OS. At least the disks were properly identified. That was enough of a proof for me that the unit had its life back and admin people at the customer site could now use it again.



Conclusion

It was quite an investment first, but in retrospect it was fun to revive the unit and find that SMD soldering is not as impossible as I had anticipated.
The DL165 appeared pretty low-end though. While it is clear that this is one of the more affordable server units, I mean, why solder an EPROM chip right to the PCB? Was that cost saving or rather intentionally forcing BIOS update victims into buying newer machines? Production would surely not have become much more expensive with BIOS EPROMs in sockets.
And while the 19-inch single-height rack housing looks pretty impressive first, that form factor comes at a price. That many small diameter fans to keep the unit from burning itself, no more than four hard disks (which are not accessible from outside with hot-plug enabled hard disk mounting frames but require shutting down and opening the unit whenever there is any maintenance to do). No thumb screws, need a screwdriver for everything. And, of course, no blade-style connector at the back but everything just like with a desktop PC. Separate mouse, keyboard, VGA, LAN, USB etc. connectors. A lot of unwiring is needed to get the unit out of its rack (the reverse way is equally painful).

Well, I hope I can get hold of one bricked DL165 and offer my BIOS rescue services. Obviously not many people dare to go that far.

So if there is anybody having a similar problem (BIOS accidents may happen with usual PC mainboards, too), just write me an e-mail and we'll see what can be done.

Thanks for reading!

Greets,
Joe

Tuesday, March 6, 2012

Reverse-Engineering a Rotary Encoder from a Panasonic VTR

Hi Folks,

recycling time - after buying a broken Panasonic NV-F75 VHS tape recorder from eBay, and no way to rescue the appliance as a whole, I stripped the valuable parts from it and now it's time to take a closer look at that Jog/Shuttle wheel that the more exclusive Panasonic models featured since the 90s. It actually consists of two nested axes, the outer one having fixed limits to the left and to the right and a center notch (shuttle wheel) while the jog dial in the center spins endlessly in clockwise or counterclockwise direction, with approximately eight "stops" per rotation as a feedback for the user.
I always loved that feature because it gave you a great way of controlling playback, allowed cueing forwards and backwards in three different speeds, and with the center wheel you could also navigate to one specific frame if you wanted, e.g. to continue recording a movie after the commercial break with frame exactness (well, nearly...)
Plus it added a professional touch to the whole device without giving it the monstrous appearance of typical studio equipment. Great thing! That would make a nice two-in-one control after all, and it appears a nice exercise to get somewhere with the Arduino microcontroller.

So here it is:





That C-shaped ring is where the shuttle ring is mounted. The center pin holds the jog dial that moves freely around its axis inside the shuttle ring.
View from the back - the two latches left and right of the center are not actually contacts with electrical meaning. I assume they mainly serve for fixing the unit properly on the PCB where it is mounted.




The 8-pin connector that I soldered to the unit is meant to adjust the contacts to fit the encoder's custom pin spacing to match a standard breadboard spacing. Figuring out the pinout from outside seemed a little hard, so I decided to open the unit and take a close look inside.
It is quite easy to open it. Just pry open the four clips on the back as indicated here and the unit comes apart - the two latches left and right from the center may be a little tricky though:




It consists of 5 pieces in total:




From left to right:

  1. backplate and center axis
  2. rotary encoder for jog dial
  3. interconnect place with 3 conductors directed to the bottom for the jog dial and 6 pointing to the top for the shuttle wheel
  4. shuttle wheel ground plate (contacts on the backside, see below)
  5. top ring to hold it all together and limit the range of motion for the shuttle wheel

Here are some closeups:










There is a lubricant across all contacts that I didn't want to remove to keep the unit in good shape. What we see here is part of the jog dial encoder. All golden surfaces are interconnected. Still there are three tracks. The center track is ground, and as the two other (partially covered) tracks are running parallel, the encoding trick must take place on the interconnect plate.



The interconnect plate. On the left, we see six contacts that face to the front (to touch the bottom of the shuttle wheel - we'll see that later). The right side offers three contacts. Remeber the layout of the encoding wheel? We can see here that the left (ground) and middle contact are on the same level while the right contact is a little longer. That means that in comparison to the middle contact, the right one will connect to ground either later (in a clockwise rotation) or earlier (in counterclockwise rotation) than the middle contact. That gives a decent way of figuring out which direction the jog wheel turns. I'll discuss that later in detail.





Bottom side of the shuttle wheel. That looks like a binary encoder indeed. The inner ring seems to be the most significant bit (MSB) as it has only two different states from one end to the other (left: off, right: on). The further outside the rings are, the more frequently they change their on/off states.

Guessing from the layout of the traces on the interconnect plate, the left four or five contacts (seen from the top of the unit) probably deal with the shuttle wheel whereas at least two of the right contacts deal with the jog dial. One pin is probably the common ground line.
Using a multimeter, I found out that the pinout seen from the direction shown in the photo is (from left to right):

Shuttle Bit 3 (MSB)
Shuttle Bit 2
Shuttle Bit 1
Shuttle Bit 0 (LSB)
Common Ground
Jog Dial Middle Contact (I'll call that A from now on)
Jog Dial Right Contact (calling this one B from now on)

Examining the Jog Dial




This closeup shows the encoding magic. The left of the three connectors is ground, and the middle one is alternating between ground and NC as the dial is turned. Same goes for the outer pin but that one catches the ground / NC status a little before or after the center pin. This relation in timing lets us determine the rotary direction eventually:
Turning the jog dial clockwise would cause the center pin to change first from NC to ground or vice versa. As the outer pin is a bit longer, it takes more time for it to reach the same connection state. Likewise, when turning counterclockwise, the outer pin reaches ground first, and the center pin shortly after that.
We can draw two bits from there. When any change is detected, we need to compare the previous set of bits to the current set. Depending on the direction, the sequence of values will give us a clue. I'll discuss that in detail further below.

Examining the Shuttle Wheel

Now that the ground wire and the four bits for the shuttle wheel data are known in the pinout, it is time to connect the whole thing and link up some LEDs to determine the order of values we get for each position of the wheel:



I prepared a breadboard with six LEDs and set up the pins to match the ones on the encoder unit.




The left four LEDs are the bits from the shuttle wheel (left = MSB, then bits 2, 1, 0). The other two LEDs are what I have called A and B earlier, and they reflect the current jog dial activity.
One might have derived the values from the encoder backplate, but I preferred to try to get the most accurate results. Why waste time guessing?
This is what I found out:
 


See how nicely this pattern repeats on the backplate of the shuttle wheel? Only one odd thing I found: the shuttle is supposed to have a center position and the same number of positions to the left and to the right, but that would require an odd number of states. In fact, I found 16 different states, which means that the center position is not actually the center. There should be two values indicating center, but the center notch is clearly linked to the value 12. Well, the value 4 appears only a very short time when turning right from the center notch, so it would probably be good to interpret both 12 and 4 as "centered".

Logic Behind the Jog Dial

We still have to figure out how to detect clockwise / counterclockwise motion on the jog dial. The sequence of bits is like this (as measured on the breadboard):




So rotating clockwise will give this sequence: 2, 3, 1, 0, 2, 3, 1, 0 etc.
Counterclockwise results in: 1, 3, 2, 0, 1, 3, 2, 0 etc.
The orange arrows just indicate where the pattern repeats. This is to cover the sequence "0, 2" for clockwise and "0, 1" for counterclockwise rotation that one might otherwise miss.

Now we don't want to sample four values until we are sure about the direction, so let's take a look at all possible combinations of one value followed by the next, because in most cases just two samples are enough to give a clue.
For instance, if we look at the clockwise sequence above once more, there are four possible pairs contained:

  • 2, 3
  • 3, 1
  • 1, 0
  • 0, 2

Same for the counterclockwise list:

  • 1, 3
  • 3, 2
  • 2, 0
  • 0, 1

So from the twelve possible sets, we can at least identify these eight doubtlessly:



Value1 always preceeds Value2 in time, so in order to measure a movement, we need to have a "previous" value buffered to compare with the current measurement. We have four states that clearly indicate a clockwise motion because these four sequences appear only as a subset of the clockwise sequence shown above. Same goes for the counterclockwise sequences of which we have also four. Eventually there are four states that are indeterminate because these sequences appear neither on the clockwise nor on the counterclockwise original sequence. If we measure something like this, this should just cause a repetition of the last clearly identified motion.
I think the easiest way to manage values from the jog dial is a single value that either decreases (when rotating counterclockwise) or increases, like a slider control in current computer operating systems.
The following "jog states" we might detect (again, this presumes we have a "previous" state = State1 and a current one, State2):



Read the table like this:

  • for 1st line: if there was no known previous state, and the current state indicates clockwise movement, then increase the value
  • for 3rd line: if the previos state was a clockwise rotation, and the current state is indeterminate, just act like the jog dial is still turned clockwise, and increase the value
  • for 5th line: if the previous state was clockwise rotation, but the current one is counterclockwise, decrease the value

I have made some notes during my breadboard experiments. Most of it is covered above in nice Excel tables, but this also shows the controller pinout. As you can see here, there are only seven positions associated with the "left" side of the shuttle wheel (i.e. if it is turned far right, that means it has position 8), while there are eight for the "right" side, leaving one that clearly indicates center position (12). I'll interpret the value 4 as "center", too, because it is very small compared to the range of motion, and this balances the number of values on both sides.


That's it for now. Soon I will try to use this in combination with an LCD display on the Arduino Duemilanove. Might take a while because that C language keeps driving me nuts. Once there is something to be shown, I will post it here of course!

Thanks for reading! CU soon!



Follow-up: Arduino Sketch


As requested by Fentronics, here is the Arduino sketch I used to read both components of the device, the rotary encoder (jog wheel) and the Gray-encoded shuttle wheel:

// pinout:
const int LED = 13;
const int JOG0 = 7; // digital 7
const int JOG1 = 6; // digital 6
const int SHUTTLE0 = 2; // digital 2
const int SHUTTLE1 = 3; // digital 3
const int SHUTTLE2 = 4; // digital 4
const int SHUTTLE3 = 5; // digital 5

const int ROT_CW = 1;
const int ROT_CCW = 2;
const int ROT_NA = 0;

byte prevJogCode = 0;
byte prevJogState = ROT_NA;
int jogValue = 0; // value of the jog wheel

// jog state to direction translation
const byte jogStates[16] = { ROT_NA, ROT_CCW, ROT_CW, ROT_NA,
                             ROT_CW, ROT_NA, ROT_NA, ROT_CCW,
                             ROT_CCW, ROT_NA, ROT_NA, ROT_CW,
                             ROT_NA, ROT_CW, ROT_CCW, ROT_NA };

byte shuttleValue = 0; // value of the shuttle wheel
byte prevShuttleValue = 0;

// gray to binary translation
const byte shuttleValues[16] = { 8, 9, 11, 10, 14, 15, 13, 12, 4, 5, 7, 6, 2, 3, 1, 0 };

void setup()
{
  pinMode(LED, OUTPUT);  //we'll use the debug LED to output a heartbeat
 
  pinMode(JOG0, INPUT);
  pinMode(JOG1, INPUT);
  digitalWrite(JOG0, HIGH);
  digitalWrite(JOG1, HIGH);
 
  pinMode(SHUTTLE0, INPUT);
  pinMode(SHUTTLE1, INPUT);
  pinMode(SHUTTLE2, INPUT);
  pinMode(SHUTTLE3, INPUT);
  digitalWrite(SHUTTLE0, HIGH);
  digitalWrite(SHUTTLE1, HIGH);
  digitalWrite(SHUTTLE2, HIGH);
  digitalWrite(SHUTTLE3, HIGH);

  Serial.begin(9600);
}

void loop()
{
  // read jog rotary encoder (2 bits)
  byte thisJogCode = (digitalRead(JOG1) == LOW ? 2 : 0) +
                     (digitalRead(JOG0) == LOW ? 1 : 0);

  // compare to previous
  bool jogChange = (thisJogCode != prevJogCode);
  if (jogChange)
  {
    // determine current direction from the states table
    // using the previous and current value
    byte jogState = jogStates[prevJogCode * 4 + thisJogCode];
  
    // if the new state is NA, continue into the same direction
    // as last time
    if (jogState == ROT_NA) jogState = prevJogState;
  
    // increase/decrease counter
    switch (jogState)
    {
      case ROT_CCW: jogValue--; break;
      case ROT_CW: jogValue++; break;
    }
    prevJogCode = thisJogCode;
    prevJogState = jogState;
  }
 
  // read shuttle bits
  byte gray = (digitalRead(SHUTTLE3) == LOW ? 8 : 0) +
              (digitalRead(SHUTTLE2) == LOW ? 4 : 0) +
              (digitalRead(SHUTTLE1) == LOW ? 2 : 0) +
              (digitalRead(SHUTTLE0) == LOW ? 1 : 0);

  // convert to binary (0 = far left, 15 = far right)
  byte shuttleValue = grayToBinary(gray);
  bool shuttleChange = (shuttleValue != prevShuttleValue);
  prevShuttleValue = shuttleValue;
 
  // announce to serial if there are any changes
  if (jogChange || shuttleChange)
  {
    digitalWrite(LED, true);
    Serial.print("Shuttle: ");
    Serial.print(shuttleValue);
    Serial.print(" Jog: ");
    Serial.println(jogValue);
    digitalWrite(LED, false);
  }
}

byte grayToBinary(byte grayCode)
{
  for (int i = 0; i <= 15; i++)
  {
    if (shuttleValues[i] == grayCode) return 15 - i;
  }
  return 0;
}


The code is quite primitive. It would be much cleaner to use interrupts so the processor only has something to do on any bit change instead of testing each bit thousands of times per second. Well, this is only for testing so I left it at this.
The code will measure all pins and derive the value information from their states. Whenever a state change is detected in one of both components, a line is written to the serial interface reflecting both values, and the LED (pin 13) is flashing.
The jogStates and shuttleValues tables are directly reused from the tables shown in the original post.
Feel free to ask if you have any questions, and have a lot of fun :)

This is how to wire it up - couldn't be easier, really!