|I remember the days when CPUs were stuck in a rut. They were barely hitting 1Ghz. Networks were running at 1Gb and beyond and CPUs and storage just could not keep up. Clients wanted redundant, failover capable servers that could handle 600 clients, but SGI was running out of ways to do that. We couldn’t make the bus any wider (128bit computers?) and we couldn’t make the CPUs any faster. What should we do?|
One answer was to network many systems together over NUMA (non-uniform memory access). This would let many systems (that would normally be a cluster) act as if they were one system. The problem with a setup like this is speed. Systems accessing remote memory are slow. We had to find a way to speed up access to memory.
SGI invented lots of cool stuff to do this.
One of the new things was the CPOP connector. This connector was made up of many fuzzy little pads. The fuzzy pads would be compressed together and allow for a much higher frequency connection than normally would be allowed with gold pins.
The problem with delicate things like this is that they are far more sensitive to installation mistakes. Each connector needed to be torqued down to the right pressure so that the signals made it across cleanly. Install them too loosely and you’re going to see connectivity errors.
So cut to one of our advanced training courses where we taught field engineers how to replace boards. The instructor explained how each HEX head screw on the CPU cards needs to be torqued down to an exact specification. This is where one of the helpful field guys, who had clearly done this before, piped up and explained that you know the boards are properly seated when you tighten the HEX screw down and hear three “clicks.”
The instructor and I looked at each other. We didn’t remember there being three clicks. We normally used torque drivers to accurately measure the torque and there were never any clicks.
After some investigation and a quick examination of the board our helpful field guy had just installed, we discovered the source of the “three clicks.” They were the sound of the very expensive backplane cracking as the HEX screw penetrated the various layers of plastic…. OUCH.
From that point on, correct torque drivers were provided to all field personnel.