BLOGS: My COW Blog Adobe Blog Editing Technology After Effects Final Cut Entertainment

Scaling IT

Just about everyone can have a free web page. You get them free when you open cloud accounts or purchase internet service. This has lead to a proliferation of cat pictures on the Internet.

Back in the 90s, when it cost a little more to get on the Internet, the idea of personal web pages was just beginning. One very large ISP (Internet Service Provider) that used SGI systems wanted to sell personal websites. They felt SGI's Challenge S system was the perfect solution. They would line up hundreds of these systems, and each system could handle several sites. SGI did indeed set several website access records for handling the website for "Showgirls,” which, as you can imagine, had a racy website.

Fast forward a few months and there are 200 systems lined up in racks handling personal web pages. Then I start getting phone calls.

"Hey Steve. These guys are filing cases about two or three times a week to get memory replaced. We're getting parity errors that cause panics about two or three times a week."

I fly out and start looking carefully at the machines. The customer had decided to purchase third party memory (to save money) so they could max out the memory in each system. Each machine had 256MB of RAM, which was a lot at the time. This was parity memory, which means that each 8 bits has a parity bit that is used like a cheap "double check" to make sure the value stored is correct. The parity bit is flipped to a 1 or 0 so that each 8 bits always has an even number of 1s in it. If the system sees an odd number of 1s, it knows there's a memory error.

I looked at each slot. I looked at ambient temperature. I made sure the machines were ventilated properly (including making the customer cover all the floppy disk holes since they did not have floppies installed, but had neglected to install the dummy bezel). No change. Parity errors continued and clearly there was an issue.

Going back to the memory vendor and the specs on the chips, we started doing the math.
The vendor claimed that due to environmental issues (space radiation etc) one should expect a single bit parity error about once every 2000 hours of uptime for each 32MB of memory. Half of these errors should be "recoverable" (i.e., the data is being read and can be read again just to be sure), but the other half will lead to a panic. They do not mean the memory is broken, but the errors should be rare.

So let's do the math: 256MB/machine (so that's 8X 32MB).
Hours of uptime? (These machines are always up): 8760hours
How many total parity errors: 35 per system, per year, with half of them being "fatal." So, that’s 17 panics per system per year. They had 200 systems. That's 3400 panics a year in that group of systems or roughly 10 per week?!

Consider this when you start to scale up your IT systems. How many machines do you have to put in a room together before "once a year" activity becomes "once a day?”

Posted by: Steve Modica on Oct 10, 2012 at 3:56:04 amComments (4) storage, networking


Re: Scaling IT
by Mike Cohen
since we're sharing, we had a Crimson Reality Engine - weighed about 100 pounds and probably about the processing power of an iPad but in 1994 it was incredible!!
Re: Scaling IT
by Steve Modica
I had one of those at home for a while. R3000 MIPS chip. If you could find "The Magic Garden" (a kernel internals book) it gives lots of nice assembly language and stack trace examples specifically for that chip.

Steve Modica
CTO, Small Tree Communications
Re: Scaling IT
by Matt Geier
I'm surprised Steve....
You only had (1) of them? lol.

I had 3 of them at home. One desk for each monitor too! Two of them had Elan Graphics Boards.

I also worked on an O2 and an Indy in the office when I was doing Irix support.

I never knew anyone personally to turn any of the machines into a refrigerator though. I wished I had. I might have one myself today.

This might bring back some memories for you then Steve;
Something like this;

Good stuff!

Matt Geier
(Video Networking Solutions Expert)
(Creative Design Workflow Consultant)
(Social Media Networks Consultant)
(Technical Video Industry Sales Consultant)
Re: Scaling IT
by Mike Cohen
our first web server was an Indigo SGI box. I don't know what the specs were but probably less powerful than a flip phone by today's standards.

Interesting story.

Mike Cohen
© 2018 All Rights Reserved