| | | | Apple has always been on the leading edge of connectivity for their systems.
Back in 2003, before we had formed Small Tree as a company, I can recall drooling over a Power Book laptop with an integrated Gigabit port. That was a crazy thing to have on a laptop at the time. Gigabit was still a little weird, very expensive, and not common as a drop at anyone’s desk. Yet here apple was putting it on a laptop.
Thunderbolt is a similarly aggressive move. It puts a great deal of IO horsepower on some very small systems.
Firstly, let’s consider what Thunderbolt is. Thunderbolt is a 4X (4 lane) PCIE 2.0 bus. It’s equal in performance (and protocol) to the top two slots of a traditional tower Mac Pro. Along with that 4X pipe, there’s a graphics output pipe for a monitor. These pipes are not shared! So using a daisy-chained monitor will not impinge on any attached IO devices.
Thunderbolt is capable of moving data at 10Gbits/sec FULL DUPLEX, meaning data can move in two directions at the same time, giving the pipe a total bandwidth of 20Gbits/sec.
As I read through the forums and opinion articles on Thunderbolt, one of the themes that pops up is “It’s Apple proprietary and expensive. Just use USB 3.0.” This is a reasonable point. USB 3.0 is capable of 4.8Gbits/sec (about half of the speed of Thunderbolt). Further, there are plans to speed up USB 3.0 to 10Gbits/sec to match Thunderbolt. So given these factors (and the low cost of most USB devices), it seems like an obvious choice.
However, there are some reasons that Thunderbolt may win the day for external high-speed connectivity (and relegate USB to it’s traditional low-end role).
First of all, most IO chips (Ethernet, SATA, SAS) are manufactured with a native PCIE backend. The chips are natively built to sit on a PCIE bus. So not only will you save the overhead of an additional protocol, the guys writing the code to support these devices only need to write one driver (PCIE). It just works whether the device is on a card or in a Thunderbolt toaster.
Another advantage of Thunderbolt is its power budget. Often, devices are powered by the port itself (very common with USB). USB can provide 4.5Watts of power to attached devices, whereas Thunderbolt offers a full 10Watts of power.
Lastly (and this is probably the most interesting thing about PCIE and Thunderbolt) is that Thunderbolt is a switched/negotiated protocol that is extremely flexible. Cards that want a 16X slot can work in a 4X slot. PCIE switches can (and do) exist to allow multiple machines to talk to one PCIE based device (like a RAID). So imagine a time in the future when devices can be connected to a “switch” in a back room and multiple systems can see them. Imagine those systems can have multiple connections to boost their bandwidth.
Thunderbolt may not be everywhere yet, but it’s really the first imaginings of a new way to handle IO outside of the “tower” type machines. I think it is easily the best choice for Mac users and will likely offer some amazing benefits in the next generation.
| | | | |
| | | | I’m getting ready to head out to Las Vegas this weekend to get the Small Tree booth all setup (SL6005) and I’m really excited.
First off, we have a brand new version of our Titanium platform coming out called “Titanium Z”. The Z platform is AWESOME and the folks here at Small Tree (including The Duffy) are very excited to start telling people about it.
First of all, in keeping with our history of bringing really high-tech functionality (like real time video editing) down into the commodity price space, we are now bringing down Storage Virtualization.
To offer Virtualization, we had to migrate Titanium to a new OS based on FreeBSD. In doing this, we were able to pull in ZFS technology. This gives us the ability to stripe RAID sets together, migrate data around, and add new RAID sets to existing volumes without rebuilding.
We’ve also updated all the hardware, increased performance 25% and kept our same great low price model. You get more for your money.
The Titanium 4 has also been extensively improved based on customer feedback. ZFS performance is so good, we ditched the need for a RAID controller in the new T5. At the same time, we added a 5th drive (more storage, more performance) and allowed for the addition of a dual port 10GbaseT card. So now, not only is the device mobile, fast and inexpensive, it also supports direct attaching with 10Gb Ethernet! You can bring along one of our ThunderNET boxes on your shoot and have your laptop editing over 10Gb Ethernet right out in the field.
Lastly, I’ve had tons of people bugging me about SSDs and 10Gb. I demoed a super fast box at the Atlanta Cutters called “Titanium Extreme” and we showed off real time video playing to my laptop (over Dual 10Gb ports) going 1.2GBytes/sec. (not a benchmark. Real video). We’ll have this guy along as well.
So if you want to stop by and visit us and see all this cool stuff, swing down the South Lower (6005). You can’t miss us. We will have a giant round screen hanging above us with all sorts of amazing stuff flying by put together by Walter Biscardi. We’d love to see you.
| | | | |
| | | | Every year as NAB approaches, the marketing once again begins. Oh the marketing....
As NAB approaches, I'd like to take a moment to remind people in the market for storage that Gigabytes/second is not what makes video play smoothly.
Vendors with no Computer Engineers on staff will pull together monstrous conglomerations of SSDs and RAID cards, run a few benchmarks (probably four or five different ones until they find one they like) and then claim they've hit some huge number of Gigabytes per second.
Small Tree has been supporting Server based video editing longer than anyone in the market. We were supporting Avid when they used SGI's 10 years ago (and they were SGI's largest customer). We know how things work. We helped develop them.
Playing video requires a RAID configuration that can handle multiple, clocked streams. Benchmarks on the other hand, tend to use a single stream, reading sequentially as fast as they can.
What's the difference you ask? Well, in the sequential case, the RAID controller gets to use lots of tricks to avoid the hard work of seeking around disks and reordering commands. The next block to be read is probably the next block, so things like "read ahead" work wonderfully. Don't just read the next 128k, read the next 1MB! It'll all be read next anyhow. It makes it very easy for sequential benchmarks to look good. In the Supercomputing world, meaningless TeraFLOP marketing numbers were referred to as "MachoFLOPS". We knew they meant nothing when vendors could spin assembly instructions in a tight loop and claim 1.5PetaFLOPS.
Small Tree's testing and development involves looking carefully at how the Video Editing Programs themselves read so we can carefully mimic that traffic during testing. This lets us be sure our equipment doesn't rely on sequential tricks to deliver real, multi-stream performance.
So when you walk up to a vendor at NAB and they start telling you about their MachoGigabytes per second, make sure you ask them about their sustained latency numbers. Small Tree knows all about latency and we back it up, every day with our products. | | | | |
| | | | Very recently, Small Tree had the opportunity to go down to Atlanta and visit Walter Biscardi and upgrade his data center and edit suites. In conjunction with this trip, we also did a presentation on the upgrade for the Atlanta Cutters and showed off a new SSD based Titanium shared storage system we put together. This new Titanium SSD was able to move 1.2GB/sec of *realtime* video to Adobe Premiere with no dropped frames. This is faster than you can go with 8Gb Fibre Channel and the fastest realtime video I've ever seen displayed live without a net!
The upgrade involved pulling out Walter's existing SFP+ 10Gb switch, which had a mix of Gigabit SFP modules for his suites and 10Gb SFP+ modules for his server, and replacing it with a 10GbaseT switch from Small Tree that had 4 SFP+ ports (for the server) and 24 10GbaseT ports for the new Titanium and some of his edit suites.
Before we dived right into putting in the new switch and adding the Titanium 8, we spent a lot of time talking about power. Walter didn't want to spend $1000 for an expensive UPS, but he wanted a good UPS that could handle the new load and not break the bank. We settled on an Ultra Xfinity that offered 1200W of load. This allowed for plenty of overhead for the 660W titanium and kept the loading on the UPS to well under the recommended 80%.
After installing the new switch, we moved all the cables over. One of the wonderful aspects of 10GbaseT is that we didn't have to do anything special when replacing ports that used to be Gigabit. 10GbaseT clocks down to Gigabit and even 100Mbit. So there was no trouble with legacy equipment or special adapters.
Once the switch was in, we turned to the Titanium 8. We installed it and plugged it into its new UPS and cabled it into the switch. We bonded the two 10GbaseT ports coming from the Titanium so it would load balance all the incoming clients.
Once that was done, it was time to upgrade some of the more important edit suites to 10GbaseT. What good is having all that 10Gb goodness in the lab when you can't feel the power all the way to the desktop? We upgraded both of Walter's iMac systems to 10Gb (via ThunderNET boxes) and added another 10Gb card to his fastest Mac Pro in Suite 1.
The result was a cool 300MB/sec writing from his iMac and 600MB/sec reading using the Aja System test. As I tell people, this isn't the best way to measure NAS bandwidth because applications like Final Cut and Adobe use different APIs to read their media files.
With the NAB Show approaching, I hope many of you that are planning to attend will be able to swing by Small Tree’s booth (SL6005) to learn more about this recent install directly from Walter, as he’ll be on-hand. While you’re there, feel free to ask about the SSD based Titanium shared storage solution we’re “going plaid” with.
If you’d rather not wait until NAB to learn more, contact me at modica at small-tree.com | | | | |
| | | | If you use Adobe Premiere for post-production projects, then you may have come across the problem where the application re-conforms and recreates peak files for projects every time the project opens or when changing back to Adobe from using another application, i.e. hide/show.
If you see this problem occurring, check that all systems are time synched with each other (IE NTP is enabled in the Time and Date preference panel). If you do not have internet access on the systems, make sure all the systems are at least locked to the server time.
The cause of this problem is that Adobe is creating new files on the server. If the server has a time that is "in the future" - relative to your client - Adobe Premiere will see these future files and decide that the project is out of date. Then it will want to rebuild all the files. When using any application - including Final Cut, Adobe Premiere or Avid across a NAS or SAN - it is imperative that all of the machines agree on the time and date so the applications can tell when a project is up to date and when it needs to be updated.
The issue above will also occur with other applications in a network based client-server environment. Using the Small Tree Titanium Server as an example, one can setup NTP so that it can synchronize its clock the same way and to the same NTP server as the Mac clients do.
To enable NTP client code on Titanium Server:
-Login to the Titanium GUI.
-Ensure that the Titanium has a Network Interface defined on your local network,
intranet, such that you statically define or use DHCP to obtain an IP address.
-Define a default router and DNS server using the
SETUP -> network -> interfaces GUI area. If you need to add a default router, make sure you apply the IP address of the router to the appropriate network interface.
-Go to:
HARDWARE -> setup
find the "Current server time" area and set/click on "Use ntp",
then define the NTP server, i.e. time.apple.com, as an example and
"apply".
-If you have the network IP address, default route and DNS setup
properly, it will come back with the correct Date/Time. If not,
you will need to revisit your network settings to correct.
If you’re having workflow issues with Adobe Premiere and your server, contact me at info@small-tree.com.
| | | | |
| | | | Storage is a tough market and customers are always willing to pay a little less to get a little less. My take away is this: In the war between Ethernet and EtherNOT based storage, such as Fibre Channel, the one that delivers the best value for the lowest price is going to win. As Warren Buffet likes to say, "In the short term, the market is a popularity contest. In the long term, it's a weighing machine." People need to buy based on value over time.
Fibre Channel has been hamstrung for a long time by its need for custom ASICs (chips used to implement the protocol in hardware). Fibre Channel wanted to overcome all of the limitations of Ethernet and so they invented a protocol that did just that. The problem of course is that those custom ASICS are not on motherboards. You don't get FC chips built into your DELL server (unless you order a special card or riser). You don't see Apple putting FC chips on Mac Pros (even tho they sold Xsan and XRAID for so long).
What's the result? Expensive chips. It's expensive to fab them and expensive to fix them. FC stuff is expensive. Vendors may find ways to lower the entry point, but somewhere or other, either via support, licensing or upgrades, the cost will be expensive.
Ethernet certainly has ASICS as well. There are network processors, MAC (media access control chips) and PHY chips (the chips that implement the physical layer). They can be incredibly expensive. The first 10Gb cards Small Tree sold were $4770 list price! But here's the thing...a 10Gb card today is $1000 or less. The chips are everywhere and they are rapidly going onto motherboards. Ethernet is truly ubiquitous and will continue to be for server and storage technologies.
If you'd like to discuss or debate Ethernet vs EtherNOT, send me an email at info@small-tree.com or hit me up on Twitter @svmodica. | | | | |
| | | | Not too long ago, I was asked to write up my predictions on storage and networking technology for the coming year. One of those predictions was the rise of new, combined file system/logical volume managers like ZFS and BtrFS.
These file systems don’t rely on RAID cards to handle things like parity calculations. They also don’t “hide” the underlying drives from the operating system. The entire IO subsystem - drives and controllers - is available to the operating system and data is laid out across the devices as necessary for best performance.
As we’ve begun experimenting ourselves with these technologies, we’ve seen a lot of very promising results.
First and foremost, I think it’s important to note that Small Tree engineers mostly came from SGI and Cray. While working there, most of our time in support was spent “tuning.” People wouldn’t buy SGIs or Crays simply to run a file server. Invariably, they were doing something new and different like simulating a jet fighter or rendering huge 3D databases to a screen in real-time. There would always be some little tweak required to the OS to make it all work smoothly. Maybe they didn’t have enough disk buffers or disk buffer headers. Maybe they couldn’t create enough shared memory segments.
Small Tree (www.small-tree.com) has always brought this same skill set down to commodity hardware like SATA drives and RAID controllers, Ethernet networks and Intel CPUS. These days, all of this stuff has the capability to handle shared video editing, but quite often the systems aren’t tuned to support it.
I think ZFS is the next big step in moving very high-end distributed storage down into the commodity space.
Consider this: A typical RAID card is really an ASIC (Application Specific Integrated Circuit). Essentially, some really smart engineering guys write hardware code (Verilog, VHDL) and create a chip that someone “prints” for them. SGI had to do this with their special IO chips and HUB chips to build huge computers like the Columbia system. Doing this is incredibly expensive and risky. If the chip doesn’t work right in its first run, you have to respin and spend millions to do it again. It takes months.
A software based file system can be modified on the fly to quickly fix problems. It can evolve over time and integrate new OS features immediately, with little change to the underlying technology.
What excites me most about ZFS is we can now consider the idea of trading a very fast - and expensive - hardware ASIC for a distributed file system that uses more CPU cores, more PCIE lanes and more system memory to achieve similar results. To date, with only very basic tuning and system configuration changes, we’ve been able to achieve Titanium level performance using very similar hardware, but no RAID controller.
So does this mean we’re ready to roll out tomorrow without a RAID controller?
No. There’s still a lot of work to do. How does it handle fragmentation? How does it handle mixed loads (read and write)? How does it handle different codecs that might require hundreds of streams (like H.264) or huge codecs that require very fast streams (like 4K uncompressed)? We still have a lot of work to do to make sure ZFS is production ready, but our current experience is exciting and bodes well for the technology.
If you’d like to chat further about combined file system/logical volume managers, other storage/networking trends, or have questions regarding your workflow, contact info@small-tree.com.
| | | | |
| | | | When I used to work at SGI, I would often wonder what "C" level officers did. I once got to ask Ed McCracken what he spent most of his time doing day-to-day. At the time, he was CEO of SGI.
His answer was that he was currently spending a lot of time talking to Congressmen trying to convince them to stop propping up Cray as a national asset. In hindsight, perhaps buying Cray was not the best idea.
As the Chief Technical Officer of Small Tree, which is a much smaller company, I have to wear a lot more hats. I thought I might include a list of the things I've been up to over the last month.
Deer hunting (actually, just watching this year)
Grocery shopping
Barbecuing
Evaluating Titanium follow on chassis designs
Helping select next generation Software Defined Radio development platforms for the Army
Working on Adobe performance issues
Evaluating a new Avid sharing product (that works great!) called Strawberry
Evaluating a new Digital Asset Manager (that also works great) called Axle
Discussing our new high performance iSCSI products with partners
Fixing the phone system
Testing Thursby's Dave software with Avid
Helping customers with Small Tree products
Running barefoot (I run barefoot and in Vibrams.... a lot)
Working on a new voice router design for the US Rangers
Helping my kids with math homework
Processing firewood for the winter
Breaking up the recyclable cardboard boxes
Writing up an NAB presentation proposal
Prepping for a visit from the Soldier Warrior team of the US Army
Small Tree Board of Directors meeting
Christmas shopping
There’s never a dull moment.
| | | | |
| | | | Not the Star Wars kind tho...
Back when cell phones were new, a number of vendors had "clone" problems. People were cloning phone serial numbers so they could get free cell service.
To combat this problem, the cellular companies built up "Clone Detector" systems. These were massive database servers that had to be extremely fast. They would monitor all in process calls looking for two that had the same serial number. If they found a match, that phone was cloned and both were taken out of service.
SGI's systems were uniquely qualified to handle this work. The company had some stellar Oracle and Sybase numbers and offered these vendors a 10X speed up in clone detection.
The phone call came in from Florida during the test phase of the new system. The sysadmin called me up and told me that when she dumped a 25% load on the system, it slowed down very quickly. If she put a full load on the system, it stopped.
This was puzzling. I'm not a database expert, so I spent time looking at the normal performance metrics. How busy are the CPUs? Not very. How busy are the (massive) RAID arrays? Not very. How much memory is in use? Not much. Nothing was adding up.
I started watching the machine’s disk activity during the 25% load. I noticed one disk was very busy, but it was not the RAID and shouldn't have been slowing the machine down. I asked the sysadmin about it. She said it was the disk with her home directory on it and it shouldn't be interfering with the machine’s database performance. That answer nagged at me, but she was right. If the database wasn't touching the disk, why should it matter? But how come it was so busy? There was a queue of pending IOs for Pete's sake! Was she downloading files or something?
I asked her if I could take a look at the index files. Index files are used by a database to keep track of where stuff is. Imagine a large address book. I wanted to see if the index files were corrupted or "strange" in any way. I thought maybe I could audit accesses to the index files and spot some rogue process or a corrupt file.
What I found were soft links instead of "real" files. For you Windows people, they are like Short Cuts. On a Mac you might call them Aliases. She had the index files "elsewhere" and had these aliases in place to point to them. She told me "Yeah. I do this on purpose so I can keep a close eye on the index files. I keep them.... in... my... home... oh!"
So her ginormous SGI system with hundreds of CPUs and monstrous RAIDs was twiddling its thumbs waiting for her poor home directory disk to respond to the millions and millions of index lookups it could generate a second. It fought heroically, but alas, could not keep up.
Some quick copies to put the index files where they belonged and we had one smoking clone detector system.
| | | | |
| | | | Many years ago when I was a "smoke jumper" support guy for SGI, I got to see some of the strangest problems on the planet. Mind you, these were not "normal" problems that you and I might have at home. These were systems that were already bleeding edge and being pushed to the max doing odd things in odd places. Further, before I ever saw the problem, lots of guys had already had a shot. So reinstalling, rebooting, looking at the logs, etc., had all been tried.
One of my favorite cases was a large Challenge XL at a printing plant. It was a large fileserver and was used for storing tons and tons of print files. These files were printed out, boxed and shipped out on their raised dock.
Each night, the machine would panic. The panics would happen in the evening. The machine was not heavily loaded, but the second shift was getting pissed. They were losing work and losing time. The panics were all over the place - Memory, CPU, IO boards. By this time, SGI had replaced everything but the backplane and nothing had even touched the problem. The panics continued.
Finally, in desperation, we sent a guy onsite. He would sit there with the machine until the witching hour to see what was going on. Maybe a floor cleaner was hitting the machine or there were brown outs going on. We felt that if we had eyes and ears nearby, it would become obvious.
Around 8pm that night, after the first shift was gone and things were quiet, our SSE got tired of sitting in the computer room and walked over to the dock. He was a smoker and he wanted to get one more in before the long night ahead. The sun was going down, making for a nice sunset as he stood out there under the glow of the bug zapper (this happened in the south where the bugs can be nasty).
As he watched, a fairly large moth came flitting along and orbited the bug zapper a few times before *BZZZZZZT* he ceased to exist in a dazzling light display. It was at that moment when the Sys Admin (who was keeping an eye on the machine during our SSE's smoke break) yelled over to him "HEY! The machine just went down again".
Yes folks, the bug zapper was sharing a circuit with the SGI machine. One large insect was enough to sag the circuit long enough to take the machine right down. Go figure.
| | | | |
| | | | I remember the days when CPUs were stuck in a rut. They were barely hitting 1Ghz. Networks were running at 1Gb and beyond and CPUs and storage just could not keep up. Clients wanted redundant, failover capable servers that could handle 600 clients, but SGI was running out of ways to do that. We couldn’t make the bus any wider (128bit computers?) and we couldn’t make the CPUs any faster. What should we do?
One answer was to network many systems together over NUMA (non-uniform memory access). This would let many systems (that would normally be a cluster) act as if they were one system. The problem with a setup like this is speed. Systems accessing remote memory are slow. We had to find a way to speed up access to memory.
SGI invented lots of cool stuff to do this.
One of the new things was the CPOP connector. This connector was made up of many fuzzy little pads. The fuzzy pads would be compressed together and allow for a much higher frequency connection than normally would be allowed with gold pins.
The problem with delicate things like this is that they are far more sensitive to installation mistakes. Each connector needed to be torqued down to the right pressure so that the signals made it across cleanly. Install them too loosely and you’re going to see connectivity errors.
So cut to one of our advanced training courses where we taught field engineers how to replace boards. The instructor explained how each HEX head screw on the CPU cards needs to be torqued down to an exact specification. This is where one of the helpful field guys, who had clearly done this before, piped up and explained that you know the boards are properly seated when you tighten the HEX screw down and hear three “clicks.”
The instructor and I looked at each other. We didn’t remember there being three clicks. We normally used torque drivers to accurately measure the torque and there were never any clicks.
After some investigation and a quick examination of the board our helpful field guy had just installed, we discovered the source of the “three clicks.” They were the sound of the very expensive backplane cracking as the HEX screw penetrated the various layers of plastic…. OUCH.
From that point on, correct torque drivers were provided to all field personnel.
| | | | |
| | | | Ever since Apple blew up the happy world of FCP 7, we've been running into more and more people moving to Adobe and Avid.
Adobe's been pretty good. I like them a lot and their support guys (I'm talking to you Bruce) have been awesome.
Avid on the other hand, is tough. Shared spaces cause reindexing and external projects won't save natively to shares spaces. Our customers have mostly worked around this by storing projects locally, using multiple external volumes for media, or using AMA volumes.
I finally had a chance to explore this External project save issue in great detail today.
It turns out there's nothing specific Avid's doing that would prevent them from saving a project externally. They stat the file a few times and give up. It appears as if they are simply not allowing saves to shared protocol volumes (samba and afp).
My solution was simple: I created a sparse disk image on the shared storage, then mounted it locally.
This worked great. I could point my external project to it and it would save correctly. I could link in my AMA files and use them and I can trust that OS X isn't going to let anyone else mount that volume while I'm using it! When I'm done, I exit and unmount the volume. Now anyone else on the network can mount it and use my project (you can even have multiple users by doing read only mounts with hdiutil)
Further, this method can be adapted to just about any granularity you'd like.
For example, if your users hate the idea of creating disk images for every project, just create one very large (up to 2TB) disk image for their entire project library. You can automount that and just let them use it all the time. You could also have per project, per user or per customer images as well.
Let's face it. Storage is expensive and we all know what disks and motherboards cost. No one wants to pay three or four times what this stuff costs for a vendor specific feature. Hopefully, this trick makes it easier for you to integrate Avid workflows into your shop if the need arises.
Steve | | | | |
| | | | The worst possible answer to a customer problem is that it’s a hardware bug. Hardware bugs are expensive to fix. You not only have to replace the hardware, you may also have to replace everything you’ve got on the shelves. You can’t do this until you’ve “fixed” the problem, which might cost millions of dollars and take months. Hardware problems suck.
This reminds me of a specific problem from long ago dealing with locking.
Locking is what programs do to avoid stepping on each other. It’s very similar to locking the bathroom door. When the bathroom is in use, the door is locked. Others wanting to use the bathroom will try the door, see that it’s locked, and try again later.
Our problem was that applications and even the kernel were crashing with what appeared to be two threads using the same resource. Thread B would have the lock, but thread A was in the same code acting as if it also held the lock. This normally is not possible. Thread A could not have gotten where it was without having the lock, and Thread B should have stopped when it hit the lock being held by A. There were no obvious errors in the code that could have allowed this.
Ultimately, we discovered the answer hinged on two assembly instructions and a new feature in the CPU called “speculative execution.”
The instructions were LL, SC (load link, store conditional). LL, SC is a neat concept that makes locking very fast. You “read” the lock in (LL) and “store” your lock value if the lock is free (store conditionally). If the lock is not free, the store does not happen and your code vectors back around to try again later.
Speculative execution is a feature that allows a CPU to execute ahead of itself. The CPU will pull in instructions that are coming up (and assume any if/then/else branches) and sort out what will probably happen in the next 32 instructions. In this way, the CPU can have all these values calculated and cued up for rapid and efficient execution.
So what was the problem? Turns out that if the CPU was in the middle of an LL/SC lock and happened to speculate over another LL/SC lock, the state of the first lock was overwritten by the second. So speculating over an unlocked lock while checking a locked one led you to continue on as if your lock was “unlocked”. The “SC” instruction would succeed when it shouldn’t have.
This is really a hardware issue. The hardware shouldn’t do this. However, replacing every CPU would be expensive and “hard.” The solution? It’s a compiler bug. The compilers were changed so that when applications and even the OS were compiled, 32 “no op” instructions were inserted after each and every LL/SC occurrence. This made sure that any speculative execution would never hit a second LL/SC combo since the CPU never went out past 32 instructions. Problem solved…. I guess.
| | | | |
| | | | Military Muscle
Working with the military can be a lot of fun. It can be exhilarating. It can also be incredibly frustrating and boring.
Recently, we had a problem with a large server system being used by a military contractor at a very sensitive base. The server was periodically losing its boot drive. The drive would spin down and stop (hanging the machine). They would reboot and the system would run for a while, then spin down again and stop. Eventually, the drive would stop working all together. It would not spin up any more.
The solution is simple, right? Just get the drive and send it to the manufacturer - they will figure out what went wrong. Sorry…no can do. These drives are super secret and they much be shredded. They cannot be returned.
Management is up in arms. "Steve" they said, "you need to take your SCSI analyzer down there and plug it in to this system and see what's going on."
I had three good reasons not to like this idea:
1. Clearly, this is a hardware problem. Software does not "fry" drives so they will not come up again. Were they thinking the drive was just demoralized from bad SCSI commands? It was broken!
2. Connecting my SCSI analyzer to a super secret system is simple enough, but the military isn't going to let me take it home again! Once it's been plugged in, it's theirs. It took me a long time to justify getting that thing and I really didn't want to give it to this military contractor
3. Whatever was blowing up their boot disks was just as likely to blow up my SCSI analyzer! That's not a very good idea either.
Not sure of the best way to approach this, I walked over to another well known smoke jumper guy. This guy fixed hardware. He fixed it the way only a former military guy could. He took it apart down to every nut and bolt, he smelled cables for ozone (indicating something burned) and he examined boards with a portable microscope. He also had one other thing that is required for hardware debugging…a 6th sense that just told him what was wrong.
Me: "Phil, I've got this system that's blowing up drives. They've replaced everything and they want me to put my analyzer on there, but I think this is hardware. You have any suggestions?" Phil: “It's a burned cap on the midplane, have them replace it.”
The SSE was quick to shoot this down when we talked "No way! We've replaced the midplane twice. It can't be the midplane".
This is when "smoke jumping" happens. Phil gets on a plane and flies down.
What does he discover? It is indeed a burned cap on the midplane. The cause? The foil shielding behind the midplane is not mounted properly and is "angled" – resulting in the foil touching a 5V pin. As soon as they powered on the machine with the new midplane, “ZAP,” another midplane is fried.
They fixed the foil, put in a new midplane, and me and my SCSI analyzer lived to fight another day.
| | | | |
| | | | Just about everyone can have a free web page. You get them free when you open cloud accounts or purchase internet service. This has lead to a proliferation of cat pictures on the Internet.
Back in the 90s, when it cost a little more to get on the Internet, the idea of personal web pages was just beginning. One very large ISP (Internet Service Provider) that used SGI systems wanted to sell personal websites. They felt SGI's Challenge S system was the perfect solution. They would line up hundreds of these systems, and each system could handle several sites. SGI did indeed set several website access records for handling the website for "Showgirls,” which, as you can imagine, had a racy website.
Fast forward a few months and there are 200 systems lined up in racks handling personal web pages. Then I start getting phone calls.
"Hey Steve. These guys are filing cases about two or three times a week to get memory replaced. We're getting parity errors that cause panics about two or three times a week."
I fly out and start looking carefully at the machines. The customer had decided to purchase third party memory (to save money) so they could max out the memory in each system. Each machine had 256MB of RAM, which was a lot at the time. This was parity memory, which means that each 8 bits has a parity bit that is used like a cheap "double check" to make sure the value stored is correct. The parity bit is flipped to a 1 or 0 so that each 8 bits always has an even number of 1s in it. If the system sees an odd number of 1s, it knows there's a memory error.
I looked at each slot. I looked at ambient temperature. I made sure the machines were ventilated properly (including making the customer cover all the floppy disk holes since they did not have floppies installed, but had neglected to install the dummy bezel). No change. Parity errors continued and clearly there was an issue.
Going back to the memory vendor and the specs on the chips, we started doing the math.
The vendor claimed that due to environmental issues (space radiation etc) one should expect a single bit parity error about once every 2000 hours of uptime for each 32MB of memory. Half of these errors should be "recoverable" (i.e., the data is being read and can be read again just to be sure), but the other half will lead to a panic. They do not mean the memory is broken, but the errors should be rare.
So let's do the math: 256MB/machine (so that's 8X 32MB).
Hours of uptime? (These machines are always up): 8760hours
How many total parity errors: 35 per system, per year, with half of them being "fatal." So, that’s 17 panics per system per year. They had 200 systems. That's 3400 panics a year in that group of systems or roughly 10 per week?!
Consider this when you start to scale up your IT systems. How many machines do you have to put in a room together before "once a year" activity becomes "once a day?”
| | | | |
| | | | You've all read that 10GbaseT is on the way. It's true. Very soon, you will be able to plug standard RJ45 connectors (just like on your Mac Book Pro) into your 10Gb Ethernet cards and switches. You'll be able to run CAT6A cable 100m (assuming the runs are clean runs) and have tons and tons of bandwidth between servers and clients. Who needs Fibre Channel anymore?!
But with the widespread migration to 10Gb, you may have a plumbing problem my friend.
Many years ago, I had the privilege of supporting three of the large animation studios in LA that were trying to use their new RAID5 arrays and run OC-3 and OC-12 right to their desktops. These two ATM standards were capable of 155Mbits and 622Mbits, respectively (this was before the days of Gigabit Ethernet). Everyone expected nirvana.
They didn't get nirvana. In fact, they found out right away that three clients ingesting media could very quickly "hang" their server. Within about 30 minutes it would slow to a crawl and sit there. They could not shut it down. Shutdown would hang. What was really happening? The machine had used all of its RAM collecting data and was unable to flush it quickly enough to their RAID. The machine was out of IO buffers and almost completely out of kernel memory. The "hang" was simply the machine doing everything it could to finish flushing all this unwritten data. We had to wait (and wait and wait).
Further, we discovered that with only three clients we could quickly start generating dropped packets. ATM had no flow control and so too many packets at once would result in dropped packets. Since the clients were very fast relative to the server, it didn't take more than a few to overwhelm it.
Similarly, as we all start to salivate over 10Gb to our Mac Books, iMacs and refrigerators, we should consider how we're going to deal with this massive plumbing problem.
First, you *will* need some form of back pressure. The server must be able to pause clients (and vice versa) or these new 300MB/sec flows are going to overwhelm all sorts of resources on the destination system.
Second, just because the network got faster, doesn't mean the disks did. In fact, now your users will have ample opportunity to do simple things like "drag and drop copies" that will use up a great deal of the resources on the server. A simple file copy over 10Gb at 300MB/sec bidirectional could overwhelm the real-time capabilities of a normal RAID. The solution lies in faster raids, SSDs and perhaps even 40Gb FCOE raids for the servers. (That's right, 40Gb FCOE raids)
So as you consider your 10Gb infrastructure upgrades, make sure you're working with an experienced vendor that knows about the pitfalls of "plumbing problems" and gets you setup with something that will work reliably and efficiently.
| | | | |
|
|