|We got RSS working on our 10Gb cards a few days ago. This is a feature that splits up data coming into the card into multiple queues. Then we can let different CPUS handle pulling in that data and passing it up the stack. We found what we figured we'd find: When we setup multiple streams, we see data in multiple queues. We see more cpus involved in the work, and things go a lot faster.|
What surprised us was how great this made SAMBA. When we tested SMB3 with Yosemite, we were able to hit line rate (10Gb/sec) between two systems! This is due to SMB Multichannel. It's amazing. Soon, we should be able to extend this across adapters as well (we actually can, but not to the same share). This will let us do things like FC and iSCSI do today, but with a NAS. We'll be able to stripe bandwidth.
|I have to admit, as an old time UNIX guy that's been around inodes, fsck and corrupted filesystems all my life, snapshots sounded a little too good to be true. |
The word was long known to me. Customers would say, "I took a snapshot of that disk so I could upgrade it and revert if I screwed something up." It's just that imaging a disk would take hours. You'd start the copy and go home for the night.
These new snapshots (like those supported by ZFS) were instantaneous. One click and you would “instantly” have a new copy of your data. How? That's not even possible. To make it even weirder, the new copy takes up no space!? Now it's starting to sound like perpetual motion.
The actual explanation is a lot simpler. Every filesystem is composed of data (your file data) and metadata (the name of the file, permissions, location of blocks, inode number, etc.). All this metadata is what organizes your data. You have what's called an "inode table" where all that stuff lives, and it "points to" the actual data you wrote. It might be video data, or your mom's apple pie recipe.
When you create a snapshot, you are instantly making a copy of that inode table. You now have two. All these inodes point to the same data. So the data was not copied.
Now the magic happens. When a user deletes a file from the original data, the inode for that file is removed, but the snapshot inode remains. ZFS will keep the data around as long as there's an inode in some snapshot somewhere pointing to it. The same is true if you edit a file. The old data is saved, but the new data gets written.
All this old stuff (old data) essentially becomes part of the snapshot. As more things change, the snapshot grows larger. If you were to delete “all” the data on the original filesystem, the snapshot would essentially grow to the size of the original filesystem. (The original filesystem would drop to 0.)
In some ways, it's a little like a trashcan. When you delete something, it doesn't really go away. It goes into the trash. If you wanted to, you could drag it out of the trash.
There's a similar way of recovering snapshots. You simply "clone" (or mount) them. When you do this, the snapshot inode table is mounted and it still points to all the old data. That file you deleted yesterday? If you mount yesterday's snapshot, it's right back where it was. Simply drag it back out.
Obviously, while snapshots make for a great method of saving previous images of a set of data, they are not a backup solution. If your RAID dies and can't be recovered, your snapshots die too! So for true backup protection, consider rsync or some other method of moving your data to another system.
Small Tree's TitaniumZ servers support snapshots and rsync and we have a very nice graphical interface so you can manage it all yourself. If you have any questions about snapshots or a backup solution that’s right for your editing team, don’t hesitate to contact me at firstname.lastname@example.org.
|1. Shared storage is becoming the norm. It's not a "hack" anymore that's used to skirt licenses or the need for more disks. Vendors are beginning to embrace it more and more, and the storage software and protocols are adapting. There's never been a better time to implement a shared storage solution.|
2. 10Gb is being adopted very quickly. Small Tree has 10Gb ports built into its TitaniumZ systems and vendors are releasing inexpensive 10GbaseT Thunderbolt PODS now. So it's time to get up to speed with 4K codecs and start using 10Gb Ethernet.
3. Don't skimp on storage space. The storage you use for every day editing needs to be kept below 80% full to avoid fragmentation. Over-provision your editing space and plan on having some sort of archive space as well. Small Tree has TitaniumZ archive options that are very inexpensive that let you store twice as much stuff for half the price.
4. Small Tree's new TitaniumZ operating system (ZenOS 10) uses a balanced storage allocation strategy so your performance remains constant as the disk begins to fill up. So you get performance and efficiency across the entire array, which also helps to mitigate any fragmentation issues.
5. Shared NAS storage like Small Tree's is easy to setup and manage. You don't need meta-data servers, licenses, or expensive fibre channel networks. You just rack it up, plug it in and go!
|We’ve been working pretty hard on Thunderbolt products over the last few weeks and I thought I’d write up some of the interesting things we’ve implemented.|
I’m sure most of you are aware that Thunderbolt is an external, hotplug/unplug version of PCIE. Thunderbolt 1 provided a 4X PCIE bus along with an equivalent bus for graphics only. Thunderbolt 2 allows you to trunk those two busses for 8X PCIE performance.
This is a new feature of Thunderbolt designed to deal with the uncertainty of what a user may plug in.
Normally, when a system boots up, all of the PCIE cards are in place. The system sorts out address space for each card and each driver is then able to map its hardware and initialize everything.
In the Thunderbolt world, we can never be sure what’s going to be there. At any time, a user could plug in not just one device, but maybe five! They could all be sitting on the users desk, daisy-chained, simply waiting for a single cable to install.
When this happens, the operating system needs the capability to reassign some of the address space and lanes so other devices can initialize and begin working.
This is where PCIE Pause comes into play. PCIE Pause allows the system to put Thunderbolt devices into a pseudo sleep mode (no driver activity) while bus address space is reassigned. Then devices are re-awakened and can restart operations. What’s important to note is that the hardware is not reset. So barring the odd timing issue causing a dropped frame, a PCIE Pause shouldn’t even reset a network mount on a Small Tree device.
Wake On Lan
We’ve been working hard on a Wake On Lan feature. This allows us to wake a machine from a sleep state in order to continue offering a service (like File sharing, ssh remote login or Screen sharing). This may be important for customers wanting to use a Mac Pro as a server via Thunderbolt RAID and Network devices.
The way it works is that you send a “magic” packet via a tool like “WakeonMac” from another system. This tells the port to power up the system far enough to start responding to services like AFP.
What’s interesting about the chip Small Tree uses (Intel x540) is that it requires power in order to watch for the “magic” wake up packet. Thunderbolt wants all power cut to the bus when the machine goes to sleep. So there’s a bit of a conflict here. Does a manufacturer violate the spec by continuing to power the device, or do they not support WOL?
This is most definitely true for the early Thunderbolt/PCIE card cage devices. They were all very careful to follow the Thunderbolt specification (required for certification and branding) and this leaves them missing this “powered while sleeping” capability.
Interested in learning more about how you could be using Thunderbolt? Contact me at
|1. The non-linear editing market (FCP, Avid etc) is changing rapidly. Avid was delisted, FCP supports NFS natively, Adobe is adding tons of new features (and a subscription model). More than ever, editors need to see what's out there and how people are using it.|
2. Storage is changing rapidly. SSDs are becoming more and more common (and less and less pricy) and spinning disk vendors are consolidating.
3. Thunderbolt is here (and it appears that it's here to stay) and it offers new methods for connecting high bandwidth IO and video devices. Should you go big and buy a Mac Pro with 6 Tbolt ports? Or can you go small and buy an iMac with 2 Tbolt ports and just hot plug? Are the devices too loud to be in your edit suite? Now's the time to come and see.
4. There are many new cameras and codecs. They are have different methods of access to systems. It's good to hear from each storage and/or camera vendors how that will work.
5. New technology announcements. With all these changes coming, vendors are constantly looking for new ways to make better, faster and cheaper. Many of these revolutionary ideas are announced at NAB. I think it's helpful to be there and see “in person” the sort of reaction different products get.
6. Who's living and who's dying? Every vendor paints a happy face on their business and their products. It's always good to see that translated into booth traffic. It should be interesting to see which edit software vendors are getting visited this year.
|Small Tree has been working closely with Adobe to make sure our shared editing storage and networking products work reliably and smoothly with Adobe’s suite of content creation software.|
Since NAB 2013, we’ve worked closely with Adobe to improve interoperability and performance, and test new features to give our customers a better experience.
Most recently, I had the chance to test out Adobe Anywhere in our shop in Minnesota.
Adobe Anywhere is designed to let users edit content that might be stored in a high bandwidth codec, over a much slower connection link. Imagine having HD or 4K footage back at the ranch, while you’re in the field accessing the media via your LTE phone and a VPN connection.
The way it works is that there’s an Adobe Anywhere server sitting on your network that you connect to with Adobe Premiere and this server compresses and shrinks the data “on the fly” so it can be fed to your machine much like a YouTube video. Except you are scrubbing, editing, cutting, dubbing and all of the other things you might need to do during an edit session.
This real-time compression/transcoding happens because the Adobe Anywhere system is taking advantage of the amazing power of GPUs. Except rather than displaying the video to a screen, the video is being pushed into a network stream that’s fed to your client.
I tested my system out with some Pro Res promotional videos we’ve used at trade shows in the past, and did my editing over Wi-Fi.
What I found was that the system worked very well. I could see that the Adobe Anywhere system was reading the video from Small Tree’s shared storage at full rate, then pushing it to my system at a greatly reduced rate. I had no trouble playing, editing and managing the video over my Wi-Fi connection (although Adobe recommends 1Gb Ethernet as the minimum connectivity for clients today).
This type of architecture is very new and there are caveats. For example, if you are very far from the server system or running over a very slow link (like a vpn connection), latency can make certain actions take a very long time (like loading an entire project, or using Adobe’s Titler app which requires interactivity). Adobe cautions that latencies of 200msecs or more will lead to a very poor customer experience.
Additionally, just because the feed to the clients is much lower bandwidth (to accommodate slower links), the original video data still needs to be read in real-time at full speed. So there are no shortcuts there. You still need high quality, low latency storage to allow people to edit video from it. You just have a new tool to push that data via real-time proxies over longer and slower links.
All in all, I found the technology to be very smooth and it worked well with Small Tree’s shared network storage. I’m excited to see the reach of Small Tree shared storage extended out to a much larger group of potential users.
For a demonstration of Adobe Anywhere over Small Tree shared storage, visit us at the
NAB Show in Las Vegas this April (Booth SL11105).
|One day, when we're sitting in our rocking chairs recounting our past IT glories ("Why, when I was a young man, computers had ‘wires’”), we'll invariably start talking about our storage war stories. There will be so many. We'll talk of frisbee tossing stuck disks or putting bad drives in the freezer. We'll recount how we saved a company’s entire financial history by recovering an alternate superblock or fixing a byte swapping error on a tape with the "dd" command. I'm sure our children will be transfixed.|
No…no, they won't be transfixed, any more than we would be listening to someone telling us about how their grandpa's secret pot roast recipe starts with "Get a woodchuck...skin it." You simply have to be in an anthropological state of mind to listen to something like that. More likely, they walked into the room to ask you your wifi password (Of course, only us old folk will have wifi. Your kids are just visiting. At home they use something far more modern and futuristic. It'll probably be called iXifi or something).
Unfortunately for us, many of these war story issues remain serious problems today. Disks “do” get stuck and they “do” often get better and work for a while if you freeze them. It's a great way to get your data back when you've been a little lazy with backups.
Another problem is fragmentation. This is what I wanted to focus on today.
Disks today are still spinning platters with rings of "blocks" on them, where each block is typically 512 bytes. Ideally, as you write files to your disk, those bytes are written around the rings so you can read and write the blocks in sequence. The head doesn't have to move. Each new block spins underneath it.
Fragmentation occurs because we don't just leave files sitting on our disk forever. We delete them. We delete emails, log files, temp files, render files, and old projects we don't care about anymore. When we do this, those files leave "holes" in our filesystems. The OS wants to use these holes. (Indeed, SGI used to have a real-time filesystem that never left holes. All data was written at the end. I had to handle a few cases where people called asking why they never got their free space back when they deleted files. The answer was "we don't ever use old holes in the filesystem. That would slow us down!")
To use these holes, most operating systems use a "best fit" algorithm. They look at what you are trying to write, and try to find a hole where that write will fit. In this way, they can use old space. When you're writing something extremely large, the OS just sticks it into the free space at the end.
The problem occurs when you let things start to fill up. Now the OS can't always find a place to put your large writes. If it can't, it may have to break that large block of data into several smaller ones. A file that may have been written in one contiguous chunk may get broken into 11 or 12 pieces. This not only slows down your write performance, it will also slow down your reads when you go to read the file back.
To make matters worse, this file will remain fragmented even if you free more space up later. The OS does not go back and clean it up. So it's a good idea not to let your filesystems drop below 20% free space. If this happens and performance suffers, you're going to need to look into a defragmentation tool.
Soon, this issue won't matter to many of us. SSDs (Solid State Disks) fragment just like spinning disks, but it doesn't matter near as much. SSDs are more like Random Access Memory in that data blocks can be read in any order, equally as fast. So even though your OS might have to issue a few more reads to pull in a file (and there will be a slight performance hit), it won't be near as bad as what a spinning disk would experience. Hence, we'll tell our fragmentation war stories one day and get blank looks from our grandkids (What do you mean "spinning disk?" The disk was “moving??”).
Personally, I long for the days when disk drives were so large, they would vibrate the floor. I liked discovering that the night time tape drive operator was getting hand lotion on the reel to reel tape heads when she put the next backup tape on for the overnight runs. It was like CSI. I'm going to miss those days. Soon, everything will be like an iPhone and we'll just throw it away, get a new one, and sync it with the cloud. Man that sucks.
Follow Steve Modica and Small Tree on Twitter @smalltreecomm. Have a question? Contact Small Tree at 1-866-782-4622.
|Now that we’re several months removed from Apple’s introduction of Mavericks for OSX and we've all tested the waters a little, I wanted to talk about video editing software and how the various versions play with NAS storage like we use at Small Tree.|
Avid has long since released Media Composer 7, and from what I've seen, their AMA support (support for non-Avid shared storage), continues to improve. There are certainly complaints about the performance not matching native MXF workflows, but now that they've added read/write support, it's clear they are moving in a more NAS friendly direction. With some of the confusion going on in the edit system space, we're seeing more and more interested in MC 7.
Adobe has moved to their Creative Cloud model and I've noticed that it made it much easier to keep my system up to date. All of my test systems are either up to date, or telling me they need and update, so I can be fairly certainly I'm working with the latest release. That's really important when dealing with a product as large and integrated as the Adobe Suite of products. You certainly don't want to mix and match product revisions when trying to move data between After Effects and Premiere.
Another thing I've really grown to like about Adobe is their willingness to work with third party vendors (like Small Tree) to help correct problems that impact all of our customers. One great example is that Adobe worked around serious file size limitations present in Apple's QuickTime libraries. Basically, any time an application would attempt to generate a large QuickTime file (larger than 2GB), there was a chance the file would stop encoding at the 2GB mark. Adobe dived into the problem, understood it, and worked around it in their applications. This makes them one of the first to avoid this problem and certainly the most NAS friendly of all the video editing applications out there.
Lastly, I've seen some great things come out of FCP X in recent days. One workflow I'm very excited about involves using "Add SAN Location" (the built in support for SAN Volumes) and NFS (Network File Sharing). It turns out, if you mount your storage as NFS and create "Final Cut Projects" and "Final Cut Events" within project directories inside that volume, FCP X will let you "add" them as SAN locations. This lets you use very inexpensive NAS storage in lieu of a much more expensive Fibre Channel solution. For shops that find FCP X fits their workflow, they'll find that NFS NAS systems definitely fit their pocket books.
So as you move forward with your Mac platforms into Mavericks and beyond, consider taking a second look at your NLE (Non-Linear Editor) of choice. You may find that other workflow options are opening up.
|With the New Year festivities well behind us, today seems like as good a time as any to chat about where video editing storage is (or should be) headed in 2014. |
First, I’m really excited about FCoE. FCoE is great technology. It's built into our (Small Tree) cards, so we get super fast offloads. It uses the Fibre Channel protocol, so it's compatible with legacy Fibre Channel. You can buy one set of switches and do everything: Fibre Channel, 10Gb and FCoE (and even iSCSI if you want).
Are there any issues to be concerned about with FCoE? One problem is that the switches are too darn expensive! I've been waiting for someone to release an inexpensive switch and it just hasn't happened. Without that, I'm afraid the protocol will take a long time to come to market.
Second, I'm quite sure SSDs are the way of the future. I'm also quite sure SSDs will be cheaper and easier to fabricate than complex spinning disks. So why aren’t SSDs ubiquitous yet? Where are the 2 and 4 TB SSD drives that fit a 3.5" form factor? Why aren't we rapidly replacing our spinning disks with SSDs as they fail?
Unfortunately, we're constrained by the number of factories that can crank out the NAND flash chips. Even worse, there are so many things that need them, including smartphones, desktop devices, SATA disks, SAS disks, PCIE disks. With all of these things clawing at the market for chips, it's no wonder they are a little hard to come by. I'm not sure things will settle down until things "settle down" (i.e., a certain form factor becomes dominant).
Looking back at 2013, there were several key improvements that will have a positive impact on shared storage in 2014. One is Thunderbolt. Small Tree spent a lot of time updating its drivers to match the new spec. Once this work was done, we had some wonderful new features. Our cards can now seamlessly hotplug and unplug from a system. So customers can walk in, plug in, connect up and go. Similarly, when it’s time to go home, they unplug, drop their laptop in their backpack, and go home. I think this opens the door to allowing a lot more 10Gb Ethernet use among laptop and iMac users.
Apple’s new SMB implementation in 2013 was also critical for improvements in video editing workflow. Apple’s moving away from AFP as their primary form of sharing storage between Macs, and the upshot for us has been a much better SMB experience for our customers. It’s faster and friendlier to heterogeneous environments. I look forward to seeing more customers moving to an open SMB environment from a more restrictive (and harder to performance tune) AFP environment.
So as your editing team seeks to simplify its workflow to maximize its productivity in 2014, keep these new or improved technological enhancements in mind. If you have any questions about your shared storage solution, don’t hesitate to contact me at email@example.com.
|Back in 2003, on September 24th, I drove up to St Paul, Minnesota and filed paperwork to make Small Tree a Minnesota LLC. |
When we started, there were 6 of us and I'm not sure we knew exactly what our plan was. I knew we wanted to write kernel drivers for "high end" networking stuff (like Intel's brand new 10Gb Ethernet cards). I didn't think much beyond that. I figured if we built it "they would come" (as the saying goes). I also needed a good excuse to buy one of the new G5 Power Macs (and make it tax deductible).
Since then, Small Tree's written drivers for 8 different Intel Ethernet chips, LACP drivers (for 10.3, back before apple had their own), iSCSI, AoE and FCoE drivers. We've also done a lot of work on storage integration and kernel performance improvements.
I'm very proud of all the hard work all of the people at Small Tree have put in and of all the great things we've accomplished. It's been quite a roller coaster ride over the years. Looking back, I'd absolutely do it again. We've got some great people and some great customers and I still love working here, even after 10 years :)
|Visiting Australia is a once in a lifetime journey for many of us in North America. It's a 14 hour (very expensive) flight from LAX and because of the timezone shift, if you leave on Saturday, you arrive on Monday. You can forget Sunday ever existed. (Although on the way back, you arrive in LA before you even left Australia).|
I had the great pleasure of visiting Sydney Australia twice this year. First, I was able to fly down and do some sales training with Adimex and Digistor in March, and then last week to help Adimex and Digistor at their big SMPTE trade show at the Sydney Convention Center. I spoke with customers, gave a presentation every day about realtime storage, and demoed the Small Tree TitaniumZ Graphical User Interface for interested customers.
First, I have to say that Adimex and Digistor are great companies with a great bunch of professionals. They know their products and treat their customers well. Second, I think the show was a wonderful opportunity – for me personally and professionally. I didn't realize that such a large show took place in Australia. We had customers from China, New Zealand, Japan, Singapore and even Vanuatu (not to mention from all over Australia). People traveled a very long way to visit this show.
Customers were very interested in Small Tree's TitaniumZ product. I think the two most important features people asked about were ZFS expansion, which allows you to add storage without rebuilding your raids, and the new, integrated version of Mint from Flavoursys. Mint allows customers to do Avid project sharing and bin locking. So multiple Avid users can now seamlessly work together using Avid MC 6 or 7 on their Small Tree storage array. I definitely sensed a lot of interest in Avid now that MC 7 has released and Adobe has announced their shift to the cloud subscription model.
While I was running around at the conference answering questions, I also had the opportunity to do a lot of running in Sydney when I wasn’t at the trade show booth. I made sure to see the Sydney Opera House as well as the Botanical Gardens. Even in Winter, Sydney is one of the most beautiful cities you can visit anywhere in the world. If you get a chance to visit, make sure you take it!
If you’d like more information on what Small Tree was showing at the SMPTE conference – its TITANIUMZ line of products – visit us at www.Small-Tree.com.
|1. When you have lots of media coming in from various cameras to your shared storage, make sure you are ingesting that media using appropriate software.|
We have seen a few cases where people are dragging files in from the camera using the Finder, rather than the camera vendors import software.
When you do this, the media can sometimes have the "User Immutable" flag set. This flag prevents users from accidentally deleting files, even if they have appropriate permissions. You can see this flag via Right Click->get info. It's the "Locked" flag.
While this makes sense if the media is on the camera (where they expect you to do all deleting with the camera software interface), it does not make sense on your storage. However the flag is persistent and will also be set on any copies you make and any copies you make of those copies. It will also prevent the original files from being deleted when you "move" a large batch of material from one volume to another!
Obviously this will waste a lot of space and be very frustrating down the line when you have thousands of media files you can't delete. You'll also find that unsetting the Lock bit via "get info" is way to cumbersome for 10,000 files.
One simple answer is the command line. Apple has a command (as does FreeBSD) called "chflags". If you can handle using the "cd" command (Change Directory) to navigate to where all your Locked file are, you can run:
chflags -R nouchg *
This will iterate through all the directories and files (starting from whatever directory you're in) and clean off all the "Locked" bits.
2. Edit project files from your local machine, rather than shared storage.
There are a number of reasons to do this, and as time goes on, I seem to find more.
First, it's just safer. Not all apps lock project files. So it's possible that if you have enough editors all sharing the same space and everyone is very busy and the environment is hectic, someone could come along and open a project you already have open. If they "save" their copy after you save yours, your changes will be lost. It would be no different if it was a Word Document or Excel Spreadsheet. When multiple people save the same file, the last guy to save wipes out the first guy. (This is not a problem for shared media like clips and audio since those files are not being written, just pulled into projects).
Second, apps like Avid and FCP 7 all have foibles with saving to remote storage. Avid doesn't like to save "project settings" over Samba or AFP (although NFS and "Dave" from Thursby work fine). FCP seems to mess up its extended attributes when it saves, leading to "unknown file" errors and other strange behavior. (When this happens, you can easily fix it. See Small Tree Knowledge Base solution here: http://www.small-tree.com/kb_results.asp?ID=43).
Lastly, you may have different versions of apps on different machines. I recently had a customer that was using FCP 7.0 and attempting to open files written by FCP 7.0.3. The older app was unhappy with the newer format files and it created some strange error messages. While this would have been a problem no matter how the files were accessed (locally or over the network), the network share made it more confusing since it was not clear that the files came from another system. Had the user received the projects on a stick or via email, the incompatibility would have been much more obvious from the start.
If you have any questions regarding shared storage and improving your workflow, do not hesitate to contact me at firstname.lastname@example.org.
|Apple has always been on the leading edge of connectivity for their systems.|
Back in 2003, before we had formed Small Tree as a company, I can recall drooling over a Power Book laptop with an integrated Gigabit port. That was a crazy thing to have on a laptop at the time. Gigabit was still a little weird, very expensive, and not common as a drop at anyone’s desk. Yet here apple was putting it on a laptop.
Thunderbolt is a similarly aggressive move. It puts a great deal of IO horsepower on some very small systems.
Firstly, let’s consider what Thunderbolt is. Thunderbolt is a 4X (4 lane) PCIE 2.0 bus. It’s equal in performance (and protocol) to the top two slots of a traditional tower Mac Pro. Along with that 4X pipe, there’s a graphics output pipe for a monitor. These pipes are not shared! So using a daisy-chained monitor will not impinge on any attached IO devices.
Thunderbolt is capable of moving data at 10Gbits/sec FULL DUPLEX, meaning data can move in two directions at the same time, giving the pipe a total bandwidth of 20Gbits/sec.
As I read through the forums and opinion articles on Thunderbolt, one of the themes that pops up is “It’s Apple proprietary and expensive. Just use USB 3.0.” This is a reasonable point. USB 3.0 is capable of 4.8Gbits/sec (about half of the speed of Thunderbolt). Further, there are plans to speed up USB 3.0 to 10Gbits/sec to match Thunderbolt. So given these factors (and the low cost of most USB devices), it seems like an obvious choice.
However, there are some reasons that Thunderbolt may win the day for external high-speed connectivity (and relegate USB to it’s traditional low-end role).
First of all, most IO chips (Ethernet, SATA, SAS) are manufactured with a native PCIE backend. The chips are natively built to sit on a PCIE bus. So not only will you save the overhead of an additional protocol, the guys writing the code to support these devices only need to write one driver (PCIE). It just works whether the device is on a card or in a Thunderbolt toaster.
Another advantage of Thunderbolt is its power budget. Often, devices are powered by the port itself (very common with USB). USB can provide 4.5Watts of power to attached devices, whereas Thunderbolt offers a full 10Watts of power.
Lastly (and this is probably the most interesting thing about PCIE and Thunderbolt) is that Thunderbolt is a switched/negotiated protocol that is extremely flexible. Cards that want a 16X slot can work in a 4X slot. PCIE switches can (and do) exist to allow multiple machines to talk to one PCIE based device (like a RAID). So imagine a time in the future when devices can be connected to a “switch” in a back room and multiple systems can see them. Imagine those systems can have multiple connections to boost their bandwidth.
Thunderbolt may not be everywhere yet, but it’s really the first imaginings of a new way to handle IO outside of the “tower” type machines. I think it is easily the best choice for Mac users and will likely offer some amazing benefits in the next generation.
|I’m getting ready to head out to Las Vegas this weekend to get the Small Tree booth all setup (SL6005) and I’m really excited.|
First off, we have a brand new version of our Titanium platform coming out called “Titanium Z”. The Z platform is AWESOME and the folks here at Small Tree (including The Duffy) are very excited to start telling people about it.
First of all, in keeping with our history of bringing really high-tech functionality (like real time video editing) down into the commodity price space, we are now bringing down Storage Virtualization.
To offer Virtualization, we had to migrate Titanium to a new OS based on FreeBSD. In doing this, we were able to pull in ZFS technology. This gives us the ability to stripe RAID sets together, migrate data around, and add new RAID sets to existing volumes without rebuilding.
We’ve also updated all the hardware, increased performance 25% and kept our same great low price model. You get more for your money.
The Titanium 4 has also been extensively improved based on customer feedback. ZFS performance is so good, we ditched the need for a RAID controller in the new T5. At the same time, we added a 5th drive (more storage, more performance) and allowed for the addition of a dual port 10GbaseT card. So now, not only is the device mobile, fast and inexpensive, it also supports direct attaching with 10Gb Ethernet! You can bring along one of our ThunderNET boxes on your shoot and have your laptop editing over 10Gb Ethernet right out in the field.
Lastly, I’ve had tons of people bugging me about SSDs and 10Gb. I demoed a super fast box at the Atlanta Cutters called “Titanium Extreme” and we showed off real time video playing to my laptop (over Dual 10Gb ports) going 1.2GBytes/sec. (not a benchmark. Real video). We’ll have this guy along as well.
So if you want to stop by and visit us and see all this cool stuff, swing down the South Lower (6005). You can’t miss us. We will have a giant round screen hanging above us with all sorts of amazing stuff flying by put together by Walter Biscardi. We’d love to see you.
|Every year as NAB approaches, the marketing once again begins. Oh the marketing....|
As NAB approaches, I'd like to take a moment to remind people in the market for storage that Gigabytes/second is not what makes video play smoothly.
Vendors with no Computer Engineers on staff will pull together monstrous conglomerations of SSDs and RAID cards, run a few benchmarks (probably four or five different ones until they find one they like) and then claim they've hit some huge number of Gigabytes per second.
Small Tree has been supporting Server based video editing longer than anyone in the market. We were supporting Avid when they used SGI's 10 years ago (and they were SGI's largest customer). We know how things work. We helped develop them.
Playing video requires a RAID configuration that can handle multiple, clocked streams. Benchmarks on the other hand, tend to use a single stream, reading sequentially as fast as they can.
What's the difference you ask? Well, in the sequential case, the RAID controller gets to use lots of tricks to avoid the hard work of seeking around disks and reordering commands. The next block to be read is probably the next block, so things like "read ahead" work wonderfully. Don't just read the next 128k, read the next 1MB! It'll all be read next anyhow. It makes it very easy for sequential benchmarks to look good. In the Supercomputing world, meaningless TeraFLOP marketing numbers were referred to as "MachoFLOPS". We knew they meant nothing when vendors could spin assembly instructions in a tight loop and claim 1.5PetaFLOPS.
Small Tree's testing and development involves looking carefully at how the Video Editing Programs themselves read so we can carefully mimic that traffic during testing. This lets us be sure our equipment doesn't rely on sequential tricks to deliver real, multi-stream performance.
So when you walk up to a vendor at NAB and they start telling you about their MachoGigabytes per second, make sure you ask them about their sustained latency numbers. Small Tree knows all about latency and we back it up, every day with our products.
|Very recently, Small Tree had the opportunity to go down to Atlanta and visit Walter Biscardi and upgrade his data center and edit suites. In conjunction with this trip, we also did a presentation on the upgrade for the Atlanta Cutters and showed off a new SSD based Titanium shared storage system we put together. This new Titanium SSD was able to move 1.2GB/sec of *realtime* video to Adobe Premiere with no dropped frames. This is faster than you can go with 8Gb Fibre Channel and the fastest realtime video I've ever seen displayed live without a net!|
The upgrade involved pulling out Walter's existing SFP+ 10Gb switch, which had a mix of Gigabit SFP modules for his suites and 10Gb SFP+ modules for his server, and replacing it with a 10GbaseT switch from Small Tree that had 4 SFP+ ports (for the server) and 24 10GbaseT ports for the new Titanium and some of his edit suites.
Before we dived right into putting in the new switch and adding the Titanium 8, we spent a lot of time talking about power. Walter didn't want to spend $1000 for an expensive UPS, but he wanted a good UPS that could handle the new load and not break the bank. We settled on an Ultra Xfinity that offered 1200W of load. This allowed for plenty of overhead for the 660W titanium and kept the loading on the UPS to well under the recommended 80%.
After installing the new switch, we moved all the cables over. One of the wonderful aspects of 10GbaseT is that we didn't have to do anything special when replacing ports that used to be Gigabit. 10GbaseT clocks down to Gigabit and even 100Mbit. So there was no trouble with legacy equipment or special adapters.
Once the switch was in, we turned to the Titanium 8. We installed it and plugged it into its new UPS and cabled it into the switch. We bonded the two 10GbaseT ports coming from the Titanium so it would load balance all the incoming clients.
Once that was done, it was time to upgrade some of the more important edit suites to 10GbaseT. What good is having all that 10Gb goodness in the lab when you can't feel the power all the way to the desktop? We upgraded both of Walter's iMac systems to 10Gb (via ThunderNET boxes) and added another 10Gb card to his fastest Mac Pro in Suite 1.
The result was a cool 300MB/sec writing from his iMac and 600MB/sec reading using the Aja System test. As I tell people, this isn't the best way to measure NAS bandwidth because applications like Final Cut and Adobe use different APIs to read their media files.
With the NAB Show approaching, I hope many of you that are planning to attend will be able to swing by Small Tree’s booth (SL6005) to learn more about this recent install directly from Walter, as he’ll be on-hand. While you’re there, feel free to ask about the SSD based Titanium shared storage solution we’re “going plaid” with.
If you’d rather not wait until NAB to learn more, contact me at modica at small-tree.com
|If you use Adobe Premiere for post-production projects, then you may have come across the problem where the application re-conforms and recreates peak files for projects every time the project opens or when changing back to Adobe from using another application, i.e. hide/show.|
If you see this problem occurring, check that all systems are time synched with each other (IE NTP is enabled in the Time and Date preference panel). If you do not have internet access on the systems, make sure all the systems are at least locked to the server time.
The cause of this problem is that Adobe is creating new files on the server. If the server has a time that is "in the future" - relative to your client - Adobe Premiere will see these future files and decide that the project is out of date. Then it will want to rebuild all the files. When using any application - including Final Cut, Adobe Premiere or Avid across a NAS or SAN - it is imperative that all of the machines agree on the time and date so the applications can tell when a project is up to date and when it needs to be updated.
The issue above will also occur with other applications in a network based client-server environment. Using the Small Tree Titanium Server as an example, one can setup NTP so that it can synchronize its clock the same way and to the same NTP server as the Mac clients do.
To enable NTP client code on Titanium Server:
-Login to the Titanium GUI.
-Ensure that the Titanium has a Network Interface defined on your local network,
intranet, such that you statically define or use DHCP to obtain an IP address.
-Define a default router and DNS server using the
SETUP -> network -> interfaces GUI area. If you need to add a default router, make sure you apply the IP address of the router to the appropriate network interface.
HARDWARE -> setup
find the "Current server time" area and set/click on "Use ntp",
then define the NTP server, i.e. time.apple.com, as an example and
-If you have the network IP address, default route and DNS setup
properly, it will come back with the correct Date/Time. If not,
you will need to revisit your network settings to correct.
If you’re having workflow issues with Adobe Premiere and your server, contact me at email@example.com.
|Storage is a tough market and customers are always willing to pay a little less to get a little less. My take away is this: In the war between Ethernet and EtherNOT based storage, such as Fibre Channel, the one that delivers the best value for the lowest price is going to win. As Warren Buffet likes to say, "In the short term, the market is a popularity contest. In the long term, it's a weighing machine." People need to buy based on value over time.|
Fibre Channel has been hamstrung for a long time by its need for custom ASICs (chips used to implement the protocol in hardware). Fibre Channel wanted to overcome all of the limitations of Ethernet and so they invented a protocol that did just that. The problem of course is that those custom ASICS are not on motherboards. You don't get FC chips built into your DELL server (unless you order a special card or riser). You don't see Apple putting FC chips on Mac Pros (even tho they sold Xsan and XRAID for so long).
What's the result? Expensive chips. It's expensive to fab them and expensive to fix them. FC stuff is expensive. Vendors may find ways to lower the entry point, but somewhere or other, either via support, licensing or upgrades, the cost will be expensive.
Ethernet certainly has ASICS as well. There are network processors, MAC (media access control chips) and PHY chips (the chips that implement the physical layer). They can be incredibly expensive. The first 10Gb cards Small Tree sold were $4770 list price! But here's the thing...a 10Gb card today is $1000 or less. The chips are everywhere and they are rapidly going onto motherboards. Ethernet is truly ubiquitous and will continue to be for server and storage technologies.
If you'd like to discuss or debate Ethernet vs EtherNOT, send me an email at firstname.lastname@example.org or hit me up on Twitter @svmodica.
|Not too long ago, I was asked to write up my predictions on storage and networking technology for the coming year. One of those predictions was the rise of new, combined file system/logical volume managers like ZFS and BtrFS. |
These file systems don’t rely on RAID cards to handle things like parity calculations. They also don’t “hide” the underlying drives from the operating system. The entire IO subsystem - drives and controllers - is available to the operating system and data is laid out across the devices as necessary for best performance.
As we’ve begun experimenting ourselves with these technologies, we’ve seen a lot of very promising results.
First and foremost, I think it’s important to note that Small Tree engineers mostly came from SGI and Cray. While working there, most of our time in support was spent “tuning.” People wouldn’t buy SGIs or Crays simply to run a file server. Invariably, they were doing something new and different like simulating a jet fighter or rendering huge 3D databases to a screen in real-time. There would always be some little tweak required to the OS to make it all work smoothly. Maybe they didn’t have enough disk buffers or disk buffer headers. Maybe they couldn’t create enough shared memory segments.
Small Tree (www.small-tree.com) has always brought this same skill set down to commodity hardware like SATA drives and RAID controllers, Ethernet networks and Intel CPUS. These days, all of this stuff has the capability to handle shared video editing, but quite often the systems aren’t tuned to support it.
I think ZFS is the next big step in moving very high-end distributed storage down into the commodity space.
Consider this: A typical RAID card is really an ASIC (Application Specific Integrated Circuit). Essentially, some really smart engineering guys write hardware code (Verilog, VHDL) and create a chip that someone “prints” for them. SGI had to do this with their special IO chips and HUB chips to build huge computers like the Columbia system. Doing this is incredibly expensive and risky. If the chip doesn’t work right in its first run, you have to respin and spend millions to do it again. It takes months.
A software based file system can be modified on the fly to quickly fix problems. It can evolve over time and integrate new OS features immediately, with little change to the underlying technology.
What excites me most about ZFS is we can now consider the idea of trading a very fast - and expensive - hardware ASIC for a distributed file system that uses more CPU cores, more PCIE lanes and more system memory to achieve similar results. To date, with only very basic tuning and system configuration changes, we’ve been able to achieve Titanium level performance using very similar hardware, but no RAID controller.
So does this mean we’re ready to roll out tomorrow without a RAID controller?
No. There’s still a lot of work to do. How does it handle fragmentation? How does it handle mixed loads (read and write)? How does it handle different codecs that might require hundreds of streams (like H.264) or huge codecs that require very fast streams (like 4K uncompressed)? We still have a lot of work to do to make sure ZFS is production ready, but our current experience is exciting and bodes well for the technology.
If you’d like to chat further about combined file system/logical volume managers, other storage/networking trends, or have questions regarding your workflow, contact email@example.com.
|When I used to work at SGI, I would often wonder what "C" level officers did. I once got to ask Ed McCracken what he spent most of his time doing day-to-day. At the time, he was CEO of SGI.|
His answer was that he was currently spending a lot of time talking to Congressmen trying to convince them to stop propping up Cray as a national asset. In hindsight, perhaps buying Cray was not the best idea.
As the Chief Technical Officer of Small Tree, which is a much smaller company, I have to wear a lot more hats. I thought I might include a list of the things I've been up to over the last month.
Deer hunting (actually, just watching this year)
Evaluating Titanium follow on chassis designs
Helping select next generation Software Defined Radio development platforms for the Army
Working on Adobe performance issues
Evaluating a new Avid sharing product (that works great!) called Strawberry
Evaluating a new Digital Asset Manager (that also works great) called Axle
Discussing our new high performance iSCSI products with partners
Fixing the phone system
Testing Thursby's Dave software with Avid
Helping customers with Small Tree products
Running barefoot (I run barefoot and in Vibrams.... a lot)
Working on a new voice router design for the US Rangers
Helping my kids with math homework
Processing firewood for the winter
Breaking up the recyclable cardboard boxes
Writing up an NAB presentation proposal
Prepping for a visit from the Soldier Warrior team of the US Army
Small Tree Board of Directors meeting
There’s never a dull moment.
|Not the Star Wars kind tho... |
Back when cell phones were new, a number of vendors had "clone" problems. People were cloning phone serial numbers so they could get free cell service.
To combat this problem, the cellular companies built up "Clone Detector" systems. These were massive database servers that had to be extremely fast. They would monitor all in process calls looking for two that had the same serial number. If they found a match, that phone was cloned and both were taken out of service.
SGI's systems were uniquely qualified to handle this work. The company had some stellar Oracle and Sybase numbers and offered these vendors a 10X speed up in clone detection.
The phone call came in from Florida during the test phase of the new system. The sysadmin called me up and told me that when she dumped a 25% load on the system, it slowed down very quickly. If she put a full load on the system, it stopped.
This was puzzling. I'm not a database expert, so I spent time looking at the normal performance metrics. How busy are the CPUs? Not very. How busy are the (massive) RAID arrays? Not very. How much memory is in use? Not much. Nothing was adding up.
I started watching the machine’s disk activity during the 25% load. I noticed one disk was very busy, but it was not the RAID and shouldn't have been slowing the machine down. I asked the sysadmin about it. She said it was the disk with her home directory on it and it shouldn't be interfering with the machine’s database performance. That answer nagged at me, but she was right. If the database wasn't touching the disk, why should it matter? But how come it was so busy? There was a queue of pending IOs for Pete's sake! Was she downloading files or something?
I asked her if I could take a look at the index files. Index files are used by a database to keep track of where stuff is. Imagine a large address book. I wanted to see if the index files were corrupted or "strange" in any way. I thought maybe I could audit accesses to the index files and spot some rogue process or a corrupt file.
What I found were soft links instead of "real" files. For you Windows people, they are like Short Cuts. On a Mac you might call them Aliases. She had the index files "elsewhere" and had these aliases in place to point to them. She told me "Yeah. I do this on purpose so I can keep a close eye on the index files. I keep them.... in... my... home... oh!"
So her ginormous SGI system with hundreds of CPUs and monstrous RAIDs was twiddling its thumbs waiting for her poor home directory disk to respond to the millions and millions of index lookups it could generate a second. It fought heroically, but alas, could not keep up.
Some quick copies to put the index files where they belonged and we had one smoking clone detector system.
|Many years ago when I was a "smoke jumper" support guy for SGI, I got to see some of the strangest problems on the planet. Mind you, these were not "normal" problems that you and I might have at home. These were systems that were already bleeding edge and being pushed to the max doing odd things in odd places. Further, before I ever saw the problem, lots of guys had already had a shot. So reinstalling, rebooting, looking at the logs, etc., had all been tried. |
One of my favorite cases was a large Challenge XL at a printing plant. It was a large fileserver and was used for storing tons and tons of print files. These files were printed out, boxed and shipped out on their raised dock.
Each night, the machine would panic. The panics would happen in the evening. The machine was not heavily loaded, but the second shift was getting pissed. They were losing work and losing time. The panics were all over the place - Memory, CPU, IO boards. By this time, SGI had replaced everything but the backplane and nothing had even touched the problem. The panics continued.
Finally, in desperation, we sent a guy onsite. He would sit there with the machine until the witching hour to see what was going on. Maybe a floor cleaner was hitting the machine or there were brown outs going on. We felt that if we had eyes and ears nearby, it would become obvious.
Around 8pm that night, after the first shift was gone and things were quiet, our SSE got tired of sitting in the computer room and walked over to the dock. He was a smoker and he wanted to get one more in before the long night ahead. The sun was going down, making for a nice sunset as he stood out there under the glow of the bug zapper (this happened in the south where the bugs can be nasty).
As he watched, a fairly large moth came flitting along and orbited the bug zapper a few times before *BZZZZZZT* he ceased to exist in a dazzling light display. It was at that moment when the Sys Admin (who was keeping an eye on the machine during our SSE's smoke break) yelled over to him "HEY! The machine just went down again".
Yes folks, the bug zapper was sharing a circuit with the SGI machine. One large insect was enough to sag the circuit long enough to take the machine right down. Go figure.
|I remember the days when CPUs were stuck in a rut. They were barely hitting 1Ghz. Networks were running at 1Gb and beyond and CPUs and storage just could not keep up. Clients wanted redundant, failover capable servers that could handle 600 clients, but SGI was running out of ways to do that. We couldn’t make the bus any wider (128bit computers?) and we couldn’t make the CPUs any faster. What should we do?|
One answer was to network many systems together over NUMA (non-uniform memory access). This would let many systems (that would normally be a cluster) act as if they were one system. The problem with a setup like this is speed. Systems accessing remote memory are slow. We had to find a way to speed up access to memory.
SGI invented lots of cool stuff to do this.
One of the new things was the CPOP connector. This connector was made up of many fuzzy little pads. The fuzzy pads would be compressed together and allow for a much higher frequency connection than normally would be allowed with gold pins.
The problem with delicate things like this is that they are far more sensitive to installation mistakes. Each connector needed to be torqued down to the right pressure so that the signals made it across cleanly. Install them too loosely and you’re going to see connectivity errors.
So cut to one of our advanced training courses where we taught field engineers how to replace boards. The instructor explained how each HEX head screw on the CPU cards needs to be torqued down to an exact specification. This is where one of the helpful field guys, who had clearly done this before, piped up and explained that you know the boards are properly seated when you tighten the HEX screw down and hear three “clicks.”
The instructor and I looked at each other. We didn’t remember there being three clicks. We normally used torque drivers to accurately measure the torque and there were never any clicks.
After some investigation and a quick examination of the board our helpful field guy had just installed, we discovered the source of the “three clicks.” They were the sound of the very expensive backplane cracking as the HEX screw penetrated the various layers of plastic…. OUCH.
From that point on, correct torque drivers were provided to all field personnel.
|Ever since Apple blew up the happy world of FCP 7, we've been running into more and more people moving to Adobe and Avid.|
Adobe's been pretty good. I like them a lot and their support guys (I'm talking to you Bruce) have been awesome.
Avid on the other hand, is tough. Shared spaces cause reindexing and external projects won't save natively to shares spaces. Our customers have mostly worked around this by storing projects locally, using multiple external volumes for media, or using AMA volumes.
I finally had a chance to explore this External project save issue in great detail today.
It turns out there's nothing specific Avid's doing that would prevent them from saving a project externally. They stat the file a few times and give up. It appears as if they are simply not allowing saves to shared protocol volumes (samba and afp).
My solution was simple: I created a sparse disk image on the shared storage, then mounted it locally.
This worked great. I could point my external project to it and it would save correctly. I could link in my AMA files and use them and I can trust that OS X isn't going to let anyone else mount that volume while I'm using it! When I'm done, I exit and unmount the volume. Now anyone else on the network can mount it and use my project (you can even have multiple users by doing read only mounts with hdiutil)
Further, this method can be adapted to just about any granularity you'd like.
For example, if your users hate the idea of creating disk images for every project, just create one very large (up to 2TB) disk image for their entire project library. You can automount that and just let them use it all the time. You could also have per project, per user or per customer images as well.
Let's face it. Storage is expensive and we all know what disks and motherboards cost. No one wants to pay three or four times what this stuff costs for a vendor specific feature. Hopefully, this trick makes it easier for you to integrate Avid workflows into your shop if the need arises.
|The worst possible answer to a customer problem is that it’s a hardware bug. Hardware bugs are expensive to fix. You not only have to replace the hardware, you may also have to replace everything you’ve got on the shelves. You can’t do this until you’ve “fixed” the problem, which might cost millions of dollars and take months. Hardware problems suck.|
This reminds me of a specific problem from long ago dealing with locking.
Locking is what programs do to avoid stepping on each other. It’s very similar to locking the bathroom door. When the bathroom is in use, the door is locked. Others wanting to use the bathroom will try the door, see that it’s locked, and try again later.
Our problem was that applications and even the kernel were crashing with what appeared to be two threads using the same resource. Thread B would have the lock, but thread A was in the same code acting as if it also held the lock. This normally is not possible. Thread A could not have gotten where it was without having the lock, and Thread B should have stopped when it hit the lock being held by A. There were no obvious errors in the code that could have allowed this.
Ultimately, we discovered the answer hinged on two assembly instructions and a new feature in the CPU called “speculative execution.”
The instructions were LL, SC (load link, store conditional). LL, SC is a neat concept that makes locking very fast. You “read” the lock in (LL) and “store” your lock value if the lock is free (store conditionally). If the lock is not free, the store does not happen and your code vectors back around to try again later.
Speculative execution is a feature that allows a CPU to execute ahead of itself. The CPU will pull in instructions that are coming up (and assume any if/then/else branches) and sort out what will probably happen in the next 32 instructions. In this way, the CPU can have all these values calculated and cued up for rapid and efficient execution.
So what was the problem? Turns out that if the CPU was in the middle of an LL/SC lock and happened to speculate over another LL/SC lock, the state of the first lock was overwritten by the second. So speculating over an unlocked lock while checking a locked one led you to continue on as if your lock was “unlocked”. The “SC” instruction would succeed when it shouldn’t have.
This is really a hardware issue. The hardware shouldn’t do this. However, replacing every CPU would be expensive and “hard.” The solution? It’s a compiler bug. The compilers were changed so that when applications and even the OS were compiled, 32 “no op” instructions were inserted after each and every LL/SC occurrence. This made sure that any speculative execution would never hit a second LL/SC combo since the CPU never went out past 32 instructions. Problem solved…. I guess.
Working with the military can be a lot of fun. It can be exhilarating. It can also be incredibly frustrating and boring.
Recently, we had a problem with a large server system being used by a military contractor at a very sensitive base. The server was periodically losing its boot drive. The drive would spin down and stop (hanging the machine). They would reboot and the system would run for a while, then spin down again and stop. Eventually, the drive would stop working all together. It would not spin up any more.
The solution is simple, right? Just get the drive and send it to the manufacturer - they will figure out what went wrong. Sorry…no can do. These drives are super secret and they much be shredded. They cannot be returned.
Management is up in arms. "Steve" they said, "you need to take your SCSI analyzer down there and plug it in to this system and see what's going on."
I had three good reasons not to like this idea:
1. Clearly, this is a hardware problem. Software does not "fry" drives so they will not come up again. Were they thinking the drive was just demoralized from bad SCSI commands? It was broken!
2. Connecting my SCSI analyzer to a super secret system is simple enough, but the military isn't going to let me take it home again! Once it's been plugged in, it's theirs. It took me a long time to justify getting that thing and I really didn't want to give it to this military contractor
3. Whatever was blowing up their boot disks was just as likely to blow up my SCSI analyzer! That's not a very good idea either.
Not sure of the best way to approach this, I walked over to another well known smoke jumper guy. This guy fixed hardware. He fixed it the way only a former military guy could. He took it apart down to every nut and bolt, he smelled cables for ozone (indicating something burned) and he examined boards with a portable microscope. He also had one other thing that is required for hardware debugging…a 6th sense that just told him what was wrong.
Me: "Phil, I've got this system that's blowing up drives. They've replaced everything and they want me to put my analyzer on there, but I think this is hardware. You have any suggestions?" Phil: “It's a burned cap on the midplane, have them replace it.”
The SSE was quick to shoot this down when we talked "No way! We've replaced the midplane twice. It can't be the midplane".
This is when "smoke jumping" happens. Phil gets on a plane and flies down.
What does he discover? It is indeed a burned cap on the midplane. The cause? The foil shielding behind the midplane is not mounted properly and is "angled" – resulting in the foil touching a 5V pin. As soon as they powered on the machine with the new midplane, “ZAP,” another midplane is fried.
They fixed the foil, put in a new midplane, and me and my SCSI analyzer lived to fight another day.
|Just about everyone can have a free web page. You get them free when you open cloud accounts or purchase internet service. This has lead to a proliferation of cat pictures on the Internet. |
Back in the 90s, when it cost a little more to get on the Internet, the idea of personal web pages was just beginning. One very large ISP (Internet Service Provider) that used SGI systems wanted to sell personal websites. They felt SGI's Challenge S system was the perfect solution. They would line up hundreds of these systems, and each system could handle several sites. SGI did indeed set several website access records for handling the website for "Showgirls,” which, as you can imagine, had a racy website.
Fast forward a few months and there are 200 systems lined up in racks handling personal web pages. Then I start getting phone calls.
"Hey Steve. These guys are filing cases about two or three times a week to get memory replaced. We're getting parity errors that cause panics about two or three times a week."
I fly out and start looking carefully at the machines. The customer had decided to purchase third party memory (to save money) so they could max out the memory in each system. Each machine had 256MB of RAM, which was a lot at the time. This was parity memory, which means that each 8 bits has a parity bit that is used like a cheap "double check" to make sure the value stored is correct. The parity bit is flipped to a 1 or 0 so that each 8 bits always has an even number of 1s in it. If the system sees an odd number of 1s, it knows there's a memory error.
I looked at each slot. I looked at ambient temperature. I made sure the machines were ventilated properly (including making the customer cover all the floppy disk holes since they did not have floppies installed, but had neglected to install the dummy bezel). No change. Parity errors continued and clearly there was an issue.
Going back to the memory vendor and the specs on the chips, we started doing the math.
The vendor claimed that due to environmental issues (space radiation etc) one should expect a single bit parity error about once every 2000 hours of uptime for each 32MB of memory. Half of these errors should be "recoverable" (i.e., the data is being read and can be read again just to be sure), but the other half will lead to a panic. They do not mean the memory is broken, but the errors should be rare.
So let's do the math: 256MB/machine (so that's 8X 32MB).
Hours of uptime? (These machines are always up): 8760hours
How many total parity errors: 35 per system, per year, with half of them being "fatal." So, that’s 17 panics per system per year. They had 200 systems. That's 3400 panics a year in that group of systems or roughly 10 per week?!
Consider this when you start to scale up your IT systems. How many machines do you have to put in a room together before "once a year" activity becomes "once a day?”
|You've all read that 10GbaseT is on the way. It's true. Very soon, you will be able to plug standard RJ45 connectors (just like on your Mac Book Pro) into your 10Gb Ethernet cards and switches. You'll be able to run CAT6A cable 100m (assuming the runs are clean runs) and have tons and tons of bandwidth between servers and clients. Who needs Fibre Channel anymore?!|
But with the widespread migration to 10Gb, you may have a plumbing problem my friend.
Many years ago, I had the privilege of supporting three of the large animation studios in LA that were trying to use their new RAID5 arrays and run OC-3 and OC-12 right to their desktops. These two ATM standards were capable of 155Mbits and 622Mbits, respectively (this was before the days of Gigabit Ethernet). Everyone expected nirvana.
They didn't get nirvana. In fact, they found out right away that three clients ingesting media could very quickly "hang" their server. Within about 30 minutes it would slow to a crawl and sit there. They could not shut it down. Shutdown would hang. What was really happening? The machine had used all of its RAM collecting data and was unable to flush it quickly enough to their RAID. The machine was out of IO buffers and almost completely out of kernel memory. The "hang" was simply the machine doing everything it could to finish flushing all this unwritten data. We had to wait (and wait and wait).
Further, we discovered that with only three clients we could quickly start generating dropped packets. ATM had no flow control and so too many packets at once would result in dropped packets. Since the clients were very fast relative to the server, it didn't take more than a few to overwhelm it.
Similarly, as we all start to salivate over 10Gb to our Mac Books, iMacs and refrigerators, we should consider how we're going to deal with this massive plumbing problem.
First, you *will* need some form of back pressure. The server must be able to pause clients (and vice versa) or these new 300MB/sec flows are going to overwhelm all sorts of resources on the destination system.
Second, just because the network got faster, doesn't mean the disks did. In fact, now your users will have ample opportunity to do simple things like "drag and drop copies" that will use up a great deal of the resources on the server. A simple file copy over 10Gb at 300MB/sec bidirectional could overwhelm the real-time capabilities of a normal RAID. The solution lies in faster raids, SSDs and perhaps even 40Gb FCOE raids for the servers. (That's right, 40Gb FCOE raids)
So as you consider your 10Gb infrastructure upgrades, make sure you're working with an experienced vendor that knows about the pitfalls of "plumbing problems" and gets you setup with something that will work reliably and efficiently.