Science & Engineering Node Services — UB Engineering / Natural Sciences & Mathematics — University at Buffalo
SENS Unix Server Fault Tolerance
Topics: Introduction
Introduction
We were recently asked if our email server is fault-tolerant. The answer is "yes", and we accomplish this without the use of expensive, custom hardware and software. Everything we do is based around off-the-shelf, standard system configurations. Because of this, spare parts are easy to obtain, and we can take advantage of conventional wisdom, widely-available expertise, and building off the mistakes of others to help avoid problems and solve them quickly when, and if, they arise.
Examples
For example, "Fate", our main email and network services server for the School of Engineering, has four disks on it: two 2.1GB disks, which hold mirrored copies of the operating system (we even mirror the swap space!), and two 72 GB disks that hold the email inboxes. Each pair is not only mirrored, where disk writes go to both disks simultaneously, but they are running a journaled file system which prevents disk corruption and lengthy file system checks. Also, the software (Solstice DiskSuite) reads from both mirrors, which makes reading more efficient. We have tested this, and the failover works beautifully; if one disk dies, the other keeps going. When the bad one is replaced, the data is synchronized from the good copy to the empty disk (the software is smart enough not to do it the other way around), and it is integrated back into the mirror when the process is complete.
Our main Engineering disk server, "Providence", is configured in much the same way. User home directories are held on a software-based RAID-5 (Redundant Array of Inexpensive Disks, Level 5) disk, which is comprised of five 9 GB disks. The data on each disk is duplicated across the other four, so that if any one disk fails, the other four can reproduce the data "on the fly", and nothing is lost; not only that, the system keeps running as though nothing happened! We keep a sixth disk online as a "hot spare", and in the event of a disk failure the software will automatically bring it into service and populate it with the data that was held on the failed disk. We also keep an inventory of spare disks so that we can replace failed ones quickly and conveniently.
We do not currently have a mirroring system for research disks, but we do have a very good automated backup system, so that if they do crash, we can recover them quickly. Not having more than 2 GB on any one research disk insures that only a small subset of the user community would be affected in such a case, and, again, that the data can be recovered quickly.
We have an identical setup for the Natural Sciences side of the Node, using two servers known as "Pinky" and "TheBrain"; we could have built one set of servers to handle the needs of the entire node, but we believe that smaller is better, and that one system going out of service shouldn't affect everybody. We also have a "hot spare", an identical machine, from which we can cannibalize parts should any of our main servers fail. It's in use as the timeshare known as "joker", but can be taken offline at a moment's notice.
So, in a nutshell, you can see how our philosophies of "smaller is better" and "off-the-shelf components are better" work in our Node. If you have any questions about our setup, feel free to direct them to us.
University at Buffalo - State University of New York

