I’ve been dealing with servers – web servers, file servers, rented servers and servers I’ve had ultimate control (and responsibility) over for more years than I really care to admit (15? *sigh*). One thing I’ve learned in those years is that servers can and do go boom. Either they get hacked or a hard drive fails or a lightening strike hits a bit too close – whatever happens, the fact is that servers will fail – spectacularly – if you use them long enough.
Here at the library, we’ve been hacked, we’ve been hit by lightening and we’ve had hard drive failures. All have required a great deal of scrambling to recover from and all have taught me something about the management of servers.
Of course, the most important bit of server management is your backup strategy – but the part of *that* that is most important is the testing strategy. Do you regularly go into your server’s backup software and try to recover individual files from past backups? If not, all the careful configuration of your backups won’t save you if something goes wrong and you don’t notice it. I try to do testing of a single file on a single server monthly – I go in, recover a file, confirm that the file is usable and uncorrupted and then delete it from the server. That server gets marked off the list and the next month I do it again to the next server in line. I only have a few servers, so this means that every one gets tested about quarterly. If you have more servers, you may want to double up your testing. It rarely takes long – 15 minutes, usually – but it can save hours of work.
The next important bit of server management is security. There are whole areas of the IT landscape that are dedicated to security professionals. I’m not one of them. I can, however, do some basic stuff to try to keep my servers secure and then outsource the rest to the real professionals. What I do is a compulsive checking of the logs each morning as I come in (I’m hoping to consolidate that into a checking of the combined log when I come in – but there are more things to do than time in which to do it…), setting reasonable policies that allow for security considerations while giving librarians a chance to actually do their work without tripping a bunch of security wires and training the staff on security issues.
No, my library’s staff never touches the servers – directly. Except for the file server, when they store documents that might or might not be riddled with viruses. Or the web server, where they do their content creation and maintenance, or the active directory server when they set their (hopefully) strong enough to be secure, easy enough to be remembered passwords. Ok – they do touch the servers in ways that aren’t immediately obvious at first glance, so the training issue is mega-important. If your staff can sniff out a phishing email a mile away, you have one less vector through which viruses can come.
Finally, patching for updates, learning about how your servers and the network on which they live works and keeping up with the hardware status of your machines will help alleviate a lot of problems as well. Nothing is going to prevent a direct strike from a lightening bolt from doing some damage to your infrastructure – even the most robust power surge equipment can fail or be overwhelmed in a huge strike – but keeping backups that work, security policies that are effective and an attitude of lifelong learning about all the new things that can go wrong on your network is a big step toward making servers that go boom a small inconvenience as opposed to a big problem.