Small Business I.T. Best Practices: 2008

At some point, you'll end up buying or assembling a server that provides some critical service that your whole business relies on. For a retailer, the Point Of Sale system going out would mean they couldn't (easily) sell merchandise until it was fixed. For a manufacturer or distributor, the inventory and purchasing system going down means no one can record what they are doing. For a payroll company, when the server running the payroll and accounting software fails, they can't calculate or write checks. Similarly, the media or file server for an advertising or architecture firm has to be up and running or their staff can't _do_ anything for their customers.

All of these scenarios point out how important this server is to your business. You may not
plan it that way, you may find one particular server grows into a critical role, or you might intentionally load a server with mission-critical software, but the one common thing to remember is you must protect your critical server(s).

Just like a car, these servers require regular maintenance to continue running reliably, they require insurance to cover the times when they crash or fail, and you have to build up a plan for continuing to do business even when the server fails.

Here at Allegro Consultants, we provide hosting, monitoring and maintenance for companies who have mission critical servers, but don't want the costs of a full IT staff to maintain them. Here is a short list of what should be done for these important servers, whether YOU do it or whether you have a professional support firm like us do it for you:

Firewalls
Your mission-critical server should be protected from outside attacks by a perimeter firewall. This is a firewall between your internal company network and the company (ISP) who provides your connectivity to the Internet. This is a very basic kind of protection.

This server should also be protected from internal attacks, those coming from a PC or server inside your company, so it can survive an attack even after one of your other internal machines is hacked into.

Power
You need to keep the mission-critical servers running even when power goes out. Many offices have some kind of generator that kicks in after a minute or so of powerloss. While it's good to have a generator, that minute of powerloss before it kicks in will crash your server.

You'll need a good UPS, uninterupptable power supply, connected to the server to keep it running until the generator has a chance to start providing power. UPSes also condition the incoming electricity so spikes, brownouts and switchovers don't hurt the server.

Spikes and surges
Each device connected to the server: the power line, the network cables, the phone line for the fax or modem, the serial cables to dumb terminals or old printers, etc., all can be the path that a electrical surge or spike uses to reach your precious server.

Each of these lines must have a surge suppressor that shunts the spike of voltage to ground. To make this work you must use three prong plugs every where. Never use a two-prong to three-prong connector. That eliminates the "safety valve" a surge suppressor needs to protect your equipment.

Hacks and viruses
You will need to protect your server from viruses, hackers and disgruntled employees. You'll need a server-level anti-virus system, an intrusion detection system that watches for and notifies you if someone tries to break in, and some kind of VPN so people outside your office who are allowed to use this server can't have their communication "snooped" by people watching their connection.

Patches
The operating system, each application you run on the server, and every add-on (like anti virus software) will have patches come out to close off security holes and add new features from the day you install it to the last day you use the software.

Applying these patches can be good, sometimes, and can be bad, other times. You'll need a an automated tool to tell you when important patches are available and a strategy to tell you if you want to apply those patches.

For many mission-critical servers, you only apply security patches and save all new-feature-only patches for planned downtime weekends. It's counterproductive to apply a new-feature patch to a working server only to have that patch crash the server and halt production.

Backups
You should have nightly backups, held offsite, and you should try to provide some kind of regular, frequent transaction backups during the working day. In the Progress, Oracle, and MS-SQL worlds, these are called “roll forward logs”. When you institute roll forward logging and transfer those logs, every 15 minutes or so, off site, they can be combined with your last full backup to provide for very, very little loss of data even under the worst disaster.

Your backups and roll forward procedures and validity should be tested twice a year.

A virtual image of the server, a cold metal backup plus exact hardware, or careful exact steps and media needed to perform a cold recovery should be held somewhere geographically separate form your primary server. You should create and test a DR (disaster recovery) plan once a year and “operate” that equipment to prove it worked.

Physical server management
All enterprise-class x86 servers from “name brands” such as Dell, IBM and HP come with management software. Once installed and configured, this software watches the physical hardware and alerts someone via SMS text messaging or email that a problem is about to occur or has occurred.

For all mission-critical server, you should use name-brand, enterprise-class servers and install the comes-with-it server management software. This may seem like more cost that the whitebox you can buy cheaper from the PC shop around the corner, but that won't help you when it's 3 AM on a holiday the day before all your payroll clients expect checks and your server has crashed with no more explanation than a blinking red light on the faceplate.

Remote control
All enterprise-class server have an option for out-of-band server management. This is typically a piece of hardware or an add-in card that allows you to diagnosis, reboot and “watch the console” even when the machine is not powered or not fully booted yet. For servers that are not 100% off-the-shelf proven designs (hardware, OS and application) and are hidden away in a lights out data center, these remote management cards can mean the difference between 15 minute recovery and 4 hour recovery.

Hardware warranties and support
You should leverage the hardware warranties and support plans available when you purchase these servers. The 24x7 4 hr onsite hardware diagnosis and repair is a very cheap insurance policy that no one but the vendor can supply as well as they do or within the warranty terms.

Server lifespan
You should expect a lifespan of each x86 server you buy to be only as long as the original manufacturer’s warranty. If Dell only offers 5 years of protection, plan to replace that server with a new one, under a new warranty, before that 5 years runs out.

Local monitoring
You should also install OS-level monitoring that proactively watches for problems and alerts you before they occur. We use Nagios for customers who buy monitoring from us and it warns us about low disk space, too much use of a CPU, running out of memory, network cards starting to fail, etc. Our Nagios management server, once it gets those “cries for help” or “warnings”, sends us SMS text messages and we ask you what you’d like to do. This usually happens early enough that maintenance to correct the problem can happen well in advance of any failure and with planned downtime.

Remote monitoring
Lastly, you want some simple checking, from the outside world, that the server is still accessible. This is a server we can offer and it uses various mechanisms to verify the server is still reachable. This detects problems like the network has gone down, the firewall has stopped allowing packets through, you software has stopped running, etc.

So, there's a lot to know about "running" a critical server. Bigger companies have dedicated IT staff who are trained and experienced in these areas. You simply count on them to provide steady, reliable service of the applications and they run the servers.

When it's just YOU as the Chief Everything Officer, you may find all the above too much to handle. But someone has to handle it as your business relies, literally, in this server working properly.

You can call a managed service provider like Allegro Consultants and have them do all this for you, or you can follow the recommendations above and do most of it on your own.

If you have users, then you've got a stack of problems. Every time you give an employee a computer, an application or even just a way to accomplish something, you'll end up with the need to support that employee. When they encounter a problem, like a stuck printer or a lost password, you'll get called and be expected to fix something.

You'll need something that helps you accomplish this task, especially if, like most small business I.T. shops, computer support isn't you're primary job. To manage this extra duty, do a good job of it, and not let it consume every waking hour, you need:

A way for users to ask you to fix a problem that doesn't require you, personally, to write down the problem
A mechanism for listing the outstanding problems and prioritizing them
Some kind of report that shows a user the status of their problem
A way to allocate budget dollars to solving problems or at least reporting how your I.T. money was spent

These are the basics that will keep you sane as well as demonstrate to the users you are indeed working on their problem. This is a very standard I.T. problem, one that all businesses have, and one that many, many companies have created solutions for.

So, how do you do this and how do you get them to use it?

The solution you're looking for is often called a "trouble ticket" system or a "help desk" system. There are lots of them. Fundamentally, such systems allow a user to submit a new problem or "ticket", the ticket is assigned to an I.T. queue, someone in I.T. gets notified there is a new issue, they work on the issue, and, once it is complete, the trouble ticket system lets the original user know it has been fixed.

While you may think this is extra work, I can tell you from experience that such systems dramatically reduce the amount of time and headache answering incoming support calls require. Entry of the original problem is performed by the user with the problem. That cuts down your time and improves the accuracy of the description. When that user wants to know the status of their issue, they look it up in the trouble ticket system instead of calling you. Again, a big time and headache saver.

Even more substantial time savings can be had once you have used the system for some time and have accumulated lots of problem-and-its-resolution data within the system. With proper handling, each issue that is entered and resolved can be searched by users before they enter a new issue to see if their problem has been solved in the past. If it hasn't and they enter a new ticket anyway, your ability to search the past problems for related issues or knowledgeable people regarding the affected systems is a big time savings.

Overall, a trouble ticket system is a critical tool for providing good support to the users, timely resolution of business issues, and reduces the time and cost you have to expend to solve those problems.

Open Source trouble ticket systems:

OTRS - Open Ticket Request System www.otrs.org
Web Amoeba Ticket System (for Mambo and Joomla CMSes) www.webamoeba.co.uk
Simple Ticket - www.simpleticket.net

If you are running Progress OpenEdge or WebSpeed, you should know that Allegro Consultants offers a free copy of their Progress-based trouble ticket system to customers for free. www.allegroconsultants.com

Small Business I.T. Best Practices

Tuesday, May 27, 2008

Managing a server critical to your business

Wednesday, March 26, 2008

Keeping track of problems

So, now you ARE IT; Now what?

About Me

Blog Archive