The Outsourcing Continuum, Part VII: SLAs

When you put a crucial portion of your company’s service into the hands of a third party service provider, you lose some control over the operational aspects. You are relying on the partner to meet your expectations for quality of service. In order to protect yourself from some of that risk, there are two service level metrics that you want to address in the contract: uptime and response time.

Uptime

Uptime refers to the amount of time a system is available for use. It’s calculated as a percentage of the total available time. If you have 99% uptime for the month, the system was available for use 712.8 hours out of the total 720 hours in a 30 day period. To get a good feel for what up-time levels are appropriate for your company, look at the current levels of service you are providing to your customers. For example if you are providing online service processing only during the hours your offices is open, and you intend to continue with that practice, then you probably don’t need 24×7 service levels but you do want to have the system up and functional when you’re actively taking providing support to your customers.

The service level you want for the hours you are supporting customers is very high, whereas the uptime for the rest of the day can be very low. Typically, you will want to get agreement for uptime to be in excess of 98% for operational use hours. That number will exclude any scheduled maintenance downtime. That means if the system is scheduled to be down for maintenance, that time will be excluded from the available time in the uptime calculations.

Response Time

Response time refers to the amount of time that it takes to get an answer back from the system once the user has entered data. It is typically measured in seconds from the time the user presses the submit key until data is returned to the screen. There can be a wide range of response times depending on the work the system is required to do to come up with an answer and the complexity of the network.

You want to determine what an average response time should be and build that into your SLA. For example, for a system that is doing a simple calculation and is directly connected to the user terminal, you would expect to see a response in less than 2 seconds. For a system that is doing a complex task like scoring a loan application and the user is connected via the Internet, you might expect to see response times of as much as 45 seconds.

The primary issue here is productivity. If an application is too slow, the user will not get as much accomplished in the same period as they would with a more responsive product. They will also get impatient and perceive it as a poor system. Not a good customer service situation.

Enforcement

On the enforcement side of the SLA, you have a couple of options. If the provider fails to perform at the desired level, you can ask for a refund of monthly fees or you can terminate the contract. Both of these options are valid separately, however, the combination is even better. If your uptimes don’t meet the SLA levels you may want to consider getting a refund of a portion of the monthly fees you’re paying for the system. For example you might want to get a 10% refund for every percentage point under the agreed to uptime. If the system was unavailable on an unscheduled basis for 96% of the month and you had a 98% SLA, you would get a 20% refund on your monthly services fees. This again is a productivity issue for your company. If the application wasn’t available then you weren’t able to do business and thus there is some loss for which you should be compensated.

To address chronic uptime/response problems, you can build in a contract termination clause if uptime and/or response time levels are not acceptable over a defined period. Typically, that period would be a fixed time frame of 3 – 6 months. You can also use chronic problems 6 months out of 12 to trigger a contract termination.