Service Level Agreements

Several weeks ago, we reviewed ITIL, its practices, terminologies, and certification (you can read that here). Part of ITIL’s definitions and methodologies are something known as SLAs. An SLA is a Service Level Agreement. It is an agreement between an IT service provider and a customer. It describes the IT service, documents Service Level Targets, and specifies the responsibilities of the IT Service Provider and the Customer. One SLA might cover multiple IT Services or customers. Another way to think of an SLA is that it is simply a promise or promises. The SLA describes what the promises are, but it doesn’t specify how the promises will be executed. The actual SLA will depend on the company that you work for, but in general it will include a contract, service description, service hours, service availability, service performance, etc. For example, your company’s SLA might say that a specific server cluster must have 100% uptime. This would mean that the IT department would spend time and money to ensure that the cluster never experiences a failure. Another part of the SLA might say that any time a user has a problem that prevents them from doing their jobs, such as keyboard or other hardware failure, a solution must be in place within one hour. The resolution for some sorts of issues may even depend on the type of user or department that submits the request (for instance, maybe any issue submitted by a Chief Officer must be done within 1 day). In the end, SLAs are a standard way to measure many portions of an IT department. Tracking the number of issues that fall into different categories of a company’s SLA may help the IT department and the company set short and long term goals for IT staff and the department as a whole. In addition, tracking how many issues ‘break’ SLA (either by being delivered late or not to the quality defined in the SLA), can potentially speak to a staff member’s work quality and load. Some SLA metrics include:

  • Defect Rates - The number of errors in deliverables (such as failures per month or the number of missed deadlines per quarter)
  • Technical Quality - Specifically for outsourced application development, this is the measure of the technical quality of the code (such as program size, degree of complexity, coding defects, etc)
  • Service Availability - If services were down, were they down for longer than promised? Or, what percentage of a time is an application available within normal working hours?

When a company thinks about measuring SLAs, it is important to choose metrics that will motivate the right behavior. You might decide it is OK to take a little longer than promised (in other words, miss deadlines) on some issues if it means you are setting the pathway for a long term solution. If that is the case, using the number of issues resolved may not be a great measure. Also, some issues will inherently be easier than others. A company may not want to reward the methodology of just resolving as many easy issues as possible instead of resolving some of the more time consuming ones. Have questions or suggestions? Please feel free to comment below or contact me.