What is a Service Level Agreement (or SLA)

Technical writer

Infrastructure

26.05.2022

12 min read

SLA is an agreement that outlines what kind (and what level) of service a certain company can provide. This term is mostly used in industries like television or Information Technology.

Unlike regular service contracts Service Level Agreement offers an exceptional amount of detail provided with descriptions of service quality, tech support response time and other indicators.

General SLA principles

The service level agreement usually follows these principles:

The interaction between the provider and the client must be as transparent as possible. Every process has to have a clear and reasonable purpose. No blurred terms and puzzled wordings allowed. Both sides should avoid using specific expressions that might be misunderstood.
The rules and rights for both sides have to be totally understandable. For instance, a company promises that all the provided services will be accessible 99.99% of the time and if the user finds out that it is not true he should have an opportunity to receive compensation.
Expectations management. For example, clients expect tech support to be available at any time as well as answers to the most insignificant questions. But providers can't offer such service. Accordingly a client must change provider or lower his expectations. Or the company has to make the tech support team more performant.

SLA usually contains such data as the amount of time that is needed to resolve a client's problems or what kind of compensation and in what cases the user has the right to ask for it, etc.

SLA doesn't have to be a giant pile of sheets. The most important thing for any company is to make the service level agreement as transparent and natural as possible. Look at successful and large corporations such as Amazon. SLA for their service S3 is fully described on just one page.

Here (link to Amazon) you can read about the monthly uptime of the services and about the level of compensation you'll receive if they are not achieved.

What typical SLA consists of

We peeked into Amazon SLA a couple of lines ago. That is not a standard. It is just one of the ways to design your SLA which takes into consideration the specific characteristics of the service provided by the company (and authors of SLA).

If we're talking about the IT industry, a typical SLA would contain:

The rules for using the product or providing some service.
Responsibilities of both sides. Mechanisms that help users and providers to control each other in some way.
Concrete procedures that might be undertaken by the provider to fix any flaws the user stumbles upon.

You can also find the exactly how long an actual service level agreement will be legitimate. Sometimes both client and provider describe ways of adding new demands to the functionality of the services if necessary.

Moreover, it is normal to list indicators that somehow refer to the actual level of service quality.

The reliability and availability of the service.
The time it takes to react to system faults and malfunctions.
The time it takes to resolve system faults and malfunctions.

You might want to add the way of settling the scores with the client. As an example, some companies ask for money after providing a certain level of service, some companies insist on paying for a fixed plan, etc. Don't forget to tell users about fines if they exist. If it is possible for the client to receive compensation, the job of the service provider is to explain why, how and where the customer can get it.

Key parameters of SLA

The parameters of SLA — is a set of metrics that can be measured somehow. There's no way you would write in SLA something along the lines of "We will fix any fault before you know about it". It is an example of a blurred statement that will only make it harder to achieve a level of agreement between the service provider and the customer.

Let us talk about such a metric as operation mode. It shouldn't be abstract. It must include concrete dates and periods of time when customers can count on the technical support team.

There are examples when a company divides all the customers into separate groups. One of them is allowed to access tech support any time. The second is only allowed to ask for help on workdays. The third can't call for help at all.

Such metrics are extremely important because there's no other way to clearly understand what both sides can expect from their collaboration. That's why you have to consider a few things:

Metrics must be published and accessible for anyone.
There shouldn't be any statements that can be misunderstood.
Any changes in metrics should not happen without warning. Customers have the right to know about any change beforehand.

When you work on establishing metrics do not overdo it. It might increase the price of services provided by the company.

Let's see. We have a problem that might be solved in about 4 hours by a mediocre specialist. An expert can solve the same problem in 2 hours. It is not a good practice to write "2 hours" in your SLA. The job done by a specialist will become much more expensive in the quickest way possible. If you write "1 hour" you will not only pay much more but also will often pay compensations to thoughtful users who believed you but were cheated on.

Operation mode and work hours are not the only metrics that you should care about. What else is important? For example, the time it takes for tech support to respond. Metrics themselves can differ because of external variables like customer status or the seriousness of the problem.

Let's say some company is outsourcing some kind of IT service. This company has a group of users that pays for the premium plan and another group that does not. The time it takes for a tech support team to respond to clients from different groups might vary because one of them is obviously more privileged. One group might get help in 15 minutes and the other in a day. If there are such differences it is extremely important to reflect it in a service level agreement.

Beside the reaction time it is important to speak about the time it takes to resolve the problem the user has run into. The logic of regulating this metric is exactly the same. Even if the customer is really important to the company his queries might be dealt with at differing speeds depending on the seriousness of the problem.

We have a client that has an extremely severe problem — the local network is down and all the inner processes are consequently stuck. Such problems must be prioritized. SLA might include the details for this kind of problem and what type of help the client can expect.

The same customer can ask for help another day but with less critical malfunction. For example, the whole network works well but a few new devices need to be connected to it. It is ok to spend hours and days on such things.

These and a lot of other considerations should be reflected in SLA and accepted both by customer and service provider. Such an approach can help to lessen the amount of potential conflicts. Everything becomes clear and understandable for anyone.

Availability of the service

For the provider, one of the most important parameters in SLA is availability. This metric can be measured in days, hours or minutes for a certain period of time. For instance, a provider can guarantee anyone that its cloud storage will be accessible 99.99% of the time during the year.

In absolute numbers 99 and 100 seem to be quite the same thing. But the difference becomes huge if we analyze those numbers considering that this percentage refers to a period of 365 days. If we say 99% it actually means that the customers agree that the server might be not available for about 4 days per year. And when we talk about 100% there shouldn't be any stand by. But it is impossible to guarantee such reliability. It is always 99.**% with some numbers after the dot.

Considering Hostman, we guarantee 99,99% of uptime. It means that servers might not work for as long as 52 minutes per year.

You might find providers that promise uptime up to 99.9999% and swear that servers will be off for 15 minutes at most. But it's not a good idea to say such things for two important reasons:

The higher the promised uptime the higher the price of the service.
Not that many clients even need such uptime. In most cases 99.98% is more than enough.

The amount of 9s is less important than the actual time that is fixed in SLA. The year is the default period of time used as a metric in SLAs. That means that 99.95% of uptime is 4.5 hours of stand by per year.

But some providers might use different metrics. If there's no concrete info, the user must ask what period of time is used to evaluate the uptime. Some companies try to cheat customers and boast of 99.95% of uptime but mean results per month and not per year.

Another important point is cumulative accessibility. It is equal to the lowest indicator reflected in SLA.

Pros of SLA

Signing and observance of SLA pays off for both sides. Using SLA a company can protect itself from unexpected customer demands (like fixing a not critical problem at 3 AM) and strictly describe its own responsibilities.

There are other advantages of SLA. Providers can settle and put in order not only external processes but also inner ones. For example, with correctly composed SLA a company can implement different layers of technical support and control it in a more efficient manner.

At the same time, customers that sign an agreement will clearly understand what kind of service will be provided and how they can communicate with the company.

The difference between SLA and SLO

SLA can be used as an indication of user-satisfaction level. The highest level is 100% and the lowest is 0%.

Of course, it is impossible to achieve 100% as it is impossible to provide 100% uptime and reflect it in the company's SLA. That's why it is important to choose metrics wisely and be realistic enough about the numbers used in SLA.

If you don't have a team that is ready to work at night, don't promise your customers technical support that is available 24/7. Remember that it is possible to change SLA anytime in future when the team grows and it will be viable for the company to provide a more advanced level of support. Customers will be very happy about that.

There is another system that is used inside companies to monitor the service level. This one is called SLO. O stands for "objectives". It means that the metric is oriented at future company goals. This metric reflects what level of service the company wants to achieve in future.

Here we go again, examples based on tech support. Let's say, at the moment a company can process about 50 requests and work 5 days a week from 9 AM to 6 PM. This data should be fixed and described in SLA so the customers can see it.

At the same time a company creates a second document (service level objectives). It is a foundation of future service improvements. SLO contains current metrics and a list of tasks that should be done so the company achieves a new level of quality growth. For example, the aim to raise the amount of processed user requests from 50 to 75 during the day. The future of SLA strongly depends on a current SLO.

How to create SLA

Starting the process of SLA compiling you'd better begin with the describing part. Usually this part of SLA contains a kind of glossary, abstract system description, roles of users and tech support team, etc. In the same part you can reflect boundaries: territory where service is provided, time, functionality.

The next section — service description (what functions, features and goods a user can get by working with a certain company). In this part of SLA a company must describe in detail what the user can count on after signing the contract and on what terms.

After finishing the first part you can narrow and make further details more specific. That's the main part where the actual level of service is explained minutely. Here you would write about:

Metrics that reflect the quality of service provided (and they must be easy to measure).
The definition of every metric. That should be concrete numbers and not abstract statements so both sides can refer to this part of SLA.

It is common to put additional useful links (where another set of conditions explained in detail) in the last part of SLA.

In all the stages of preparing an SLA a company must remember that it is a regulation document that helps to control everything connected with the service. The more control a company has over all the processes the better. If SLA doesn't give a company some level of control, there's no reason for such a document to exist.

Checklist: what you should consider while compiling SLA

If you are not signing the SLA but creating your own and composing it to offer the potential clients, keep these things in mind:

Customers. In large systems it is recommended to divide users into separate groups and communicate with every of them individually. This approach helps to distribute resources more effectively and do the job more effectively even in the moments of high loading.
Services. At this stage it is important to consider what group of customers need certain types of services. For example, your company might offer access to a CRM system for every e-commerce business. If they can't access it their business will fail and the clients will start to lose money. And consequently it will lead them to the service provider who failed them. That's why such services get the highest importance rating and must be prioritized over some simple tasks like changing the printer or creating a new account.
Parameters of service quality. These parameters should be connected with the business targets your company follows and the desires of the users. For example, time and conditions at which any service is provided. One company may want to work 24/7 and the other only offers access to a tech support team 5 days a week from 9 AM to 9 PM.

Any changes to SLA should be explained to every user (regardless of his status or level of privilege) before the actual changes come into force.

SLA is an ever-changing technology. In real use cases you will see that some parameters or aims do not correlate well with the general direction the business is taking. And that's why the management team often decides to correct SLA and optimize it.

Remember, SLA is not a marketing tool, it is a way for the company to talk to its users in the clearest, most efficient way. Everyone accepts the rules in SLA.