How To Create an SLA
SLAs vs Priorities
Previously, we've established priorities based on impact and urgency.
Priorities help the IT team decide what to work on next, but it's a bit of a blunt tool. They can be viewed as the same thing at the most superficial level. E.g. Priority 1 incidents will be resolved in 90 minutes. SLA achievement for the month = 98% of Priority 1 incidents resolved in 90 minutes. Indeed, this is probably the way to go to keep it simple.
SLAs help add in extra dimensions and set qualitative measurements and performance expectations by which we can communicate expectations for both the customer and service provider. So an SLA can also include availability metrics and capacity boundaries as examples.
Benefits of SLAs
So, there are a couple of clear benefits to having service-level agreements in place and published.
Setting clear expectations – Defining parties' scope, performance, process and responsibilities.
Improving communications – Sets a documented framework up front that all stakeholders can review and help improve and help improve working relationships.
Performance measurement – By explicitly capturing needed performance levels, it should be apparent if the targets are being met and why not. This may support discussions for further resourcing or adjusting the targets to something more realistic.
The 3 Types of SLA
There are three main types of standard SLA models. First, you must be clear on your model and how you wish to implement it.
Service-based SLA
A single agreement covers a service, and all customers share the same level of service.
Customer-Based SLA
SLAs differ for each customer of that service.
Multi-Layer SLA
A single SLA has multiple layers as options. For example, gold, silver and bronze levels, depending on the cost.
Key Notes
Can you measure it?
Ensure that nothing goes into your SLA that is not easy to measure. For example, it is probably too complicated if it requires lots of manual calculations or different pieces of data to be brought together.
A great example is availability, a measure often tricky for people to obtain. If you can automate it, that's great. Still, equally, you could use a simple report to summarise major incidents in a period, calculate the total outage periods, and use that to calculate availability. Manually going through MI reports or incidents and adding things up, trying to remember if there was any outage, is a quick way to expose your lack of maturity to anyone you engage with on the figures as soon as they recall an issue you'd overlooked. By automating it, you should avoid such problems.
Always be transparent about how you calculate figures.
Baseline before promising
If you don't know how you are currently doing against potential metrics / KPIs before you publish them to a customer, then you probably shouldn't.
Saying you can hit 95% of all priority 3 incidents within the target is great if you know you can currently do it. Otherwise, you are immediately going to over-promise and under-deliver.
Simplifying the SLA Model
Let's pause here, just for a moment.
I said SLAs are not typically just priorities; there can be much more to them. This is very true, for example, of an agreement for an outsourced service or a bespoke contractual agreement. The SLAs in these circumstances may include not only response and resolution times but measures of availability, security responses, disaster recovery, etc.
So there will almost certainly be multiple dimensions to it.
However, by adopting a simplified approach to SLAs where there is an opportunity, organizations can better focus on critical aspects of service delivery without being overwhelmed by complex agreements.
This two-part approach consists of the following:
Response and Resolution Times SLA: For service requests within the ITSM system, this part of the SLA focuses on response and solution times. These measurable targets are tied directly to service requests, enabling easy performance tracking and ensuring support teams remain accountable for timely responses and resolutions.
Overarching SLA Agreement: A separate, overarching SLA agreement can be created to cover broader aspects of service delivery, such as help desk opening hours, service availability details, escalation paths, and more. By consolidating these details into a single document, organizations can maintain a centralized reference point for these key elements without complicating the SLAs tied to individual service requests.
This adapted approach has several benefits for small to medium-sized organizations:
Simplification: By separating response and solution times from the overarching agreement, organizations can maintain a simplified structure that is easier to manage and understand.
Focus on key metrics: This approach allows organizations to prioritize the most critical aspects of service delivery, such as response and solution times, without being distracted by additional metrics that may not be as relevant to their specific needs.
Flexibility: Organizations can adapt their service delivery expectations and requirements by maintaining an overarching SLA agreement without revising individual response and solution time SLAs.
Streamlined management: With this two-part approach, organizations can manage their SLAs more effectively, making tracking performance and enforcing accountability easier.
Stick with me here, and I'll try to explain in a couple of different ways. Firstly, here's a diagram showing how we could implement workflows around response & resolution times, which might vary, but the general agreement contains the other details in the SLA.
So, to reflect on the above model for a moment in a visual way, it might look like this;
Part One - Response & Solution Times
The following is a screenshot from ManageEngine's SLA configuration tool. Note that here we can configure the response and fulfilment times, escalation times, and actions.
Part Two: Overarching SLA Agreement
Here, we have the more comprehensive service-level agreement document that contains everything else, such as availability metrics and hours of operation.
The more you create complex SLAs, the greater all the downstream activities, including reporting, communication, resourcing to meet targets, etc.
Having a myriad of SLAs for different teams and scenarios can become an absolute managerial nightmare.
If you can't automate that scenario, then don't do it. You can end up in a situation where you are missing service levels, and you won't know it, but guess what? Your customer will.
Configuring SLAs
Configuring the SLA timelines response/resolution times in the ITSM tool
Configuring your SLAs into your ITSM system is something I cannot help with in great detail as there are too many products out there, but let's assume it's based on a convergence of the above actions to this point and then the following;
Define the escalation paths
You'll need to define the thresholds for triggering automated alerts or notifications when SLA targets are at risk of being breached. This would include defining both the thresholds for alerts and whom to alert.
What you configure this to will depend significantly on how you wish to implement things, but let's suggest that for lower priority issues, you set the threshold to 80% of the resolution and response times (e.g. on a 4-hour SLA, it would be 3.2 hours = 4hrs * 0.8 = 3.2 hrs). For more urgent or critical SLAs, you might set it to 50% (e.g. 30 mins on a 1-hour response = 1 * 0.5 = 0.5hrs).
These are just suggestions; it's entirely down to how you configure your SLAs and escalation points.
Build real-time dashboard monitoring
Having dashboards that allow everyone to view where things are in real time is invaluable. Here are a few ideas on what widgets/metrics you could add to your dashboard to track response & resolution times.
Open Tickets by SLA Status: Include a chart or list that categorises open tickets by their SLA status (e.g., within target, at risk, or breached). This can help you prioritise resources and take necessary actions to ensure SLA compliance. You may wish to refine this by priority levels (e.g., Priority 1, priority 2, etc.)
At-Risk Tickets: Display a countdown or progress bar for tickets at risk of breaching their SLA targets. This can help you quickly identify tickets that require immediate attention.
Create reports based on KPIs
While the above is used for real-time tracking, you now should configure reporting on historical performance. As previously mentioned, less is more. Pick just one or two that resonate with you. Here are some suggestions for consideration;
SLA Compliance Rate Over Time: Generate a report that shows the percentage of requests and incidents that met their SLA targets for response and resolution times over a specific period (e.g., monthly, quarterly, or yearly).
SLA Breach Rate Over Time: Create a report that displays the percentage of requests and incidents that breached their SLA targets for response and resolution times, allowing you to analyse trends and identify potential issues.
Resolution Time Trends: Generate a report that illustrates the trends in average resolution times for requests and incidents. This can help you gauge the overall efficiency of your help desk over time.
First Response Time Trends: Create a report that shows the trends in average first response times for requests and incidents. This can provide insights into your team's responsiveness and customer satisfaction levels.
First Contact Resolution (FCR) Rate Over Time: Generate a report that displays the FCR rate for a specific period, allowing you to track improvements in resolving issues on the first interaction with customers.
Ticket Volume Trends: Create a report that shows the number of requests and incidents over time, which can help you identify patterns in workload and allocate resources effectively.
SLA Performance by Category: Generate a report that breaks down SLA compliance and breach rates by ticket category, allowing you to pinpoint specific areas that may require process improvements or additional resources.
SLA Performance by Priority: Create a report that displays SLA compliance and breach rates based on ticket priority. This can help you assess your help desk's effectiveness in handling high-priority requests and incidents.
Escalations Over Time: Generate a report that shows the number of escalated tickets over time and the average time taken for escalation. This can provide insights into your team's ability to identify and address critical issues.
SLA Performance by Agent: Create a report that displays individual agent performance regarding SLA compliance, breach rates, response times, and resolution times. This can help you identify top performers, staff training areas, and process improvement opportunities.
Comments