The following summarises the key components of a major incident process under ITIL.
The following video gives an overview of the process and some key understandings around the process.
Major Incident Management Process Overview
The following steps summarise the major incident process and can be downloaded in the file above which can be tailored to your own purposes.
1) Investigation
Objective
Swiftly identify the root cause of the incident and explore initial mitigation strategies.
Procedure
The designated receiving team is allocated a duration of one hour for the primary investigation.
In many circumstances, it's more efficient to promptly restart a particular component or service instead of in-depth diagnostics.
Should the initial investigation require external expertise or additional support, the Major Incident Manager (MI Mgr) may be consulted.
2) Contact the Major Incident Manager
Objective
Ensure coordinated and effective incident handling.
Procedure:
If the service disruption persists beyond one hour without a resolution, the incident owner is mandated to engage the MI Mgr.
The MI Mgr assumes responsibility for overseeing the recovery process and facilitating communications, even if there's an anticipation of imminent resolution.
3) Assess Criteria for Major Incident
Objective
Determine the gravity of the situation and decide on the course of action.
Procedure:
The MI Mgr evaluates whether the ongoing situation qualifies as a major incident based on predefined criteria.
This evaluation ensures that the MI process isn't initiated unnecessarily, preventing resource wastage.
4) Investigate & Escalate
Objective
Perform an in-depth analysis and involve higher tiers if needed.
Procedure:
The investigating team is given a predetermined window to delve deeper into the incident.
The primary focus remains on service recovery, with root cause analysis being a subsequent priority.
All significant findings and updates are meticulously recorded.
5) Manage Recovery & Comms
Objective
Restore normalcy and keep stakeholders informed.
Procedure
The MI Mgr holds the reins, supervising all efforts aimed at service restoration.
While the MI Mgr might seek external assistance, they remain the central figure guiding the overall recovery process.
6) Investigation Review Meetings
Objective
Facilitate effective team communication during the crisis.
Procedure
If the situation demands, the MI Mgr assembles the concerned teams for urgent review meetings.
These meetings are focused on framing the problem, prioritising actions, and assigning ownership to ensure swift resolution.
7) Update Stakeholders
Objective
Keep major stakeholders in the loop.
Procedure
The MI Mgr leads the communication efforts, updating stakeholders about the ongoing progress.
Updates are structured and provided in a consistent, standard format.
8) Communicate Resolution
Objective
Inform stakeholders once the service is restored.
Procedure
Post restoration, the MI Mgr disseminates information to all concerned parties, possibly enlisting support from the Help Desk.
9) Produce an MI Report
Objective
Document the incident and its handling for future reference.
Procedure
The MI Mgr drafts a comprehensive report capturing the incident's impact, significant events, follow-up actions, and, if discerned, the root cause.
This report is shared within 24 hours of incident closure. If the root cause remains elusive, the problem management process is triggered.
10) Close Incident
Objective
Close the incident record and complete the process
Procedure
The MI record is formally closed, recording the location of the MI report and any follow-up activities.
Major Incident Roles & Responsibilities
Role | Responsibilities |
Help Desk Staff | • Responsible for identifying and logging incidents as they are reported by users. • Capturing information which will help in the analysis of the issue. • Providing updates to customers where requested. • Escalate incidents to the appropriate technical teams or the Major Incident Manager as needed. |
Investigating Technical Teams | • Collaborate with other technical teams or 3rd party suppliers as necessary to resolve incidents. • Implement fixes, workarounds, or recovery actions to restore services. • Update the incident management system with the incident resolution progress and status. • Provide input to the Major Incident Manager on the incident status, impact, and expected resolution time. • Participate in post-incident reviews to identify areas for improvement and implement corrective actions. |
Major Incident Process | • Coordinate the overall major incident response and resolution process. • Engage and mobilize necessary resources, including technical teams and 3rd party suppliers. • Establish and maintain communication channels with stakeholders, including senior management and affected users. • Ensure timely and accurate updates are provided to stakeholders. • Monitor and track the progress of major incident resolution. • Facilitate post-incident reviews to identify areas for improvement and implement corrective actions. • Create Major Incident Report |
Comments