On-call schedules
Use on-call schedules in Grafana IRM to establish automatic rotation management and alert escalation. With on-call schedules, you can:
- Ensure consistent coverage for alerts and incidents
- Distribute on-call responsibilities across your team
- Define custom rotation patterns that match your team’s needs
- Manage schedule changes and temporary coverage adjustments
- Monitor and improve schedule quality
Key concepts
Before creating schedules in Grafana IRM, understand the following key components:
- Rotations: Recurring patterns that determine when team members are on-call, such as weekly or monthly cycles
- Shifts: Specific time periods when a team member is responsible for responding to incidents and alerts
- Schedule layers: Multiple concurrent rotation patterns that define different response priorities (a higher-level layer overrides a lower tier rotation)
- Time zones: Settings that ensure accurate schedule display and management for distributed teams across regions
- Overrides: Temporary schedule modifications to accommodate time off, shift swaps, or special coverage needs
Schedule types
Grafana IRM offers three flexible ways to manage on-call schedules:
IRM app managed schedules
Create and manage schedules directly through the Grafana IRM web UI:
- Design custom rotation patterns that match your team’s needs
- Configure multiple rotation layers
- Preview and validate schedule coverage in real-time
- Handle overrides and time zone adjustments
To learn more, refer to Create on-call schedules.
Calendar import (iCal)
Import schedules from calendar applications like Google Calendar:
- Manage rotations using familiar calendar tools
- Import schedules via iCal URLs
- Automatically sync schedule changes
- Set up override calendars for temporary coverage adjustments
- Support multiple assignees and priority levels
- View and monitor schedules through the IRM schedule interface
To learn more, refer to Import schedules.
Infrastructure as code
Manage schedules programmatically through Terraform and version control:
- Define schedules as code
- Track schedule changes in version control
- Automate schedule creation and updates
- Integrate with CI/CD pipelines
- Ensure consistent schedule configuration
- Scale schedule management across teams
To learn more, refer to Schedules as code.
Schedule quality
Monitor and improve your schedule effectiveness:
Quality metrics
- Coverage gaps
- Distribution balance
- User workload
- Schedule predictability
Quality score
The schedule quality score (0-100) helps you assess and improve your schedules:
Score | Rating | Description |
---|---|---|
81-100 | Great | Well-balanced, consistent coverage |
61-80 | Good | Minor improvements possible |
41-60 | Medium | Some gaps or imbalances |
21-40 | Low | Significant improvements needed |
0-20 | Poor | Major coverage issues |
Note
If the schedule includes users which were deleted or users without the required permissions to be on-call, that will also affect the quality score.
Combine schedules and escalation levels
By defining an escalation chain including several “Notify schedule” steps with a “Wait” step between them, you can organize different levels of on-call responsibility:
Primary level
- First line of defense for incident response
- Handles initial incident assessment and resolution
- Typically staffed by team members most familiar with the system
Secondary level
- Provides backup support when primary cannot respond
- Activated if primary doesn’t acknowledge within set timeframe
- Often includes more experienced team members or specialists
Tertiary level
- Final escalation point for critical incidents
- Ensures coverage when primary and secondary are unavailable
- May include senior team members or management
Benefits of layered schedules
- Distributes on-call workload across teams
- Provides clear escalation paths
- Ensures continuous coverage for critical systems
- Allows for specialized response teams