Configure default alerts
Default alerts are data source-managed alert rules that integrate with Grafana Alerting to trigger alerts based on the pass/fail result of a check run. They use low/medium/high sensitivity thresholds that are shared across all checks. If your check needs a specific threshold, use per-check-alerts.
Synthetic Monitoring includes 3 pre-configured rules:
- HighSensitivity: Fires an alert if 5% of a check’s probes fail for 5 minutes.
- MedSensitivity: Fires an alert if 10% of a check’s probes fail for 5 minutes.
- LowSensitivity: Fires an alert if 25% of a check’s probes fail for 5 minutes.
Create the pre-configured sensitivity alert rules
To create the default sensitivity alert rules:
- Navigate to Testing & synthetics > Synthetics > Alerts.
- If you have not already set up the default alert rules, click the Populate default alerts button.
- The
HighSensitivity
,MedSensitivity
, andLowSensitivity
rules are automatically generated. These rules query the probe success percentage and check options to decide whether to fire alerts.
You only have to do this once for your Grafana stack.
Enable pre-configured sensitivity alerts on a check
To configure a check to trigger an alert:
Navigate to Testing & synthetics > Synthetics > Checks.
Click New Check to create a new check or edit a preexisting check in the list.
Click the Alerting section to show the alerting fields.
Select a sensitivity level to associate with the check and click Save.
This sensitivity value is published to the
alert_sensitivity
label on thesm_check_info
metric each time the check runs on a probe. The default alerts use that label value to determine which checks to fire alerts for.
Checks that have enabled a sensitivity level trigger their corresponding alerts when the success percentage drops below their thresholds.
Configuring the alert sensitivity option allows you to set the sensitivity metric label value, which is used to determine whether to trigger sensitivity alerts.
How default alert rules work
The default alert rules are built with one recording rule and three alert rules, one for each sensitivity level: high
, medium
, and low
.
The recording rule (instance_job_severity:probe_success:mean5m)
queries the Prometheus check metrics, evaluating the success rate of the check and the alert_sensitivity
label. If alert_sensitivity
is defined, the recording rule saves the results as new precomputed metrics:
instance_job_severity:probe_success:mean5m{alert_sensitivity="high"}
instance_job_severity:probe_success:mean5m{alert_sensitivity="medium"}
instance_job_severity:probe_success:mean5m{alert_sensitivity="low"}
Then, each default alert rule queries its corresponding metric and evaluates it against its threshold to decide whether to fire the alert or not.
For example, if a check has the alert_sensitivity=high
, its success rate is evaluated and compared to its threshold (which defaults to 95%). If the success rate drops below the threshold, the alert rule enters a pending state. When the success rate remains below the threshold for the duration of the pending state (defaults to 5m
), the rule starts firing. For further details, refer to Alert Rule Evaluation.
You can edit the threshold values and the duration of the pending state, but you can’t edit the predefined alert_sensitivity
values.
Avoid alert-flapping
When enabling alerting for a check, it’s recommended to run that check from multiple locations, preferably three or more. That way, if there’s a problem with a single probe or the network connectivity from that single location, you won’t be needlessly alerted, as the other locations running the same check will continue to report their results alongside the problematic location.