Skip to content

Add a metric for the timestamp of the latest successful backup/a way to check the failure or success of backups #1100

Open
@glowing-axolotl

Description

@glowing-axolotl

Is your feature request related to a problem? Please describe.
We wanted to monitor that all backups are correctly created and that they are recent enough. If no successfull backup was created in the last n days (for example due to a problem in a cronjob), an alert should fire on PMM.

Describe the solution you'd like
A possible solution would be to implement in this exporter a metric in the lines of backup_timestamp that gives us the backup's unix timestamp as a Gauge value.

Describe alternatives you've considered
You could just use a Grafana alert on the already existing PBM dashboard, but this has some issues:

  • Grafana's embedded sqlite doesn't scale well with multiple alerts configured (especially as metric types increase) so it's really only a temporary workaround
  • By querying Prometheus's ALERTS synthetic time series you won't get the alarm and therefore you won't get the related history in PMM
  • You cannot query this from an outside system to take action, since Grafana doesn't expose its alerts (AFAIK) in a readable way (and IMHO it shouldn't)
  • The Grafana dashboard actually runs Grafana-side transformations on the metric name. There is no value exposing the timestamp, which means additional processing is required on the Grafana server, and there is no way to parse metric names as strings in Prometheus.

Additional context
Here is how other projects implemented this check:

  • Velero/OADP: See velero_backup_last_status and velero_backup_last_successful_timestamp and velero_backup_partial_failure_total. A possible query in this case is increase(velero_backup_failure_total{schedule="schedulename"}[24h]) > 0 to check in the last 24 hours, as per OKD documentation.
  • borg_exporter: borg_last_backup_timestamp , where in theory you could then do an alert with time() - velero_backup_last_successful_timestamp > 86400.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions