-
Notifications
You must be signed in to change notification settings - Fork 823
add per-tenant alertmanager metrics #2124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add per-tenant alertmanager metrics #2124
Conversation
@pracucci I paired down the number of user metrics considerably. The ones that remain convey either essential basic information about the number of alerts/silences or help users avoid silent failures.
I also made sure to remove the unused functions I added to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left some comments around metrics code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with nit. I haven't taken a close look at the tests as I think Marco and Peter took a close look there.
@jtlisi Could you also resolve conversations that have been already implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtlisi Thanks for addressing my feedback. I'm a bit concerned about the "unpause" logic and wondering if there's any issue there. Please take a look at comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor nits.
(Btw, flaky test is now fixed on master)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving cause my previous concern about the "unpause" is not an issue. However I left few comments which I would be glad if you could address before merging. Thanks @jtlisi !
pkg/alertmanager/alertmanager.go
Outdated
return nil | ||
} | ||
|
||
func (am *Alertmanager) isActive() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't exposed, while Pause()
is? I think it should be specular to Pause()
and being exposed too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
pkg/alertmanager/multitenant.go
Outdated
userAM.Stop() | ||
delete(am.alertmanagers, user) | ||
// The user alertmanager is only paused in order to retain the prometheus metrics | ||
// it has reported to it's registry. If a new config for this user appears, this structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's
> its
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Moved from #2116 because integration tests don't work for Grafana org repo PRs due to
NOQUAY
being set as an environment variable.What this PR does:
This PR takes advantage of the
util.MetricFamiliesPerUser
struct to provide per-tenant Alertmanager metrics.Which issue(s) this PR fixes:
Fixes #1631
Checklist
CHANGELOG.md
updated