Concurrent fetch of azure metricdefinitions and batchApi usage#41790
Conversation
|
This pull request is now in conflicts. Could you fix it? 🙏 |
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
|
|
Microsoft.DocumentDb/databaseAccounts (1 resource)resource type:
Activity:
# x-pack/metricbeat/modules.d/azure.yml
- module: azure
metricsets:
- database_account
enabled: true
period: 300s
client_id: '${AZURE_CLIENT_ID:""}'
client_secret: '${AZURE_CLIENT_SECRET:""}'
tenant_id: '${AZURE_TENANT_ID:""}'
subscription_id: '${AZURE_SUBSCRIPTION_ID:""}'
refresh_list_interval: 600s
UPDATE: I didn't build the right version, I'm re-testing 9.0.0 8.17.19.0.0
Issues(1) Timegrain for azure.database_account.create_account.count is emptyIn version 8.17.1, the timegrain for this field is PT5M. (2) The azure.database_account.service_availability.avg (timegrain PT1H) is missingVersion 9.0.0 always collects 7 documents with PT5M, while version 8.17.1 collect 7 documents PT5M + 1 document PT1H during the first iteration and again every 60 mins. Is 9.0.0 missing the PT1H document on the first iteration? Waiting for the next iteration to double-check. After 75 mins, no UPDATE: tested by @MichaelKatsoulis I managed to collect |
|
UPDATE: I built the wrong version, I'm re-testing 9.0.0 with Microsoft.DocumentDb/databaseAccounts (1 resource) and I'll update the previous comment. My apologies for the noise. |
Microsoft.KeyVault/vaults (10 resources)resource type:
Activity:
- module: azure
metricsets:
- monitor
enabled: true
period: 60s
client_id: '${AZURE_CLIENT_ID:""}'
client_secret: '${AZURE_CLIENT_SECRET:""}'
tenant_id: '${AZURE_TENANT_ID:""}'
subscription_id: '${AZURE_SUBSCRIPTION_ID:""}'
refresh_list_interval: 600s
resources:
- resource_query: "resourceType eq 'Microsoft.KeyVault/vaults'"
resource_group:
- "mbranca-az-scalability-kv-r10"
metrics:
- dimensions:
- name: ActivityType
value: '*'
- name: ActivityName
value: '*'
- name: StatusCode
value: '*'
- name: StatusCodeClass
value: '*'
ignore_unsupported: true
name:
- ServiceApiLatency
- Availability
- ServiceApiResult
namespace: Microsoft.KeyVault/vaults
timegrain: PT1M
- dimensions:
- name: ActivityType
value: '*'
- name: ActivityName
value: '*'
ignore_unsupported: true
name:
- ServiceApiHit
namespace: Microsoft.KeyVault/vaults
timegrain: PT1M
- dimensions:
- name: ActivityType
value: '*'
- name: ActivityName
value: '*'
- name: TransactionType
value: '*'
ignore_unsupported: true
name:
- SaturationShoebox
namespace: Microsoft.KeyVault/vaults
timegrain: PT1MNotes: When the key vaults are unused (like in this resource group), they only generates a subset of metrics:
8.17.1In progress. I can see the three metrics (Availability, API Hits, API Results), grouped in two documents. So 2 documents x 10 resources = 20 documents per iteration: 9.0.0In progress. First iterations are okay. I get the same number of documents (20) as 8.17.1 and same values. Still checking, but this case looks good. |
|
@MichaelKatsoulis, I found a couple of issues relate to timegrain in the Microsoft.DocumentDb/databaseAccounts (1 resource) test. |
Microsoft.ContainerRegistry/registries (1 resource)resource type:
Activity:
- module: azure
metricsets:
- container_registry
enabled: true
period: 300s
client_id: '${AZURE_CLIENT_ID:""}'
client_secret: '${AZURE_CLIENT_SECRET:""}'
tenant_id: '${AZURE_TENANT_ID:""}'
subscription_id: '${AZURE_SUBSCRIPTION_ID:""}'
refresh_list_interval: 600sSince we had issue with PT1H metrics, I tried another metricset with this timegrain. 8.17.1After one iteration, 8.17.1 collected:
9.0.0After one iteration, 8.17.1 collected:
Conclusion✅ With the recent code changes 8.17.1 and 9.0.0 yield the same outcome. Metrics docs |
…-azure-metricdefinitions
zmoog
left a comment
There was a problem hiding this comment.
The performance gains from the new sdk/monitor/query/azmetrics package with batch API are extremely compelling.
I would love to simplify the internal structure, but I am also okay with going with the PR as-is, collect customer feedback, and switch to the batch API in the next release.
I added a few non-blocking comments for things we may want to address before merging.
| _boolean_ | ||
| Optional, by default is set to False. Set this to True when facing scalability issues. When configured, the azure batch api will be used | ||
| to fetch metrics of multiple resources in one api call. | ||
| Currently supported metricsets are monitor, container_registry, container_instance, container_service, compute_vm, compute_vm_scaleset, database_account and storage. |
There was a problem hiding this comment.
Can we also add storage to the list, or remove it because the metricset supports all the metricsets, right?
There was a problem hiding this comment.
Isn't storage in this list?
…d batchApi usage (#43923) * Concurrent fetch of azure metricdefinitions and batchApi usage (#41790) * Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum * Resolve conflicts --------- Co-authored-by: Michalis Katsoulis <michaelkatsoulis88@gmail.com>
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
…nd batchApi usage (#44243) * Concurrent fetch of azure metricdefinitions and batchApi usage (#41790) * Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc * Resolve conflicts --------- Co-authored-by: Michalis Katsoulis <michaelkatsoulis88@gmail.com>
…nd batchApi usage (#44241) The changes affect azure monitor and relevant metricsets. The list of metricsets affected are: - `monitor` - `container_registry` - `container_instance` - `container_service` - `compute_vm` - `compute_vm_scaleset` - `database_account` - `storage_account` A new configuration parameter is introduced `enable_batch_api` of type boolean. If set to `false`(default) nothing changes in the way the metrics are collected for these metricsets. If set to `true`: - The metric definitions of resources are collected asynchronously and write the results in a channel. - The channel is read and when the number of definitions collected reach 50 (batch API limit) - The metrics definitions are grouped based on criteria(1) and the azure BatchAPI is used to retrieve metrics of multiple resources with one api call. 1. Grouping criteria are - Namespace - SubscriptionID - Location - Names - TimeGrain - Dimensions
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
…nd batchApi usage (#44242) The changes affect azure monitor and relevant metricsets. The list of metricsets affected are: - `monitor` - `container_registry` - `container_instance` - `container_service` - `compute_vm` - `compute_vm_scaleset` - `database_account` - `storage_account` A new configuration parameter is introduced `enable_batch_api` of type boolean. If set to `false`(default) nothing changes in the way the metrics are collected for these metricsets. If set to `true`: - The metric definitions of resources are collected asynchronously and write the results in a channel. - The channel is read and when the number of definitions collected reach 50 (batch API limit) - The metrics definitions are grouped based on criteria(1) and the azure BatchAPI is used to retrieve metrics of multiple resources with one api call. 1. Grouping criteria are - Namespace - SubscriptionID - Location - Names - TimeGrain - Dimensions
| return fmt.Errorf("no resources were found based on all the configurations options entered") | ||
| } | ||
|
|
||
| metricStores := make(map[ResDefGroupingCriteria]*MetricStore) |
There was a problem hiding this comment.
metricStores only work on a single goroutine, so no mutex is needed?
| groupedMetrics := map[ResDefGroupingCriteria][]Metric{ | ||
| criteria: store.GetMetrics(), | ||
| } | ||
| metricValues := client.GetMetricsInBatch(groupedMetrics, referenceTime, report) |
There was a problem hiding this comment.
In another points, if MetricStore need mutex, here should have an bug when another goroutine AddMetric into store.
bug for array copy will not share the resize action, see playground
But for now, MetricStore is only work in single goroutine, so it work well for now, right?






The changes affect azure monitor and relevant metricsets. The list of metricsets affected are:
monitorcontainer_registrycontainer_instancecontainer_servicecompute_vmcompute_vm_scalesetdatabase_accountstorage_accountA new configuration parameter is introduced
enable_batch_apiof type boolean.If set to
false(default) nothing changes in the way the metrics are collected for these metricsets.If set to
true:metrics of multiple resources with one api call.
Proposed commit message
Checklist
CHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.Disruptive User Impact
Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs