Skip to content

Count over time Traceql query doubles the actual span count randomly #4257

Closed
@imjibi

Description

@imjibi

Describe the bug
When trying to count the number of spans using count_over_time the actual count gets doubled randomly. Query used was {} | count_over_time() by (name)

To Reproduce
Steps to reproduce the behavior:

  1. Use https://github.com/grafana/docker-otel-lgtm/tree/main and make sure it uses Tempo version 2.6.0
  2. Change backend to Azure blob storage and fill in required details. Mine was as follows,
  storage:
    trace:
      backend: azure
      azure:
        container_name: 'containername'
        storage_account_name: 'storageaccount'
        storage_account_key: 'key'
  1. Start generating traffic using curl command that can be found here. Make sure to send traffic without a delay (wanted to replicate our production scenario hence chose to skip delay) and I triggered traffic from command line as follows
while true
do
curl -s http://localhost:8081/rolldice
done
  1. After waiting for almost an hour, go to Grafana and choose Tempo data source and run {name="roll"} | count_over_time() by (name) . Please make sure step is set to 1m and time range is last 1 hour - this is really important as issue happens only when range is set to last 1 hour, if you choose an hour window in the past query works just fine. It will show a time series with a huge spike at a random point (if you don't see any spike at once just try hitting Run Query every minute ). Looking the value you could see that its exactly twice the actual value and the next minute the doubling behaviour moves to next point in time.
  2. The following snap shot show the buggy behaviour
    At 16:26 you could see that the count is 4.87K which is the actual value
    Image
    After a moment later the count is changed to 9.74K which exactly the double of the actual value.
    Image
    And then it continuously slides through the window.
    Actual Count at 16:27 is 4.73K
    Image
    Couple of minutes later it is 9.45K
    Image

Expected behavior
The count should n't get doubled when looking at the last 1 hour span count
Environment:

  • Infrastructure: k8s, docker-compose
  • Deployment tool: helm, docker

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions