Reduce imported opts queries by ukutaht · Pull Request #6106 · plausible/analytics

ukutaht · 2026-02-26T19:40:58Z

Currently a call to Query.build will result in reduntant Repo.preload(site, :completed_imports) calls:

2 times for queries with no comparisons
3 times for queries with comparisons

This PR makes two changes to preloading completed_imports:

Do not preload at all if we are skipping imports anyways due to :unsupported_interval or :unsupported_query. The biggest win here is that for daily stats (interval == "hour") we will do 0 preloads as opposed to 2 or 3 postgres roundtrips.
If preloading is necessary, hoist the preload a bit higher in the call tree so it does not need to be fetched multiple times

/sites sparklines

I looked into this because the separate sparkline graph queries on /sites currently result in 5 identical preloads per site card: 3 preloads for query_24h_stats and 2 for query_24h_intervals. With default page_size of 24 this will result in 120 postgres queries per page load when one per site would suffice.

With this PR, this will cut it down from 5 identical redundant queries per site to 0 because with interval == "hour" the preloads are not needed. For longer time ranges in works by @aerosol, it will be 2 queries per site. It would be possible to cut it down to 1 per site at the cost of some complexity. For that the Sparklines.overview_24h function would have to figure out whether imports are supported for the query and if so, run the preload before calling query_24h_stats and query_24h_intervals.

However, this would require exposing some of the Query internals and I felt like it isn't worth it at the moment. One duplicated (surely cached on postgres side) query per site is not terrible.

aerosol · 2026-02-27T07:51:45Z

Interesting, I like where this is going but I find the query building code a bit difficult to follow, how did you determine the number of preloads in your assessment?

I understand that for "time:minute" and "time:hour" we don't need them at all, but otherwise aren't preloads idempotent at Ecto level?

aerosol · 2026-02-27T07:54:35Z

Another minor optimization we could also do there is, we don't need to query for :visitors, :visits, :pageviews, :views_per_visit for regular site cards, just :visitors should be enough (as opposed to consolidated views). But that can be done by the way of building those more complex queries at varying ranges.

ukutaht · 2026-02-27T11:06:53Z

how did you determine the number of preloads in your assessment?

I noticed that loading the /sites page resulted in a lot of duplicate DB queries in local server logs but it was pretty hard to parse out what was going on. I captured the logs from a single page refresh and asked claude to figure out what's going on. My previous PR was found, this PR is the second issue.

Manually verified with logs of a single sparkline request.

master:

iex(3)> Plausible.Stats.Sparkline.overview_24h(site)
12:38:08.454 [debug] QUERY OK source="teams" db=2.2ms idle=1965.2ms
SELECT t0."id", t0."identifier", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."locked", t0."setup_complete", t0."setup_at", t0."hourly_api_request_limit", t0."notes", t0."policy", t0."grace_period", t0."inserted_at", t0."updated_at", t0."id" FROM "teams" AS t0 WHERE (t0."id" = $1) [1]
↳ Plausible.Props.allowed_for/2, at: lib/plausible/props.ex:60
12:38:08.455 [debug] QUERY OK source="subscriptions" db=0.9ms idle=1968.5ms
SELECT s0."id", s0."paddle_subscription_id", s0."paddle_plan_id", s0."update_url", s0."cancel_url", s0."status", s0."next_bill_amount", s0."next_bill_date", s0."last_bill_date", s0."currency_code", s0."team_id", s0."inserted_at", s0."updated_at", s0."team_id" FROM "subscriptions" AS s0 WHERE (s0."team_id" = $1) ORDER BY s0."inserted_at" DESC, s0."id" DESC LIMIT 1 [1]
↳ Plausible.Teams.Billing.allowed_features_for/1, at: lib/plausible/teams/billing.ex:677
12:38:08.461 [debug] QUERY OK source="site_imports" db=1.4ms idle=1973.4ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Imported.get_completed_imports/1, at: lib/plausible/imported.ex:209
12:38:08.462 [debug] QUERY OK source="site_imports" db=0.4ms idle=1975.0ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Imported.get_completed_imports/1, at: lib/plausible/imported.ex:209
12:38:08.462 [debug] QUERY OK source="site_imports" db=0.3ms idle=1975.7ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Imported.get_completed_imports/1, at: lib/plausible/imported.ex:209
12:38:08.497 [debug] QUERY OK db=33.8ms idle=1968.4ms
SELECT s0."pageviews",s0."visitors",s0."visits",s0."pageviews" AS "pageviews",s0."visitors" AS "visitors",s0."visits" AS "visits",s1."views_per_visit" AS "views_per_visit" FROM (SELECT toUInt64(round(countIf(se0."name" = 'pageview') * any(_sample_factor))) AS "pageviews",toUInt64(round(uniq(se0."user_id") * any(_sample_factor))) AS "visitors",toUInt64(round(uniq(se0."session_id") * any(_sample_factor))) AS "visits" FROM "events_v2" AS se0 WHERE ((se0."site_id" = {$0:Int64}) AND (se0."timestamp" >= {$1:DateTime}) AND (se0."timestamp" <= {$2:DateTime}))) AS s0 LEFT JOIN (SELECT greatest(ifNotFinite(round(sum(ss0."sign" * ss0."pageviews") / sum(ss0."sign"), 2), 0), 0) AS "views_per_visit",toUInt32(greatest(sum(sign), 0)) AS "__internal_visits" FROM "sessions_v2" AS ss0 WHERE ((ss0."site_id" = {$3:Int64}) AND (ss0."start" >= {$4:DateTime}) AND (ss0."timestamp" >= {$5:DateTime}) AND (ss0."start" <= {$6:DateTime}))) AS s1 ON 1 ORDER BY "visitors" DESC [1, ~N[2026-02-26 10:38:08], ~N[2026-02-27 10:38:08], 1, ~N[2026-02-19 10:38:08], ~N[2026-02-26 10:38:08], ~N[2026-02-27 10:38:08]]
↳ Plausible.Stats.QueryRunner.execute_main_query/1, at: lib/plausible/stats/query_runner.ex:51
12:38:08.509 [debug] QUERY OK db=11.4ms idle=2.4ms
SELECT s0."pageviews",s0."visitors",s0."visits",s0."pageviews" AS "pageviews",s0."visitors" AS "visitors",s0."visits" AS "visits",s1."views_per_visit" AS "views_per_visit" FROM (SELECT toUInt64(round(countIf(se0."name" = 'pageview') * any(_sample_factor))) AS "pageviews",toUInt64(round(uniq(se0."user_id") * any(_sample_factor))) AS "visitors",toUInt64(round(uniq(se0."session_id") * any(_sample_factor))) AS "visits" FROM "events_v2" AS se0 WHERE ((se0."site_id" = {$0:Int64}) AND (se0."timestamp" >= {$1:DateTime}) AND (se0."timestamp" <= {$2:DateTime}))) AS s0 LEFT JOIN (SELECT greatest(ifNotFinite(round(sum(ss0."sign" * ss0."pageviews") / sum(ss0."sign"), 2), 0), 0) AS "views_per_visit",toUInt32(greatest(sum(sign), 0)) AS "__internal_visits" FROM "sessions_v2" AS ss0 WHERE ((ss0."site_id" = {$3:Int64}) AND (ss0."start" >= {$4:DateTime}) AND (ss0."timestamp" >= {$5:DateTime}) AND (ss0."start" <= {$6:DateTime}))) AS s1 ON 1 ORDER BY "visitors" DESC [1, ~N[2026-02-25 10:38:08], ~N[2026-02-26 10:38:08], 1, ~N[2026-02-18 10:38:08], ~N[2026-02-25 10:38:08], ~N[2026-02-26 10:38:08]]
↳ Plausible.Stats.QueryRunner.execute_comparison_query/1, at: lib/plausible/stats/query_runner.ex:81
12:38:08.513 [debug] QUERY OK source="teams" db=0.5ms idle=1033.0ms
SELECT t0."id", t0."identifier", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."locked", t0."setup_complete", t0."setup_at", t0."hourly_api_request_limit", t0."notes", t0."policy", t0."grace_period", t0."inserted_at", t0."updated_at", t0."id" FROM "teams" AS t0 WHERE (t0."id" = $1) [1]
↳ Plausible.Props.allowed_for/2, at: lib/plausible/props.ex:60
12:38:08.515 [debug] QUERY OK source="subscriptions" db=1.0ms idle=1033.9ms
SELECT s0."id", s0."paddle_subscription_id", s0."paddle_plan_id", s0."update_url", s0."cancel_url", s0."status", s0."next_bill_amount", s0."next_bill_date", s0."last_bill_date", s0."currency_code", s0."team_id", s0."inserted_at", s0."updated_at", s0."team_id" FROM "subscriptions" AS s0 WHERE (s0."team_id" = $1) ORDER BY s0."inserted_at" DESC, s0."id" DESC LIMIT 1 [1]
↳ Plausible.Teams.Billing.allowed_features_for/1, at: lib/plausible/teams/billing.ex:677
12:38:08.516 [debug] QUERY OK source="site_imports" db=0.7ms idle=868.1ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Imported.get_completed_imports/1, at: lib/plausible/imported.ex:209
12:38:08.516 [debug] QUERY OK source="site_imports" db=0.6ms idle=62.3ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Imported.get_completed_imports/1, at: lib/plausible/imported.ex:209
12:38:08.524 [debug] QUERY OK source="sessions_v2" db=7.0ms idle=21.6ms
SELECT toUInt64(round(uniq(s0."user_id") * any(_sample_factor))) AS "visitors",toStartOfHour(f1) AS "time" FROM "sessions_v2" AS s0 ARRAY JOIN timeSlots(toTimeZone(s0."start", {$0:String}), toUInt32(timeDiff(s0."start", s0."timestamp")), toUInt32({$1:Int64})) AS f1 WHERE ((s0."site_id" = {$2:Int64}) AND (s0."start" >= {$3:DateTime}) AND (s0."timestamp" >= {$4:DateTime}) AND (s0."start" <= {$5:DateTime})) GROUP BY "time" ORDER BY "time" ["Etc/UTC", 900, 1, ~N[2026-02-19 10:38:08], ~N[2026-02-26 10:38:08], ~N[2026-02-27 10:38:08]]
↳ Plausible.Stats.QueryRunner.execute_main_query/1, at: lib/plausible/stats/query_runner.ex:51

Note that there are 5 queries to site_imports

this branch:

iex(4)> Plausible.Stats.Sparkline.overview_24h(site)
12:28:40.530 [debug] QUERY OK source="teams" db=3.1ms idle=1294.5ms
SELECT t0."id", t0."identifier", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."locked", t0."setup_complete", t0."setup_at", t0."hourly_api_request_limit", t0."notes", t0."policy", t0."grace_period", t0."inserted_at", t0."updated_at", t0."id" FROM "teams" AS t0 WHERE (t0."id" = $1) [1]
↳ Plausible.Props.allowed_for/2, at: lib/plausible/props.ex:60
12:28:40.532 [debug] QUERY OK source="subscriptions" db=0.8ms idle=1298.6ms
SELECT s0."id", s0."paddle_subscription_id", s0."paddle_plan_id", s0."update_url", s0."cancel_url", s0."status", s0."next_bill_amount", s0."next_bill_date", s0."last_bill_date", s0."currency_code", s0."team_id", s0."inserted_at", s0."updated_at", s0."team_id" FROM "subscriptions" AS s0 WHERE (s0."team_id" = $1) ORDER BY s0."inserted_at" DESC, s0."id" DESC LIMIT 1 [1]
↳ Plausible.Teams.Billing.allowed_features_for/1, at: lib/plausible/teams/billing.ex:677
12:28:40.532 [debug] QUERY OK source="site_imports" db=0.5ms idle=1299.6ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Stats.Query.put_imported_opts/2, at: lib/plausible/stats/query.ex:155
12:28:40.558 [debug] QUERY OK db=24.7ms idle=1316.4ms
SELECT s0."visitors",s0."pageviews",s0."visits",s0."pageviews" AS "pageviews",s0."visitors" AS "visitors",s0."visits" AS "visits",s1."views_per_visit" AS "views_per_visit" FROM (SELECT toUInt64(round(uniq(se0."user_id") * any(_sample_factor))) AS "visitors",toUInt64(round(countIf(se0."name" = 'pageview') * any(_sample_factor))) AS "pageviews",toUInt64(round(uniq(se0."session_id") * any(_sample_factor))) AS "visits" FROM "events_v2" AS se0 WHERE ((se0."site_id" = {$0:Int64}) AND (se0."timestamp" >= {$1:DateTime}) AND (se0."timestamp" <= {$2:DateTime}))) AS s0 LEFT JOIN (SELECT greatest(ifNotFinite(round(sum(ss0."sign" * ss0."pageviews") / sum(ss0."sign"), 2), 0), 0) AS "views_per_visit",toUInt32(greatest(sum(sign), 0)) AS "__internal_visits" FROM "sessions_v2" AS ss0 WHERE ((ss0."site_id" = {$3:Int64}) AND (ss0."start" >= {$4:DateTime}) AND (ss0."timestamp" >= {$5:DateTime}) AND (ss0."start" <= {$6:DateTime}))) AS s1 ON 1 ORDER BY "visitors" DESC [1, ~N[2026-02-26 10:28:40], ~N[2026-02-27 10:28:40], 1, ~N[2026-02-19 10:28:40], ~N[2026-02-26 10:28:40], ~N[2026-02-27 10:28:40]]
↳ Plausible.Stats.QueryRunner.execute_main_query/1, at: lib/plausible/stats/query_runner.ex:51
12:28:40.577 [debug] QUERY OK db=19.0ms idle=1341.8ms
SELECT s0."visitors",s0."pageviews",s0."visits",s0."pageviews" AS "pageviews",s0."visitors" AS "visitors",s0."visits" AS "visits",s1."views_per_visit" AS "views_per_visit" FROM (SELECT toUInt64(round(uniq(se0."user_id") * any(_sample_factor))) AS "visitors",toUInt64(round(countIf(se0."name" = 'pageview') * any(_sample_factor))) AS "pageviews",toUInt64(round(uniq(se0."session_id") * any(_sample_factor))) AS "visits" FROM "events_v2" AS se0 WHERE ((se0."site_id" = {$0:Int64}) AND (se0."timestamp" >= {$1:DateTime}) AND (se0."timestamp" <= {$2:DateTime}))) AS s0 LEFT JOIN (SELECT greatest(ifNotFinite(round(sum(ss0."sign" * ss0."pageviews") / sum(ss0."sign"), 2), 0), 0) AS "views_per_visit",toUInt32(greatest(sum(sign), 0)) AS "__internal_visits" FROM "sessions_v2" AS ss0 WHERE ((ss0."site_id" = {$3:Int64}) AND (ss0."start" >= {$4:DateTime}) AND (ss0."timestamp" >= {$5:DateTime}) AND (ss0."start" <= {$6:DateTime}))) AS s1 ON 1 ORDER BY "visitors" DESC [1, ~N[2026-02-25 10:28:40], ~N[2026-02-26 10:28:40], 1, ~N[2026-02-18 10:28:40], ~N[2026-02-25 10:28:40], ~N[2026-02-26 10:28:40]]
↳ Plausible.Stats.QueryRunner.execute_comparison_query/1, at: lib/plausible/stats/query_runner.ex:81
12:28:40.578 [debug] QUERY OK source="teams" db=0.6ms idle=1345.5ms
SELECT t0."id", t0."identifier", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."locked", t0."setup_complete", t0."setup_at", t0."hourly_api_request_limit", t0."notes", t0."policy", t0."grace_period", t0."inserted_at", t0."updated_at", t0."id" FROM "teams" AS t0 WHERE (t0."id" = $1) [1]
↳ Plausible.Props.allowed_for/2, at: lib/plausible/props.ex:60
12:28:40.579 [debug] QUERY OK source="subscriptions" db=0.4ms idle=1346.6ms
SELECT s0."id", s0."paddle_subscription_id", s0."paddle_plan_id", s0."update_url", s0."cancel_url", s0."status", s0."next_bill_amount", s0."next_bill_date", s0."last_bill_date", s0."currency_code", s0."team_id", s0."inserted_at", s0."updated_at", s0."team_id" FROM "subscriptions" AS s0 WHERE (s0."team_id" = $1) ORDER BY s0."inserted_at" DESC, s0."id" DESC LIMIT 1 [1]
↳ Plausible.Teams.Billing.allowed_features_for/1, at: lib/plausible/teams/billing.ex:677
12:28:40.587 [debug] QUERY OK source="sessions_v2" db=6.8ms idle=1363.5ms
SELECT toUInt64(round(uniq(s0."user_id") * any(_sample_factor))) AS "visitors",toStartOfHour(f1) AS "time" FROM "sessions_v2" AS s0 ARRAY JOIN timeSlots(toTimeZone(s0."start", {$0:String}), toUInt32(timeDiff(s0."start", s0."timestamp")), toUInt32({$1:Int64})) AS f1 WHERE ((s0."site_id" = {$2:Int64}) AND (s0."start" >= {$3:DateTime}) AND (s0."timestamp" >= {$4:DateTime}) AND (s0."start" <= {$5:DateTime})) GROUP BY "time" ORDER BY "time" ["Etc/UTC", 900, 1, ~N[2026-02-19 10:28:40], ~N[2026-02-26 10:28:40], ~N[2026-02-27 10:28:40]]
↳ Plausible.Stats.QueryRunner.execute_main_query/1, at: lib/plausible/stats/query_runner.ex:51

There is 1 query to site_imports. In my PR description I said it will be 0 but I overlooked the fact that the query_24h_stats does not use an interval so for that it will still make one query to site_imports.

aren't preloads idempotent at Ecto level?

They are but only in the case that the result is stored and re-used. We often don't. For example:

site = ...
Repo.preload(site, :completed_imports) # Fires DB query. Returns site with completed_imports preloaded
Repo.preload(site, :completed_imports) # Fires DB query again because the preload from last line was discarded
site = Repo.preload(site, :completed_imports) # Preload completed imports and store the site with preloaded data in variable
Repo.preload(site, :completed_imports) # Does not fire DB query since the site variable now has preloaded data

Another minor optimization we could also do there is, we don't need to query for :visitors, :visits, :pageviews, :views_per_visit for regular site cards

Nice! I hadn't considered that. I don't think it's that minor because it means we'll avoid JOINing with sessions in clickhouse queries for site cards. That would be a significant win.

aerosol · 2026-02-27T18:54:42Z

Nice! I hadn't considered that. I don't think it's that minor because it means we'll avoid JOINing with sessions in clickhouse queries for site cards. That would be a significant win.

We can include it in this PR if you like via #6109

ukutaht added 2 commits February 26, 2026 20:58

Reduce database queries for imported_opts in sparkline graphs

6a1f047

Conditionally preload imports in query builder

fcc8ec6

ukutaht requested review from a team and RobertJoonas February 26, 2026 19:40

Extract Imported.schema_supports_interval?/1

24947d9

ukutaht force-pushed the reduce-imported-opts-queries branch from b8b2af8 to 24947d9 Compare February 26, 2026 20:01

ukutaht marked this pull request as draft February 26, 2026 20:12

Fix test failures

c634347

ukutaht marked this pull request as ready for review February 27, 2026 01:14

aerosol mentioned this pull request Feb 27, 2026

Don't join sessions for regular site cards #6109

Open

aerosol approved these changes Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce imported opts queries#6106

Reduce imported opts queries#6106
ukutaht wants to merge 4 commits intomasterfrom
reduce-imported-opts-queries

ukutaht commented Feb 26, 2026

aerosol commented Feb 27, 2026

aerosol commented Feb 27, 2026

ukutaht commented Feb 27, 2026 •

edited

Loading

aerosol commented Feb 27, 2026

Labels

2 participants

Uh oh!

Conversation

ukutaht commented Feb 26, 2026

/sites sparklines

aerosol commented Feb 27, 2026

aerosol commented Feb 27, 2026

ukutaht commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

aerosol commented Feb 27, 2026

Labels

2 participants

ukutaht commented Feb 27, 2026 •

edited

Loading