Skip to content

Reduce imported opts queries#6106

Open
ukutaht wants to merge 4 commits intomasterfrom
reduce-imported-opts-queries
Open

Reduce imported opts queries#6106
ukutaht wants to merge 4 commits intomasterfrom
reduce-imported-opts-queries

Conversation

@ukutaht
Copy link
Contributor

@ukutaht ukutaht commented Feb 26, 2026

Currently a call to Query.build will result in reduntant Repo.preload(site, :completed_imports) calls:

  • 2 times for queries with no comparisons
  • 3 times for queries with comparisons

This PR makes two changes to preloading completed_imports:

  1. Do not preload at all if we are skipping imports anyways due to :unsupported_interval or :unsupported_query. The biggest win here is that for daily stats (interval == "hour") we will do 0 preloads as opposed to 2 or 3 postgres roundtrips.
  2. If preloading is necessary, hoist the preload a bit higher in the call tree so it does not need to be fetched multiple times

/sites sparklines

I looked into this because the separate sparkline graph queries on /sites currently result in 5 identical preloads per site card: 3 preloads for query_24h_stats and 2 for query_24h_intervals. With default page_size of 24 this will result in 120 postgres queries per page load when one per site would suffice.

With this PR, this will cut it down from 5 identical redundant queries per site to 0 because with interval == "hour" the preloads are not needed. For longer time ranges in works by @aerosol, it will be 2 queries per site. It would be possible to cut it down to 1 per site at the cost of some complexity. For that the Sparklines.overview_24h function would have to figure out whether imports are supported for the query and if so, run the preload before calling query_24h_stats and query_24h_intervals.

However, this would require exposing some of the Query internals and I felt like it isn't worth it at the moment. One duplicated (surely cached on postgres side) query per site is not terrible.

@ukutaht ukutaht requested review from a team and RobertJoonas February 26, 2026 19:40
@ukutaht ukutaht force-pushed the reduce-imported-opts-queries branch from b8b2af8 to 24947d9 Compare February 26, 2026 20:01
@ukutaht ukutaht marked this pull request as draft February 26, 2026 20:12
@ukutaht ukutaht marked this pull request as ready for review February 27, 2026 01:14
@aerosol
Copy link
Member

aerosol commented Feb 27, 2026

Interesting, I like where this is going but I find the query building code a bit difficult to follow, how did you determine the number of preloads in your assessment?

I understand that for "time:minute" and "time:hour" we don't need them at all, but otherwise aren't preloads idempotent at Ecto level?

@aerosol
Copy link
Member

aerosol commented Feb 27, 2026

Another minor optimization we could also do there is, we don't need to query for :visitors, :visits, :pageviews, :views_per_visit for regular site cards, just :visitors should be enough (as opposed to consolidated views). But that can be done by the way of building those more complex queries at varying ranges.

@ukutaht
Copy link
Contributor Author

ukutaht commented Feb 27, 2026

how did you determine the number of preloads in your assessment?

I noticed that loading the /sites page resulted in a lot of duplicate DB queries in local server logs but it was pretty hard to parse out what was going on. I captured the logs from a single page refresh and asked claude to figure out what's going on. My previous PR was found, this PR is the second issue.

Manually verified with logs of a single sparkline request.

master:

iex(3)> Plausible.Stats.Sparkline.overview_24h(site)
12:38:08.454 [debug] QUERY OK source="teams" db=2.2ms idle=1965.2ms
SELECT t0."id", t0."identifier", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."locked", t0."setup_complete", t0."setup_at", t0."hourly_api_request_limit", t0."notes", t0."policy", t0."grace_period", t0."inserted_at", t0."updated_at", t0."id" FROM "teams" AS t0 WHERE (t0."id" = $1) [1]
↳ Plausible.Props.allowed_for/2, at: lib/plausible/props.ex:60
12:38:08.455 [debug] QUERY OK source="subscriptions" db=0.9ms idle=1968.5ms
SELECT s0."id", s0."paddle_subscription_id", s0."paddle_plan_id", s0."update_url", s0."cancel_url", s0."status", s0."next_bill_amount", s0."next_bill_date", s0."last_bill_date", s0."currency_code", s0."team_id", s0."inserted_at", s0."updated_at", s0."team_id" FROM "subscriptions" AS s0 WHERE (s0."team_id" = $1) ORDER BY s0."inserted_at" DESC, s0."id" DESC LIMIT 1 [1]
↳ Plausible.Teams.Billing.allowed_features_for/1, at: lib/plausible/teams/billing.ex:677
12:38:08.461 [debug] QUERY OK source="site_imports" db=1.4ms idle=1973.4ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Imported.get_completed_imports/1, at: lib/plausible/imported.ex:209
12:38:08.462 [debug] QUERY OK source="site_imports" db=0.4ms idle=1975.0ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Imported.get_completed_imports/1, at: lib/plausible/imported.ex:209
12:38:08.462 [debug] QUERY OK source="site_imports" db=0.3ms idle=1975.7ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Imported.get_completed_imports/1, at: lib/plausible/imported.ex:209
12:38:08.497 [debug] QUERY OK db=33.8ms idle=1968.4ms
SELECT s0."pageviews",s0."visitors",s0."visits",s0."pageviews" AS "pageviews",s0."visitors" AS "visitors",s0."visits" AS "visits",s1."views_per_visit" AS "views_per_visit" FROM (SELECT toUInt64(round(countIf(se0."name" = 'pageview') * any(_sample_factor))) AS "pageviews",toUInt64(round(uniq(se0."user_id") * any(_sample_factor))) AS "visitors",toUInt64(round(uniq(se0."session_id") * any(_sample_factor))) AS "visits" FROM "events_v2" AS se0 WHERE ((se0."site_id" = {$0:Int64}) AND (se0."timestamp" >= {$1:DateTime}) AND (se0."timestamp" <= {$2:DateTime}))) AS s0 LEFT JOIN (SELECT greatest(ifNotFinite(round(sum(ss0."sign" * ss0."pageviews") / sum(ss0."sign"), 2), 0), 0) AS "views_per_visit",toUInt32(greatest(sum(sign), 0)) AS "__internal_visits" FROM "sessions_v2" AS ss0 WHERE ((ss0."site_id" = {$3:Int64}) AND (ss0."start" >= {$4:DateTime}) AND (ss0."timestamp" >= {$5:DateTime}) AND (ss0."start" <= {$6:DateTime}))) AS s1 ON 1 ORDER BY "visitors" DESC [1, ~N[2026-02-26 10:38:08], ~N[2026-02-27 10:38:08], 1, ~N[2026-02-19 10:38:08], ~N[2026-02-26 10:38:08], ~N[2026-02-27 10:38:08]]
↳ Plausible.Stats.QueryRunner.execute_main_query/1, at: lib/plausible/stats/query_runner.ex:51
12:38:08.509 [debug] QUERY OK db=11.4ms idle=2.4ms
SELECT s0."pageviews",s0."visitors",s0."visits",s0."pageviews" AS "pageviews",s0."visitors" AS "visitors",s0."visits" AS "visits",s1."views_per_visit" AS "views_per_visit" FROM (SELECT toUInt64(round(countIf(se0."name" = 'pageview') * any(_sample_factor))) AS "pageviews",toUInt64(round(uniq(se0."user_id") * any(_sample_factor))) AS "visitors",toUInt64(round(uniq(se0."session_id") * any(_sample_factor))) AS "visits" FROM "events_v2" AS se0 WHERE ((se0."site_id" = {$0:Int64}) AND (se0."timestamp" >= {$1:DateTime}) AND (se0."timestamp" <= {$2:DateTime}))) AS s0 LEFT JOIN (SELECT greatest(ifNotFinite(round(sum(ss0."sign" * ss0."pageviews") / sum(ss0."sign"), 2), 0), 0) AS "views_per_visit",toUInt32(greatest(sum(sign), 0)) AS "__internal_visits" FROM "sessions_v2" AS ss0 WHERE ((ss0."site_id" = {$3:Int64}) AND (ss0."start" >= {$4:DateTime}) AND (ss0."timestamp" >= {$5:DateTime}) AND (ss0."start" <= {$6:DateTime}))) AS s1 ON 1 ORDER BY "visitors" DESC [1, ~N[2026-02-25 10:38:08], ~N[2026-02-26 10:38:08], 1, ~N[2026-02-18 10:38:08], ~N[2026-02-25 10:38:08], ~N[2026-02-26 10:38:08]]
↳ Plausible.Stats.QueryRunner.execute_comparison_query/1, at: lib/plausible/stats/query_runner.ex:81
12:38:08.513 [debug] QUERY OK source="teams" db=0.5ms idle=1033.0ms
SELECT t0."id", t0."identifier", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."locked", t0."setup_complete", t0."setup_at", t0."hourly_api_request_limit", t0."notes", t0."policy", t0."grace_period", t0."inserted_at", t0."updated_at", t0."id" FROM "teams" AS t0 WHERE (t0."id" = $1) [1]
↳ Plausible.Props.allowed_for/2, at: lib/plausible/props.ex:60
12:38:08.515 [debug] QUERY OK source="subscriptions" db=1.0ms idle=1033.9ms
SELECT s0."id", s0."paddle_subscription_id", s0."paddle_plan_id", s0."update_url", s0."cancel_url", s0."status", s0."next_bill_amount", s0."next_bill_date", s0."last_bill_date", s0."currency_code", s0."team_id", s0."inserted_at", s0."updated_at", s0."team_id" FROM "subscriptions" AS s0 WHERE (s0."team_id" = $1) ORDER BY s0."inserted_at" DESC, s0."id" DESC LIMIT 1 [1]
↳ Plausible.Teams.Billing.allowed_features_for/1, at: lib/plausible/teams/billing.ex:677
12:38:08.516 [debug] QUERY OK source="site_imports" db=0.7ms idle=868.1ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Imported.get_completed_imports/1, at: lib/plausible/imported.ex:209
12:38:08.516 [debug] QUERY OK source="site_imports" db=0.6ms idle=62.3ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Imported.get_completed_imports/1, at: lib/plausible/imported.ex:209
12:38:08.524 [debug] QUERY OK source="sessions_v2" db=7.0ms idle=21.6ms
SELECT toUInt64(round(uniq(s0."user_id") * any(_sample_factor))) AS "visitors",toStartOfHour(f1) AS "time" FROM "sessions_v2" AS s0 ARRAY JOIN timeSlots(toTimeZone(s0."start", {$0:String}), toUInt32(timeDiff(s0."start", s0."timestamp")), toUInt32({$1:Int64})) AS f1 WHERE ((s0."site_id" = {$2:Int64}) AND (s0."start" >= {$3:DateTime}) AND (s0."timestamp" >= {$4:DateTime}) AND (s0."start" <= {$5:DateTime})) GROUP BY "time" ORDER BY "time" ["Etc/UTC", 900, 1, ~N[2026-02-19 10:38:08], ~N[2026-02-26 10:38:08], ~N[2026-02-27 10:38:08]]
↳ Plausible.Stats.QueryRunner.execute_main_query/1, at: lib/plausible/stats/query_runner.ex:51

Note that there are 5 queries to site_imports

this branch:

iex(4)> Plausible.Stats.Sparkline.overview_24h(site)
12:28:40.530 [debug] QUERY OK source="teams" db=3.1ms idle=1294.5ms
SELECT t0."id", t0."identifier", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."locked", t0."setup_complete", t0."setup_at", t0."hourly_api_request_limit", t0."notes", t0."policy", t0."grace_period", t0."inserted_at", t0."updated_at", t0."id" FROM "teams" AS t0 WHERE (t0."id" = $1) [1]
↳ Plausible.Props.allowed_for/2, at: lib/plausible/props.ex:60
12:28:40.532 [debug] QUERY OK source="subscriptions" db=0.8ms idle=1298.6ms
SELECT s0."id", s0."paddle_subscription_id", s0."paddle_plan_id", s0."update_url", s0."cancel_url", s0."status", s0."next_bill_amount", s0."next_bill_date", s0."last_bill_date", s0."currency_code", s0."team_id", s0."inserted_at", s0."updated_at", s0."team_id" FROM "subscriptions" AS s0 WHERE (s0."team_id" = $1) ORDER BY s0."inserted_at" DESC, s0."id" DESC LIMIT 1 [1]
↳ Plausible.Teams.Billing.allowed_features_for/1, at: lib/plausible/teams/billing.ex:677
12:28:40.532 [debug] QUERY OK source="site_imports" db=0.5ms idle=1299.6ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Stats.Query.put_imported_opts/2, at: lib/plausible/stats/query.ex:155
12:28:40.558 [debug] QUERY OK db=24.7ms idle=1316.4ms
SELECT s0."visitors",s0."pageviews",s0."visits",s0."pageviews" AS "pageviews",s0."visitors" AS "visitors",s0."visits" AS "visits",s1."views_per_visit" AS "views_per_visit" FROM (SELECT toUInt64(round(uniq(se0."user_id") * any(_sample_factor))) AS "visitors",toUInt64(round(countIf(se0."name" = 'pageview') * any(_sample_factor))) AS "pageviews",toUInt64(round(uniq(se0."session_id") * any(_sample_factor))) AS "visits" FROM "events_v2" AS se0 WHERE ((se0."site_id" = {$0:Int64}) AND (se0."timestamp" >= {$1:DateTime}) AND (se0."timestamp" <= {$2:DateTime}))) AS s0 LEFT JOIN (SELECT greatest(ifNotFinite(round(sum(ss0."sign" * ss0."pageviews") / sum(ss0."sign"), 2), 0), 0) AS "views_per_visit",toUInt32(greatest(sum(sign), 0)) AS "__internal_visits" FROM "sessions_v2" AS ss0 WHERE ((ss0."site_id" = {$3:Int64}) AND (ss0."start" >= {$4:DateTime}) AND (ss0."timestamp" >= {$5:DateTime}) AND (ss0."start" <= {$6:DateTime}))) AS s1 ON 1 ORDER BY "visitors" DESC [1, ~N[2026-02-26 10:28:40], ~N[2026-02-27 10:28:40], 1, ~N[2026-02-19 10:28:40], ~N[2026-02-26 10:28:40], ~N[2026-02-27 10:28:40]]
↳ Plausible.Stats.QueryRunner.execute_main_query/1, at: lib/plausible/stats/query_runner.ex:51
12:28:40.577 [debug] QUERY OK db=19.0ms idle=1341.8ms
SELECT s0."visitors",s0."pageviews",s0."visits",s0."pageviews" AS "pageviews",s0."visitors" AS "visitors",s0."visits" AS "visits",s1."views_per_visit" AS "views_per_visit" FROM (SELECT toUInt64(round(uniq(se0."user_id") * any(_sample_factor))) AS "visitors",toUInt64(round(countIf(se0."name" = 'pageview') * any(_sample_factor))) AS "pageviews",toUInt64(round(uniq(se0."session_id") * any(_sample_factor))) AS "visits" FROM "events_v2" AS se0 WHERE ((se0."site_id" = {$0:Int64}) AND (se0."timestamp" >= {$1:DateTime}) AND (se0."timestamp" <= {$2:DateTime}))) AS s0 LEFT JOIN (SELECT greatest(ifNotFinite(round(sum(ss0."sign" * ss0."pageviews") / sum(ss0."sign"), 2), 0), 0) AS "views_per_visit",toUInt32(greatest(sum(sign), 0)) AS "__internal_visits" FROM "sessions_v2" AS ss0 WHERE ((ss0."site_id" = {$3:Int64}) AND (ss0."start" >= {$4:DateTime}) AND (ss0."timestamp" >= {$5:DateTime}) AND (ss0."start" <= {$6:DateTime}))) AS s1 ON 1 ORDER BY "visitors" DESC [1, ~N[2026-02-25 10:28:40], ~N[2026-02-26 10:28:40], 1, ~N[2026-02-18 10:28:40], ~N[2026-02-25 10:28:40], ~N[2026-02-26 10:28:40]]
↳ Plausible.Stats.QueryRunner.execute_comparison_query/1, at: lib/plausible/stats/query_runner.ex:81
12:28:40.578 [debug] QUERY OK source="teams" db=0.6ms idle=1345.5ms
SELECT t0."id", t0."identifier", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."locked", t0."setup_complete", t0."setup_at", t0."hourly_api_request_limit", t0."notes", t0."policy", t0."grace_period", t0."inserted_at", t0."updated_at", t0."id" FROM "teams" AS t0 WHERE (t0."id" = $1) [1]
↳ Plausible.Props.allowed_for/2, at: lib/plausible/props.ex:60
12:28:40.579 [debug] QUERY OK source="subscriptions" db=0.4ms idle=1346.6ms
SELECT s0."id", s0."paddle_subscription_id", s0."paddle_plan_id", s0."update_url", s0."cancel_url", s0."status", s0."next_bill_amount", s0."next_bill_date", s0."last_bill_date", s0."currency_code", s0."team_id", s0."inserted_at", s0."updated_at", s0."team_id" FROM "subscriptions" AS s0 WHERE (s0."team_id" = $1) ORDER BY s0."inserted_at" DESC, s0."id" DESC LIMIT 1 [1]
↳ Plausible.Teams.Billing.allowed_features_for/1, at: lib/plausible/teams/billing.ex:677
12:28:40.587 [debug] QUERY OK source="sessions_v2" db=6.8ms idle=1363.5ms
SELECT toUInt64(round(uniq(s0."user_id") * any(_sample_factor))) AS "visitors",toStartOfHour(f1) AS "time" FROM "sessions_v2" AS s0 ARRAY JOIN timeSlots(toTimeZone(s0."start", {$0:String}), toUInt32(timeDiff(s0."start", s0."timestamp")), toUInt32({$1:Int64})) AS f1 WHERE ((s0."site_id" = {$2:Int64}) AND (s0."start" >= {$3:DateTime}) AND (s0."timestamp" >= {$4:DateTime}) AND (s0."start" <= {$5:DateTime})) GROUP BY "time" ORDER BY "time" ["Etc/UTC", 900, 1, ~N[2026-02-19 10:28:40], ~N[2026-02-26 10:28:40], ~N[2026-02-27 10:28:40]]
↳ Plausible.Stats.QueryRunner.execute_main_query/1, at: lib/plausible/stats/query_runner.ex:51

There is 1 query to site_imports. In my PR description I said it will be 0 but I overlooked the fact that the query_24h_stats does not use an interval so for that it will still make one query to site_imports.

aren't preloads idempotent at Ecto level?

They are but only in the case that the result is stored and re-used. We often don't. For example:

site = ...
Repo.preload(site, :completed_imports) # Fires DB query. Returns site with completed_imports preloaded
Repo.preload(site, :completed_imports) # Fires DB query again because the preload from last line was discarded
site = Repo.preload(site, :completed_imports) # Preload completed imports and store the site with preloaded data in variable
Repo.preload(site, :completed_imports) # Does not fire DB query since the site variable now has preloaded data

Another minor optimization we could also do there is, we don't need to query for :visitors, :visits, :pageviews, :views_per_visit for regular site cards

Nice! I hadn't considered that. I don't think it's that minor because it means we'll avoid JOINing with sessions in clickhouse queries for site cards. That would be a significant win.

@aerosol
Copy link
Member

aerosol commented Feb 27, 2026

Nice! I hadn't considered that. I don't think it's that minor because it means we'll avoid JOINing with sessions in clickhouse queries for site cards. That would be a significant win.

We can include it in this PR if you like via #6109

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants