- Requires python 3.11
pip install -r requirements.txt# Index English (default)
python index_pages.py --base_directory /opt/clickhouse-docs --algolia_app_id 7AL1W7YVZK --algolia_api_key <write_key>
# Index Japanese
python index_pages.py --base_directory /opt/clickhouse-docs --algolia_app_id 7AL1W7YVZK --algolia_api_key <write_key> --locale jp
# Index Chinese
python index_pages.py --base_directory /opt/clickhouse-docs --algolia_app_id 7AL1W7YVZK --algolia_api_key <write_key> --locale zh
# Index Russian
python index_pages.py --base_directory /opt/clickhouse-docs --algolia_app_id 7AL1W7YVZK --algolia_api_key <write_key> --locale ru
# Using the shell script
./run_indexer.sh --locale jpusage: index_pages.py [-h] [-d BASE_DIRECTORY] [-x] --algolia_app_id ALGOLIA_APP_ID --algolia_api_key ALGOLIA_API_KEY [--algolia_index_name ALGOLIA_INDEX_NAME] [--locale {en,jp,zh,ru}]
Index search pages.
options:
-h, --help show this help message and exit
-d BASE_DIRECTORY, --base_directory BASE_DIRECTORY
Path to root directory of docs repo
-x, --dry_run Dry run, do not send results to Algolia.
--algolia_app_id ALGOLIA_APP_ID
Algolia Application ID
--algolia_api_key ALGOLIA_API_KEY
Algolia Admin API Key
--algolia_index_name ALGOLIA_INDEX_NAME
Algolia Index Name
--locale {en,jp,zh,ru}
Locale to index (default: en)Before pushing any changes to the production app, please test on the dev app and make a backup of the english search index "clickhouse". You can do so from the search tab -> manage index -> duplicate. Give the duplicate index a name like "clickhouse-backup-DD-MM-YYYY"
We use these to evaluate search performance. results.csv contains a list of authoritative search results for 200 terms.
We use this to compute an average nDCG.
- Requires python 3.11
pip install -r requirements.txtYou need to comment out either Dev or Prod depending on what you want to test. The API key is the public search key, don't worry. Find the actual key you need in the Algolia app under settings -> API keys
# dev details
# ALGOLIA_APP_ID = "7AL1W7YVZK"
# ALGOLIA_API_KEY = "43bd50d4617a97c9b60042a2e8a348f9"
# Prod details
ALGOLIA_APP_ID = "5H9UG7CX5W"
ALGOLIA_API_KEY = "4a7bf25cf3edbef29d78d5e1eecfdca5"python compute_ndcg.py -dusage: compute_ndcg.py [-h] [-d] [-v] [input_csv]
Compute nDCG for Algolia search results.
positional arguments:
input_csv Path to the input CSV file (default: results.csv).
options:
-h, --help show this help message and exit
-d, --detailed Print detailed results for each search term.
-v, --validate Validate links.| Date | Average nDCG | Results | Changes |
|---|---|---|---|
| 20/01/2025 | 0.4700 | View Results | Baseline |
| 21/01/2025 | 0.5021 | View Results | Index _ character and move language to English |
| 24/01/2025 | 0.7072 | View Results | Process markdown, and tune settings. |
| 24/01/2025 | 0.7412 | View Results | Include manual promotions for ambigious terms. |
| 28/08/2025 | 0.5729 | View Results | This was unfortunately not run or recorded for search improvements which were made recently |
Note: exact scores may vary due to constant content changes.
- Some pages are not optimized for retrieval e.g.
a. https://clickhouse.com/docs/sql-reference/aggregate-functions/combinators#-if will never return for
countIf,sumif,multiif - Some pages are hidden e.g. https://clickhouse.com/docs/install#from-docker-image - this needs to be separate page.
- Some pages e.g. https://clickhouse.com/docs/sql-reference/statements/alter need headings e.g.
Alter table - https://clickhouse.com/docs/optimize/sparse-primary-indexes needs to be optimized for primary key
- case
when- https://clickhouse.com/docs/sql-reference/functions/conditional-functions needs to be improved. Maybe keywords or a header has- https://clickhouse.com/docs/sql-reference/functions/array-functions#hasarr-elem trickycodec- we need better contentshard- need a better pagepopulate- we need to have a subheading on the mv pagecontains- https://clickhouse.com/docs/sql-reference/functions/string-search-functions needs wordsreplica- need more terms on https://clickhouse.com/docs/architecture/horizontal-scaling but we need a better page
- Better chunking - using a markdown chunker which respects code and table boundaries
- Skip pages
- Segment on case on h3s, h2s on numerics e.g. toInt8 -> to Int 8
The search index is automatically updated through a GitHub Actions workflow (.github/workflows/build-search.yml) in three scenarios:
- When: Every day at 4:00 AM UTC (11 PM CST / 12 AM EST)
- What: Automatically re-indexes all documentation to keep search results fresh
- When: A PR is merged to
mainwith theupdate searchlabel - How to use: Add the
update searchlabel to your PR before merging to trigger an immediate index update - Use case: Important for major documentation restructures or when search needs to reflect changes immediately
- When: Via GitHub Actions UI (workflow_dispatch)
- How: Go to Actions → Update Algolia Search → Run workflow
- Use case: Emergency updates or testing