Collection of user guides, tools, and links to resources for working with OpenAQ data.
- Resources
- Download OpenAQ archive data from S3 using
awscli - How big is the OpenAQ S3 bucket?
- Convert ndjson to InfluxDB line protocol format
- Convert CSV to InfluxDB line protocol format
- Contributing
- Access OpenAQ data via a filterable SNS topic
- Using Athena to access the whole archive
- Air Quality Collection with TimescaleDB - Sample Application
- openaq.org - The main OpenAQ website, contains CSV download pages and the world pollutant map.
- ropensci/ropenaq - R package for the OpenAQ API
- nickolasclarke/openaq - JavaScript client for the OpenAQ API
- dhhagan/py-openaq - Python wrapper for the OpenAQ API
- openaq-postman - Postman collections for working with OpenAQ API
- jackkoppa/cityaq - Compare air quality for cities
- dolugen/openaq-browser - A web client for OpenAQ API
- barronh/scrapenaq - Download and convert OpenAQ archived data with Pandas
- dolugen/openaq-swagger - OpenAPI v3 spec of OpenAQ API
- dolugen/sns-s3-influxdb - Populate InfluxDB with air quality data
- OpenAQ on AWS - OpenAQ's publically available S3 bucket and SNS topic informations.
OpenAQ stores metric data in a S3 bucket, and it's publicly available. One way to download from the archive is using the aws s3 command.
Prerequisites: You need a free AWS account, and have awscli installed and configured.
Download a single file:
aws s3 cp s3://openaq-fetches/realtime-gzipped/2020-06-06/1591476667.ndjson .Download files for 1 day:
aws s3 cp s3://openaq-fetches/realtime-gzipped/2020-06-06/ . --recursiveYou can go up 1 level and download the entire archive if you wish.
If you prefer to not use awscli, take a look at this tool that uses the scraping approach: barronh/scrapenaq.
aws s3 ls --summarize --human-readable --recursive s3://openaq-fetchesAs of June 2020, it's 323 GB.
The archive files in the S3 bucket are ndjson formatted, or newline delimited JSON. Meaning it's just JSON, but each line is a separate JSON object.
If you were to convert this to InfluxDB's line protocol, you can use ndjson2lineprotocol.py script that's found in this repo.
cat *.ndjson | ./ndjson2lineprotocol.pyThe script outputs to standard output, so you may want to redirect it to a file.
Addition to the S3 option, you can filter and download data as CSV from openaq.org website.
After downloading the CSV, feed the file to csv2lineprotocol.py like so:
cat openaq.csv | ./csv2lineprotocol.pySomething missing or need fixing here? Please use the issues page to submit requests and ask questions. You can also create a Pull Request with your changes.

