doccano is an open-source text annotation tool for humans. It provides annotation features for text classification, sequence labeling, and sequence to sequence tasks. You can create labeled data for sentiment analysis, named entity recognition, text summarization, and so on. Just create a project, upload data, and start annotating. You can build a dataset in hours.
Try the annotation demo.
Read the documentation at https://doccano.github.io/doccano/.
- Collaborative annotation
- Multi-language support
- Mobile support
- Emoji 😄 support
- Dark theme
- RESTful API
There are three options to run doccano:
- pip (Python 3.8+)
- Docker
- Docker Compose
To install doccano, run:
pip install doccanoBy default, SQLite 3 is used for the default database. If you want to use PostgreSQL, install the additional dependencies:
pip install 'doccano[postgresql]'and set the DATABASE_URL environment variable according to your PostgreSQL credentials:
DATABASE_URL="postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DB}?sslmode=disable"After installation, run the following commands:
# Initialize database.
doccano init
# Create a super user.
doccano createuser --username admin --password pass
# Start a web server.
doccano webserver --port 8000In another terminal, run the command:
# Start the task queue to handle file upload/download.
doccano taskGo to http://127.0.0.1:8000/.
As a one-time setup, create a Docker container as follows:
docker pull doccano/doccano
docker container create --name doccano \
-e "ADMIN_USERNAME=admin" \
-e "ADMIN_EMAIL=admin@example.com" \
-e "ADMIN_PASSWORD=password" \
-v doccano-db:/data \
-p 8000:8000 doccano/doccanoNext, start doccano by running the container:
docker container start doccanoGo to http://127.0.0.1:8000/.
To stop the container, run docker container stop doccano -t 5. All data created in the container will persist across restarts.
If you want to use the latest features, specify the nightly tag:
docker pull doccano/doccano:nightlyYou need to install Git and clone the repository:
git clone https://github.com/doccano/doccano.git
cd doccanoNote for Windows developers: Be sure to configure git to correctly handle line endings or you may encounter status code 127 errors while running the services in future steps. Running with the git config options below will ensure your git directory correctly handles line endings.
git clone https://github.com/doccano/doccano.git --config core.autocrlf=inputThen, create an .env file with variables in the following format (see ./docker/.env.example):
# platform settings
ADMIN_USERNAME=admin
ADMIN_PASSWORD=password
ADMIN_EMAIL=admin@example.com
# rabbit mq settings
RABBITMQ_DEFAULT_USER=doccano
RABBITMQ_DEFAULT_PASS=doccano
# database settings
POSTGRES_USER=doccano
POSTGRES_PASSWORD=doccano
POSTGRES_DB=doccano
After running the following command, access http://127.0.0.1/.
docker-compose -f docker/docker-compose.prod.yml --env-file .env up| Service | Button |
|---|---|
| AWS1 | |
| Heroku |
See the documentation for details.
As with any software, doccano is under continuous development. If you have requests for features, please file an issue describing your request. Also, if you want to see work towards a specific feature, feel free to contribute by working towards it. The standard procedure is to fork the repository, add a feature, fix a bug, then file a pull request that your changes are to be merged into the main repository and included in the next release.
Here are some tips might be helpful. How to Contribute to Doccano Project
@misc{doccano,
title={{doccano}: Text Annotation Tool for Human},
url={https://github.com/doccano/doccano},
note={Software available from https://github.com/doccano/doccano},
author={
Hiroki Nakayama and
Takahiro Kubo and
Junya Kamura and
Yasufumi Taniguchi and
Xu Liang},
year={2018},
}For help and feedback, feel free to contact the author.
from root dir doccano/ run
docker build --no-cache --progress=plain --file ./docker/Dockerfile.prod --platform=linux/amd64 -t doccano:be_20240813 ./docker build --no-cache --progress=plain --file ./docker/Dockerfile.nginx --platform=linux/amd64 -t doccano:fe_20240813 ./
test:
docker build --no-cache --progress=plain -t doccano:20230911 ./docker/docker-frontend/ &> build.log
from the / root forder:
- sudo docker-compose -f docker/docker-compose.prod.yml ps
- sudo docker-compose -f docker/docker-compose.prod.yml up -d
- docker-compose -f docker/docker-compose.prod.yml --env-file .env up (not tried yet)
Doing this in us-east-1 - Virginia and used the base name doccano, so for instance doccano-vpc, doccano-sg etc etc
- Create Secrets
- Create VPC
- Create Security Group for ALB
- Create Target Group
- Create ALB
- Create RDS
- Populate Secrets needed by the EC2
- Create EC2 instance
- Add instance/s to the target group
- Update SSL Cert and listeners
- add useful secrets to secrets manager:
- quay_io_creds (quay.io login creds)
- doccano_creds (all the information needed in the .env file, mostly doccano and DB credentials)
Select the following options:
- VPC and more
- pick your CIDR block and name (
doccano-vpc) - 2 availbiulity zones, 2 private, 2 public subnets
- only 1 nat gateway (in 1 availability zone. We are going to deploy only in that one zone since this application doesn't need to be fault tollerant and can have some downtime. We create 2 so it will be easier to add eventually a second nat gateway down the line if we decide to.)
- Leave the other options as they are
doccano-alb-sg- select
doccano-vpc - create security group for ALB listen to all from 80 and 443
- Maybe restrict to UChicago IPs for now?
- create target group for ALB (type instances, name
doccano-target-group, protocol 80, http1, health check: /) - Create button, add instances later
- name (
doccano-alb) - internet facing
- ipv4
- select
doccano-vpc - select the two availability zones and the 2 puvlic subnets for the ALB
- select
doccano-alb-sgsecurity group as well as the VPCdefault(the default will allow full connectivity between the EC2 and the ALB) - listeners: listen to 443 (set 80 if you don't have a certificate already and you can update later) and forward to target group created previously (may need to refresh the page for it to show up)
- click on create
- psql
- 13.13
- production
- single db
- identifier:
doccano - master username:
doccano - secrets manager
- t3.small
- 100 Gi
- doccano vpc
- force to create a new DB subnet group
- doccano vpc default sec group
- name= doccano-ec2-20240813-1, Amazon Linux 2023 AMI, t3.medium
- select key pair
- select doccano VPC, private subnet
- No public IP, existing security group 'default' for the doccano VPC
- 40gb gp3
- previously created role ec2_secrets_manager_role ( or create it if not created previously. Give EC2 read permission on the secrets manager with policy SecretsManagerReadWrite and trust relationship to EC2, as well as CloudWatchLogsFullAccess to push the docker logs to cloudwatch) as instance profile.
- update ec2_user_data.sh script
- click on create
and add the ec2 created previously
- For some reason if the instance is only internal it will now turn healthy in the target group. You can just attach a public IP, wait 2/3 minutes for it to turn healthy and then you can remove the elastic IP. That seems to be solving the issue. (TRY added ALB's security group on port 80 in EC2's security group. By doing this, the issue will be resolved.)
- get a cert from certificate manager for the ALB
- update the DNS provider with the CNAME of the ALB
- add certificate to the HTTPS 443 listener in the ALB and point to the target group
- add listener to listen to 80 and redirect (redirect to url) 301 to 443 - Redirect to HTTPS://#{host}:443/#{path}?#{query}
- Update the ec2_user_data.sh file with the new tag
- Repeat the step
Create EC2 instance - Repeat the step
Add instance/s to the target group - Remove old instance from the target group
- Terminate old instance
- sudo dnf search postgresql
- sudo dnf install -y postgresql15
Footnotes
-
(1) EC2 KeyPair cannot be created automatically, so make sure you have an existing EC2 KeyPair in one region. Or create one yourself. (2) If you want to access doccano via HTTPS in AWS, here is an instruction. ↩

