Skip to content

Conversation

@llbbl
Copy link

@llbbl llbbl commented Jun 27, 2025

Add Python Testing Infrastructure

Summary

This PR sets up a comprehensive testing infrastructure for the OCR dataset tools project using Poetry as the package manager and pytest as the testing framework.

Changes Made

Package Management

  • Poetry Setup: Created pyproject.toml with Poetry configuration as the project's package manager
  • Dependencies: Migrated all necessary dependencies including PyTorch, OpenCV, and other OCR-related packages
  • Dev Dependencies: Added pytest, pytest-cov, and pytest-mock for testing

Testing Configuration

  • pytest Configuration: Set up pytest with:

    • Test discovery patterns for test_*.py and *_test.py files
    • Coverage reporting with HTML and XML output formats
    • Custom markers for unit, integration, and slow tests
    • Strict mode with verbose output
  • Coverage Settings: Configured coverage to:

    • Track convert and dataset packages
    • Exclude test files and __init__.py from coverage
    • Generate reports in multiple formats
    • Currently set to 0% threshold (should be changed to 80% when actual tests are added)

Directory Structure

tests/
├── __init__.py
├── conftest.py          # Shared pytest fixtures
├── test_setup_validation.py  # Infrastructure validation tests
├── unit/
│   └── __init__.py
└── integration/
    └── __init__.py

Fixtures (in conftest.py)

  • temp_dir: Creates temporary directory for test files
  • sample_image: Generates test images
  • sample_detection_json: Creates detection JSON test data
  • sample_recognition_txt: Creates recognition text test data
  • mock_dataset_config: Provides mock configuration
  • sample_points: Creates polygon point arrays
  • mock_lmdb_env: Mocks LMDB database environment
  • reset_modules: Cleans module imports between tests
  • capture_stdout: Captures print output for testing

Additional Changes

  • Updated .gitignore: Added comprehensive entries for:
    • Testing artifacts (.pytest_cache/, coverage.xml, htmlcov/)
    • Claude settings (.claude/*)
    • Python build artifacts and virtual environments
    • Note: poetry.lock is intentionally NOT ignored

How to Use

Installing Dependencies

# Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -

# Install project dependencies
poetry install

Running Tests

Both commands are available and will run the same test suite:

poetry run test
# or
poetry run tests

Test Options

All standard pytest options are available:

# Run only unit tests
poetry run test -m unit

# Run with specific verbosity
poetry run test -v

# Run a specific test file
poetry run test tests/test_setup_validation.py

# Generate coverage report only
poetry run test --cov-report=html

Notes

  1. Coverage Threshold: Currently set to 0% to allow infrastructure setup. Should be changed to 80% in pyproject.toml when actual tests are written.

  2. Validation Tests: The included test_setup_validation.py verifies that the testing infrastructure is properly configured and all dependencies are correctly installed.

  3. Next Steps: Developers can now immediately start writing unit and integration tests for the convert and dataset packages using the provided infrastructure.

  4. Poetry Lock File: The poetry.lock file will be generated on first install and should be committed to ensure reproducible builds.

- Set up Poetry as package manager with pyproject.toml configuration
- Add pytest, pytest-cov, and pytest-mock as dev dependencies
- Configure pytest with coverage reporting and custom markers
- Create test directory structure with fixtures in conftest.py
- Add validation tests to verify infrastructure setup
- Update .gitignore with testing and Claude-related entries
- Configure test commands accessible via `poetry run test/tests`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant