A robust NestJS application for scraping news articles from popular Czech news websites. Built with enterprise-grade logging, comprehensive testing, and automated CI/CD pipeline.
- News Scraping: Automated scraping from major Czech news websites
- iDnes.cz
- HospodΓ‘ΕskΓ© noviny (HN.cz)
- AktuΓ‘lnΔ.cz
- Novinky.cz
- Blesk.cz
- Enterprise Logging: Winston-based logging with file and console outputs
- Database: SQLite with TypeORM for data persistence
- API Documentation: Scalar UI for interactive API documentation
- Background Jobs: Scheduled scraping every hour using NestJS Schedule
- Duplicate Prevention: Content-based deduplication using SHA-256 hashing
- Comprehensive Testing: Unit tests, e2e tests, and code coverage
- CI/CD Pipeline: GitHub Actions with security scanning and code quality checks
- Bun (v1.2.17 or higher)
- Node.js (v20 or higher)
- Clone the repository:
git clone <repository-url>
cd nest-scraping-api- Install dependencies:
bun install- Create logs directory:
mkdir -p logsbun run start:devbun run build
bun run start:prodbun run start:debugbun run testbun run test:covbun run test:watchbun run test:e2ebun run test:ci# Check and fix linting issues
bun run lint
# Check linting issues only (no auto-fix)
bun run lint:checkbun run type-checkbun run formatbun run auditThe project maintains a minimum code coverage threshold of 80% for:
- Branches
- Functions
- Lines
- Statements
Current Coverage: 83.33%
Coverage reports are generated in multiple formats:
- HTML:
coverage/index.html - LCOV:
coverage/lcov.info - Console output
- Statements: 91.02%
- Branches: 77.09%
- Functions: 87.5%
- Lines: 91.07%
The project uses Codecov for continuous coverage monitoring and reporting.
Create a .env file in the root directory:
# Application
PORT=3000
NODE_ENV=development
# Logging
LOG_LEVEL=info
# Database
DB_TYPE=sqlite
DB_DATABASE=db.sqlite3Logs are stored in the logs/ directory:
logs/combined.log: All log levelslogs/error.log: Error level only
Once the application is running, you can access:
- API Documentation (Scalar UI): http://localhost:3000/reference
- OpenAPI JSON: http://localhost:3000/api-json
The API documentation includes:
- Interactive API explorer
- Request/response examples
- Authentication details
- Schema definitions
src/
βββ config/
β βββ logging.config.ts # Winston logging configuration
βββ entities/
β βββ article.entity.ts # Article database entity
βββ scraping/
β βββ scraping.module.ts # Scraping module
β βββ scraping.service.ts # Core scraping logic
β βββ scraping.service.spec.ts # Unit tests
βββ app.controller.ts # Main controller
βββ app.module.ts # Root module
βββ app.service.ts # App service
βββ main.ts # Application entry point
test/
βββ setup.ts # Test environment setup
βββ scraping.e2e-spec.ts # End-to-end tests
logs/ # Application logs
coverage/ # Test coverage reports
The project includes a comprehensive GitHub Actions workflow that runs on every push and pull request:
- Test: Runs linting, type checking, and tests with coverage
- Security: Performs security audits and vulnerability scanning
- Build: Creates production build artifacts (main branch only)
- Automated testing with Jest
- Code coverage reporting to Codecov
- Security scanning with Snyk
- Dependency vulnerability checks
- Automated builds and deployments
-
Port already in use:
# Change the port in .env file PORT=3001 -
Database issues:
# Remove existing database and restart rm db.sqlite3 bun run start:dev -
Logging issues:
# Ensure logs directory exists mkdir -p logs
Run the application in debug mode to get detailed logs:
bun run start:debug- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and linting
- Submit a pull request
# Create feature branch
git checkout -b feature/your-feature
# Make changes and test
bun run test
bun run lint
bun run type-check
# Commit changes
git commit -m "feat: add your feature"
# Push and create PR
git push origin feature/your-featureThis project is licensed under the MIT License.
For support and questions:
- Create an issue in the repository
- Check the API documentation at
/reference - Review the test files for usage examples