Intelligent Document Query System

A FastAPI-powered intelligent document query system that processes PDFs, creates vector embeddings, and generates AI-powered answers using FAISS and sentence-transformers.

Features

PDF Processing: Download and extract text from PDF URLs using PyMuPDF
Vector Embeddings: Create semantic embeddings using sentence-transformers (all-MiniLM-L6-v2)
Vector Search: Efficient similarity search using FAISS
AI-Powered Answers: Generate contextual answers using Google Gemini 2.5-flash (with mock fallback)
Production Ready: Optimized for deployment on Render with proper error handling

Tech Stack

FastAPI: Modern Python web framework
sentence-transformers: State-of-the-art text embeddings
FAISS: Efficient vector similarity search
PyMuPDF: PDF text extraction
Google Gemini: AI-powered answer generation
httpx: Async HTTP client for PDF downloads

API Endpoints

POST `/api/v1/hackrx/run`

Process a PDF document and answer questions using AI-powered semantic search.

Request Body:

{
  "documents": "https://example.com/document.pdf",
  "questions": [
    "What is the grace period for premium payment?",
    "Does this policy cover maternity?"
  ]
}

Response Body:

{
  "answers": [
    {
      "question": "What is the grace period for premium payment?",
      "answer": "A grace period of thirty days is provided for premium payment delays.",
      "source_clause": "Clause 3.2: A grace period of thirty days is provided...",
      "explanation": "Answer derived based on Clause 3.2 in the document."
    },
    {
      "question": "Does this policy cover maternity?",
      "answer": "Yes, maternity coverage is included under the medical benefits section.",
      "source_clause": "Section 5.1: Medical benefits include maternity coverage...",
      "explanation": "Information found in the medical benefits section of the policy."
    }
  ]
}

GET `/health`

Health check endpoint that returns the status of all system components.

Response:

{
  "status": "healthy",
  "components": {
    "pdf_processor": true,
    "embedding_manager": true,
    "ai_generator": true
  }
}

Setup & Installation

Local Development

Clone the repository

git clone <repository-url>
cd intelligent-document-query

Install dependencies
```
pip install -r requirements.txt
```

Set up environment variables

cp .env.example .env
# Edit .env and add your Gemini API key

Run the application

uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Access the API
- API Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
- Main Endpoint: POST http://localhost:8000/api/v1/hackrx/run

Environment Variables

GEMINI_API_KEY: Your Google Gemini API key (required for AI-powered answers)
DEBUG: Set to true for development mode (default: false)
CORS_ORIGINS: Allowed CORS origins (default: *)

Getting a Gemini API Key

Go to Google AI Studio
Sign in with your Google account
Click "Get API key"
Create a new project or select existing one
Generate and copy your API key
Add it to your .env file as GEMINI_API_KEY=your_key_here

Deployment on Render

This application is ready for deployment on Render with the included render.yaml configuration.

Fork/clone this repository
Connect to Render
- Go to Render Dashboard
- Click "New Web Service"
- Connect your GitHub repository
Configure Environment Variables
- Add GEMINI_API_KEY in the Render dashboard
- Other variables are configured in render.yaml
Deploy
- Render will automatically build and deploy using the configuration

Architecture

The system uses a modular architecture:

FastAPI: Async web framework with automatic API documentation
PDF Processor: Downloads and extracts text from PDF URLs using PyMuPDF
Embedding Manager: Creates vector embeddings using sentence-transformers or TF-IDF fallback
FAISS: Efficient vector similarity search for finding relevant document chunks
AI Generator: Google Gemini 2.5-flash for intelligent answer generation with mock fallback

Features

Robust PDF Processing: Handles various PDF formats and sizes
Smart Text Chunking: Intelligent text segmentation with overlap for context preservation
Fallback Systems: TF-IDF vectorizer when advanced embeddings aren't available
Mock Responses: Template answers when AI API is unavailable
Error Handling: Comprehensive error handling with meaningful messages
Production Ready: Optimized for deployment with health checks and monitoring

Example Usage

curl -X POST "http://localhost:8000/api/v1/hackrx/run" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": "https://example.com/policy.pdf",
    "questions": [
      "What is the coverage limit?",
      "Are pre-existing conditions covered?"
    ]
  }'

Requirements

Python 3.11+
FastAPI
Google Gemini API key (optional, uses mock responses without it)
Internet access for PDF downloads

License

This project is open source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config		config
data/4202e78204df594d38b23cd4e573218c74880ebc21682c0e88a517b835651a08		data/4202e78204df594d38b23cd4e573218c74880ebc21682c0e88a517b835651a08
sample		sample
utils		utils
vectorstore_optimized		vectorstore_optimized
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
render.yaml		render.yaml
requirements.txt		requirements.txt
test.py		test.py
test1.py		test1.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Intelligent Document Query System

Features

Tech Stack

API Endpoints

POST `/api/v1/hackrx/run`

GET `/health`

Setup & Installation

Local Development

Environment Variables

Getting a Gemini API Key

Deployment on Render

Architecture

Features

Example Usage

Requirements

License

About

Uh oh!

Releases

Packages

Languages

AyushKatre05/DocQueryAI

Folders and files

Latest commit

History

Repository files navigation

Intelligent Document Query System

Features

Tech Stack

API Endpoints

POST /api/v1/hackrx/run

GET /health

Setup & Installation

Local Development

Environment Variables

Getting a Gemini API Key

Deployment on Render

Architecture

Features

Example Usage

Requirements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

POST `/api/v1/hackrx/run`

GET `/health`

Packages