Skip to content

This is a full stack web app that will summarize papers that you upload while also providing citations, I have used the o4-mini and text embedding ada 002 models

Notifications You must be signed in to change notification settings

vvemulakonda/Researchpaper-summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-Powered Research Assistant

A full-stack web application that allows users to upload academic PDFs and ask natural language questions about the content using AI-powered retrieval augmented generation (RAG).

πŸ—οΈ Architecture

AI-Powered Research Assistant/
β”œβ”€β”€ frontend/           # HTML, CSS, JavaScript interface
β”œβ”€β”€ backend/           # Flask API server
β”œβ”€β”€ ml/               # Machine learning modules
β”œβ”€β”€ data/             # PDF storage and FAISS index
β”œβ”€β”€ docs/             # Documentation
└── requirements.txt   # Python dependencies

πŸš€ Features

  • PDF Upload & Processing: Parse academic PDFs using PyMuPDF
  • Semantic Search: Generate embeddings using OpenAI's text-embedding-ada-002
  • RAG Implementation: Retrieve relevant chunks and generate answers with GPT-4
  • Citation Generation: Automatic APA/MLA citation formatting
  • Vector Storage: FAISS index for efficient similarity search
  • Related Papers: Find similar papers using cosine similarity
  • Azure OpenAI Support: Use Azure OpenAI endpoints and keys

πŸ› οΈ Setup Instructions

Prerequisites

  • Python 3.8+
  • OpenAI API key OR Azure OpenAI configuration
  • pip (Python package manager)

Installation

  1. Clone the repository

    git clone <repository-url>
    cd ai-research-assistant
  2. Install dependencies

    pip install -r requirements.txt
  3. Configure API Keys

    Option A: OpenAI Direct API

    export OPENAI_API_KEY="your-openai-api-key-here"

    Option B: Azure OpenAI

    export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
    export AZURE_OPENAI_API_KEY="your-azure-api-key-here"
    
    # Optional: Specify deployment names
    export AZURE_OPENAI_DEPLOYMENT_NAME="your-gpt-4-deployment"
    export AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME="your-embedding-deployment"
  4. Run the application

    python backend/app.py
  5. Access the application

    • Open your browser and navigate to http://localhost:5000

πŸ”‘ API Configuration

OpenAI Direct API

  • Get your API key from OpenAI Platform
  • Set environment variable: OPENAI_API_KEY

Azure OpenAI

  • Create an Azure OpenAI resource in Azure Portal
  • Get your endpoint URL and API key
  • Set environment variables:
    • AZURE_OPENAI_ENDPOINT
    • AZURE_OPENAI_API_KEY
    • AZURE_OPENAI_DEPLOYMENT_NAME (optional)
    • AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME (optional)

πŸ“ Project Structure

Backend (/backend)

  • app.py: Main Flask application
  • routes.py: API endpoints for upload, question-answering, and citations
  • config.py: Configuration settings

ML Modules (/ml)

  • pdf_parser.py: PDF parsing and chunking
  • embedding_generator.py: OpenAI/Azure embedding generation
  • vector_search.py: FAISS index management and similarity search
  • rag_engine.py: Retrieval Augmented Generation implementation
  • citation_generator.py: APA/MLA citation formatting

Frontend (/frontend)

  • index.html: Main application interface
  • style.css: Styling and responsive design
  • script.js: Frontend JavaScript functionality

Data (/data)

  • uploads/: Stored PDF files
  • embeddings/: FAISS index files
  • metadata/: Document metadata storage

πŸ”§ API Endpoints

  • POST /upload: Upload and process PDF files
  • POST /ask: Ask questions about uploaded documents
  • GET /documents: List uploaded documents
  • GET /related/<doc_id>: Find related papers

🎯 Usage

  1. Upload PDFs: Drag and drop or select academic PDF files
  2. Ask Questions: Type natural language questions about the content
  3. Get Answers: Receive AI-generated answers with citations
  4. Explore Related Papers: Discover similar academic papers

πŸ”’ Security Notes

  • Store your API keys securely
  • Implement proper file validation for PDF uploads
  • Consider rate limiting for API endpoints
  • Add authentication for production use

πŸ“Š Performance

  • Supports PDFs up to 50MB
  • Processes documents in chunks of 1000 tokens
  • Retrieves top-5 most relevant chunks for each question
  • FAISS index enables fast similarity search

πŸ’° Cost Considerations

OpenAI Direct API

  • GPT-4: ~$0.03 per 1K input tokens, ~$0.06 per 1K output tokens
  • text-embedding-ada-002: ~$0.0001 per 1K tokens

Azure OpenAI

  • Pricing varies by region and model deployment
  • Check Azure pricing calculator for your specific setup

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

πŸ“„ License

MIT License - see LICENSE file for details

About

This is a full stack web app that will summarize papers that you upload while also providing citations, I have used the o4-mini and text embedding ada 002 models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published