Feed it a research topic -> Get back a comprehensive thematic tree, parallel methodology evaluations, and a structured, funding-ready grant proposal evaluated for novelty. All without a vector database or embeddings layer.
VMARO is an advanced 8-stage, multi-agent AI pipeline orchestrating academic research and grant writing. Instead of the traditional, generic RAG mechanism (chunking texts and vector similarity), VMARO utilizes LLM-native structural synthesis to construct an interpretable "Thematic Tree" directly from multiple live academic sources.
The multi-model engine sequentially analyzes literature, detects emerging macro-trends, isolates critical research gaps, pits multiple methodologies against each other in a parallel "challenger" phase, formats the outcomes to specific institutional guidelines (e.g., NIH, NSF, ERC), and finally generates the full-bodied proposal with a quantified novelty score and PDF/LaTeX exports.
-
Vectorless Navigation: No FAISS, no ChromaDB. Replaces black-box semantic retrieval with direct semantic clustering, constructing a visual Thematic Tree directly from high-signal abstracts and metadata.
-
Intelligent Quality Gates: Built-in "LLM-as-a-Judge" layers validate outputs iteratively between stages. If data is shallow or hallucinatory, the gate will flag it (
PASS,REVISE,FAIL). -
Parallel Methodology Evaluation: VMARO doesn't just pick the first idea. It drafts a primary methodology, constructs a challenger counter-approach, and objectively evaluates which design has stronger statistical power and feasibility.
-
Intent-Aware Preprocessing: Raw user input β whether a phrase or a paragraph β is normalized into a structured payload with domain classification, query variants, and explicit research intent (
survey_gaps,propose_methodology) before retrieval begins. Prevents garbage-in-garbage-out at the pipeline root. -
Institutional Format Matching: Automatically restructures and tunes rhetorical tone to align with rigorous schemas (e.g., NSF, NIH, ERC) using a dedicated Format Matcher. You can upload custom JSON format templates as well.
-
Stateful Resiliency: All outputs cache natively via
utils/cache.py. Process interrupted? The pipeline resumes immediately from the last checkpoint to save API credits.
[Research Topic]
β
0οΈβ£ Topic Normalization (Intent classification + query variant generation)
β
1οΈβ£ Literature Mining (Multi-API Fetcher: arXiv, PubMed, Scholar + LLM)
β
2οΈβ£ Thematic Tree Builder (Clusters into hierarchical themes) β π‘οΈ [Quality Gate 1]
β
3οΈβ£ Trend Analysis (Detects dominant/emerging signals)
β
4οΈβ£ Gap Identification (Auto-detects and ranks multiple research gaps) β π‘οΈ [Quality Gate 2]
β
[User Intervenes: Selects Gap or Defines Custom]
β
5οΈβ£ Methodology Evaluator (Drafts Primary vs Challenger Methodologies -> Selects Winner)
β
6οΈβ£ Format Selection (Matches winning approach to grant styles + User Override)
β
7οΈβ£ Grant Writing (Detailed content generation constrained by format schema)
β
8οΈβ£ Novelty Scoring (Coarse tree pass β Deep paper comparison β 0-100 Score)
β
[Streamlit Dashboard / LaTeX PDF Export]
git clone https://github.com/your-org/vmaro.git
cd vmaro
# Create and sync virtual environment
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtcp .env.example .envEdit the .env to map your respective accounts. VMARO leverages multiple providers (Gemini / Groq / AWS) dynamically, handling round-robin request pools to bypass restrictive free-tier rate limits.
# Foundational LLMs
GROQ_API_KEY_1=your_key
GEMINI_4_AWS_KEY_1=your_key
# External sources (optional, standard use bypasses these if not provided)
SEMANTIC_SCHOLAR_KEY=To let the automated orchestrator handle everything programmatically:
python main.py --topic "Federated Learning in Bioinformatics"Want to bypass the parallel methodology evaluation? Add the --no-parallel flag.
To utilize the dynamic visualizer (Agraph), manual gap selection intervention, and one-click Format/PDF generation:
streamlit run app.pyOpen http://localhost:8501 in your browser.
vmaro/
βββ agents/
β βββ literature_agent.py # Agent 1: Multi-API Fetch & Consolidate
β βββ tree_agent.py # Agent 2: Hierarchical Clustinger
β βββ trend_agent.py # Agent 3: Macro-Signals Identification
β βββ gap_agent.py # Agent 4: Target Discovery
β βββ methodology_agent.py # Agent 5a: Method generation
β βββ methodology_evaluator.py # Agent 5b: Primary vs Challenger eval
β βββ format_matcher.py # Agent 6: Matching proposal formats
β βββ grant_agent.py # Agent 7: Format-compliant Grant Writing
β βββ novelty_agent.py # Agent 8: Score validation
βββ utils/
β βββ multi_api_fetcher.py # Scholar, PubMed, Arxiv, CrossRef multiplexer
β βββ schema.py # Pydantic-like validations, LLM cleanup & Key rotation
β βββ quality_gate.py # Quality evaluator middleware
β βββ format_loader.py # Loads and registers JSON schemas for Grants
β βββ latex_exporter.py # Converts generated outputs to PDF / Tex
βββ app.py # Modern Streamlit UI application
βββ main.py # CrewAI Orchestrator Execution script
βββ ...
Capabilities:
- Deduplication: Multi-API fetches eliminate cross-source duplicates.
- Robust Fail-Safes: All keys are iterated cyclically.
clean_json_response()parses markdown-polluted LLM responses flawlessly.
Future Items:
- Paper count is intentionally bounded at 20 to optimize token efficiency and maintain coherent thematic clustering β larger corpora dilute signal without improving output quality at current LLM context limits.
- Deeper automated web-searching in the Methodology generation phase for specific up-to-date Python/R package implementations.







