Ligands | Devpost

Inspiration

Every disease comes down to proteins behaving badly, and every drug is a molecule that fits into that broken protein to fix it. All the data to find these molecules (protein structures, ligand databases, docking tools) is already free and online. But researchers still spend weeks manually stitching these tools together just to figure out which molecules are worth testing. That due diligence is the single biggest bottleneck in drug discovery. We wanted to build the AI that finally connects the dots.

What it does

A researcher describes their disease target and gets back a ranked list of the most promising drug candidates, with 3D visualizations showing exactly how each molecule fits into the protein. The system automatically searches public databases, prepares protein structures, runs multiple independent docking methods on cloud GPUs, cross-validates results, and explains why it ranked each candidate where it did. It compresses weeks of manual work into minutes.

How we built it

We built an LLM-orchestrated pipeline around open-source structure and docking tools. Proteins are resolved via UniProt and RCSB PDB, while ligands are resolved primarily through PubChem (with CCD support) and ChEMBL used for known-binder/approved-drug retrieval. Boltz-2 (structure prediction/affinity) and GNINA (CNN-scored docking) can run on Modal GPUs, including parallel execution through subagents. Predicted complexes are checked with PoseBusters and profiled with PLIP. Results are captured in typed envelopes and optional Postgres logging for provenance, with consensus synthesis currently handled at the agent/report layer rather than a dedicated ranking engine.

Challenges we ran into

Integrating the frontend and backend was a constant battle. Each component (protein resolution, docking engines, QC checks, interaction analysis) has its own data formats, output structures, and failure modes, and getting them all to talk to each other cleanly through the API took way more iteration than we expected. On the agent side, context engineering was a huge challenge. The LLM orchestrator needs enough scientific context to make good tool selection decisions, but too much context and it loses focus or makes contradictory calls. Getting the right information into the right tool calls at the right time, especially when chaining multiple steps where each depends on the previous output, required a lot of careful prompt design and structured output handling. We also ran into tricky issues with agent tool calls failing silently or returning partial results that looked valid but weren't, which forced us to build validation layers between every step.

Accomplishments that we're proud of

We gave the system the protein that causes chronic myeloid leukemia, mixed the actual FDA-approved drug (Gleevec) anonymously into a pool of 50 molecules, and our platform ranked it near the top. It independently rediscovered a real cancer drug. We also compressed a workflow that published NIH protocols say takes around 5 hours (for just the docking step) into under 15 minutes, while running more validation than most researchers have time to do manually.

What we learned

The data and tools to accelerate drug discovery are already out there. The missing piece was intelligent orchestration. We also learned that multi-engine consensus isn't just a nice-to-have. The cases where Boltz-2 and GNINA disagree are often the most scientifically interesting and catch errors that either method alone would miss. And we gained a deep appreciation for how much tedious manual work computational biologists deal with daily.

What's next for Ligands

The immediate next step is talking to actual researchers and scientists here at UIUC. We want to understand their real workflows, figure out where the biggest pain points are, and see if we can get them to try our platform on their own projects. Beyond that, we want to scale to full library screening with hundreds or thousands of candidates per run, add more docking engines for stronger consensus, and validate across more disease targets like Alzheimer's, Parkinson's, and rare diseases that pharma companies underinvest in because the markets are too small. The infrastructure is disease-agnostic, and we want to make it accessible to any researcher with a question.