Skip to content
View jakep-allenai's full-sized avatar
  • Seattle, WA
  • 14:08 (UTC -07:00)

Block or report jakep-allenai

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A fast AWS S3 browser, with inspiration from s5cmd

Rust 13 3 Updated May 29, 2026

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 72,877 8,904 Updated Jun 30, 2026

Rapid fuzzy string matching in Python using various string metrics

Python 3,981 160 Updated Jun 22, 2026

Access a database of word frequencies, in various natural languages.

Python 1,682 113 Updated Jan 4, 2025

🚀 Efficient implementations for emerging model architectures

Python 5,285 569 Updated Jul 1, 2026

utilities for batched llm calls with retries

Python 51 2 Updated Jun 12, 2026

High performance and CommonMark compliant HTML to Markdown converter. Maintained by the Kreuzberg team. Kreuzberg is a fast, polyglot document intelligence engine with a Rust core. It extracts stru…

HTML 791 62 Updated Jul 1, 2026

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Python 551 63 Updated Jul 20, 2025

High-performance In-browser LLM Inference Engine

TypeScript 18,289 1,316 Updated Jun 9, 2026

Nano vLLM

Python 14,256 2,272 Updated Apr 26, 2026

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

Python 9,803 1,179 Updated Jul 1, 2026

Next-generation Punkt sentence boundary detection with zero dependencies

Python 31 1 Updated Nov 18, 2025

OCR & Document Extraction using vision models

TypeScript 12,240 846 Updated May 20, 2025

OCR Benchmark

TypeScript 636 53 Updated Oct 21, 2025

OLMost every training recipe you need to perform data interventions with the OLMo family of models.

Python 72 19 Updated May 29, 2026

qpdf: A content-preserving PDF document transformer

C++ 5,187 387 Updated Jun 19, 2026

A pipeline for performing OCR on historical newspapers

Python 7 2 Updated Jan 22, 2026

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.

Perl 1,279 145 Updated Jul 1, 2026

A computer algebra system written in pure Python

Python 14,732 5,359 Updated Jul 1, 2026
Python 36 3 Updated Jan 17, 2026

Tile primitives for speedy kernels

Cuda 3,503 300 Updated Jun 15, 2026

Toolkit for linearizing PDFs for LLM datasets/training

Python 18,220 1,500 Updated Mar 25, 2026

📦 Repomix is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) or other AI tools lik…

TypeScript 26,753 1,410 Updated Jul 1, 2026

Synthetic data curation for post-training and structured data extraction

Python 1,696 142 Updated Jun 21, 2026

Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.

Rust 86,295 9,341 Updated Jul 1, 2026

Parallel S3 and local filesystem execution tool.

Go 4,095 340 Updated Jun 13, 2025

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 29,871 6,853 Updated Jul 1, 2026

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Python 10,100 1,592 Updated Jun 30, 2026

Streaming replication for SQLite.

Go 13,772 361 Updated Jul 1, 2026

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 4,147 336 Updated Jul 1, 2026
Next