Skip to content
View kyleclo's full-sized avatar

Organizations

@solstat

Block or report kyleclo

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
15 stars written in Python
Clear filter

TensorFlow code and pre-trained models for BERT

Python 40,045 9,703 Updated Jul 23, 2024

Code for the paper "Language Models are Unsupervised Multitask Learners"

Python 24,976 5,893 Updated Aug 14, 2024

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 6,925 406 Updated Mar 27, 2026

Modeling, training, eval, and inference code for OLMo

Python 6,568 777 Updated Nov 24, 2025

Library to scrape and clean web pages to create massive datasets.

Python 2,266 323 Updated Nov 11, 2020

A full spaCy pipeline and models for scientific/biomedical documents.

Python 1,970 257 Updated Dec 4, 2025

Code for Defending Against Neural Fake News, https://rowanzellers.com/grover/

Python 917 218 Updated May 22, 2023

Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)

Python 469 93 Updated Apr 11, 2024

Code for collecting, processing, and preparing datasets for the Common Pile

Python 258 26 Updated Feb 11, 2026

We evaluate many models used for biomedical and clinical nlp tasks, and train new models that perform much better.

Python 164 27 Updated Jul 29, 2021

An Interactive Tool for Scalable and Reproducible Error Analysis.

Python 109 11 Updated Jul 22, 2021

Code and Data for Evaluation WG

Python 42 24 Updated May 4, 2022

Code for the paper SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts (AKBC 2021). https://openreview.net/forum?id=OFLbgUP04nC

Python 30 4 Updated Oct 17, 2021

A large (>5k) collection of search questions asked about Coronavirus 🦠

Python 14 1 Updated Mar 21, 2020