Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Plug and play cpu percentage and icon indicator for Tmux.
🎮 ⌨ An easy to use tool to change the behaviour of your input devices.
Lightweight coding agent that runs in your terminal
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
gunicorn 'Green Unicorn' is a WSGI HTTP Server for UNIX, fast clients and sleepy applications.
Simple, powerful, and fast logging for Python.
Segment a given audio into utterances using a trained end-to-end ASR model.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Gemma open-weight LLM library, from Google DeepMind
The official Python library for the OpenAI API
verl: Volcano Engine Reinforcement Learning for LLMs
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients. Published in Nature.
DataComp: In search of the next generation of multimodal datasets
Qwen3Guard is a multilingual guardrail model series developed by the Qwen team at Alibaba Cloud.

