A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.
-
Updated
Feb 27, 2024 - Python
A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.
Lightweight 3-tier memory and context management for AI agents. Token-budget-aware, SQLite-backed, zero-config. Works with Claude, OpenAI, or any LLM agent system.
Add a description, image, and links to the chatbot-memory topic page so that developers can more easily learn about it.
To associate your repository with the chatbot-memory topic, visit your repo's landing page and select "manage topics."