Running LLM Models Locally on NVIDIA Jetson Orin Nano with Yzma I'm excited to share my experience getting Yzma working on the NVIDIA Jetson Orin Nano - a journey that went from "impossible" to "working" in just 3 days! What is Yzma? Yzma is a Go library that enables running LLM and VLM models locally from llama.cpp, with hardware acceleration (CUDA, Metal, Vulkan) integrated directly into your applications - no external server or CGo required. The Challenge: The libraries needed to run Yzma on Jetson (ARM64 architecture) didn't exist. After opening an issue on the project, Ron Evans jumped in and made it happen within days. What I Cover in My Tutorial: ✅ Downloading and configuring the necessary ARM64 libraries (CUDA/Vulkan) ✅ Setting up the development environment on Jetson ✅ Writing your first Yzma program ✅ Running local LLM inference with hardware acceleration Bonus Discovery: William Kennedy has started the Kronk project - a library that provides an OpenAI-like API for Yzma. This opens up even more possibilities for edge AI applications! This is a great example of the power of open-source collaboration and the growing ecosystem around edge AI devices. If you're working with edge computing, AI at the edge, or just curious about running LLMs on embedded devices, check out my detailed setup guide: https://lnkd.in/eKwb_s4F A huge thank you to Ron Evans for the incredible support! 🙏 #EdgeAI #MachineLearning #Golang #NVIDIA #Jetson #OpenSource #AI #DeveloperCommunity #EdgeComputing
I'm excited to see what people are doing with yzma! Thanks for the awesome post!
Kronk is finally close to having a stable API. Starting yesterday there is support for tool calling and jinja templates. Without Ron none of this exists.