Building a Desktop LLM App with cpp-httplib

Have you ever wanted to add a web API to your own C++ library, or quickly build an Electron-like desktop app? In Rust you might reach for "Tauri + axum," but in C++ it always seemed out of reach.

With cpp-httplib, webview/webview, and cpp-embedlib, you can take the same approach in pure C++ — and produce a small, easy-to-distribute single binary.

In this tutorial we build an LLM-powered translation app using llama.cpp, progressing step by step from "REST API" to "SSE streaming" to "Web UI" to "desktop app." Translation is just the vehicle — replace llama.cpp with your own library and the same architecture works for any application.

Desktop App

If you know basic C++17 and understand the basics of HTTP / REST APIs, you're ready to start.

Chapters

Set up the project — Fetch dependencies, configure the build, write scaffold code
Embed llama.cpp and create a REST API — Return translation results as JSON
Add token streaming with SSE — Stream responses token by token
Add model discovery and management — Download and switch models from Hugging Face
Add a Web UI — A browser-based translation interface
Turn it into a desktop app with WebView — A single-binary desktop application
Reading the llama.cpp server source code — Compare with production-quality code
Making it your own — Swap in your own library and customize