-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Description
What would you like to be added?
I’ve spent the last few weeks developing a Remote API for Gemini CLI that allows for 100% feature
parity between the CLI and a remote interface (web/mobile). This enables long-running tasks: you can
start a complex refactoring job, disconnect, and reconnect later from another device to review
"thoughts," confirm tool calls, or interact with the shell.
Why is this needed?
- Motivation & Use Case
As a heavy user of Gemini CLI, I frequently find myself working in environments where I don't have
constant access to my primary workstation (e.g., commuting, working from a tablet or phone). While
the CLI is powerful, it currently requires a persistent local terminal session. - The Gap in Existing Protocols
While the project recently introduced the A2A (Agent-to-Agent) protocol via SSE, it is primarily
designed for inter-agent task execution. It lacks several features critical for a human-centric
remote UI:
- Session Persistence: A2A is task-based, whereas a remote UI needs to manage the entire session
history and system state. - Real-time PTY Streaming: A2A's shell output mechanism isn't optimized for interactive terminal
apps (ANSI codes, cursor positioning). - Zero-Latency Interaction: WebSocket provides the sub-millisecond latency required for a smooth
"live terminal" experience.
Additional context
- Proposed Solution
I propose adding an optionalRemoteApiServiceto packages/cli.
- Core Logic: A WebSocket server (based on ws) that acts as a bidirectional "mirror" of the CLI
state. - React Integration: A useRemoteApi hook that synchronizes uiState (history, status, RAM, tokens,
Git branch) and exposes uiActions (prompt submission, tool confirmation, stop generation). - PTY Proxy: Real-time streaming of shell output directly from ShellExecutionService.
- Mirroring: Any command sent via the Remote API is rendered in the local CLI as if typed manually,
ensuring a "single source of truth."
-
Security Model
Security is a top priority. The proposed implementation: -
Strict Localhost: The server is hard-coded to listen only on 127.0.0.1, preventing accidental
network exposure. -
Proxy-Ready: For remote access, users are encouraged to use a secure proxy/orchestrator (I have
a reference Rust implementation) that adds authentication and encryption. -
Optional Token: We can implement a shared secret/token generated at startup (similar to Jupyter)
to prevent unauthorized local applications from connecting. -
Precedent
The packages/devtools package already employs a WebSocketServer for log ingestion. This proposal
expands on that established pattern to provide active session control. -
Current Status: Working Prototype
I have developed a robust working prototype in my fork. While it successfully demonstrates the core
functionality (streaming PTY, syncing history, and handling confirmations), it is currently in an
"RFC-stage."
Some areas, such as strict TypeScript typing for internal events and edge-case error handling, would
benefit from collaboration with the core team to ensure they align perfectly with the project's
long-term architectural goals. I am eager to use this prototype as a foundation for a formal
implementation.
I am writing about this here as a general concept to understand whether the project is interested in such a feature. If interested, we will discuss the details and implement it as decided. My current implementation was developed “for myself” and therefore can only work as a prototype. The gif animation uses my gemini cli fork, as well as a Rust orchestrator that launches gemini cli in the desired folder and manages sessions, and a web interface. In this issue, I am only proposing to add a websocket api to gemini cli itself.
