[Proposal] Stateful Remote WebSocket API for Interactive Control

What would you like to be added?

I’ve spent the last few weeks developing a Remote API for Gemini CLI that allows for 100% feature
parity between the CLI and a remote interface (web/mobile). This enables long-running tasks: you can
start a complex refactoring job, disconnect, and reconnect later from another device to review
"thoughts," confirm tool calls, or interact with the shell.

Why is this needed?

Motivation & Use Case
As a heavy user of Gemini CLI, I frequently find myself working in environments where I don't have
constant access to my primary workstation (e.g., commuting, working from a tablet or phone). While
the CLI is powerful, it currently requires a persistent local terminal session.
The Gap in Existing Protocols
While the project recently introduced the A2A (Agent-to-Agent) protocol via SSE, it is primarily
designed for inter-agent task execution. It lacks several features critical for a human-centric
remote UI:

Session Persistence: A2A is task-based, whereas a remote UI needs to manage the entire session
history and system state.
Real-time PTY Streaming: A2A's shell output mechanism isn't optimized for interactive terminal
apps (ANSI codes, cursor positioning).
Zero-Latency Interaction: WebSocket provides the sub-millisecond latency required for a smooth
"live terminal" experience.

Additional context

Proposed Solution
I propose adding an optional RemoteApiService to packages/cli.

Core Logic: A WebSocket server (based on ws) that acts as a bidirectional "mirror" of the CLI
state.
React Integration: A useRemoteApi hook that synchronizes uiState (history, status, RAM, tokens,
Git branch) and exposes uiActions (prompt submission, tool confirmation, stop generation).
PTY Proxy: Real-time streaming of shell output directly from ShellExecutionService.
Mirroring: Any command sent via the Remote API is rendered in the local CLI as if typed manually,
ensuring a "single source of truth."

Security Model
Security is a top priority. The proposed implementation:
Strict Localhost: The server is hard-coded to listen only on 127.0.0.1, preventing accidental
network exposure.
Proxy-Ready: For remote access, users are encouraged to use a secure proxy/orchestrator (I have
a reference Rust implementation) that adds authentication and encryption.
Optional Token: We can implement a shared secret/token generated at startup (similar to Jupyter)
to prevent unauthorized local applications from connecting.
Precedent
The packages/devtools package already employs a WebSocketServer for log ingestion. This proposal
expands on that established pattern to provide active session control.
Current Status: Working Prototype
I have developed a robust working prototype in my fork. While it successfully demonstrates the core
functionality (streaming PTY, syncing history, and handling confirmations), it is currently in an
"RFC-stage."

Some areas, such as strict TypeScript typing for internal events and edge-case error handling, would
benefit from collaboration with the core team to ensure they align perfectly with the project's
long-term architectural goals. I am eager to use this prototype as a foundation for a formal
implementation.

I am writing about this here as a general concept to understand whether the project is interested in such a feature. If interested, we will discuss the details and implement it as decided. My current implementation was developed “for myself” and therefore can only work as a prototype. The gif animation uses my gemini cli fork, as well as a Rust orchestrator that launches gemini cli in the desired folder and manages sessions, and a web interface. In this issue, I am only proposing to add a websocket api to gemini cli itself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Stateful Remote WebSocket API for Interactive Control #20782

What would you like to be added?

Why is this needed?

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Proposal] Stateful Remote WebSocket API for Interactive Control #20782

Description

What would you like to be added?

Why is this needed?

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions