Skip to content

[Proposal] Stateful Remote WebSocket API for Interactive Control #20782

@pavver

Description

@pavver

What would you like to be added?

I’ve spent the last few weeks developing a Remote API for Gemini CLI that allows for 100% feature
parity between the CLI and a remote interface (web/mobile). This enables long-running tasks: you can
start a complex refactoring job, disconnect, and reconnect later from another device to review
"thoughts," confirm tool calls, or interact with the shell.

Why is this needed?

  1. Motivation & Use Case
    As a heavy user of Gemini CLI, I frequently find myself working in environments where I don't have
    constant access to my primary workstation (e.g., commuting, working from a tablet or phone). While
    the CLI is powerful, it currently requires a persistent local terminal session.
  2. The Gap in Existing Protocols
    While the project recently introduced the A2A (Agent-to-Agent) protocol via SSE, it is primarily
    designed for inter-agent task execution. It lacks several features critical for a human-centric
    remote UI:
  • Session Persistence: A2A is task-based, whereas a remote UI needs to manage the entire session
    history and system state.
  • Real-time PTY Streaming: A2A's shell output mechanism isn't optimized for interactive terminal
    apps (ANSI codes, cursor positioning).
  • Zero-Latency Interaction: WebSocket provides the sub-millisecond latency required for a smooth
    "live terminal" experience.

Additional context

  1. Proposed Solution
    I propose adding an optional RemoteApiService to packages/cli.
  • Core Logic: A WebSocket server (based on ws) that acts as a bidirectional "mirror" of the CLI
    state.
  • React Integration: A useRemoteApi hook that synchronizes uiState (history, status, RAM, tokens,
    Git branch) and exposes uiActions (prompt submission, tool confirmation, stop generation).
  • PTY Proxy: Real-time streaming of shell output directly from ShellExecutionService.
  • Mirroring: Any command sent via the Remote API is rendered in the local CLI as if typed manually,
    ensuring a "single source of truth."
  1. Security Model
    Security is a top priority. The proposed implementation:

  2. Strict Localhost: The server is hard-coded to listen only on 127.0.0.1, preventing accidental
    network exposure.

  3. Proxy-Ready: For remote access, users are encouraged to use a secure proxy/orchestrator (I have
    a reference Rust implementation) that adds authentication and encryption.

  4. Optional Token: We can implement a shared secret/token generated at startup (similar to Jupyter)
    to prevent unauthorized local applications from connecting.

  5. Precedent
    The packages/devtools package already employs a WebSocketServer for log ingestion. This proposal
    expands on that established pattern to provide active session control.

  6. Current Status: Working Prototype
    I have developed a robust working prototype in my fork. While it successfully demonstrates the core
    functionality (streaming PTY, syncing history, and handling confirmations), it is currently in an
    "RFC-stage."

Some areas, such as strict TypeScript typing for internal events and edge-case error handling, would
benefit from collaboration with the core team to ensure they align perfectly with the project's
long-term architectural goals. I am eager to use this prototype as a foundation for a formal
implementation.

Image

I am writing about this here as a general concept to understand whether the project is interested in such a feature. If interested, we will discuss the details and implement it as decided. My current implementation was developed “for myself” and therefore can only work as a prototype. The gif animation uses my gemini cli fork, as well as a Rust orchestrator that launches gemini cli in the desired folder and manages sessions, and a web interface. In this issue, I am only proposing to add a websocket api to gemini cli itself.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/coreIssues related to User Interface, OS Support, Core Functionalitystatus/need-triageIssues that need to be triaged by the triage automation.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions