Skip to content

turn old phones into ai agents - give it a goal in plain english. it reads the screen, thinks about what to do, taps and types via adb, and repeats until the job is done.

Notifications You must be signed in to change notification settings

Shubhamsaboo/droidclaw

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

droidclaw

experimental. i wanted to build something to turn my old android devices into ai agents. after a few hours reverse engineering accessibility trees and the kernel and playing with tailscale.. it worked.

ai agent that controls your android phone. give it a goal in plain english - it figures out what to tap, type, and swipe. it reads the screen, asks an llm what to do, executes via adb, and repeats until the job is done.

one of the biggest things it can do right now is delegate incoming requests to chatgpt, gemini, or google search on the device... and give us the result back. few years back we could run this kind of automation with predefined flows. now think of this as automation with ai intelligence... it can do stuff. you don't need to worry about messy api's. just install your fav apps, write workflows or tell them on the fly. it will get it done.

$ bun run src/kernel.ts
enter your goal: open youtube and search for "lofi hip hop"

--- step 1/30 ---
think: i'm on the home screen. launching youtube.
action: launch (842ms)

--- step 2/30 ---
think: youtube is open. tapping search icon.
action: tap (623ms)

--- step 3/30 ---
think: search field focused.
action: type "lofi hip hop" (501ms)

--- step 4/30 ---
action: enter (389ms)

--- step 5/30 ---
think: search results showing. done.
action: done (412ms)

setup

curl -fsSL https://droidclaw.ai/install.sh | sh

installs bun and adb if missing, clones the repo, sets up .env. or do it manually:

# install adb
brew install android-platform-tools

# install bun (required — npm/node won't work)
curl -fsSL https://bun.sh/install | bash

# clone and setup
git clone https://github.com/unitedbyai/droidclaw.git
cd droidclaw && bun install
cp .env.example .env

note: droidclaw requires bun, not node/npm. it uses bun-specific apis (Bun.spawnSync, native .env loading) that don't exist in node.

edit .env - fastest way to start is with groq (free tier):

LLM_PROVIDER=groq
GROQ_API_KEY=gsk_your_key_here

or run fully local with ollama (no api key needed):

ollama pull llama3.2
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.2

connect your phone (usb debugging on):

adb devices   # should show your device
bun run src/kernel.ts

that's the simplest way - just type a goal and let the agent figure it out. but for anything you want to run repeatedly, there are two modes: workflows and flows.

workflows

workflows are ai-powered. you describe goals in natural language, and the llm decides how to navigate, what to tap, what to type. use these when the ui might change, when you need the agent to think, or when chaining goals across multiple apps.

bun run src/kernel.ts --workflow examples/workflows/research/weather-to-whatsapp.json

each workflow is a json file - just a name and a list of steps:

{
  "name": "weather to whatsapp",
  "steps": [
    { "app": "com.google.android.googlequicksearchbox", "goal": "search for chennai weather today" },
    { "goal": "share the result to whatsapp contact Sanju" }
  ]
}

you can also pass form data into steps when you need to inject specific text:

{
  "name": "slack standup",
  "steps": [
    {
      "app": "com.Slack",
      "goal": "open #standup channel, type the message and send it",
      "formData": { "Message": "yesterday: api integration\ntoday: tests\nblockers: none" }
    }
  ]
}

examples

35 ready-to-use workflows organised by category:

messaging - whatsapp, telegram, slack, email

social - instagram, youtube, cross-posting

productivity - calendar, notes, github, notifications

research - search, compare, monitor

lifestyle - food, transport, music, fitness

flows

for tasks where you don't need ai thinking at all - just a fixed sequence of taps and types. no llm calls, instant execution. good for things you do exactly the same way every time.

bun run src/kernel.ts --flow examples/flows/send-whatsapp.yaml
appId: com.whatsapp
name: Send WhatsApp Message
---
- launchApp
- wait: 2
- tap: "Contact Name"
- wait: 1
- tap: "Message"
- type: "hello from droidclaw"
- tap: "Send"
- done: "Message sent"

examples

5 flow templates in examples/flows/:

quick comparison

workflows flows
format json yaml
uses ai yes no
handles ui changes yes no
speed slower (llm calls) instant
best for complex/multi-app tasks simple repeatable tasks

providers

provider cost vision notes
groq free tier no fastest to start
ollama free (local) yes* no api key, runs on your machine
openrouter per token yes 200+ models
openai per token yes gpt-4o
bedrock per token yes claude on aws

*ollama vision requires a vision model like llama3.2-vision or llava

config

all in .env:

key default what
MAX_STEPS 30 steps before giving up
STEP_DELAY 2 seconds between actions
STUCK_THRESHOLD 3 steps before stuck recovery
VISION_MODE fallback off / fallback / always
MAX_ELEMENTS 40 ui elements sent to llm

how it works

each step: dump accessibility tree → filter elements → send to llm → execute action → repeat.

the llm thinks before acting - returns { think, plan, action }. if the screen doesn't change for 3 steps, stuck recovery kicks in. when the accessibility tree is empty (webviews, flutter), it falls back to screenshots.

source

src/
  kernel.ts          main loop
  actions.ts         22 actions + adb retry
  skills.ts          6 multi-step skills
  workflow.ts        workflow orchestration
  flow.ts            yaml flow runner
  llm-providers.ts   5 providers + system prompt
  sanitizer.ts       accessibility xml parser
  config.ts          env config
  constants.ts       keycodes, coordinates
  logger.ts          session logging

remote control with tailscale

the default setup is usb - phone plugged into your laptop. but you can go further.

install tailscale on both your android device and your laptop/vps. once they're on the same tailnet, connect adb over the network:

# on your phone: enable wireless debugging (developer options → wireless debugging)
# note the ip:port shown on the screen

# from your laptop/vps, anywhere in the world:
adb connect <phone-tailscale-ip>:<port>
adb devices   # should show your phone

bun run src/kernel.ts

now your phone is a remote ai agent. leave it on a desk, plugged into power, and control it from your vps, your laptop at a cafe, or a cron job running workflows at 8am every morning. the phone doesn't need to be on the same wifi or even in the same country.

this is what makes old android devices useful again - they become always-on agents that can do things on apps that don't have api's.

troubleshooting

"adb: command not found" - install adb or set ADB_PATH in .env

"no devices found" - check usb debugging is on, tap "allow" on the phone

agent repeating - stuck detection handles this. if it persists, use a better model

contributors

built by unitedby.ai — an open ai community

acknowledgements

droidclaw's workflow orchestration was influenced by android action kernel from action state labs. we took the core idea of sub-goal decomposition and built a different system around it — with stuck recovery, 22 actions, multi-step skills, and vision fallback.

license

mit

About

turn old phones into ai agents - give it a goal in plain english. it reads the screen, thinks about what to do, taps and types via adb, and repeats until the job is done.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 66.9%
  • HTML 29.6%
  • Shell 3.5%