experimental. i wanted to build something to turn my old android devices into ai agents. after a few hours reverse engineering accessibility trees and the kernel and playing with tailscale.. it worked.
ai agent that controls your android phone. give it a goal in plain english - it figures out what to tap, type, and swipe. it reads the screen, asks an llm what to do, executes via adb, and repeats until the job is done.
one of the biggest things it can do right now is delegate incoming requests to chatgpt, gemini, or google search on the device... and give us the result back. few years back we could run this kind of automation with predefined flows. now think of this as automation with ai intelligence... it can do stuff. you don't need to worry about messy api's. just install your fav apps, write workflows or tell them on the fly. it will get it done.
$ bun run src/kernel.ts
enter your goal: open youtube and search for "lofi hip hop"
--- step 1/30 ---
think: i'm on the home screen. launching youtube.
action: launch (842ms)
--- step 2/30 ---
think: youtube is open. tapping search icon.
action: tap (623ms)
--- step 3/30 ---
think: search field focused.
action: type "lofi hip hop" (501ms)
--- step 4/30 ---
action: enter (389ms)
--- step 5/30 ---
think: search results showing. done.
action: done (412ms)
curl -fsSL https://droidclaw.ai/install.sh | shinstalls bun and adb if missing, clones the repo, sets up .env. or do it manually:
# install adb
brew install android-platform-tools
# install bun (required — npm/node won't work)
curl -fsSL https://bun.sh/install | bash
# clone and setup
git clone https://github.com/unitedbyai/droidclaw.git
cd droidclaw && bun install
cp .env.example .envnote: droidclaw requires bun, not node/npm. it uses bun-specific apis (
Bun.spawnSync, native.envloading) that don't exist in node.
edit .env - fastest way to start is with groq (free tier):
LLM_PROVIDER=groq
GROQ_API_KEY=gsk_your_key_hereor run fully local with ollama (no api key needed):
ollama pull llama3.2
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.2connect your phone (usb debugging on):
adb devices # should show your device
bun run src/kernel.tsthat's the simplest way - just type a goal and let the agent figure it out. but for anything you want to run repeatedly, there are two modes: workflows and flows.
workflows are ai-powered. you describe goals in natural language, and the llm decides how to navigate, what to tap, what to type. use these when the ui might change, when you need the agent to think, or when chaining goals across multiple apps.
bun run src/kernel.ts --workflow examples/workflows/research/weather-to-whatsapp.jsoneach workflow is a json file - just a name and a list of steps:
{
"name": "weather to whatsapp",
"steps": [
{ "app": "com.google.android.googlequicksearchbox", "goal": "search for chennai weather today" },
{ "goal": "share the result to whatsapp contact Sanju" }
]
}you can also pass form data into steps when you need to inject specific text:
{
"name": "slack standup",
"steps": [
{
"app": "com.Slack",
"goal": "open #standup channel, type the message and send it",
"formData": { "Message": "yesterday: api integration\ntoday: tests\nblockers: none" }
}
]
}35 ready-to-use workflows organised by category:
messaging - whatsapp, telegram, slack, email
- slack-standup - post daily standup to a channel
- whatsapp-broadcast - send a message to multiple contacts
- telegram-send-message - send a telegram message
- email-reply - draft and send an email reply
- whatsapp-to-email - forward whatsapp messages to email
- slack-check-messages - read unread slack messages
- email-digest - summarise recent emails
- telegram-channel-digest - digest a telegram channel
- whatsapp-reply - reply to a whatsapp message
- send-whatsapp-vi - send whatsapp to a specific contact
social - instagram, youtube, cross-posting
- social-media-post - post across platforms
- social-media-engage - like/comment on posts
- instagram-post-check - check recent instagram posts
- youtube-watch-later - save videos to watch later
productivity - calendar, notes, github, notifications
- morning-briefing - read messages, calendar, weather across apps
- github-check-prs - check open pull requests
- calendar-create-event - create a calendar event
- notes-capture - capture a quick note
- notification-cleanup - clear and triage notifications
- screenshot-share-slack - screenshot and share to slack
- translate-and-reply - translate a message and reply
- logistics-workflow - multi-app logistics coordination
research - search, compare, monitor
- weather-to-whatsapp - get weather via google ai mode, share to whatsapp
- multi-app-research - research across multiple apps
- price-comparison - compare prices across shopping apps
- news-roundup - collect news from multiple sources
- google-search-report - search google and save results
- check-flight-status - check flight status
lifestyle - food, transport, music, fitness
- food-order - order food from a delivery app
- uber-ride - book an uber ride
- spotify-playlist - create or add to a spotify playlist
- maps-commute - check commute time
- fitness-log - log a workout
- expense-tracker - log an expense
- wifi-password-share - share wifi password
- do-not-disturb - toggle do not disturb with exceptions
for tasks where you don't need ai thinking at all - just a fixed sequence of taps and types. no llm calls, instant execution. good for things you do exactly the same way every time.
bun run src/kernel.ts --flow examples/flows/send-whatsapp.yamlappId: com.whatsapp
name: Send WhatsApp Message
---
- launchApp
- wait: 2
- tap: "Contact Name"
- wait: 1
- tap: "Message"
- type: "hello from droidclaw"
- tap: "Send"
- done: "Message sent"5 flow templates in examples/flows/:
- send-whatsapp - send a whatsapp message
- google-search - run a google search
- create-contact - add a new contact
- clear-notifications - clear all notifications
- toggle-wifi - toggle wifi on/off
| workflows | flows | |
|---|---|---|
| format | json | yaml |
| uses ai | yes | no |
| handles ui changes | yes | no |
| speed | slower (llm calls) | instant |
| best for | complex/multi-app tasks | simple repeatable tasks |
| provider | cost | vision | notes |
|---|---|---|---|
| groq | free tier | no | fastest to start |
| ollama | free (local) | yes* | no api key, runs on your machine |
| openrouter | per token | yes | 200+ models |
| openai | per token | yes | gpt-4o |
| bedrock | per token | yes | claude on aws |
*ollama vision requires a vision model like llama3.2-vision or llava
all in .env:
| key | default | what |
|---|---|---|
MAX_STEPS |
30 | steps before giving up |
STEP_DELAY |
2 | seconds between actions |
STUCK_THRESHOLD |
3 | steps before stuck recovery |
VISION_MODE |
fallback | off / fallback / always |
MAX_ELEMENTS |
40 | ui elements sent to llm |
each step: dump accessibility tree → filter elements → send to llm → execute action → repeat.
the llm thinks before acting - returns { think, plan, action }. if the screen doesn't change for 3 steps, stuck recovery kicks in. when the accessibility tree is empty (webviews, flutter), it falls back to screenshots.
src/
kernel.ts main loop
actions.ts 22 actions + adb retry
skills.ts 6 multi-step skills
workflow.ts workflow orchestration
flow.ts yaml flow runner
llm-providers.ts 5 providers + system prompt
sanitizer.ts accessibility xml parser
config.ts env config
constants.ts keycodes, coordinates
logger.ts session logging
the default setup is usb - phone plugged into your laptop. but you can go further.
install tailscale on both your android device and your laptop/vps. once they're on the same tailnet, connect adb over the network:
# on your phone: enable wireless debugging (developer options → wireless debugging)
# note the ip:port shown on the screen
# from your laptop/vps, anywhere in the world:
adb connect <phone-tailscale-ip>:<port>
adb devices # should show your phone
bun run src/kernel.tsnow your phone is a remote ai agent. leave it on a desk, plugged into power, and control it from your vps, your laptop at a cafe, or a cron job running workflows at 8am every morning. the phone doesn't need to be on the same wifi or even in the same country.
this is what makes old android devices useful again - they become always-on agents that can do things on apps that don't have api's.
"adb: command not found" - install adb or set ADB_PATH in .env
"no devices found" - check usb debugging is on, tap "allow" on the phone
agent repeating - stuck detection handles this. if it persists, use a better model
built by unitedby.ai — an open ai community
droidclaw's workflow orchestration was influenced by android action kernel from action state labs. we took the core idea of sub-goal decomposition and built a different system around it — with stuck recovery, 22 actions, multi-step skills, and vision fallback.
mit