π Website Β· π¦ Download Β· π Report Bug Β· π Changelog
A high-performance website crawler built for serious SEO audits. Targets 1M+ URLs on a single machine, with a dense Screaming Frogβstyle UI, 150+ SEO issue checks, and zero native dependencies.
FreeCrawl is a desktop SEO crawler that scales to 1M+ URLs on a single machine at 80β150 URL/s via undici with keep-alive, with optional JavaScript rendering through headless Chromium (Playwright) that captures the post-JS DOM, full-page / above-fold / mobile screenshots, LCP candidate elements, and a per-URL mobile usability audit. It runs 150+ on-page SEO checks across 30 top-level tabs β exact and near-duplicate clustering (SimHash + LSH) with a dedicated Cluster view, full hreflang validation (reciprocity, self-reference, inconsistent lang), OpenGraph / Twitter Card / JSON-LD / Web App Manifest parsing, AMP smoke validation, structured-data validation across 17 schema types (duplicate @id, malformed @type, missing required + Google-recommended props), a configurable performance budget (response-time / page-size / LCP / CLS ceilings), WCAG accessibility checks (<main> landmark, skip-link, ARIA roles, heading order), security headers + SSL/TLS chain audit + active/passive mixed content split, readability scores (Flesch, FleschβKincaid, Gunning Fog), and a 10-rule custom extraction engine β CSS selectors, XPath, regex, and JSONPath (for JSON API responses) β with a live Preview dialog and JSON import / export for sharing rule sets.
The dense dark UI renders virtualized 1M-row tables that live-stream rows during a crawl, with List β Tree β Cluster view toggles, column pin (sticky-left) + drag-to-reorder + show/hide, advanced AND/OR filters across 24 fields Γ 12 operators, per-tab quick-filter dropdowns, a 16-tab Details panel per URL (URL Details, Outline, Inlinks, Outlinks, Images, Resources, Extracted Data, SERP Snippet, HTTP Headers, Cookies, Structured Data, View Source, View Rendered, Screenshot, Duplicates, Analytics), an English + Turkish UI including every Settings panel, in-app scheduled crawls (hourly / daily / weekly / custom), a live memory monitor, robots.txt syntax validator, URL rewriting with live preview, project-vs-project compare diff, a Cytoscape link graph in its own native window (force-directed, BFS / radial / Sugiyama tree, and directory-tree layouts, a By-LCP above-the-fold colour overlay, and a click-to-trace Crawl Path Report that highlights the shortest discovery path from the homepage to any page), a standalone Log File Analyzer window (Apache / Nginx / IIS / custom access logs β bot hits per URL, crawl budget, response-code distribution, daily trend, and crawl Γ log orphan detection β with 40+ bot detection, optional reverse-DNS verification, per-bot URL filtering, and one-click CSV / Excel export of every table), and a .seoproject file association for OS double-click.
Integrations cover Google Search Console (clicks, impressions, CTR, position + URL Inspection coverage), Google Analytics 4 (sessions, users, bounce, engagement), PageSpeed Insights Lighthouse audits, custom AI prompts via OpenAI / Anthropic Claude / local Ollama with {url}/{title}/{description}/{h1}/{body} variables, and SEO authority providers (Ahrefs / Majestic / Moz / Semrush) behind a single dropdown β each integration card now ships with an in-app step-by-step Setup Guide modal in English and Turkish that walks you through OAuth client creation, test-user setup, API enablement, and the most common errors. Exports go through a unified Export Crawl Data dialog (Excel .xlsx / CSV UTF-8 / JSON / XML with hierarchical category selection and nested folder output) plus a standalone HTML audit report, sitemap generator (standard / image / hreflang / sharded / gzipped), and direct streaming to Google Sheets and BigQuery. Project files can be saved as password-protected encrypted snapshots (.seoproject.enc, AES-256-GCM + PBKDF2). The MCP server exposes 90 tools that let Claude Code or any MCP client drive crawls live, query every UI surface, run single or bulk exports, save / open encrypted project snapshots, fetch GSC/GA4 data on demand, and modify settings β every action a human user can take in the desktop is callable from an agent. Crawl-completion webhooks, OS notifications, 22 built-in reports (histograms, top/bottom URLs, link positions, top words, cross-source orphans), per-URL Duplicates view, and a custom-CSS theme override round out the suite. Everything runs fully local β no telemetry, no cloud, MIT-style license.
| Layer | Choice |
|---|---|
| π’ Runtime | Node.js 22 LTS+ (ESM-first) |
| π Language | TypeScript 5.7+ strict |
| πͺ Desktop shell | Electron 41 |
| β‘ Build | electron-vite 5 / Vite 7 |
| π¨ UI | React 19 + Tailwind 3.4 + Zustand 5 |
| π Tables | @tanstack/react-table + @tanstack/react-virtual |
| π HTTP | undici 8 |
| π HTML parse | cheerio (htmlparser2 fast path) |
| π₯ Queue | p-queue 8 |
| π€ robots | robots-parser 3 |
| πΎ Storage | node:sqlite + WAL β zero native deps |
| π¦ Distribution | electron-builder 26 |
Tip
End users: download the prebuilt installer from the Releases page β no setup required.
πͺ Windows β easiest path is the .bat launcher
Double-click FreeCrawl-SEO-Tool-Start.bat at the repo root. It verifies Node.js, runs npm install on first launch, then starts the app with npm run dev.
Don't want to install? Grab the portable
.exefrom the Releases page β runs without installation.
Or manually:
git clone https://github.com/kemalai/FreeCrawl-SEO-Tool.git
cd FreeCrawl-SEO-Tool
npm install
npm run devπ macOS β Apple Silicon + Intel
Easiest path is the FreeCrawl-SEO-Tool-Start.sh launcher at the repo root β same one-click flow as the Windows .bat (verifies Node, prompts to install on first run, then starts the app).
chmod +x FreeCrawl-SEO-Tool-Start.sh
./FreeCrawl-SEO-Tool-Start.shOr manually:
# 1. Install prerequisites (skip any you already have)
brew install node@22 git
xcode-select --install # Command Line Tools β required once
# 2. Clone and run
git clone https://github.com/kemalai/FreeCrawl-SEO-Tool.git
cd FreeCrawl-SEO-Tool
npm install
npm run devIf macOS Gatekeeper blocks an unsigned local build ("App is damaged"):
xattr -cr "/Applications/FreeCrawl SEO.app"π§ Linux β Debian / Ubuntu / Fedora / Arch
Easiest path is the FreeCrawl-SEO-Tool-Start.sh launcher at the repo root (same as macOS).
Prebuilt installers are available for all three families: .AppImage (universal), .deb (Debian / Ubuntu), and .rpm (Fedora / RHEL).
# 1. Install Node.js 22 LTS (Debian / Ubuntu via NodeSource)
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt install -y nodejs git
# Fedora / RHEL : sudo dnf install nodejs:22 git
# Arch : sudo pacman -S nodejs npm git
# 2. Clone and run
git clone https://github.com/kemalai/FreeCrawl-SEO-Tool.git
cd FreeCrawl-SEO-Tool
npm install
npm run devSome headless / minimal distros also need GTK/X11 runtime libs for Electron:
sudo apt install -y libgtk-3-0 libnss3 libasound2t64
β¨ CLI (headless crawl)
npm run build:cli
node apps/cli/dist/index.js https://example.com --depth 2 --max 500 --out out.csv
node apps/cli/dist/index.js --list urls.txt --out out.json # list mode + JSON
# Log file analysis β parse an access log into the project for crawl Γ log reporting
node apps/cli/dist/index.js analyze-logs samples/apache-access.txt --project crawl.seoproject
node apps/cli/dist/index.js analyze-logs access.log --format iis-w3c --verify-bots --jsonSample logs ship in
samples/(apache-access.txt,iis-access.txt) so you can try the Log File Analyzer immediately β in the desktop app, open Log Analyzer β Open Log Analyzer Windowβ¦ and import one.
CI / CD recipes β ready-to-use GitHub Actions and GitLab CI examples that crawl your site on a schedule, fail the build when broken-URL count exceeds a threshold, and upload the crawl as an artifact.
π¦ Production build (per-platform installers)
npm run build # all packages + desktop + CLI
npm --workspace apps/desktop run build:win # Windows installer (NSIS) + portable .exe
npm --workspace apps/desktop run build:mac # macOS DMG (arm64 + x64)
npm --workspace apps/desktop run build:linux # AppImage / .deb / .rpmπ€ MCP server β query AND drive crawls from Claude / any MCP client
FreeCrawl ships an MCP (Model Context Protocol) server that exposes the active .seoproject to AI agents over stdio. Two capabilities in one server:
- Read-only data access to the SQLite project β runs alongside the desktop app without contention (WAL allows concurrent readers).
- Live crawl control β when the desktop app is open, an agent can start / pause / resume / stop crawls and poll progress in real time. This goes through a localhost-only HTTP bridge (127.0.0.1, ephemeral random port, 32-byte Bearer token auth, discovery file written to
<userData>/mcp-bridge.jsonon app launch).
90 tools β every action a human can take in the desktop is callable from an agent. Representative groups:
| Group | Tools |
|---|---|
| π Top-level data | get_summary, get_overview_counts, top_issues, query_urls, get_url_detail |
| π¬ Per-URL detail sub-tabs | get_url_source (raw + rendered + screenshot paths), get_url_inlinks, get_url_outlinks, get_url_images, get_url_headers, get_url_duplicates, get_url_analytics, get_url_cert |
| π§© Integration rows | query_gsc, query_ga4, query_pagespeed, query_ai, query_seo |
| π Specialised queries | query_images, query_broken_links, list_duplicate_clusters |
| π Reports | report_status_code_histogram, report_indexability_distribution, report_content_kind_distribution, report_depth_histogram, report_response_time_histogram, report_inlinks_histogram, report_word_count_histogram, report_url_length_histogram, report_top_urls_by, report_top_anchor_texts, report_external_domain_health, report_image_weight_per_page, report_link_position_breakdown, report_pages_per_directory, report_word_count_per_directory, report_sitemap_orphans, report_analytics_coverage, report_server_header_breakdown, report_query_string_variants, report_sitemap_priority_mismatch, report_top_words |
| π€ Exports & mutations | export_csv, export_json, export_xml, export_html_report, export_tabular, export_bulk, export_broken_links, export_images, sitemap_generate, sitemap_validate, robots_test, robots_validate, compare_load, crawl_add_url, respider_urls, remove_urls, data_delete_by_domain, graph_snapshot, get_crawl_path, url_rewrite_preview, extraction_preview |
| π Project files | project_save_as, project_save_encrypted, project_open_encrypted |
| π Integration fetch | google_auth_status, gsc_list_sites, gsc_fetch, ga4_list_properties, ga4_fetch, pagespeed_run, ai_run, seo_run, get_fetch_progress, cancel_fetch |
| β Config & schedule | get_crawl_config, set_crawl_config, schedule_get, schedule_set |
| π Project management | list_projects, set_project, current_project |
| π· Crawl control (desktop must be open) | start_crawl, stop_crawl, pause_crawl, resume_crawl, clear_crawl, get_crawl_progress, get_desktop_project |
start_crawl accepts a startUrl plus optional whitelisted overrides (scope, maxDepth, maxUrls, maxConcurrency, maxRps, crawlDelayMs, requestTimeoutMs, respectRobotsTxt, followRedirects, crawlExternal, userAgent, include/excludePatterns) β anything you don't override keeps the desktop user's saved value. Crawls launched via MCP go through the same code path as the UI's Start button, so progress shows up in the desktop app live as the agent drives it. Call clear_crawl first when you want a fresh BFS after a completed crawl with the same seed URL (otherwise the crawler treats the re-start as a resume and exits because every URL is already in the DB).
1. Build it once:
npm run build:mcpThis produces apps/mcp-server/dist/index.js.
2. Register it with your MCP client.
Claude Desktop
Edit your Claude Desktop config:
| Platform | Path |
|---|---|
| Windows | %APPDATA%\Claude\claude_desktop_config.json |
| macOS | ~/Library/Application Support/Claude/claude_desktop_config.json |
| Linux | ~/.config/Claude/claude_desktop_config.json |
{
"mcpServers": {
"freecrawl": {
"command": "node",
"args": ["/absolute/path/to/FreeCrawl-SEO-Tool/apps/mcp-server/dist/index.js"]
}
}
}Restart Claude Desktop. The freecrawl server appears under the tool π icon.
Claude Code (CLI)
claude mcp add freecrawl -- node /absolute/path/to/FreeCrawl-SEO-Tool/apps/mcp-server/dist/index.jsOther MCP clients
Run the binary directly with stdio transport:
node apps/mcp-server/dist/index.jsThe server speaks newline-delimited JSON-RPC 2.0 β point any MCP-compatible client at it.
3. Try it. Ask your agent things like:
"Crawl https://example.com with maxDepth 3 and watch the progress." "Show the 10 URLs with the longest response time in my last crawl." "What are the top 5 issue categories with the most affected pages?" "List every URL with a missing meta description." "Pause the running crawl, then resume it once I've checked the first 1000 URLs."
Pointing at a non-default project:
By default the server reads <userData>/projects/default.seoproject (the same file the desktop app uses). Override with the FREECRAWL_PROJECT env var, or call the set_project tool mid-session:
{
"mcpServers": {
"freecrawl": {
"command": "node",
"args": ["/path/to/apps/mcp-server/dist/index.js"],
"env": { "FREECRAWL_PROJECT": "/path/to/audit.seoproject" }
}
}
}For developers / source builds
| Component | Minimum | Where |
|---|---|---|
| Node.js | 22 LTS (24 also OK) | nodejs.org |
| npm | 10+ (ships with Node) | bundled |
| Git | any recent | git-scm.com |
Why no Python / MSBuild / node-gyp? FreeCrawl uses Node 22's built-in
node:sqliteinstead ofbetter-sqlite3. There are zero native dependencies βnpm installnever invokes a C++ compiler.
Verify your setup:
node --version # v22.x.x or v24.x.x
npm --version # 10+Runtime requirements (any platform)
- Outbound HTTPS access to the sites you crawl. Behind a corporate proxy? Set
HTTPS_PROXY=http://your-proxy:portbefore launch, or enter a proxy URL in Settings β Network β HTTP/HTTPS proxies route through undici'sProxyAgent, and SOCKS5 / SOCKS4 proxies (socks5://β¦, including theh/4aremote-DNS variants β handy for routing crawls through Tor / a SSH tunnel) are tunnelled automatically. - TLS root certificates. Node ships with the Mozilla CA bundle. If your antivirus or company proxy performs HTTPS inspection (Kaspersky, ESET, Zscaler, BlueCoat, β¦), set
NODE_EXTRA_CA_CERTS=C:\path\to\corp-ca-bundle.crtβ otherwise crawls fail withUNABLE_TO_GET_ISSUER_CERT_LOCALLY.
Disk + memory budget
| Resource | Size |
|---|---|
node_modules after npm install |
~600 MB |
| Production Electron build | ~150 MB |
| Peak RAM, 100K-URL crawl | ~100 MB |
| 1M-URL crawl | comfortably under 1 GB |
FreeCrawl-SEO-Tool/
βββ π FreeCrawl-SEO-Tool-Start.bat # Windows one-click launcher
βββ π FreeCrawl-SEO-Tool-Start.sh # macOS / Linux one-click launcher
βββ π CHANGELOG.md # versioned release notes
βββ π apps/
β βββ πͺ desktop/ # Electron app (main + preload + renderer)
β βββ β¨ cli/ # headless Node CLI
β βββ π€ mcp-server/ # MCP server for AI agents
βββ π packages/
βββ π shared-types/ # IPC + domain types
βββ πΎ db/ # ProjectDb (node:sqlite) + migrations
βββ π· core/ # crawler engine (UI-agnostic)
Dependency graph
graph LR
A[shared-types] --> B[db]
B --> C[core]
C --> D[desktop]
C --> E[cli]
B --> F[mcp-server]
Note
Active development. All 29 analysis tabs (Internal, External, Response Codes, URL, Page Titles, Meta Description, H1, H2, Content, Images, Canonicals, Directives, Redirects, Pagination, Hreflang, AMP, Structured Data, Meta Refresh, Custom Extraction, Custom Search, Security, Duplicates, Links, Broken Links, SERP, PageSpeed, Search Console, GA4, AI, SEO Authority) plus standalone Visualization window, advanced search, per-tab quick-filter dropdown + List/Tree view toggle, 150+ issue categories, sitemap export variants, Export Crawl Data dialog (XLSX / CSV-UTF-8 / JSON / XML with hierarchical tree picker + nested folder output), list mode, custom extraction, near-duplicate + exact-duplicate detection, hreflang validation, project compare, Cytoscape visualization (force-directed / BFS / radial / Sugiyama / directory-tree layouts + LCP above-the-fold overlay + click-to-trace Crawl Path Report), Basic / Digest / Bearer + form-based login (HTTP multi-step + Playwright browser-driven SPA login), HTTP / SOCKS5 proxy, webhook, MCP server with crawl control + live progress, Google PageSpeed / Search Console / URL Inspection / GA4 integrations, AI per-URL prompts (OpenAI / Anthropic / Ollama), SEO Authority providers (Ahrefs / Majestic / Moz / Semrush), Google Sheets + BigQuery direct export, encrypted project snapshots (AES-256-GCM), cross-source orphan detection, JavaScript rendering with Playwright (post-JS DOM, screenshot capture, LCP candidate, Mobile Usability audit), memory-limit auto-pause watchdog, OS notifications, robots.txt syntax validator, URL rewriting + preview, status-code diagnosis banner, live memory monitor, in-app scheduled crawl, multi-language UI (EN + TR) with full Settings coverage, standalone Log File Analyzer (Apache / Nginx / IIS / custom access logs β bot hits per URL, crawl budget, response-code distribution, daily trend, crawl Γ log orphan detection, 40+ bot detection with reverse-DNS verification, per-bot URL filter, CSV / Excel export, and a headless analyze-logs CLI), .seoproject file association, in-app logs, and diagnostic popups are working. Cross-platform installers (Windows .exe + portable, macOS .dmg, Linux .AppImage / .deb / .rpm) β release builds ship Playwright Chromium offline so JS rendering works on first launch. Live-streaming UX with first row in ~1 s.
Upcoming (V2): Plugin system, Light theme, Multi-window, Code-signing + auto-update.
| π Found a bug? | Open an issue |
| π‘ Have a feature idea? | Start a discussion |
| π¦ Want the prebuilt app? | Download a release |
| π Project website | freecrawl.net |
MIT β see LICENSE
Built with β€ for SEO professionals who want a fast, free, open alternative to Screaming Frog.