Web-DL

Download the complete source and assets of any website for offline viewing.

Web-DL mirrors a website with wget, compresses it with archiver, and streams live progress back to your browser over a Socket.IO channel — then hands you a ready-to-download .zip.

Features

Safe by design — wget is launched with spawn() and an argument array (never a shell string), so a URL can't be interpreted as a command.
SSRF protection — the server DNS-resolves each host and refuses private, loopback, link-local and cloud-metadata addresses.
Tunable downloads — set crawl depth, include/exclude file types, a size quota, wait between requests, page requisites, and whether to follow external links.
Cancel anytime — stop a running job; the wget process is killed and partial files are removed.
Concurrency control — a configurable cap with a queue keeps the server from being overwhelmed.
Live progress — real-time progress bar, file count, current file and downloaded size.
Download history — list, re-download or delete previously generated zips.
Auto-cleanup — old zips are swept on an interval so disk usage stays bounded.

Getting started

Important

Web-DL shells out to wget. Make sure Node.js 18+ and wget are installed and on your PATH.

git clone https://github.com/nooblk-98/Web-DL.git
cd Web-DL
npm install
npm start

Then open http://localhost:3000, paste a URL, tweak the options, and download.

How it works

Every download is built from a fixed set of base wget flags:

wget --mirror --convert-links --adjust-extension --page-requisites --no-if-modified-since <url>

Flag	Why
`--mirror`	recursive download of the whole site
`--convert-links`	rewrite links (incl. CSS) to relative paths for offline viewing
`--adjust-extension`	add `.html` / `.css` extensions based on content-type
`--page-requisites`	fetch the CSS, JS and images needed to render each page
`--no-if-modified-since`	always fetch resources instead of relying on conditional requests

User-supplied options (depth, filters, --no-parent, etc.) are layered on top through a strict allowlist — raw flags from the client are never accepted.

The request flows through the server like this:

Browser ──Socket.IO──▶ socket/socket.js ──▶ lib/jobQueue.js ──▶ wget/index.js (spawn wget)
   ▲                                                                    │
   │  live progress, file counts, status                               ▼
   └──────────────────── archiver/index.js ◀──── mirrored site folder (downloads/)
                                │
                                ▼
                    public/sites/<host>.zip  ──▶ served via express.static + /api/history

The client submits a URL and options over Socket.IO.
lib/urlGuard.js validates the URL and blocks SSRF targets; lib/wgetArgs.js builds a safe argument array.
lib/jobQueue.js enforces the concurrency cap; wget/index.js spawns wget and streams progress.
archiver/index.js zips the mirrored folder into public/sites/, the temp mirror is removed, and the download link is sent back.
lib/cleanup.js periodically deletes zips older than the configured TTL.

Configuration

Everything is optional and falls back to sensible defaults. Set via environment variables:

Variable	Default	Description
`PORT`	`3000`	HTTP port
`MAX_CONCURRENT_DOWNLOADS`	`3`	Max simultaneous `wget` jobs
`ZIP_TTL_MS`	`86400000` (24h)	Age after which generated zips are deleted
`CLEANUP_INTERVAL_MS`	`3600000` (1h)	How often the cleanup sweep runs
`DOWNLOAD_ROOT`	`./downloads`	Working directory for site mirrors
`MAX_DEPTH`	`10`	Upper bound for user-supplied crawl depth
`MAX_QUOTA_MB`	`2048`	Upper bound for user-supplied max download size
`MAX_WAIT_SECONDS`	`30`	Upper bound for the wait-between-requests option
`ALLOW_PRIVATE_HOSTS`	`false`	Set `true` to permit localhost/private hosts (local testing only)

HTTP API

Beyond the Socket.IO download channel, a small REST API manages generated zips:

Method	Endpoint	Description
`GET`	`/api/history`	List generated zips (`name`, `size`, `modified`, `url`)
`DELETE`	`/api/history/:name`	Delete one zip (path-traversal guarded)
`GET`	`/sites/:name.zip`	Re-download a zip (served via `express.static`)

Scripts

Command	Description
`npm start`	Run the server
`npm run dev`	Run with `NODE_ENV=development`
`npm test`	Run the Jest unit tests
`npm run lint` / `npm run lint:fix`	ESLint
`npm run format`	Prettier

Project structure

app.js              Express app wiring (routes, views, error handling)
bin/www             Server entry point (HTTP + Socket.IO + cleanup)
routes/             HTTP routes (index, users, history API)
socket/             Socket.IO download orchestration
lib/                Core logic: urlGuard, wgetArgs, jobQueue, activeJobs, sites, cleanup
wget/               wget process spawning + progress parsing
archiver/           Zips a mirrored site folder
config/             Central config, limits and constants
views/              Handlebars templates
public/             Static assets + generated zips (public/sites)
__tests__/          Jest unit tests

Security

Warning

wget re-resolves DNS and follows redirects itself, so a public host that redirects to an internal address could still be reached. Redirects are capped via --max-redirect; for hardened deployments, also run the server in a network-restricted environment.

Credits

Web-DL is built on top of the original Website-downloader by Ahmad Ibrahim — many thanks for the original work this project is based on.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.github/workflows		.github/workflows
__tests__		__tests__
archiver		archiver
bin		bin
config		config
lib		lib
public		public
routes		routes
scripts		scripts
socket		socket
views		views
wget		wget
.dockerignore		.dockerignore
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
app.js		app.js
app.json		app.json
docker-compose.yml		docker-compose.yml
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-DL

Features

Getting started

How it works

Configuration

HTTP API

Scripts

Project structure

Security

Credits

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web-DL

Features

Getting started

How it works

Configuration

HTTP API

Scripts

Project structure

Security

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages