🤖 Reddit Scraper

Modular Reddit data collection framework
Scrape subreddits, posts, and users into clean structured JSON.

✨ Overview

A modular Reddit scraping pipeline designed for data collection, analytics, and research workflows.

The project gathers structured data about:

📚 Subreddits
📝 Posts
👤 Users

and exports everything as clean JSON datasets ready for:

databases
machine learning pipelines
analytics
data exploration

No manual scraping steps required.

🚀 Features

Modular scraper architecture
Structured JSON output
Automated scraping workflow
MongoDB import helpers
Large dataset handling utilities
Environment-based configuration

Collects

Entity	Data
Subreddits	metadata & statistics
Posts	content, scores, engagement
Users	profile & activity info

🧠 How It Works


run.py
│
├── subreddits.py
├── posts.py
└── users.py
↓
JSON datasets
↓
(optional) MongoDB import

Each scraper is independent and reusable.

📦 Installation

1️⃣ Clone & setup environment

git clone https://github.com/glowfi/reddit-scraper
cd reddit-scraper

python -m venv env
source env/bin/activate      # Linux / macOS
# env\Scripts\activate       # Windows

pip install -r requirements.txt

2️⃣ Configure API credentials

Edit env-sample and rename it:

.env

username=<RedditUsername>
password=<RedditPassword>
client_id=<ClientID>
client_secret=<ClientSecret>

TOTAL_SUBREDDITS_PER_TOPICS=6
SUBREDDIT_SORT_FILTER="hot"
POSTS_PER_SUBREDDIT=10
POSTS_SORT_FILTER="new"

Create Reddit API credentials here:

👉 https://www.reddit.com/prefs/apps

3️⃣ Run scraper

python run.py

Pipeline execution:

Scrape subreddits
Scrape posts
Scrape users
Export JSON datasets
Optional dataset splitting

📊 Output Examples

JSON files are large (16–25MB). Download instead of viewing in browser.

🗂️ Project Structure

reddit-scraper/
├── subreddits.py
├── posts.py
├── users.py
├── run.py
├── utils/
│   ├── split.py
│   └── import_data_to_mongodb.sh
└── output/

🧩 Utilities

Tool	Purpose
`run.py`	Executes full scraping pipeline
`utils/split.py`	Splits large JSON datasets
`import_data_to_mongodb.sh`	Bulk imports into MongoDB

🗄️ MongoDB Import

After scraping:

./utils/import_data_to_mongodb.sh

Ensure MongoDB is running beforehand.

⚠️ Notes

Reddit API rate limits apply
Scraping speed depends on network/API limits
Designed for research & data workflows
Respect Reddit API terms of service

🤝 Contributing

Contributions, improvements, and issue reports are welcome.

Small focused PRs are preferred.

📄 License

GPL-3.0

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
images		images
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cleanup.sh		cleanup.sh
env-sample		env-sample
ondemand.json		ondemand.json
posts.py		posts.py
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt
run.py		run.py
subreddits.py		subreddits.py
topic.py		topic.py
topics.json		topics.json
users.py		users.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Reddit Scraper

✨ Overview

🚀 Features

Collects

🧠 How It Works

📦 Installation

1️⃣ Clone & setup environment

2️⃣ Configure API credentials

3️⃣ Run scraper

📊 Output Examples

Subreddit Document

Post Document

User Document

🗂️ Project Structure

🧩 Utilities

🗄️ MongoDB Import

⚠️ Notes

🤝 Contributing

📄 License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Reddit Scraper

✨ Overview

🚀 Features

Collects

🧠 How It Works

📦 Installation

1️⃣ Clone & setup environment

2️⃣ Configure API credentials

3️⃣ Run scraper

📊 Output Examples

Subreddit Document

Post Document

User Document

🗂️ Project Structure

🧩 Utilities

🗄️ MongoDB Import

⚠️ Notes

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages