Skip to content

AI-powered secret detection for pre-commit pipelines using transformers and regex.

License

Notifications You must be signed in to change notification settings

markorskip/neuroleaks

Repository files navigation

NeuroLeaks

AI-powered secret detection for code commits using transformers and regex.

NeuroLeaks combines traditional regex scanning with CodeBERT, a state-of-the-art transformer model, to detect secrets like API keys, passwords, tokens, and credentials — even when they're obfuscated or renamed.


🔐 Why NeuroLeaks?

Feature Regex Only ML Only NeuroLeaks
Fast detection
Understands code context
Finds obfuscated secrets
Works in CI / locally
Pre-commit compatible

Quick Start

1. Clone the Repo

git clone https://github.com/YOUR_USERNAME/neuroleaks.git
cd neuroleaks

2. Install Dependencies

It's recommended to do this inside a virtual environment.

pip install -r requirements.txt

Using NeuroLeaks in Pre-commit

Step 1: Install pre-commit (if you haven’t already)

pip install pre-commit

Step 2: Reference NeuroLeaks in Your Project’s .pre-commit-config.yaml

repos:
  - repo: https://github.com/markorskip/neuroleaks
    rev: main  # or use a specific tag like v0.1.0
    hooks:
      - id: neuroleaks

Step 3: Install the Hook

From the root of your project:

pre-commit install

Step 4: Run It Manually (Optional)

pre-commit run --all-files

You’ll get output like:

NeuroLeaks Secret Detection......................................❌
- [REGEX] Line 3: 'api_key = "AIzaSy123..."' matched api key pattern
- [ML]    Line 9: 'token = ghp_abcd1234...' (risk score: 0.91)

Testing NeuroLeaks

NeuroLeaks includes a test suite to validate its ML and regex behavior.

Run the tests

pytest tests/

🧠 How It Works

  • Regex Engine: Scans added lines for common secret patterns (e.g., AWS keys, GitHub tokens, passwords).
  • Transformer Model: Uses microsoft/codebert-base to assess whether a line is likely to contain a secret, even if it's obfuscated or renamed.
  • Hybrid Detection: If regex doesn’t catch it, ML does. Both methods are used on staged diffs via git diff --cached.

License

MIT License


Badge

Want to display that your repo is checked by NeuroLeaks? Add this badge:

![NeuroLeaks Verified](https://img.shields.io/badge/Secrets-Checked%20by%20NeuroLeaks-blueviolet?style=flat&logo=ai)

Coming Soon

  • Docker support
  • Fine-tuned CodeBERT on real-world leaked secrets
  • GitHub Action for CI integration
  • Web dashboard to review and manage leaks

Questions or Feedback?

Open an issue or start a discussion. Contributions welcome!

About

AI-powered secret detection for pre-commit pipelines using transformers and regex.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages