On March 19, 2026, Zach Rice — the original creator of Gitleaks — released Betterleaks, an MIT-licensed secret scanner that replaces his own widely-used tool. The reason? Entropy-based detection, the method Gitleaks and most other scanners rely on, misses nearly a third of leaked secrets. Betterleaks uses a fundamentally different approach called Token Efficiency, built on BPE tokenization, and achieves 98.6% recall compared to Gitleaks’ 70.4%.
If you run CI/CD pipelines, push code with AI assistants like Claude Code or Cursor, or maintain any kind of deployment infrastructure, this matters. Leaked API keys, database credentials, and cloud tokens remain one of the most common attack vectors in production breaches. Here is what changed, how Betterleaks works, and how to set it up.
What Was Wrong With Gitleaks?
Gitleaks has been the go-to open-source secret scanner since its release in 2017. It scans Git repositories for hardcoded secrets using two detection methods: regex patterns for known credential formats and Shannon entropy for high-randomness strings that might be tokens or keys.
The regex approach works well for structured secrets — AWS access keys that start with AKIA, GitHub tokens prefixed with ghp_, or Stripe keys beginning with sk_live_. But many secrets do not follow predictable patterns. Generic API keys, database passwords, OAuth tokens, and webhook secrets are just random strings.
That is where entropy detection was supposed to help. Shannon entropy measures randomness — a string like aB3xK9mP2qR7 has higher entropy than hello-world, so it is more likely to be a secret. The problem: entropy also flags Base64-encoded data, UUIDs, hashes, minified JavaScript, and URL parameters. The result is either too many false positives (making the scanner unusable in CI) or too many missed secrets when you tighten the threshold.
Rice documented the core issue in his announcement: entropy-based methods achieve around 70% recall in real-world benchmarks. That means roughly 3 out of every 10 leaked secrets pass through undetected.
Token Efficiency: How Betterleaks Actually Works
Betterleaks replaces entropy with a technique called Token Efficiency, which uses BPE (Byte Pair Encoding) tokenization — the same compression algorithm that powers large language models like GPT and Claude.
The idea is simple but clever. BPE tokenizers learn common patterns in text during training. When you feed normal English text or code through a BPE tokenizer, it compresses efficiently because the tokenizer recognizes familiar patterns — words, variable names, function calls. A sentence like database_connection_string tokenizes into a few tokens because those subwords are common.
Secrets, however, are random. A string like sk_live_4eC39HqLyjWDarjtT1zdp7dc does not compress well because the random portion does not match any learned patterns. It takes many more tokens to represent.
Token Efficiency measures the ratio of characters to tokens. A low ratio (few characters per token) means the string compressed well — it is probably normal code. A high ratio means the tokenizer struggled — it is likely a secret. Here is how they compare:
| String | Characters | Tokens | Ratio | Classification |
|---|---|---|---|---|
database_connection_string | 28 | 4 | 7.0 | Normal code |
my_password_123 | 15 | 4 | 3.75 | Normal code |
ghp_ABCDe1234fGHIj5678kLMNOpqr | 31 | 14 | 2.2 | Likely secret |
4eC39HqLyjWDarjtT1zdp7dc | 24 | 16 | 1.5 | Definitely secret |
This approach sidesteps the fundamental flaw of entropy detection. UUIDs, Base64 payloads, and hashes — the false positives that plague entropy scanners — actually tokenize somewhat efficiently because their character distributions follow patterns BPE has seen before. True secrets, generated from cryptographically secure random sources, do not.
Benchmark Results
| Method | Recall | Precision | F1 Score |
|---|---|---|---|
| Shannon Entropy (Gitleaks default) | 70.4% | 82.1% | 75.8% |
| Regex-only (Gitleaks patterns) | 61.2% | 94.3% | 74.2% |
| Token Efficiency (Betterleaks) | 98.6% | 89.4% | 93.8% |
The recall jump from 70.4% to 98.6% is the headline number, but the F1 score tells the full story. Betterleaks detects far more secrets while keeping false positives manageable.
Setting Up Betterleaks
Betterleaks is a drop-in replacement for Gitleaks. It reads existing Gitleaks configuration files, supports the same CLI flags, and produces compatible output formats. If you already run Gitleaks in CI, switching takes minutes.
Installation
Install from the official releases or build from source:
# Binary install (Linux amd64)
curl -sSL https://github.com/AikidoSec/betterleaks/releases/latest/download/betterleaks_linux_amd64.tar.gz | tar xz
sudo mv betterleaks /usr/local/bin/
# Or via Go
go install github.com/AikidoSec/betterleaks@latest
# Verify installation
betterleaks version
Scanning a Repository
# Scan the current directory
betterleaks detect
# Scan a specific repo path
betterleaks detect --source /path/to/repo
# Scan only staged changes (pre-commit hook)
betterleaks protect
# Output in JSON for CI integration
betterleaks detect --report-format json --report-path results.json
Using Your Existing Gitleaks Config
If you have a .gitleaks.toml configuration file, Betterleaks reads it automatically:
# .gitleaks.toml — works with both Gitleaks and Betterleaks
[allowlist]
paths = [
'''vendor/''',
'''node_modules/''',
'''\.min\.js$''',
]
commits = [
"abc123def456",
]
[[rules]]
id = "custom-internal-token"
description = "Internal API token pattern"
regex = '''INTERNAL_[A-Za-z0-9]{32}'''
Custom rules and allowlists carry over unchanged. Betterleaks adds Token Efficiency on top of whatever regex rules you already have defined.
CI/CD Integration
The real value of a secret scanner is catching leaks before they reach your remote repository. Here are the three most common integration points.
Pre-Commit Hook
# .pre-commit-config.yaml
repos:
- repo: https://github.com/AikidoSec/betterleaks
rev: v1.0.0
hooks:
- id: betterleaks
This scans staged changes before every commit. If a secret is detected, the commit is blocked with a clear message showing the file, line number, and detected secret type.
GitHub Actions
# .github/workflows/secrets-scan.yml
name: Secret Scan
on: [push, pull_request]
jobs:
betterleaks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: AikidoSec/betterleaks-action@v1
with:
args: detect --redact
GitLab CI
# .gitlab-ci.yml
secret-scan:
stage: test
image: ghcr.io/aikidosec/betterleaks:latest
script:
- betterleaks detect --report-format json --report-path gl-secret-detection-report.json
artifacts:
reports:
secret_detection: gl-secret-detection-report.json
Why This Matters for AI-Assisted Development
Betterleaks was explicitly designed for a world where developers write code alongside AI agents. Tools like Claude Code, Cursor, and GitHub Copilot generate code at a pace that makes manual review of every line unrealistic. An AI assistant might autocomplete a configuration file with a placeholder that looks like a real API key, or pull in an example from training data that contains an actual leaked credential.
The speed matters. A pre-commit hook that takes 30 seconds to scan a large repo becomes a friction point when you are committing 20 times a day in an AI-assisted workflow. Betterleaks is optimized for incremental scanning — the protect command scans only staged changes, keeping the feedback loop under 2 seconds for typical commits.
The recall rate matters even more. When AI generates dozens of configuration files in a session, a scanner that misses 30% of secrets is not a safety net — it is a false sense of security. The jump to 98.6% recall closes that gap enough to be a genuine guardrail rather than a best-effort filter.
What Betterleaks Does Not Do
Betterleaks is a scanner, not a remediation platform. It tells you where secrets are — it does not rotate them, revoke them, or alert your team. For a complete secret management workflow, you still need:
- A secret manager — HashiCorp Vault, Doppler, or
sopsfor encrypted files - Automated rotation — cloud provider key rotation policies (AWS IAM, GCP service accounts)
- Monitoring — GitHub secret scanning alerts, GitGuardian, or similar for post-push detection
- Server-side security — firewalls, intrusion prevention, and network segmentation on your hosting environment (tools like Fail2ban or CrowdSec handle the perimeter layer)
Think of Betterleaks as the first line of defense. It catches secrets before they enter your Git history. Everything else handles what happens when something slips through or when secrets need to be rotated on schedule.
Migrating From Gitleaks
If you are already running Gitleaks, here is the migration checklist:
| Step | Action | Notes |
|---|---|---|
| 1 | Install Betterleaks alongside Gitleaks | Both can coexist during transition |
| 2 | Run both on your repo and compare output | betterleaks detect vs gitleaks detect |
| 3 | Review new findings from Betterleaks | Expect ~28% more secrets detected |
| 4 | Update allowlist for confirmed false positives | Same .gitleaks.toml format |
| 5 | Replace Gitleaks in CI pipelines | Swap binary/image name; config carries over |
| 6 | Update pre-commit hooks | Change repo URL in .pre-commit-config.yaml |
| 7 | Remove Gitleaks | Clean up old binary and CI references |
Run both scanners in parallel for at least one sprint before cutting over. The comparison output helps you understand what Token Efficiency catches that entropy misses — and more importantly, whether any new findings require immediate credential rotation.
Securing Your Pipeline End to End
Secret scanning is one layer of a secure development pipeline. Recent incidents — including the Docker v29 breaking changes that exposed firewall misconfigurations — show how quickly infrastructure assumptions can become vulnerabilities.
A strong pipeline security stack on a self-hosted VPS looks like this:
- Pre-commit: Betterleaks for secret scanning
- CI pipeline: Container image scanning (Trivy or Grype) plus dependency audit
- Network: WireGuard VPN for admin access, firewall rules for exposed ports
- Runtime: Intrusion prevention, log monitoring, automated updates
If you are running CI/CD on your own infrastructure, a Cloud VPS with full root access gives you complete control over your pipeline security tooling — no vendor lock-in on which scanners or policies you can deploy. For teams that want the infrastructure secured without managing it themselves, CWH’s Managed Support handles server hardening, firewall configuration, and ongoing security patching.
Bottom Line
Betterleaks solves a real problem: entropy-based secret detection was not good enough, and the original Gitleaks author knew it. Token Efficiency is a smarter approach that leverages the same tokenization technology behind modern AI models to distinguish secrets from noise.
The migration is straightforward — same config format, same CLI interface, dramatically better detection. If you run any kind of CI/CD pipeline, switching from Gitleaks to Betterleaks is one of the fastest security improvements you can make this week. Install it, run both scanners in parallel, review the delta, and cut over.
Be First to Comment