Going Paperless in 2026: The Complete Guide to Paperless-ngx

Transform your paper chaos into a searchable digital archive with Paperless-ngx. Self-hosted document management with OCR, AI tagging, and complete privacy.

• 13 min read
homelabself-hostedpaperlessdocument-managementdockerprivacy
Going Paperless in 2026: The Complete Guide to Paperless-ngx

Going Paperless in 2026: The Complete Guide to Paperless-ngx

Take control of your documents with the best self-hosted document management system.

Digital paperless office concept


The Paper Problem We All Ignore

You know that drawer. The one stuffed with old receipts, tax documents from three years ago, warranty cards for appliances you no longer own, and cables for devices you’ve never even heard of. We all have one. Some of us have several.

The average American household receives 41 pounds of junk mail every year. Add in utility bills, insurance statements, medical records, and the instruction manual for that blender you bought in 2019, and you’re drowning in paper. Finding a specific document when you need it becomes an archaeological expedition. “Where did I put the warranty for the dishwasher?” is a question that triggers a full-scale excavation.

But here’s the real problem: paper is fragile. One flood, one fire, one over-caffeinated morning with a spilled coffee, and years of records are gone. Paper doesn’t have a backup. Paper doesn’t have search. Paper doesn’t have a “find all receipts from Amazon in 2024” button.

What you need is a paperless office system — one that’s completely under your control. No cloud subscription fees, no privacy concerns about who’s reading your tax returns, and no vendor lock-in when the service shuts down or changes its pricing model.

That’s where Paperless-ngx comes in.

What is Paperless-ngx?

Digital archive visualization

Paperless-ngx is an open-source document management system that transforms your paper chaos into a searchable, organized digital archive. It’s the community-supported successor to the original Paperless project, actively maintained by contributors who use it themselves and backed by thousands of self-hosting enthusiasts.

Here’s what makes it powerful:

  • Complete control: Your documents live on your server, in your home. No third-party access. No subscription fees. No “we updated our terms of service” surprises.
  • Intelligent OCR: Every document becomes fully searchable. Need to find that one receipt with “ACME Corp” on it? Type the name, and Paperless finds it in seconds.
  • Smart organization: Automatic tagging, document type classification, and correspondent tracking. Set up rules once, and Paperless organizes everything going forward.
  • AI-powered suggestions: The system learns your filing habits and suggests tags and correspondents for new documents.
  • Integration ready: Works with Home Assistant, Nextcloud, n8n, and practically any tool that speaks HTTP.

In 2026, with privacy concerns at an all-time high and subscription fatigue setting in across every service category, self-hosting your document management isn’t just cost-effective — it’s a statement about digital sovereignty. Your financial records, medical documents, and contracts should belong to you, not a corporation.

Key Features That Matter

Paperless-ngx isn’t just a document scanner with a web interface. It’s a full document management platform with features that rival enterprise solutions costing hundreds per month.

Document Processing

  • Optical Character Recognition (OCR): Powered by Tesseract, Paperless extracts text from scanned documents, PDFs, and images. Every word becomes searchable, even in multi-page documents scanned at an angle.
  • Automatic tagging: Define rules like “if document contains ‘invoice’ and amount > $500, tag as ‘major-expense’” and Paperless handles it automatically.
  • Document types: Categorize documents as invoices, receipts, contracts, bank statements, medical records, warranties, or custom types you define.
  • Correspondents: Track who sent or received each document — companies, government agencies, contractors.
  • Storage paths: Automatically organize files by year, month, document type, or custom logic.
  • Full-text search: Find any document by any word it contains, instantly.

AI and Automation

  • Machine learning: Paperless observes how you tag documents and suggests matching tags for similar new uploads.
  • Barcode recognition: Scan documents with barcodes for automatic routing and identification.
  • Email processing: Connect your email accounts via IMAP, and Paperless will automatically import attachments from specified senders. Perfect for digital statements and receipts.
  • Workflow automations: Create processing pipelines — import, classify, tag, notify, archive.
  • Full REST API: Integrate with any automation platform or write custom scripts.

User Experience

  • Modern web interface: Clean, responsive, and mobile-friendly. No desktop app required.
  • Bulk operations: Select multiple documents and change tags, correspondents, or types in one action.
  • Saved views: Create custom filters like “All tax documents from 2026” or “Receipts over $100 this month.”
  • In-browser preview: View PDFs without leaving the interface.
  • Dark mode: Because who likes staring at white screens at midnight?

Security and Privacy

  • Self-hosted: Documents never leave your network unless you choose to expose them.
  • Multi-user support: Create accounts for family members with permission controls.
  • Two-factor authentication: Keep your documents secure with 2FA.
  • Audit logging: Track who accessed or modified what and when.
  • Optional encryption: Add at-rest encryption for sensitive documents.

Hardware Requirements

One of the best things about Paperless-ngx is that it’s surprisingly lightweight. You don’t need a server rack or enterprise hardware to run it effectively.

Minimum Requirements (Home/Light Use)

ResourceRequirement
CPU2 cores
RAM2GB (4GB recommended)
Storage50GB+ (depends on document volume)
OSLinux (Docker), macOS, or Windows
ResourceRecommendation
CPU4+ cores
RAM8GB+ (OCR is memory-intensive)
Storage500GB+ SSD (for fast search indexing)
OptionalGPU for accelerated OCR (not required)

Storage Planning

The storage question deserves some thought. A typical scanned page at 300 DPI results in roughly 1MB of data. If you’re digitizing 10,000 pages (a moderately sized household archive), that’s about 10GB. But Paperless also stores the OCR index and metadata, so plan for approximately 1.5-2MB per page in total.

The good news: storage is cheaper than ever. A 1TB SSD costs less than a decent scanner. Your grandmother’s filing cabinet? Not so scalable.

Installation Guide: Docker Setup

Paperless-ngx is designed to run in Docker, which makes deployment straightforward regardless of your host operating system. Here’s a complete, production-ready setup.

The fastest way to get started is with the official installation script:

curl -L https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh | bash

This script downloads the latest release, creates the necessary directory structure, and generates a docker-compose.yml with sensible defaults.

Manual Docker Compose Setup

For those who prefer full control, here’s a complete docker-compose.yml that you can customize:

version: "3.9"

services:
  paperless:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    container_name: paperless
    restart: unless-stopped
    depends_on:
      - db
      - broker
    ports:
      - "8000:8000"
    environment:
      # Core settings
      PAPERLESS_URL: https://paperless.yourdomain.com
      PAPERLESS_SECRET_KEY: "change-this-to-a-long-random-string"
      
      # Database (PostgreSQL recommended for production)
      PAPERLESS_DBHOST: db
      PAPERLESS_DBPORT: 5432
      PAPERLESS_DBNAME: paperless
      PAPERLESS_DBUSER: paperless
      PAPERLESS_DBPASS: your-database-password
      
      # Redis broker
      PAPERLESS_REDIS: redis://broker:6379
      
      # Admin user (set once, then remove)
      PAPERLESS_ADMIN_USER: admin
      PAPERLESS_ADMIN_PASSWORD: your-secure-password
      
      # OCR settings
      PAPERLESS_OCR_LANGUAGE: eng
      
      # Optional: Enable machine learning
      PAPERLESS_ENABLE_HTTP_BASIC_AUTH: "true"
      
    volumes:
      - ./data:/usr/src/paperless/data
      - ./media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000"]
      interval: 30s
      timeout: 10s
      retries: 5

  db:
    image: docker.io/library/postgres:15
    container_name: paperless-db
    restart: unless-stopped
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: your-database-password
    volumes:
      - ./pgdata:/var/lib/postgresql/data

  broker:
    image: docker.io/library/redis:7
    container_name: paperless-redis
    restart: unless-stopped
    volumes:
      - ./redisdata:/data

Save this as docker-compose.yml, then run:

docker compose up -d

Paperless will pull the necessary images, initialize the database, and start listening on port 8000. Point your browser to http://your-server:8000 and log in with the admin credentials you configured.

Directory Structure Explained

DirectoryPurpose
/consumeDrop files here for automatic import. Paperless processes and removes them.
/mediaStores your original documents and generated thumbnails. Back this up.
/dataContains the database and search index. Critical for restore.
/exportDestination for the built-in export/backup tool.

Reverse Proxy (Production)

For remote access, run Paperless behind a reverse proxy with HTTPS. Here’s a simple Caddy configuration:

paperless.yourdomain.com {
    reverse_proxy localhost:8000
}

Or with Nginx:

server {
    listen 443 ssl;
    server_name paperless.yourdomain.com;
    
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    
    location / {
        proxy_pass http://localhost:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Basic Configuration and First Steps

Once Paperless is running, a few initial configurations will set you up for success.

Create Your Tag Structure

Tags are your primary organization tool. Think carefully about your top-level categories. A good starting structure:

  • Financial: bank-statements, invoices, receipts, taxes
  • Home: warranties, manuals, maintenance, insurance
  • Medical: records, prescriptions, insurance-claims
  • Legal: contracts, certificates, government
  • Work: contracts, expenses, training

You can nest tags, but keep top-level categories limited to 10-15. Too many branches make navigation cumbersome.

Set Up Document Types

Document types work alongside tags for classification:

  • Invoice
  • Receipt
  • Contract
  • Bank Statement
  • Medical Record
  • Insurance Document
  • Warranty
  • Manual/Guide
  • Correspondence
  • Certificate

Add Correspondents

Correspondents represent entities you exchange documents with:

  • Your bank
  • Insurance company
  • Employer
  • Healthcare providers
  • Government agencies (IRS, DMV, etc.)

Paperless learns from your assignments and will start suggesting correspondents for similar documents.

Configure Auto-Tagging Rules

Rules let you automate organization. Examples:

RuleConditionAction
Bank statementsCorrespondent contains “Bank of America”Add tag: bank-statement
Amazon purchasesContent contains “amazon.com”Add tag: shopping, correspondent: Amazon
Tax documentsDocument type is “Form 1099” or “W-2”Add tag: taxes, tax-2026

To create rules, navigate to Settings → Workflows and define your triggers.

OCR and Search Capabilities

OCR text recognition visualization

This is where Paperless-ngx shines. The OCR engine transforms static images into searchable text, making every document instantly retrievable.

How OCR Works

When you upload a document (PDF, image, or other format), Paperless:

  1. Detects the document language (configurable for multiple languages)
  2. Rotates and deskews if necessary
  3. Runs Tesseract OCR to extract all text
  4. Indexes the content in the search database
  5. Generates a searchable PDF

The result: you can find “receipt for the HDMI cable” by searching for “HDMI” — even if that text is buried inside a multi-page PDF.

Search Features

  • Full-text search: Search inside document content, not just filenames
  • Fuzzy matching: Find “reciept” even if you typed “receipt” wrong
  • Advanced filters: Combine text search with tags, dates, correspondents
  • Saved searches: Bookmark complex queries for quick access
  • Search operators: Use tag:receipt date:2026-01 correspondent:Amazon for precise filtering

Multi-Language OCR

If your documents are in multiple languages, configure them in your environment:

PAPERLESS_OCR_LANGUAGE: eng+spa+deu

This enables English, Spanish, and German OCR in parallel.

Automation Workflows

Paperless-ngx excels at hands-off document processing. Set up workflows once, and documents flow through your pipeline automatically.

Email Import

Connect your email accounts to automatically capture digital statements and receipts:

  1. Navigate to Settings → Mail
  2. Add an IMAP account with your email provider’s settings
  3. Create mail rules to filter by sender, subject, or attachment type
  4. Paperless downloads matching attachments and imports them

Example use case: Every time your utility company sends “Your Bill is Ready,” Paperless imports the attached PDF, tags it as “utilities,” and assigns the correspondent.

Consume Folder Automation

The /consume directory watches for new files. Drop documents there, and they’re processed automatically:

# Scan from a network scanner to consume folder
scp scanned-doc.pdf paperless-server:/path/to/paperless/consume/

# Or use a hot folder from your scanner software
# Most scanner apps support "scan to folder" — point it at your consume directory

For advanced workflows, combine with automation tools:

# n8n workflow example
# Watch for new Google Drive uploads, then push to Paperless
{
  "nodes": [
    {
      "type": "n8n-nodes-base.googleDriveTrigger",
      "parameters": {"event": "fileAdded"}
    },
    {
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "url": "https://paperless.yourdomain.com/api/documents/post_document/",
        "method": "POST",
        "authentication": "genericCredentialType"
      }
    }
  ]
}

Webhook Integrations

Paperless can send webhooks when documents are processed. Use this with Home Assistant for notifications:

# Home Assistant automation
automation:
  - alias: "New Tax Document Notification"
    trigger:
      - platform: webhook
        webhook_id: paperless-tax-doc
    condition:
      - condition: template
        value_template: "{{ trigger.json.tags contains 'taxes' }}"
    action:
      - service: notify.mobile_app
        data:
          message: "New tax document uploaded: {{ trigger.json.title }}"

Security and Backup Strategy

Your documents contain sensitive information. Security isn’t optional — it’s essential.

Network Security

PracticeWhy It Matters
Run behind reverse proxyEnables HTTPS encryption
Use VPN for remote accessAvoids exposing Paperless directly to the internet
Enable 2FAPrevents unauthorized access even with compromised password
Regular updatesSecurity patches fix vulnerabilities
Network isolationRun on a separate VLAN or behind a firewall

User Permissions

Create separate accounts for family members. Paperless supports role-based access:

  • Admin: Full access to all settings and documents
  • Standard: Can view and edit documents
  • View-only: Read access without modification rights

Backup Strategy

The 3-2-1 rule: three copies, two different media types, one off-site.

What to back up:

  • media/ — Your original documents (critical)
  • data/ — Database and search index
  • docker-compose.yml and environment files

Built-in export:

# Export all documents and metadata
docker exec paperless document_exporter /export --zip

# The resulting file is in your ./export directory
# Copy it to backup storage
cp ./export/export.zip /backup-location/paperless-$(date +%Y%m%d).zip

Automated backup script:

#!/bin/bash
# Daily backup script for Paperless-ngx

BACKUP_DIR="/mnt/backup/paperless"
DATE=$(date +%Y%m%d)

# Create backup
docker exec paperless document_exporter /export --zip

# Move and rename
mv ./export/*.zip "$BACKUP_DIR/paperless-$DATE.zip"

# Rotate backups (keep 30 days)
find "$BACKUP_DIR" -name "*.zip" -mtime +30 -delete

echo "Backup complete: paperless-$DATE.zip"

Off-site backup:

Use rclone or rsync to sync your backups to cloud storage or a remote server:

# Sync to Backblaze B2 (example)
rclone sync /mnt/backup/paperless b2:my-backup-bucket/paperless

Migrating from Paper Archives

Ready to digitize that filing cabinet? Here’s a practical workflow.

Scanner Setup

Invest in a scanner with these features:

  • Document feeder (ADF): Batch scan 20+ pages at once
  • Duplex scanning: Both sides simultaneously
  • Scan-to-folder: Output directly to Paperless consume folder

Recommended scanners:

BudgetModelNotes
$100-200Brother ADS-2700WGreat value, WiFi included
$300-400Fujitsu ScanSnap iX1600Excellent software, fast
$500+Fujitsu fi-8000 seriesEnterprise-grade, heavy duty

Scanning Best Practices

  • 300 DPI for text documents — sufficient for OCR without massive files
  • 600 DPI for photos or documents with small print
  • Enable auto-color detection — grayscale scans are smaller but color is preserved when needed
  • Enable deskew — straightens crooked scans automatically
  • Enable blank page removal — skips blank sides of double-sided documents

Digitization Workflow

  1. Sort by category: Group similar documents before scanning
  2. Batch scan: Run 20-50 documents through the feeder at once
  3. Review in Paperless: Check auto-tagging and correct as needed
  4. Shred originals: Once confirmed, securely dispose of paper copies

For documents you must keep in physical form (original certificates, notarized documents), store them securely and note their location in Paperless with a “original-kept” tag.

Handling Backlog

If you have years of documents to digitize:

  • Start with the most recent and important (last 12 months of financial documents)
  • Work backward chronologically
  • Don’t try to digitize everything at once — set a goal of 50-100 documents per session
  • Consider which documents even need digitizing. That cable bill from 2019? Maybe just shred it.

Conclusion: Why Paperless-ngx Is Your Best Choice

In 2026, your options for document management are clear:

You could pay $10-15 per month for cloud services like Evernote, OneDrive, or Google Drive. They’ll hold your documents hostage behind paywalls, mine your data for advertising, and change terms of service whenever convenient. When they shut down or raise prices, you scramble.

Or you could take control.

Paperless-ngx gives you:

  • Privacy: Your financial records, medical documents, and contracts stay on your server
  • Freedom: No subscription fees, no vendor lock-in, no surprising price increases
  • Power: Enterprise-grade features without enterprise-grade costs
  • Community: Active development, helpful forums, and a project that’s maintained by people who use it daily

The transition takes effort. Scanning your archives requires time. Setting up workflows requires thought. But once it’s running, you’ll wonder how you ever lived with paper.

Start small. Install Paperless. Scan this month’s documents. Watch it organize itself. Then, when you’re ready, tackle that drawer.

Your future self — the one who finds the dishwasher warranty in 30 seconds flat — will thank you.


Ready to go paperless? Try the Paperless-ngx demo (login: demo/demo) or check out the official documentation.

Anthony Lattanzio

Anthony Lattanzio

Tech Enthusiast & Builder

I'm a tech enthusiast who loves building things with hardware and software. By night, I run a homelab that's grown way beyond what any reasonable person needs. Check out about me for more.

Comments

Powered by GitHub Discussions