v2.5 — Document Understanding + Generic Extraction

Extract structured data
from ANY document.
Zero config.

Drop in any document — invoices, contracts, reports, forms, or something we have never seen. Quilldoc understands it, extracts every field, and returns structured JSON. Self-hosted. 1.0 confidence on real docs.

Self-hostableMIT licensed$0.01/doc
terminal
|
0.0

Confidence on real documents

0+

Fields auto-detected per document

Zero

Config needed for unknown doc types

$0.01

Per document — self-hosted

The Quilldoc Pipeline

From document to structured data in seconds. Watch it work.

Upload
Understand
Extract
Verify
Deliver

Pipeline complete. 22 fields extracted at 1.0 confidence.

See it in action

Upload a document and get structured JSON in seconds. No signup required.

Upload a PDF or image

or try our sample invoice

response.json
// Upload a document to see extracted data

How it works

From document to structured data in seconds.

Step 1

Upload your document

PDF, image, or scanned document. Any format, any quality level.

Step 2

AI extracts the data

11-stage pipeline with vision models, OCR, math verification, and constrained JSON.

Step 3

Get structured JSON

Schema-matched output with field-level confidence scores and visual grounding.

Built for any document

Understanding first, extraction second. Quilldoc reads your documents the way a human would — then does it faster.

Document Understanding

Semantic analysis and auto-classification. Quilldoc reads and understands your document before extracting a single field.

Generic Extraction

Works on any document without a predefined schema. Drop in something we have never seen and get structured JSON back.

Schema Suggestion

Auto-generates reusable schemas from your documents. Process one doc, get a schema for the next thousand.

Self-Hosting

Full privacy, zero cloud dependency. Run entirely on your infrastructure with your own GPU. MIT licensed.

3-Tier Extraction

Text-first for speed, Claude for accuracy, local VLM for privacy. Automatic fallback through all three tiers.

Visual Grounding

Pixel-level field locations on every extracted value. See exactly where each data point came from in the original document.

Open Source

100% open source. Zero black boxes.

Quilldoc is MIT-licensed and fully transparent. You own your pipeline, your data, and your deployment.

Star on GitHubMIT License

No Vendor Lock-in

Self-host on your own infrastructure. Switch, fork, or extend anytime.

Your Data Stays Yours

Documents never leave your servers. Zero telemetry, zero third-party calls.

Community-Driven

Built in the open with contributions from developers worldwide.

Audit the Code

Full source access. Review every line of the extraction pipeline.

Simple, flat pricing

No per-page fees. No surprises. Cancel anytime.

Starter

$99/mo

For small teams getting started with document automation.

  • 1,000 documents/month
  • 3 document types
  • REST API access
  • Email support
  • Community schemas
Start Free Trial
Most popular

Pro

$499/mo

For teams processing at scale with full flexibility.

  • 10,000 documents/month
  • Unlimited document types
  • Custom schemas
  • Priority API access
  • Priority support
  • Analytics dashboard
  • Webhook integrations
Start Free Trial

Enterprise

Custom

Dedicated infrastructure with premium support.

  • Unlimited documents
  • Dedicated GPU cluster
  • Custom model training
  • SLA guarantee
  • Dedicated support
  • SSO / SAML
  • Custom integrations
Contact Sales

Trusted by Teams

What our users say

We replaced our entire Azure Document Intelligence setup with Quilldoc. Processing 4,000 invoices a month at a fraction of the cost with better accuracy on our edge cases.

MK

Michael Kovacs

Head of Engineering, Fintech Corp

The self-hosting aspect was non-negotiable for us. Our legal documents never leave our infrastructure, and the extraction quality rivals any cloud service we tested.

SR

Sarah Reeves

CTO, LegalFlow

Schema suggestion is a game-changer. We uploaded one purchase order and Quilldoc generated a reusable schema that handled 95% of our vendor formats out of the box.

JT

James Torres

VP of Operations, SupplyChain.io

Quilldoc vs alternatives

The only self-hosted, flat-rate document extraction platform.

FeatureQuilldocAzureLandingAI
Self-hosted option
Unknown doc types (zero-config)
Auto schema suggestion
No per-page pricing
Visual grounding
MIT licensed
Table extraction
Math verification
Setup time5 min1 day2 weeks
Cost per doc$0.01$0.01–0.10Enterprise
Unknown doc handlingAutoManualManual

Ready to extract smarter?

Start processing documents in under 5 minutes. No credit card required.