What is Quilldoc?

Quilldoc extracts structured data from any document — invoices, receipts, contracts, bank statements, medical reports, and more. Upload a PDF, get back clean JSON with every field extracted, verified, and grounded to the source.

Any document type

Works with any document type — no templates or pre-configuration needed. Quilldoc understands the document first, then extracts.

Built for developers

Accurate, fast document processing at scale. Simple REST API, structured JSON output, confidence scores on every field.

# Upload a document, get structured JSON

curl -X POST https://api.quilldoc.studio/extract \
  -H "X-API-Key: sk_live_..." \
  -F "file=@invoice.pdf"

{
  "doc_type": "invoice",
  "confidence": 0.96,
  "data": {
    "vendor_name": { "value": "Acme Corp", "confidence": 0.98 },
    "total_amount": { "value": 1250.00, "confidence": 0.96 },
    "invoice_date": { "value": "2026-02-15", "confidence": 0.92 }
  }
}

Key Concepts

Core terms and ideas you will encounter throughout the Quilldoc platform and API.

Document Processing Pipeline

The 11-stage AI pipeline that ingests, understands, and extracts data from documents. Each stage — from PDF parsing and OCR through extraction and verification — runs automatically and produces auditable intermediate results.

Knowledge Boards

Collections of documents you can chat with. Upload a set of related documents to a board, then ask questions and get cited answers grounded in the source material.

Schemas

Define what fields to extract from a document type (e.g., invoice_number, total_amount, line_items). Quilldoc can also auto-suggest schemas for unknown document types based on its understanding of the content.

Confidence Scores

Every extracted field gets a confidence score from 0 to 1. High confidence (≥0.85) means auto-approved. Medium (0.60–0.85) means the field needs review. Low (<0.60) means it is flagged for human verification.

Grounding

Every extracted value is linked back to its exact location in the source document — bounding boxes, page numbers, and source text. You can verify any field by clicking through to where it appears in the original PDF.

Document Understanding

Before extraction, AI analyzes each document to identify its purpose, structure, key entities, and logical sections. This enables zero-config processing of any document type without pre-defined templates.

Review Queue

Documents with fields below the confidence threshold are automatically routed to a review queue. Reviewers can approve, reject, or correct individual extracted fields, and corrections feed back into the system to improve future accuracy.

API Reference

The Quilldoc REST API lets you upload documents, extract structured data, manage schemas, and integrate with your workflows.

Base URL: https://api.quilldoc.studio

Getting Started

All API requests require an API key passed via the X-API-Key header. Create an API key from the dashboard or the API Keys endpoint.

curl -X POST https://api.quilldoc.studio/documents/upload \
  -H "X-API-Key: your-api-key" \
  -F "file=@invoice.pdf"

Authentication

All endpoints require an API key in the X-API-Key header. Keys can be created, listed, and revoked via the API Keys endpoints below.

# Include in every request
-H "X-API-Key: sk_live_..."

Documents API

Upload, process, and retrieve documents and their extraction results.

POST/documents/upload

Upload a PDF document for processing. Optionally specify a schema to use.

Request Body

Content-Type: multipart/form-data

file: <binary PDF>
schema: "invoice"        # optional
priority: "high"         # optional: low | normal | high

Response

{
  "document_id": "doc_abc123",
  "status": "queued",
  "created_at": "2026-03-06T10:00:00Z"
}

GET/documents

List all documents with optional filtering and pagination.

Response

{
  "documents": [
    {
      "id": "doc_abc123",
      "filename": "invoice_001.pdf",
      "status": "completed",
      "doc_type": "invoice",
      "confidence": 0.94,
      "created_at": "2026-03-06T10:00:00Z"
    }
  ],
  "total": 142,
  "page": 1,
  "per_page": 20
}

GET/documents/{id}/status

Get the current processing status and stage of a document.

Response

{
  "document_id": "doc_abc123",
  "status": "processing",
  "current_stage": "extraction",
  "stage_number": 8,
  "total_stages": 11,
  "started_at": "2026-03-06T10:00:05Z"
}

GET/documents/{id}/result

Get the extracted data, confidence scores, and grounding information.

Response

{
  "document_id": "doc_abc123",
  "doc_type": "invoice",
  "confidence": 0.94,
  "data": {
    "vendor_name": { "value": "Acme Corp", "confidence": 0.98 },
    "total_amount": { "value": 1250.00, "confidence": 0.96 },
    "invoice_date": { "value": "2026-02-15", "confidence": 0.92 }
  },
  "grounding": { ... }
}

GET/documents/{id}/pdf

Download the original uploaded PDF file.

PATCH/documents/{id}/correct

Submit manual corrections for extracted fields. Used by the review queue.

Request Body

{
  "corrections": {
    "vendor_name": "Acme Corporation",
    "total_amount": 1250.50
  }
}

Response

{
  "document_id": "doc_abc123",
  "status": "corrected",
  "updated_fields": ["vendor_name", "total_amount"]
}

POST/documents/{id}/retry

Retry processing for a failed document.

Response

{
  "document_id": "doc_abc123",
  "status": "queued"
}

POST/documents/{id}/chat

Ask a natural language question about the document contents.

Request Body

{
  "message": "What is the payment due date?"
}

Response

{
  "answer": "The payment due date is March 15, 2026.",
  "confidence": 0.91,
  "source_page": 1
}

POST/documents/{id}/accept-schema

Accept a suggested schema for this document type.

Response

{
  "schema_name": "invoice_v2",
  "status": "accepted"
}

GET/documents/{id}/related

Find related documents via cross-document matching.

Response

{
  "related": [
    { "id": "doc_def456", "relation": "same_vendor", "similarity": 0.89 }
  ]
}

Export API

Export extraction results in multiple formats.

GET/documents/{id}/export/{format}

Export extracted data. Supported formats: json, csv, excel.

Response

# format = json
{
  "vendor_name": "Acme Corp",
  "total_amount": 1250.00,
  "line_items": [...]
}

# format = csv
Content-Type: text/csv

# format = excel
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Schemas API

Manage document schemas that define which fields to extract.

GET/schemas

List all available schemas including built-in and custom ones.

Response

{
  "schemas": [
    { "name": "invoice", "type": "built-in", "field_count": 12 },
    { "name": "receipt", "type": "built-in", "field_count": 8 },
    { "name": "custom_po", "type": "custom", "field_count": 15 }
  ]
}

GET/schemas/{name}

Get the full schema definition including field types and descriptions.

Response

{
  "name": "invoice",
  "fields": [
    { "name": "vendor_name", "type": "string", "required": true },
    { "name": "total_amount", "type": "number", "required": true },
    { "name": "line_items", "type": "array", "required": false }
  ]
}

POST/schemas

Create a custom schema for a new document type.

Request Body

{
  "name": "purchase_order",
  "fields": [
    { "name": "po_number", "type": "string", "required": true },
    { "name": "vendor", "type": "string", "required": true },
    { "name": "line_items", "type": "array", "required": true }
  ]
}

Response

{
  "name": "purchase_order",
  "status": "created"
}

DELETE/schemas/{name}

Delete a custom schema. Built-in schemas cannot be deleted.

Batch API

Upload and process multiple documents in a single batch.

POST/batch/upload

Upload multiple PDFs for batch processing.

Request Body

Content-Type: multipart/form-data

files: [<binary PDF>, <binary PDF>, ...]
schema: "invoice"   # optional

Response

{
  "batch_id": "batch_xyz789",
  "document_count": 5,
  "status": "processing"
}

GET/batch/{id}

Get batch processing status and summary.

Response

{
  "batch_id": "batch_xyz789",
  "status": "processing",
  "total": 5,
  "completed": 3,
  "failed": 0,
  "pending": 2
}

GET/batch/{id}/documents

List all documents in a batch with their individual statuses.

Response

{
  "documents": [
    { "id": "doc_001", "filename": "inv_1.pdf", "status": "completed" },
    { "id": "doc_002", "filename": "inv_2.pdf", "status": "processing" }
  ]
}

Webhooks API

POST/webhooks

Request Body

{
  "url": "https://your-app.com/webhook",
  "events": ["document.completed", "document.failed"]
}

Response

{
  "webhook_id": "wh_abc123",
  "url": "https://your-app.com/webhook",
  "events": ["document.completed", "document.failed"],
  "status": "active"
}

GET/webhooks

List all registered webhooks.

Response

{
  "webhooks": [
    {
      "id": "wh_abc123",
      "url": "https://your-app.com/webhook",
      "events": ["document.completed", "document.failed"],
      "status": "active"
    }
  ]
}

DELETE/webhooks/{id}

Delete a registered webhook.

Review Queue

Manage documents flagged for human review due to low confidence.

GET/review-queue

Get all documents in the review queue, sorted by priority.

Response

{
  "items": [
    {
      "document_id": "doc_abc123",
      "reason": "low_confidence",
      "confidence": 0.62,
      "flagged_fields": ["total_amount", "tax"],
      "created_at": "2026-03-06T10:05:00Z"
    }
  ],
  "total": 8
}

PATCH/review-queue/{id}

Resolve a review queue item by approving or correcting the extraction.

Request Body

{
  "action": "approve",
  "corrections": {}
}

Response

{
  "document_id": "doc_abc123",
  "status": "resolved"
}

API Keys

Create and manage API keys for authentication.

POST/api-keys

Create a new API key. The full key is only shown once.

Request Body

{
  "name": "production-key"
}

Response

{
  "key": "sk_live_abc123def456...",
  "prefix": "sk_live_abc",
  "name": "production-key",
  "created_at": "2026-03-06T10:00:00Z"
}

GET/api-keys

List all API keys (prefix only, full key is never shown again).

Response

{
  "keys": [
    { "prefix": "sk_live_abc", "name": "production-key", "created_at": "2026-03-06T10:00:00Z" }
  ]
}

DELETE/api-keys/{prefix}

Revoke an API key by its prefix.

Health Checks

Monitor service health and readiness.

GET/health

Full health check including database, Redis, and MinIO status.

Response

{
  "status": "healthy",
  "version": "2.3.0",
  "database": "connected",
  "redis": "connected",
  "minio": "connected"
}

GET/health/live

Kubernetes liveness probe. Returns 200 if the service is running.

GET/health/ready

Kubernetes readiness probe. Returns 200 if the service can accept traffic.

Utility Endpoints

Direct extraction and parsing without document storage.

POST/extract

Run extraction directly on an uploaded file without storing it. Useful for testing.

Request Body

Content-Type: multipart/form-data

file: <binary PDF>
schema: "invoice"   # optional

Response

{
  "doc_type": "invoice",
  "confidence": 0.93,
  "data": { ... }
}

POST/parse

Parse a document into structured markdown without extraction.

Request Body

Content-Type: multipart/form-data

file: <binary PDF>

Response

{
  "pages": 3,
  "markdown": "# Invoice\n\nVendor: Acme Corp\n...",
  "tables": [...]
}