OctonData Document Processing API

Extract structured data from documents with high accuracy

Quick Start: Which Endpoint Should I Use?

Use Case Endpoint Output
Get document text/markdown POST /api/v1/parse Markdown + chunks
Extract form data (invoices, IDs, tax forms) POST /api/v1/extract Structured JSON fields
Process large files or batches POST /api/v1/extract/async Job ID (poll for results)

Available Endpoints

POST/api/v1/parse

Convert documents to clean Markdown with structure. Best for full text extraction and search indexing. Returns content in chunks with page coordinates.

POST/api/v1/extract

Extract structured data from forms, invoices, IDs, and tax documents. Returns field-level data with confidence scores. Auto-detects document type.

POST/api/v1/extract/async

Background processing for large files. Submit a job and poll for results or receive webhook notification on completion.

POST/api/v1/upload/request

Get a signed URL for uploading files larger than 100MB. Returns document_id for processing.

GET/health

Health check endpoint. Returns service status and component health.