Document Intelligence
Experimental10 creditsDocument Intelligence (V2) uses advanced AI models and layout analysis for document processing. Extract text, tables, and structured content from PDFs, images, and Office documents.
Production Recommendation
This is a direct endpoint for development and testing. For production workloads, use the Data Intelligence Pipeline -- it provides structured Data Packages with quality metrics, is async by default, and is covered by Enterprise SLAs.
Overview
Document Intelligence (V2) uses advanced AI models and layout analysis for document processing. Extract text, tables, and structured content from PDFs, images, and Office documents.
Key features:
- •Supports PDF, images, DOCX, XLSX, PPTX, HTML
- •Layout detection and table extraction
- •Chart and seal recognition
- •Multiple output formats (Markdown, JSON, HTML, XLSX)
- •Performance mode with auto-refinement
API Reference
https://api.latence.ai/api/v1/document_intelligence/processRequest Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_url | string | — | Public URL to document | |
file_base64 | string | — | Base64-encoded file data | |
filename | string | — | Filename for type detection | |
mode | string | — | Processing mode: default or performance | |
output_format | string | — | Output: markdown, json, html, xlsx | |
max_pages | integer | — | Maximum pages to process | |
pipeline_options | object | — | Pipeline configuration |
Response Fields
| Field | Type | Description |
|---|---|---|
content | string | |
pages_processed | number | |
output_format | string | |
success | boolean | |
usage | object |
Response Example
{
"content": "# Document Title\n\nExtracted text content...",
"pages_processed": 3,
"output_format": "markdown",
"success": true,
"usage": { "credits": 15.0 }
}Code Examples
from latence import Latence
client = Latence(api_key="YOUR_API_KEY")
# Process a document from URL
result = client.experimental.document_intelligence.process(
file_url="https://example.com/document.pdf",
output_format="markdown" # or "json", "html", "xlsx"
)
print(result.content) # Extracted text in markdown format
print(f"Pages processed: {result.pages_processed}")
# Or from a local file with performance mode
result = client.experimental.document_intelligence.process(
file_path="/path/to/document.pdf",
mode="performance", # Auto-refinement enabled
pipeline_options={
"use_chart_recognition": True,
"use_seal_recognition": True
}
)Explore Tutorials & Notebooks
Deep-dive examples and interactive notebooks in our GitHub repository
Looking for production-grade processing?
The Data Intelligence Pipeline chains services automatically and returns structured Data Packages.