Data Intelligence Pipeline

Data Intelligence Pipeline

Turn messy, sensitive documents into RAG-ready knowledge graphs. One call. Structured Data Packages. Production-grade quality metrics.

latence-python SDK

v0.2

Pipeline-first SDK. Submit files, get structured Data Packages.

Tutorials & NotebooksCode ExamplesPipeline Guides
Explore on GitHub

Quick Start

1. Get your API Key

Create an account and generate an API key from your dashboard.

Create Account →
2. Install the SDK
pip install latence
View on GitHub →
3. Submit a Pipeline
job = client.pipeline.run( files=["doc.pdf"] )
View Example →
4. Get Data Package
pkg = job.wait_for_completion() print(pkg.document.markdown) pkg.merge(save_to="out.json")
Data Package →

The Pipeline

Core Product

Submit documents. Get back structured, high-quality data packages ready for RAG, agents, and LLM workflows.

Smart Defaults

Just provide files. The intelligent default pipeline runs Document Intelligence → Entity Extraction → Knowledge Graph automatically. No configuration required.

Structured Data Package

Not raw JSON dumps. Organized sections with document markdown, entities, knowledge graphs, quality metrics, and confidence scores.

DAG Execution

Services execute as a directed acyclic graph -- independent branches run in parallel. Track per-stage progress with real-time callbacks. Resumable on partial failure.

ZIP Archive Export

Download results as an organized ZIP archive with markdown documents, entity JSON, knowledge graph data, quality reports, and a human-readable README.

Data Consolidation

Merge all outputs into a single, document-centric JSON with zero redundancy. One call to pkg.merge() and you have production-ready data for downstream consumption.

Full Guide
Pipeline Documentation
Complete guide: smart defaults, step configuration, Data Package structure, fluent builder, async job handling, and pricing.
Document IntelligenceRedactionEntity ExtractionKnowledge Graph
View Guide

Authentication

All API requests require authentication using a Bearer token.

pipeline.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from latence import Latence
 
client = Latence(api_key="YOUR_API_KEY")
 
# Submit files -- smart defaults handle the rest
job = client.pipeline.run(files=["contract.pdf"])
 
# Wait for the composed Data Package
pkg = job.wait_for_completion()
 
# Structured, summarized results
print(pkg.document.markdown) # Clean extracted text
print(pkg.entities.summary) # Entity counts by type
print(pkg.knowledge_graph.relations) # Full relation list
 
pkg.download_archive("./results.zip") # ZIP export
pkg.merge(save_to="./output.json") # Consolidated JSON
1
Get your API key from the Dashboard → API Keys page
2
Install the Python SDK: pip install latence
3
The SDK handles authentication automatically
4
Never share your API key or commit it to version control

Rate Limits

API requests are rate-limited per API key, per service. Limits apply uniformly — there are no tier-based rate limits.

ServiceRate Limit
Document Intelligence500 req / min
Entity Extraction1,000 req / min
Relation Extraction / Knowledge Graph500 req / min
Redaction1,000 req / min
Compression1,500 req / min
Embed (unified)1,500 req / min
Embedding (dense)2,000 req / min
ColBERT1,000 req / min
ColPali1,000 req / min
Chunking5,000 req / min
Dataset Intelligence100 req / min
Enrichment (Coming Soon)2,500 req / min

Rate Limit Headers

x-ratelimit-limitMaximum requests allowed in the window
x-ratelimit-remainingRequests remaining in the current window
x-credits-usedCredits charged for this request
x-credits-remainingYour remaining credit balance

Error Codes

Standard HTTP error codes with additional context.

CodeNameDescription
400Bad RequestInvalid request parameters
401UnauthorizedMissing or invalid API key
402Insufficient CreditsYour credit balance is too low
429Too Many RequestsRate limit exceeded
500Internal Server ErrorUnexpected server error
Error Response ExampleJSON
{
  "error": "Rate limit exceeded",
  "details": "Maximum 500 requests per 60000ms",
  "retry_after": 60
}

Experimental / Developer APIs

Self-Service

Direct access to individual services for development, testing, and custom workflows.

These endpoints are available for development and testing. For production workloads, use the Data Intelligence Pipeline above -- it provides structured Data Packages, quality metrics, and is covered by Enterprise SLAs.

Experimental1 cr
Embedding
Embedding - generates dense vector embeddings with Matryoshka dimension support. Choose your embedding dimension (256, 512, 768, or 1024) to balance between quality and performance.
View Documentation
Experimental5 cr
ColBERT
ColBERT provides state-of-the-art neural retrieval with token-level embeddings. Using late interaction, it delivers superior ranking precision compared to traditional dense embeddings.
View Documentation
Experimental10 cr
ColPali
ColPali combines vision and language models for searching documents where visual context matters. Ideal for documents with charts, diagrams, tables, and complex formatting.
View Documentation
Experimental3 cr
Compression
Compression - dramatically reduces token count while preserving meaning. Using TOON encoding and intelligent compression, achieve up to 80% token reduction.
View Documentation
Experimental10 cr
Document Intelligence
Document Intelligence (V2) uses advanced AI models and layout analysis for document processing. Extract text, tables, and structured content from PDFs, images, and Office documents.
View Documentation
Experimental5 cr
Entity Extraction
Zero-shot entity extraction using NER-inspired approach. Extract any entity type without training - just provide labels or let AI generate them automatically.
View Documentation
Experimental5 cr
Relation Extraction
Extract relations and build structured knowledge graphs from unstructured text. Discover entity relationships, output in RDF/Turtle, Neo4j property graph, or custom formats.
View Documentation
Experimental10 cr
Redaction
GDPR-compliant PII detection and redaction. Automatically find and remove sensitive information from text with configurable masking or replacement.
View Documentation
Experimental0 cr
Chunking
Split text into semantically meaningful chunks with 4 strategies. Character and token are free.
View Documentation
Coming Soon
Enrichment (Coming Soon)
10-dimensional per-chunk and corpus-level feature enrichment for retrieval-optimized data. Coming Soon.
Coming Soon
Experimental51.85 cr
Dataset Intelligence
Corpus-level knowledge graph construction, ontology induction, and incremental dataset ingestion. Transforms pipeline outputs into entities, relations, graph embeddings, and ontological concepts.
View Documentation