Enrichment (Coming Soon)
Experimental0.5 credits10-dimensional per-chunk and corpus-level feature enrichment for retrieval-optimized data. Coming Soon.
Production Recommendation
This is a direct endpoint for development and testing. For production workloads, use the Data Intelligence Pipeline -- it provides structured Data Packages with quality metrics, is async by default, and is covered by Enterprise SLAs.
Overview
The Enrichment service computes 10 feature groups per chunk and at corpus level for retrieval-optimized data. This service requires corpus-level streaming architecture and will be available in a future release.
**Feature groups:** quality, density, structural, semantic, compression, zipf, coherence, spectral, drift, redundancy
**Status:** Coming Soon — corpus-level processing requires a dedicated architecture for streaming data from B2, efficient computation across 100k+ chunks, and concurrent multi-user processing.
When to Use
Use enrichment when you need chunk-level metadata (quality, semantic role, centrality) for advanced retrieval strategies like adaptive context selection, quality filtering, or redundancy detection. This service is not yet available.
API Reference
https://api.latence.ai/api/v1/enrichment/chunkRequest Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text | string | — | Text to chunk (max 5,000,000 characters) | |
strategy | stringcharactertokensemantichybrid | hybrid | Chunking strategy | |
chunk_size | integer | 512 | Target chunk size (64-8192) Range: 64 - 8192 | |
chunk_overlap | integer | 50 | Overlap between adjacent chunks Range: 0 - 8191 | |
min_chunk_size | integer | 64 | Minimum chunk size — smaller chunks are discarded Range: 1 - 8192 | |
request_id | string | — | Optional request tracking ID |
Response Fields
| Field | Type | Description |
|---|---|---|
success | boolean | Whether the request succeeded |
data.chunks | array | Array of chunk objects with content, index, start, end, char_count, token_count, semantic_score, section_path |
data.num_chunks | integer | Total number of chunks produced |
data.strategy | string | Chunking strategy used |
data.chunk_size | integer | Target chunk size parameter |
data.processing_time_ms | number | Processing time in milliseconds |
usage | object | Credit usage information |
Response Example
{
"success": true,
"data": {
"chunks": [
{
"content": "Introduction to machine learning...",
"index": 0,
"start": 0,
"end": 512,
"char_count": 498,
"token_count": 127,
"semantic_score": 0.87,
"section_path": ["Chapter 1", "Section 1.1"]
}
],
"num_chunks": 42,
"strategy": "hybrid",
"chunk_size": 512,
"processing_time_ms": 45.2
},
"usage": { "credits": 0.5 }
}https://api.latence.ai/api/v1/enrichment/enrichRequest Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text | string | — | Text to enrich (max 5,000,000 characters) | |
strategy | stringcharactertokensemantichybrid | hybrid | Chunking strategy | |
chunk_size | integer | 512 | Target chunk size (64-8192) Range: 64 - 8192 | |
chunk_overlap | integer | 50 | Overlap between adjacent chunks Range: 0 - 8191 | |
min_chunk_size | integer | 64 | Minimum chunk size Range: 1 - 8192 | |
encoding_format | stringfloatbase64 | float | Embedding output format | |
features | array | — | Feature groups to compute (default: all 10). Valid: quality, density, structural, semantic, compression, zipf, coherence, spectral, drift, redundancy | |
request_id | string | — | Optional request tracking ID |
Response Fields
| Field | Type | Description |
|---|---|---|
success | boolean | Whether the request succeeded |
data.chunks | array | Array of chunk objects |
data.num_chunks | integer | Total number of chunks |
data.embeddings | array | One embedding per chunk (float arrays or base64 strings) |
data.embedding_dim | integer | Embedding dimension (e.g. 1024) |
data.encoding_format | string | Embedding format used |
data.features | object | Feature groups keyed by name: quality, density, structural, semantic, compression, zipf, coherence, spectral, drift, redundancy |
data.strategy | string | Chunking strategy used |
data.processing_time_ms | number | Processing time in milliseconds |
usage | object | Credit usage information |
Response Example
{
"success": true,
"data": {
"chunks": [
{
"content": "Introduction to machine learning...",
"index": 0,
"start": 0,
"end": 512,
"char_count": 498,
"token_count": 127
}
],
"num_chunks": 42,
"embeddings": [[0.012, -0.034, 0.056]],
"embedding_dim": 1024,
"encoding_format": "float",
"features": {
"quality": {
"per_chunk": [{"coherence_score": 0.72, "is_short": false, "is_long": false, "word_count": 87, "avg_word_length": 5.2}],
"aggregate": {"mean_coherence": 0.65, "short_chunks": 3, "long_chunks": 0}
},
"semantic": {
"per_chunk": [{"rhetorical_role": "definition", "rhetorical_confidence": 0.82, "centrality": 0.67}],
"aggregate": {"role_distribution": {"definition": 0.23}, "mean_centrality": 0.55}
}
},
"strategy": "hybrid",
"processing_time_ms": 1250.4
},
"usage": { "credits": 0.5 }
}Error Handling
All errors return a JSON body with error and details fields.
| Status | Code | Description |
|---|---|---|
| 400 | MISSING_FIELDMissing required field: text | The text field is required |
| 400 | INVALID_STRATEGYInvalid strategy. Must be character, token, semantic, or hybrid | Unknown chunking strategy |
| 400 | INVALID_FEATURESUnknown features: ['invalid']. Valid: ['quality', ...] | Invalid feature group name |
| 429 | RATE_LIMITEDRate limit exceeded | Too many requests — retry after the Retry-After interval |
Billing
Pricing Formula
cost = (characters / 1,000,000) × rateAdd-ons & Multipliers
| Option | Price | Description |
|---|---|---|
| Chunk task | $0.10 / 1M chars | Text chunking only |
| Enrich task | $0.50 / 1M chars | Chunk + embed + 10 feature groups |
Pricing Examples
Code Examples
from latence import Latence
client = Latence(api_key="YOUR_API_KEY")
# Chunk a document
chunks = client.experimental.enrichment.chunk(
text="Your document text here...",
strategy="hybrid",
chunk_size=512,
)
print(f"{chunks.data.num_chunks} chunks")
# Full enrichment with features
result = client.experimental.enrichment.enrich(
text="Your document text here...",
strategy="hybrid",
features=["quality", "semantic", "drift"],
)
for chunk in result.data.chunks:
print(f" [{chunk.index}] {chunk.content[:80]}...")
print(f"Mean coherence: {result.data.features['quality']['aggregate']['mean_coherence']:.2f}")Best Practices
Use hybrid strategy for the best balance of speed and boundary quality
Start with chunk_size=512 and adjust based on your downstream model's context window
Use the features parameter to compute only the feature groups you need — reduces processing time
For large documents (>1MB), prefer character or hybrid strategy over semantic
Use the quality feature to filter low-coherence chunks before indexing
Use drift detection to identify natural document sections without relying on headings
Explore Tutorials & Notebooks
Deep-dive examples and interactive notebooks in our GitHub repository
Looking for production-grade processing?
The Data Intelligence Pipeline chains services automatically and returns structured Data Packages.