Trace
NEWExperimentalGroundedness & phantom-hallucination scoring
Catch hallucinations, drift, and unused context before your users do. Three lanes over one pod: RAG groundedness, agentic-code phantom scoring with cross-turn session chaining, and a stateless session rollup for live scoreboards.
Quickstart
Install the SDK, export your API key, and score your first RAG answer.
from latence import Latence
client = Latence() # reads LATENCE_API_KEY from the environment
r = client.experimental.trace.rag(
response_text="Paris is the capital of France.",
raw_context="France's capital city is Paris.",
)
print(r.score) # 0.0 - 1.0
print(r.band) # "green" | "amber" | "red" | "unknown"
print(r.context_coverage_ratio) # how much of the answer is grounded in context
print(r.context_unused_ratio) # how much retrieved context was dead weight
You now know whether the answer was grounded, how much of your retrieved context was actually used, and whether to trust it. Keep reading for the code and rollup lanes.
Three lanes, one mental model
Pick the lane that matches what your app is doing right now. The URL pins the lane server-side, so payloads cannot be cross-wired.
Did the answer actually come from the context you retrieved?
r = client.experimental.trace.rag(
response_text=answer,
raw_context=ctx,
)
print(r.band, r.score)Deep dive Catch phantom APIs and drift across agentic coding turns, with opaque session chaining.
t = client.experimental.trace.code(
response_text=patch,
raw_context=repo,
response_language_hint="python",
)
print(t.session_signals
.recommendation)Deep dive Stateless, sub-ms aggregation of N per-turn outputs into one scoreboard.
rollup = client.experimental
.trace.rollup(turns=[t1, t2])
print(rollup.noise_pct,
rollup.retrieval_waste_pct)Deep dive Signals → actions
The response fields are routing rules, not diagnostics. Read them and know exactly what to upgrade next.
| Signal | What it means | Next step |
|---|---|---|
band amber/red, low context_coverage_ratio | The answer isn't grounded in what you retrieved. | Upgrade data quality → Clean upstream documents with the Data Intelligence Pipeline. |
High context_unused_ratio, retrieval_waste_pct > 30% | Your retriever is shipping the wrong chunks. | Upgrade retrieval → Swap in ColSearch, our OSS late-interaction retrieval engine. |
session_signals.recommendation = re_anchor / fresh_chat | Session drift is compounding across agent turns. | Reset the agent's context on the next turn — hand back a fresh session_state or start a new session. |
RAG lane — trace.rag()
Score a response for groundedness against retrieval context. At least one of raw_context, chunk_ids, or support_units must be supplied.
https://api.latence.ai/api/v1/trace/ragRequest parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
response_text | string | required | — | The generated response text to score. |
query_text | string | — | — | Optional query for query-conditioned diagnostics. |
raw_context | string | — | — | Raw context string to segment and encode on demand. |
chunk_ids | list[str | int] | — | — | External chunk ids whose stored support vectors to reuse (fast path). |
support_units | list[SupportUnitInput | dict] | — | — | Structured premise lane with per-unit provenance. |
primary_metric | "reverse_context" | "triangular" | — | reverse_context | Headline metric selector. |
evidence_limit | int (1-128) | — | 8 | Maximum top evidence links in the sparse response. |
coverage_threshold | float (0.0-1.0) | — | 0.5 | Per-unit reverse-context threshold. |
segmentation_mode | "sentence" | "sentence_packed" | "paragraph" | — | sentence_packed | How raw_context is segmented. |
attribution_mode | "closed_book" | "open_domain" | — | closed_book | Evidence policy. |
include_triangular_diagnostics | bool | — | true | Include query-conditioned diagnostics. |
heatmap_format | "none" | "data" | "html" | — | data | Heatmap surface. `html` also returns a self-contained <div>. |
verification_samples | list[str] | — | — | Alternate responses for semantic-entropy fusion. |
session_id | string | — | — | Opaque, hashed session identifier (never raw user text). |
request_id | string | — | — | Optional tracking ID. |
verbose | bool | — | false | Return full diagnostics instead of the compact summary. |
return_job | bool | — | false | Return JobSubmittedResponse for async polling. |
Response fields
| Field | Type | Description |
|---|---|---|
score | float | null | Top-level groundedness score (0-1). |
primary_metric | string | null | Metric backing `score`. Typically reverse_context or triangular; pod may echo a derived name like groundedness_v2. |
band | string | null | Risk band: green / amber / red / unknown. |
structured_score | dict | null | Per-component score breakdown. |
nli_aggregate | float | null | Aggregate NLI entailment score. |
context_coverage_ratio | float | null | Fraction of the response grounded in context. |
context_usage_ratio | float | null | Fraction of context actually used. |
context_unused_ratio | float | null | Fraction of context left unused — the retrieval-quality signal. |
context_uncertain_ratio | float | null | Fraction with uncertain grounding. |
support_units_usage | object | null | Aggregate counts of used / unused / uncertain support units. |
support_units | list | null | Per-unit verdicts: source_id, usage_state, coverage_score. |
reason | string | null | Human-readable reason for the band. |
warnings | list[str] | null | Non-fatal scorer warnings. |
file_attribution | object | null | Per-file owner share and reason-code histogram. |
heatmap | object | null | Structured heatmap payload. |
heatmap_html | string | null | Self-contained <div> when heatmap_format="html". |
latency_ms | float | null | Pod-side scoring latency. |
scoring_mode | "rag" | "code" | Lane echoed back by the pod. |
session_id | string | null | Echoed session id. |
version | string | null | Pod handler version tag. |
Structured premises with SupportUnitInput
Pass per-unit provenance (source_id, speaker, timestamp) and the scorer propagates it back onto every support_units verdict.
from latence import Latence, SupportUnitInput
client = Latence()
units = [
SupportUnitInput(text="Paris is the capital of France.", source_id="doc-42"),
SupportUnitInput(text="It sits on the Seine.", source_id="doc-42"),
{"text": "Population: 2.1M.", "source_id": "wiki"},
]
r = client.experimental.trace.rag(
response_text="Paris, France's capital, sits on the Seine.",
support_units=units,
)
for u in (r.support_units or []):
print(u.source_id, u.usage_state, u.coverage_score)
Client-side validation (before the HTTP call)
- •
ValueError: `response_text` must be a non-empty string. - •
ValueError: Trace scoring requires at least one of: raw_context, chunk_ids, or support_units.
raw_context and support_units.Code lane — trace.code()
Score an agentic-coding turn for phantom APIs and drift. Superset of RAG with three code-lane additions. Chain turns by round-tripping the opaque next_session_state into the next call — the payload is fully opaque to the client.
https://api.latence.ai/api/v1/trace/codeAdditional request parameters (on top of all RAG fields)
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
response_language_hint | "python" | "typescript" | "go" | "rust" | ... | — | — | Hint for the AST extractor. |
emit_chunk_ownership | bool | — | false | Return per-unit ownership table (adds a few KB over the wire). |
session_state | SessionState | dict | — | — | Echo the previous turn's next_session_state verbatim to chain turns. |
Additional response fields (on top of all RAG response fields)
| Field | Type | Description |
|---|---|---|
code_lane | object | null | Composite / AST / NLI diagnostics for the turn. |
next_session_state | SessionState | null | Opaque state to pass into the next code() call. |
session_signals | object | null | EMA groundedness, drift, phantom rate, recommendation (continue / re_anchor / fresh_chat). |
Multi-turn session chaining
from latence import Latence
client = Latence()
turn1 = client.experimental.trace.code(
response_text="def add(a, b): return a + b",
raw_context="# utils.py\ndef sub(a, b): return a - b",
response_language_hint="python",
)
turn2 = client.experimental.trace.code(
response_text="def mul(a, b): return a * b",
raw_context="# utils.py\ndef sub(a, b): return a - b",
response_language_hint="python",
session_state=turn1.next_session_state, # chain turns
)
print(turn2.band)
print(turn2.session_signals.recommendation) # continue | re_anchor | fresh_chat
print(turn2.session_signals.ema_groundedness)
Client-side validation (before the HTTP call)
- •
ValueError: `response_text` must be a non-empty string. - •
ValueError: Trace scoring requires at least one of: raw_context, chunk_ids, or support_units.
response_text, raw_context, query_text, and support_units.Rollup lane — trace.rollup()
Aggregate N per-turn outputs into a session scoreboard. Stateless, CPU-only, sub-millisecond on the pod — safe to call on every keystroke.
https://api.latence.ai/api/v1/trace/rollupRequest parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
turns | list[TraceResponse | dict] | required | — | Ordered per-turn outputs (response objects from .rag() / .code() work directly). |
session_id | string | — | — | Echoed on the response. |
heatmap_format | "none" | "data" | "html" | — | data | Session-level heatmap surface. |
request_id | string | — | — | Optional tracking ID. |
Response fields
| Field | Type | Description |
|---|---|---|
turns_processed | int | null | Number of per-turn outputs aggregated. |
noise_pct | float | null | Fraction of turns flagged as noise. |
model_drift_pct | float | null | Fraction of turns with model drift. |
retrieval_waste_pct | float | null | Fraction of retrieved context left unused across the session. |
reason_code_histogram | dict[str, int] | null | Count of each reason code over the window. |
recommendations | list[str] | null | Session-level recommendations. |
risk_band_trail | list[str] | null | Risk band per turn, chronological. |
drift_trend | dict[str, Any] | null | Drift trajectory summary ({last, max, mean, min, ...}). |
top_dead_files | list | null | Files consistently marked dead-weight. |
heatmap | object | null | Session-level heatmap payload. |
heatmap_html | string | null | Self-contained <div> when heatmap_format="html". |
Example
from latence import Latence
client = Latence()
# turns can be TraceResponse objects from .rag() / .code(), or plain dicts.
rollup = client.experimental.trace.rollup(turns=[turn1, turn2])
print(rollup.noise_pct) # fraction of turns flagged as noise
print(rollup.retrieval_waste_pct) # fraction of retrieved context left unused
print(rollup.model_drift_pct) # fraction of turns with drift
print(rollup.reason_code_histogram) # why the turns failed, aggregated
print(rollup.risk_band_trail) # per-turn band, chronological
print(rollup.recommendations) # actionable session-level advice
Client-side validation (before the HTTP call)
- •
ValueError: `turns` must be a non-empty list.
scoring_mode injection) — you can mix RAG and code turns in the same rollup.Async and background jobs
Every method has an awaitable twin under AsyncLatence. Pass return_job=True on either rag() or code() to fire-and-forget and poll later (rollup is always inline).
from latence import AsyncLatence
async with AsyncLatence() as client:
r = await client.experimental.trace.rag(
response_text="Paris is the capital of France.",
raw_context="France's capital city is Paris.",
)
print(r.score, r.band)
# Background job: fire-and-forget, poll later via client.jobs.wait().
job = client.experimental.trace.rag(
response_text="...",
raw_context="...",
return_job=True,
)
result = client.jobs.wait(job.job_id)
Next steps
Trace detected a data-quality bottleneck?
When context_coverage_ratio is low or bands trend amber/red, the fix is upstream. Run the Data Intelligence Pipeline over your documents to get clean markdown, resolved entities, and a typed knowledge graph in one call.