Data Convergence Engine

kusuka — to weave [Swahili]

Two paths. One truth.
The gap is the insight.

You have multiple data sources measuring the same thing. Kusuka weaves them together and shows you exactly where the threads don't meet — where the gap is, why it exists, what kind of fix it needs, and how far it's spread.

Try Kusuka See the proof Mwenendo — track drift

Try Kusuka

Two files. One truth. See where the threads don't meet.

Upload two spreadsheets to reconcile, paste JSON for a quick comparison, or build a full position matrix. Everything runs in your browser — no data leaves your machine.

Kusuka Strand Run

Preset:

File A (Source of Truth)

Drop CSV here or browse

File B (Compare Against)

Drop CSV here or browse

Path A

Path B

Positions (JSON array)

Convergence Matrix

Why Kusuka

Not better anomaly detection. A different question entirely.

Every monitoring tool asks "does this value look wrong?" Kusuka asks "do two independent paths agree?" That's a fundamentally harder question to fool.

Traditional monitoring

Needs historical data to learn "normal"
Tells you something is wrong
Gives you a health score or pass/fail
You diagnose after detection
Vertical-specific — different tool per domain
Cold start problem — weeks before useful

Kusuka

Needs two paths. Day one value. No training period
Tells you what KIND of wrong
Gives you a position-level convergence matrix
The shape of the gap IS the diagnosis
Domain-agnostic — same engine, any pipeline
First run finds things

How Kusuka Works

Connect. Weave. Read the matrix.

Kusuka runs a Strand — a structured comparison of two data paths at every position in your pipeline. The output is a convergence matrix you can read like a scan.

01 — Connect

Define your two paths

Any two independent measurements of the same system. Source A vs Source B. Model output vs ground truth. This month vs last month. You choose the paths.

02 — Weave

Kusuka computes the matrix

Every position — each row (entity) by each column (pipeline layer) — gets a convergence glyph. Agreement, approximation, drift, failure, or missing data. Nothing hides.

03 — Classify

Gaps name themselves

Block of nulls = source problem. Systematic column = volume issue. Gradual drift = formula error. Row-localised = orphan record. The shape tells you what to fix and who should fix it.

04 — Track

Watch convergence over time

Run Kusuka regularly. The change in convergence between runs tells you whether your pipeline is healing or degrading — before downstream users notice.

Example: a Kusuka run across 5 positions and 6 pipeline layers

Source

Ingest

Transform

Model

Output

Report

Position A

Position B

Position C

Position D

Position E

= Converged ~ Approximate > Path A exceeds B < Path B exceeds A X Diverged . Missing data o Pending

Reading the matrix: Position B shows drift starting at the Transform layer — a formula or logic issue that propagates downstream. Position C is entirely null — a source problem, the data never arrived. Position D converges through Transform but fails at Model — something broke in that specific layer. You don't need to investigate to know what kind of problem each is. The pattern tells you.

What Kusuka Gives You

Not just detection. Diagnosis.

◉

WHERE

The exact position and layer where the gap first appeared. Not "something is off" — "it started here, at this step, on this entity."

◆

WHY

Gap classification. Block null = missing source. Systematic column = volume problem. Drift = formula error. The shape of disagreement IS the diagnosis.

■

HOW FAR

Ripple tracking. A gap at layer 3 propagates to layers 4, 5, 6. Kusuka shows the full chain — so you fix the root, not the symptoms.

○

WHAT KIND

Every gap gets a class. Each class maps to a team and a fix type. Source team for nulls. Engineering for drift. Data for volume. No ambiguity about ownership.

∆

IS IT HEALING

Run Kusuka over time. The change in convergence between runs — positive, negative, or flat — tells you if your fixes are working before anyone downstream notices.

Validated

Same engine. Different domains. Proven results.

We validate Kusuka against real problems across unrelated domains. Each study below uses real data and is fully interactive. The engine doesn't change — the domains do.

Transport & Traffic

Road incident detection

36-edge traffic network simulated with SUMO. Blocked a road at minute 20. Kusuka identified the exact segment, separated sensor noise from real congestion, and tracked the wave across neighbouring roads.

Path A: Ground truth vehicle counts
Path B: Sensor network (70% coverage, 15% noise, 5% dropout)

Explore the study →

Weather & Climate

Cold front detection

72 hours of temperature data across 24 Kenyan weather stations. When a cold front arrived, satellite estimates lagged behind ground readings. Kusuka showed exactly which stations were in the anomaly zone.

Path A: Ground weather stations
Path B: Satellite-derived temperature estimates

Explore the study →

Seismic & Geothermal

Earthquake localisation

120-minute window across the Rift Valley. M4.2 earthquake at minute 45, aftershock at minute 75. Kusuka tracked wave propagation, found monitoring blind spots, and separated instrument noise from real ground motion.

Path A: Primary seismometers (precise, 60% coverage)
Path B: Secondary accelerometers (noisy, 90% coverage)

Explore the study →

Quantitative Finance

Forex pipeline validation

6 trading episodes across 7 pipeline layers (data, signal, gate, fill, position, exit, outcome). Kusuka found 83% convergence — and one orphan row from a deprecated code path that would have gone unnoticed.

Path A: Episode metadata (expected state)
Path B: Pipeline artifacts (observed state)

Your domain not listed? Kusuka works anywhere you have two independent ways to measure the same system. Health, finance, agriculture, infrastructure, software pipelines — tell us what you're working with.

Case Studies

14 Strand runs. 9 domains. Every run found something.

These aren't demos — they're real Kusuka runs against production systems and research pipelines. Each one found issues that would have gone undetected by traditional monitoring.

Quantitative Finance

83.3% converged

Forex Trading Pipeline

6 trading episodes validated across 7 pipeline layers — from raw data ingestion through signal generation, gate logic, order fill, position management, exit, and outcome recording.

EUR/USD =======

GBP/USD =======

USD/JPY =======

AUD/JPY =======

XAU/USD =======

USD/KES XXXXX..

What Kusuka found: 5 of 6 episodes fully converged. One row — USD/KES — showed row-localised failure across all layers. Root cause: an orphan episode from a deprecated code path (pre-IG Markets migration). The asset wasn't in the active instrument list, so it had no composite signal, no gate logic, and no IG-specific fields. Traditional monitoring would have missed it entirely — the pipeline didn't error, it just silently carried a dead position.

2.236 Frobenius norm

0.345 Normalised score

30 Cells evaluated

25/30 Converged

AI Infrastructure

45.6% converged

Graph Neural Network Knowledge Pipeline

100 lessons in a GNN-powered knowledge graph validated across 7 layers — from raw lesson text through structural embedding, GCN propagation, semantic grounding, cross-domain linking, human-applied confirmation, and final retrieval weight.

What Kusuka found: Four entire columns showed block-null patterns. L1 (structural embedding) and L6 (applied confirmation) were 100% null — meaning the GNN had structural features and application tracking that existed in schema but had never been populated with real data. L2 (GCN propagation) was 93% null, L5 (grounded cross-links) 86% null. The pipeline looked healthy from the outside — lessons were stored, retrieved, and used. But Kusuka revealed that 4 of 7 layers were essentially hollow. One orphan row (a sensor-mesh lesson) was also missing L3 and L4 entirely.

19.52 Frobenius norm

0.738 Normalised score

700 Cells evaluated

block_null Gap class

Machine Translation

0.317 normalised score

Translation Quality — Back-translation Strand

Neural machine translation output for 4 African languages (Kikuyu, Luo, Swahili, Gusii) validated by back-translating to English and measuring semantic round-trip fidelity.

What Kusuka found: Gusii (guz) showed a complete block-null row — the language isn't in the NLLB-200 model at all, so back-translation was impossible. This wasn't an error anyone would see in logs; the system simply had no path for that language. For the remaining languages, Kusuka found column-localised divergence on the Jaccard similarity metric — because Jaccard penalises paraphrase (a correct translation that uses different words scores low). This led to a new "Path Gamma" using character n-gram language models, which confirmed that 9 of 30 flagged rows were actually paraphrases (correct), not failures. Resource ordering confirmed: Swahili strongest, then Kikuyu, then Luo.

4 Languages tested

1 Block-null language

9/30 False positives caught

v1.3 Spec upgrade triggered

Prediction Markets

59.8% converged

Market Prediction Pipeline

127 prediction markets with 235 price snapshots and 601 price points, validated across a 5-layer pipeline from market creation through price capture, signal generation, episode tracking, and outcome resolution.

What Kusuka found: 59.8% convergence with the dominant glyph being "=" — meaning the majority of the pipeline is healthy, but significant gaps exist in the signal and tracking layers. The Frobenius norm of 6.856 (normalised 0.626) shows meaningful structural disagreement — not noise, but real pipeline gaps where market data exists but downstream processing hasn't kept pace.

6.856 Frobenius norm

0.626 Normalised score

127 Markets evaluated

= Dominant glyph

Community Health

784K rows evaluated

CHW Performance Pipeline — Aggregation vs BI Layer

A community health data warehouse with 784K monthly performance records across 4 counties, validated across 4 KPIs by comparing the aggregation layer against the filtered BI layer that feeds dashboards used by 600+ field staff.

household_visits =====XX

children_assessed =====X.

iccm_assessments ======.

registered_hh ooooo..

What Kusuka found: The BI layer applies a 3-month rolling window filter. Within that window, all 41,264 rows converged at 100% — zero mismatches on any KPI. But the full outer join revealed 742K historical rows in the aggregation layer that don't exist in the BI layer. The registered_households column showed 312K one-sided nulls (o glyphs) — data present upstream but absent downstream. This isn't a bug; it's a design choice. But without Kusuka, nobody had quantified how much data the filter drops (95%). In a pipeline serving 600+ users, knowing the shape of what you're NOT showing is as important as verifying what you are.

784K Rows evaluated

100% In-window convergence

95% Historical rows filtered

dbt Package: dbt_kusuka

Public Health dbt Pipeline

55M+ records tested

dbt_kusuka Proof Test — 4 Strand Tests on a Production Warehouse

A ClickHouse-backed dbt pipeline serving 600+ dashboard users across 5 regions. 4 independent Strand tests run against production data: metrics-vs-fact reconciliation, staging-to-intermediate preservation, entity dedup boundary, and 9-month temporal drift analysis.

children_assessed =====

treated_visits >>>~~

iccm_visits XXXXX

referred_visits XXXXX

What Kusuka found across 4 tests:

Test 1 — Metrics vs Fact (35% converged): The pre-aggregated metrics layer and the underlying fact table agreed perfectly on children_assessed but diverged on 3 other KPIs. iccm_visits diverged because the metrics layer applies a two-gate filter (sick child + non-empty diagnoses) before counting, while a naive recount from the fact table skips the first gate. referred_visits diverged 3–10x because the metrics layer counts process-step flags within the sick-child cohort, while the fact table's referral flag captures all referrals including non-ICCM pathways. Neither layer is wrong — they measure different populations. But without Strand testing, the gap was invisible.

Test 2 — Staging to Intermediate (100% converged): 55M+ data records across 5 regions, zero loss. Every staging table matched its intermediate counterpart exactly. Clean baseline — now a regression guard.

Test 3 — Entity Boundary (60% converged): One region showed SMAPE 1.57 — 1.3M contacts in staging vs 157K in intermediate. Root cause: the staging table contains a combined dataset that the intermediate layer splits by sub-region. The 1.1M "missing" records aren't lost — they're reclassified. A new engineer would see 88% record loss. Kusuka shows it's a design boundary, not a bug.

Test 4 — Temporal Drift (stable oscillation): Frobenius normalised tracked across 9 months showed no monotonic trend (range 0.42–0.49, alternating converging/diverging). The gap is structural — baked into model definitions, not data quality degradation. If it starts trending upward, that's the signal.

35% Metrics vs Fact

100% Staging preservation

0.676 Normalised score

stable 9-month drift signal

Two paths. One truth.
The gap is the insight.

Kusuka Strand Run

Convergence Matrix

Gaps found

Road incident detection

Cold front detection

Earthquake localisation

Forex pipeline validation

Forex Trading Pipeline

Graph Neural Network Knowledge Pipeline

Translation Quality — Back-translation Strand

Market Prediction Pipeline

CHW Performance Pipeline — Aggregation vs BI Layer

dbt_kusuka Proof Test — 4 Strand Tests on a Production Warehouse

30-Day Pilot

Kusuka Integration

Embedded License

Your data already has the answer.
Let Kusuka weave it together.

Two paths. One truth.The gap is the insight.

Kusuka Strand Run

Convergence Matrix

Gaps found

Road incident detection

Cold front detection

Earthquake localisation

Forex pipeline validation

Forex Trading Pipeline

Graph Neural Network Knowledge Pipeline

Translation Quality — Back-translation Strand

Market Prediction Pipeline

CHW Performance Pipeline — Aggregation vs BI Layer

dbt_kusuka Proof Test — 4 Strand Tests on a Production Warehouse

30-Day Pilot

Kusuka Integration

Embedded License

Your data already has the answer.Let Kusuka weave it together.

Two paths. One truth.
The gap is the insight.

Your data already has the answer.
Let Kusuka weave it together.