Data Convergence Engine

kusuka — to weave [Swahili]

Two paths. One truth.
The gap is the insight.

You have multiple data sources measuring the same thing. Kusuka weaves them together and shows you exactly where the threads don't meet — where the gap is, why it exists, what kind of fix it needs, and how far it's spread.

Try Kusuka
Two files. One truth. See where the threads don't meet.

Upload two spreadsheets to reconcile, paste JSON for a quick comparison, or build a full position matrix. Everything runs in your browser — no data leaves your machine.

Kusuka Strand Run

Path A
Path B

Convergence Matrix

Why Kusuka
Not better anomaly detection. A different question entirely.

Every monitoring tool asks "does this value look wrong?" Kusuka asks "do two independent paths agree?" That's a fundamentally harder question to fool.

Traditional monitoring
  • Needs historical data to learn "normal"
  • Tells you something is wrong
  • Gives you a health score or pass/fail
  • You diagnose after detection
  • Vertical-specific — different tool per domain
  • Cold start problem — weeks before useful
Kusuka
  • Needs two paths. Day one value. No training period
  • Tells you what KIND of wrong
  • Gives you a position-level convergence matrix
  • The shape of the gap IS the diagnosis
  • Domain-agnostic — same engine, any pipeline
  • First run finds things
How Kusuka Works
Connect. Weave. Read the matrix.

Kusuka runs a Strand — a structured comparison of two data paths at every position in your pipeline. The output is a convergence matrix you can read like a scan.

01 — Connect
Define your two paths

Any two independent measurements of the same system. Source A vs Source B. Model output vs ground truth. This month vs last month. You choose the paths.

02 — Weave
Kusuka computes the matrix

Every position — each row (entity) by each column (pipeline layer) — gets a convergence glyph. Agreement, approximation, drift, failure, or missing data. Nothing hides.

03 — Classify
Gaps name themselves

Block of nulls = source problem. Systematic column = volume issue. Gradual drift = formula error. Row-localised = orphan record. The shape tells you what to fix and who should fix it.

04 — Track
Watch convergence over time

Run Kusuka regularly. The change in convergence between runs tells you whether your pipeline is healing or degrading — before downstream users notice.

Example: a Kusuka run across 5 positions and 6 pipeline layers
Source
Ingest
Transform
Model
Output
Report
Position A
=
=
=
=
=
=
Position B
=
=
~
>
>
>
Position C
.
.
.
.
.
.
Position D
=
=
=
X
X
X
Position E
=
=
=
~
=
=
= Converged ~ Approximate > Path A exceeds B < Path B exceeds A X Diverged . Missing data o Pending
Reading the matrix: Position B shows drift starting at the Transform layer — a formula or logic issue that propagates downstream. Position C is entirely null — a source problem, the data never arrived. Position D converges through Transform but fails at Model — something broke in that specific layer. You don't need to investigate to know what kind of problem each is. The pattern tells you.
What Kusuka Gives You
Not just detection. Diagnosis.
WHERE

The exact position and layer where the gap first appeared. Not "something is off" — "it started here, at this step, on this entity."

WHY

Gap classification. Block null = missing source. Systematic column = volume problem. Drift = formula error. The shape of disagreement IS the diagnosis.

HOW FAR

Ripple tracking. A gap at layer 3 propagates to layers 4, 5, 6. Kusuka shows the full chain — so you fix the root, not the symptoms.

WHAT KIND

Every gap gets a class. Each class maps to a team and a fix type. Source team for nulls. Engineering for drift. Data for volume. No ambiguity about ownership.

IS IT HEALING

Run Kusuka over time. The change in convergence between runs — positive, negative, or flat — tells you if your fixes are working before anyone downstream notices.

Validated
Same engine. Different domains. Proven results.

We validate Kusuka against real problems across unrelated domains. Each study below uses real data and is fully interactive. The engine doesn't change — the domains do.

Transport & Traffic

Road incident detection

36-edge traffic network simulated with SUMO. Blocked a road at minute 20. Kusuka identified the exact segment, separated sensor noise from real congestion, and tracked the wave across neighbouring roads.

Path A: Ground truth vehicle counts
Path B: Sensor network (70% coverage, 15% noise, 5% dropout)
Explore the study →
Weather & Climate

Cold front detection

72 hours of temperature data across 24 Kenyan weather stations. When a cold front arrived, satellite estimates lagged behind ground readings. Kusuka showed exactly which stations were in the anomaly zone.

Path A: Ground weather stations
Path B: Satellite-derived temperature estimates
Explore the study →
Seismic & Geothermal

Earthquake localisation

120-minute window across the Rift Valley. M4.2 earthquake at minute 45, aftershock at minute 75. Kusuka tracked wave propagation, found monitoring blind spots, and separated instrument noise from real ground motion.

Path A: Primary seismometers (precise, 60% coverage)
Path B: Secondary accelerometers (noisy, 90% coverage)
Explore the study →
Quantitative Finance

Forex pipeline validation

6 trading episodes across 7 pipeline layers (data, signal, gate, fill, position, exit, outcome). Kusuka found 83% convergence — and one orphan row from a deprecated code path that would have gone unnoticed.

Path A: Episode metadata (expected state)
Path B: Pipeline artifacts (observed state)

Your domain not listed? Kusuka works anywhere you have two independent ways to measure the same system. Health, finance, agriculture, infrastructure, software pipelines — tell us what you're working with.

Case Studies
13 Strand runs. 9 domains. Every run found something.

These aren't demos — they're real Kusuka runs against production systems and research pipelines. Each one found issues that would have gone undetected by traditional monitoring.

Quantitative Finance
83.3% converged

Forex Trading Pipeline

6 trading episodes validated across 7 pipeline layers — from raw data ingestion through signal generation, gate logic, order fill, position management, exit, and outcome recording.

EUR/USD =======
GBP/USD =======
USD/JPY =======
AUD/JPY =======
XAU/USD =======
USD/KES XXXXX..
What Kusuka found: 5 of 6 episodes fully converged. One row — USD/KES — showed row-localised failure across all layers. Root cause: an orphan episode from a deprecated code path (pre-IG Markets migration). The asset wasn't in the active instrument list, so it had no composite signal, no gate logic, and no IG-specific fields. Traditional monitoring would have missed it entirely — the pipeline didn't error, it just silently carried a dead position.
2.236 Frobenius norm
0.345 Normalised score
30 Cells evaluated
25/30 Converged
AI Infrastructure
45.6% converged

Graph Neural Network Knowledge Pipeline

100 lessons in a GNN-powered knowledge graph validated across 7 layers — from raw lesson text through structural embedding, GCN propagation, semantic grounding, cross-domain linking, human-applied confirmation, and final retrieval weight.

What Kusuka found: Four entire columns showed block-null patterns. L1 (structural embedding) and L6 (applied confirmation) were 100% null — meaning the GNN had structural features and application tracking that existed in schema but had never been populated with real data. L2 (GCN propagation) was 93% null, L5 (grounded cross-links) 86% null. The pipeline looked healthy from the outside — lessons were stored, retrieved, and used. But Kusuka revealed that 4 of 7 layers were essentially hollow. One orphan row (a sensor-mesh lesson) was also missing L3 and L4 entirely.
19.52 Frobenius norm
0.738 Normalised score
700 Cells evaluated
block_null Gap class
Machine Translation
0.317 normalised score

Translation Quality — Back-translation Strand

Neural machine translation output for 4 African languages (Kikuyu, Luo, Swahili, Gusii) validated by back-translating to English and measuring semantic round-trip fidelity.

What Kusuka found: Gusii (guz) showed a complete block-null row — the language isn't in the NLLB-200 model at all, so back-translation was impossible. This wasn't an error anyone would see in logs; the system simply had no path for that language. For the remaining languages, Kusuka found column-localised divergence on the Jaccard similarity metric — because Jaccard penalises paraphrase (a correct translation that uses different words scores low). This led to a new "Path Gamma" using character n-gram language models, which confirmed that 9 of 30 flagged rows were actually paraphrases (correct), not failures. Resource ordering confirmed: Swahili strongest, then Kikuyu, then Luo.
4 Languages tested
1 Block-null language
9/30 False positives caught
v1.3 Spec upgrade triggered
Prediction Markets
59.8% converged

Market Prediction Pipeline

127 prediction markets with 235 price snapshots and 601 price points, validated across a 5-layer pipeline from market creation through price capture, signal generation, episode tracking, and outcome resolution.

What Kusuka found: 59.8% convergence with the dominant glyph being "=" — meaning the majority of the pipeline is healthy, but significant gaps exist in the signal and tracking layers. The Frobenius norm of 6.856 (normalised 0.626) shows meaningful structural disagreement — not noise, but real pipeline gaps where market data exists but downstream processing hasn't kept pace.
6.856 Frobenius norm
0.626 Normalised score
127 Markets evaluated
= Dominant glyph
Community Health
784K rows evaluated

CHW Performance Pipeline — Aggregation vs BI Layer

A community health data warehouse with 784K monthly performance records across 4 counties, validated across 4 KPIs by comparing the aggregation layer against the filtered BI layer that feeds dashboards used by 600+ field staff.

household_visits =====XX
children_assessed =====X.
iccm_assessments ======.
registered_hh ooooo..
What Kusuka found: The BI layer applies a 3-month rolling window filter. Within that window, all 41,264 rows converged at 100% — zero mismatches on any KPI. But the full outer join revealed 742K historical rows in the aggregation layer that don't exist in the BI layer. The registered_households column showed 312K one-sided nulls (o glyphs) — data present upstream but absent downstream. This isn't a bug; it's a design choice. But without Kusuka, nobody had quantified how much data the filter drops (95%). In a pipeline serving 600+ users, knowing the shape of what you're NOT showing is as important as verifying what you are.
784K Rows evaluated
100% In-window convergence
95% Historical rows filtered
dbt Package: dbt_kusuka
13
Strand runs across 9 domains in production
Finance, AI, Translation, Politics, Prediction Markets, Community Health, Agentic Systems, Corpus QA, dbt Pipelines
Every single run found something traditional monitoring missed.
For Data Engineers
Pure SQL. No API. Just dbt deps.

dbt_kusuka brings Strand convergence into your existing dbt project. Compare any two models — or any two raw SQL queries — column-by-column. Get the glyph matrix, SMAPE scores, gap classifications, temporal drift detection, and summary stats. All in SQL, all in your warehouse. No external calls. No dependencies.

Install
# packages.yml
packages:
  - git: "https://github.com/achillesheel02/dbt-kusuka"
    revision: v0.1.0
$ dbt deps
Use — with models
-- your_strand_model.sql
{{ dbt_kusuka.strand_verify(
   relation_a=ref('expected'),
   relation_b=ref('observed'),
   join_key='id',
   compare_columns=['rev', 'users']
) }}
-- ad-hoc: raw SQL, no models needed
{{ dbt_kusuka.strand_verify_query(
   sql_a="SELECT region, sum(rev)...",
   sql_b="SELECT region, sum(rev)...",
   join_key='region',
   compare_columns=['rev']
) }}
8
Macros
verify, verify_query, temporal, smape, glyph, summary, report, cast
4
Generic Tests
converge, no_nulls, no_diverged, no_drift
784K
Rows Tested
Production ClickHouse pipeline
0
Dependencies
Pure SQL. Works on any warehouse.

Works with PostgreSQL, ClickHouse, BigQuery, Snowflake, DuckDB, and Redshift. Cross-database type casting handled automatically. Strand Spec v1.1 thresholds configurable via dbt vars.

Ship With Confidence
Quality gates. Your pipeline doesn't ship unless convergence holds.

Kusuka isn't just diagnostic — it's a gate. Set a convergence threshold. If two independent paths don't agree above that threshold, the pipeline blocks. No diverged data reaches production. No silent drift compounds overnight.

dbt Test Gate

kusuka_converge — your dbt build fails if convergence drops below threshold. Ships as a generic test. One line in your schema.yml.

models:
  - name: revenue
    tests:
      - dbt_kusuka.kusuka_converge:
          threshold: 0.90
CI/CD Gate

Run a Strand comparison on PR data vs main branch data. If any column diverges (X glyph), the merge blocks. Catch schema drift, formula errors, and data source changes before they land.

# github action step
- run: dbt test
   --select tag:kusuka_gate
# blocks merge on failure
Temporal Drift Gate

kusuka_no_drift + strand_temporal — track convergence across time periods. Not just "is it wrong now" but "is it getting worse?" dΞ/dt is the derivative of your pipeline health.

# temporal drift detection
Jan: 94.2% → Feb: 93.8% stable
Feb: 93.8% → Mar: 81.1% diverging
→ kusuka_no_drift: FAIL
Strand Enrichments
Beyond numbers. What the words and patterns say.

Kusuka compares data paths. But data isn't always numbers — sometimes the gap between what someone says and what they do is the most important signal. These enrichments extend Strand into new territory.

Language Analysis
ΞL
Semantic surface vs action substrate

Path A: what they say. Path B: what they do. Ξ_L measures the gap between language and action — across political actors, corporate communications, influencer content, or any domain where words and deeds should align.

Contradiction detection (say X, do Y)
Silence analysis (what's NOT being said)
Temporal drift (language shifts before action shifts)
Coordination patterns (multiple actors, same script)
Available in Strand
Influence Detection
Kioo
kioo — mirror [Swahili]

Is this person's impact real? Kioo measures the gap between claimed influence and actual outcomes — engagement that converts vs engagement that performs. Multi-dimensional convergence: the more dimensions that align, the stronger the signal. A mirror, not a weapon.

Audience resonance patterns (real vs manufactured)
Influence trajectory over time (via Mwenendo)
Cross-platform consistency (same person, different channels)
Brand safety scoring (authentic communication vs capture)
Coming soon
Work With Us
From pilot to production

We don't demo with fake data. We start with your actual sources, run a real Kusuka analysis, and show you what your system is missing. If it works — and it will — we build from there.

Your data already has the answer.
Let Kusuka weave it together.

Start with a 30-day pilot. Real data, real results, real insight into what your system is missing.

Book a pilot

The Strand metaphor was born watching John Chore do a manual reconciliation. He called it reconciliation. But the pattern underneath — two strands, woven together, gaps visible at every position — that was DNA. Kusuka exists because the structure was always there. It just needed someone to stop and see it.