Data Convergence Engine

kusuka — to weave [Swahili]

Two paths. One truth.
The gap is the insight.

You have multiple data sources measuring the same thing. Kusuka weaves them together and shows you exactly where the threads don't meet — where the gap is, why it exists, what kind of fix it needs, and how far it's spread.

Try Kusuka See the proof Mwenendo — track drift

Try Kusuka

Two files. One truth. See where the threads don't meet.

Upload two spreadsheets to reconcile, paste JSON for a quick comparison, or build a full position matrix. Everything runs in your browser — no data leaves your machine.

Kusuka Strand Run

Preset:

File A (Source of Truth)

Drop CSV here or browse

File B (Compare Against)

Drop CSV here or browse

Path A

Path B

Positions (JSON array)

Convergence Matrix

Why Kusuka

Not better anomaly detection. A different question entirely.

Every monitoring tool asks "does this value look wrong?" Kusuka asks "do two independent paths agree?" That's a fundamentally harder question to fool.

Traditional monitoring

Needs historical data to learn "normal"
Tells you something is wrong
Gives you a health score or pass/fail
You diagnose after detection
Vertical-specific — different tool per domain
Cold start problem — weeks before useful

Kusuka

Needs two paths. Day one value. No training period
Tells you what KIND of wrong
Gives you a position-level convergence matrix
The shape of the gap IS the diagnosis
Domain-agnostic — same engine, any pipeline
First run finds things

How Kusuka Works

Connect. Weave. Read the matrix.

Kusuka runs a Strand — a structured comparison of two data paths at every position in your pipeline. The output is a convergence matrix you can read like a scan.

01 — Connect

Define your two paths

Any two independent measurements of the same system. Source A vs Source B. Model output vs ground truth. This month vs last month. You choose the paths.

02 — Weave

Kusuka computes the matrix

Every position — each row (entity) by each column (pipeline layer) — gets a convergence glyph. Agreement, approximation, drift, failure, or missing data. Nothing hides.

03 — Classify

Gaps name themselves

Block of nulls = source problem. Systematic column = volume issue. Gradual drift = formula error. Row-localised = orphan record. The shape tells you what to fix and who should fix it.

04 — Track

Watch convergence over time

Run Kusuka regularly. The change in convergence between runs tells you whether your pipeline is healing or degrading — before downstream users notice.

Example: a Kusuka run across 5 positions and 6 pipeline layers

Source

Ingest

Transform

Model

Output

Report

Position A

Position B

Position C

Position D

Position E

= Converged ~ Approximate > Path A exceeds B < Path B exceeds A X Diverged . Missing data o Pending

Reading the matrix: Position B shows drift starting at the Transform layer — a formula or logic issue that propagates downstream. Position C is entirely null — a source problem, the data never arrived. Position D converges through Transform but fails at Model — something broke in that specific layer. You don't need to investigate to know what kind of problem each is. The pattern tells you.

What Kusuka Gives You

Not just detection. Diagnosis.

◉

WHERE

The exact position and layer where the gap first appeared. Not "something is off" — "it started here, at this step, on this entity."

◆

WHY

Gap classification. Block null = missing source. Systematic column = volume problem. Drift = formula error. The shape of disagreement IS the diagnosis.

■

HOW FAR

Ripple tracking. A gap at layer 3 propagates to layers 4, 5, 6. Kusuka shows the full chain — so you fix the root, not the symptoms.

○

WHAT KIND

Every gap gets a class. Each class maps to a team and a fix type. Source team for nulls. Engineering for drift. Data for volume. No ambiguity about ownership.

∆

IS IT HEALING

Run Kusuka over time. The change in convergence between runs — positive, negative, or flat — tells you if your fixes are working before anyone downstream notices.

Validated

Same engine. Different domains. Proven results.

We validate Kusuka against real problems across unrelated domains. Each study below uses real data and is fully interactive. The engine doesn't change — the domains do.

Transport & Traffic

Road incident detection

36-edge traffic network simulated with SUMO. Blocked a road at minute 20. Kusuka identified the exact segment, separated sensor noise from real congestion, and tracked the wave across neighbouring roads.

Path A: Ground truth vehicle counts
Path B: Sensor network (70% coverage, 15% noise, 5% dropout)

Explore the study →

Weather & Climate

Cold front detection

72 hours of temperature data across 24 Kenyan weather stations. When a cold front arrived, satellite estimates lagged behind ground readings. Kusuka showed exactly which stations were in the anomaly zone.

Path A: Ground weather stations
Path B: Satellite-derived temperature estimates

Explore the study →

Seismic & Geothermal

Earthquake localisation

120-minute window across the Rift Valley. M4.2 earthquake at minute 45, aftershock at minute 75. Kusuka tracked wave propagation, found monitoring blind spots, and separated instrument noise from real ground motion.

Path A: Primary seismometers (precise, 60% coverage)
Path B: Secondary accelerometers (noisy, 90% coverage)

Explore the study →

Quantitative Finance

Forex pipeline validation

6 trading episodes across 7 pipeline layers (data, signal, gate, fill, position, exit, outcome). Kusuka found 83% convergence — and one orphan row from a deprecated code path that would have gone unnoticed.

Path A: Episode metadata (expected state)
Path B: Pipeline artifacts (observed state)

Your domain not listed? Kusuka works anywhere you have two independent ways to measure the same system. Health, finance, agriculture, infrastructure, software pipelines — tell us what you're working with.

Case Studies

13 Strand runs. 9 domains. Every run found something.

These aren't demos — they're real Kusuka runs against production systems and research pipelines. Each one found issues that would have gone undetected by traditional monitoring.

Quantitative Finance

83.3% converged

Forex Trading Pipeline

6 trading episodes validated across 7 pipeline layers — from raw data ingestion through signal generation, gate logic, order fill, position management, exit, and outcome recording.

EUR/USD =======

GBP/USD =======

USD/JPY =======

AUD/JPY =======

XAU/USD =======

USD/KES XXXXX..

What Kusuka found: 5 of 6 episodes fully converged. One row — USD/KES — showed row-localised failure across all layers. Root cause: an orphan episode from a deprecated code path (pre-IG Markets migration). The asset wasn't in the active instrument list, so it had no composite signal, no gate logic, and no IG-specific fields. Traditional monitoring would have missed it entirely — the pipeline didn't error, it just silently carried a dead position.

2.236 Frobenius norm

0.345 Normalised score

30 Cells evaluated

25/30 Converged

AI Infrastructure

45.6% converged

Graph Neural Network Knowledge Pipeline

100 lessons in a GNN-powered knowledge graph validated across 7 layers — from raw lesson text through structural embedding, GCN propagation, semantic grounding, cross-domain linking, human-applied confirmation, and final retrieval weight.

What Kusuka found: Four entire columns showed block-null patterns. L1 (structural embedding) and L6 (applied confirmation) were 100% null — meaning the GNN had structural features and application tracking that existed in schema but had never been populated with real data. L2 (GCN propagation) was 93% null, L5 (grounded cross-links) 86% null. The pipeline looked healthy from the outside — lessons were stored, retrieved, and used. But Kusuka revealed that 4 of 7 layers were essentially hollow. One orphan row (a sensor-mesh lesson) was also missing L3 and L4 entirely.

19.52 Frobenius norm

0.738 Normalised score

700 Cells evaluated

block_null Gap class

Machine Translation

0.317 normalised score

Translation Quality — Back-translation Strand

Neural machine translation output for 4 African languages (Kikuyu, Luo, Swahili, Gusii) validated by back-translating to English and measuring semantic round-trip fidelity.

What Kusuka found: Gusii (guz) showed a complete block-null row — the language isn't in the NLLB-200 model at all, so back-translation was impossible. This wasn't an error anyone would see in logs; the system simply had no path for that language. For the remaining languages, Kusuka found column-localised divergence on the Jaccard similarity metric — because Jaccard penalises paraphrase (a correct translation that uses different words scores low). This led to a new "Path Gamma" using character n-gram language models, which confirmed that 9 of 30 flagged rows were actually paraphrases (correct), not failures. Resource ordering confirmed: Swahili strongest, then Kikuyu, then Luo.

4 Languages tested

1 Block-null language

9/30 False positives caught

v1.3 Spec upgrade triggered

Prediction Markets

59.8% converged

Market Prediction Pipeline

127 prediction markets with 235 price snapshots and 601 price points, validated across a 5-layer pipeline from market creation through price capture, signal generation, episode tracking, and outcome resolution.

What Kusuka found: 59.8% convergence with the dominant glyph being "=" — meaning the majority of the pipeline is healthy, but significant gaps exist in the signal and tracking layers. The Frobenius norm of 6.856 (normalised 0.626) shows meaningful structural disagreement — not noise, but real pipeline gaps where market data exists but downstream processing hasn't kept pace.

6.856 Frobenius norm

0.626 Normalised score

127 Markets evaluated

= Dominant glyph

Community Health

784K rows evaluated

CHW Performance Pipeline — Aggregation vs BI Layer

A community health data warehouse with 784K monthly performance records across 4 counties, validated across 4 KPIs by comparing the aggregation layer against the filtered BI layer that feeds dashboards used by 600+ field staff.

household_visits =====XX

children_assessed =====X.

iccm_assessments ======.

registered_hh ooooo..

What Kusuka found: The BI layer applies a 3-month rolling window filter. Within that window, all 41,264 rows converged at 100% — zero mismatches on any KPI. But the full outer join revealed 742K historical rows in the aggregation layer that don't exist in the BI layer. The registered_households column showed 312K one-sided nulls (o glyphs) — data present upstream but absent downstream. This isn't a bug; it's a design choice. But without Kusuka, nobody had quantified how much data the filter drops (95%). In a pipeline serving 600+ users, knowing the shape of what you're NOT showing is as important as verifying what you are.

784K Rows evaluated

100% In-window convergence

95% Historical rows filtered

dbt Package: dbt_kusuka

Strand runs across 9 domains in production

Finance, AI, Translation, Politics, Prediction Markets, Community Health, Agentic Systems, Corpus QA, dbt Pipelines

Every single run found something traditional monitoring missed.

For Data Engineers

Pure SQL. No API. Just dbt deps.

dbt_kusuka brings Strand convergence into your existing dbt project. Compare any two models — or any two raw SQL queries — column-by-column. Get the glyph matrix, SMAPE scores, gap classifications, temporal drift detection, and summary stats. All in SQL, all in your warehouse. No external calls. No dependencies.

Install

# packages.yml
packages:
  - git: "https://github.com/achillesheel02/dbt-kusuka"
    revision: v0.1.0
$ dbt deps

Use — with models

-- your_strand_model.sql
{{ dbt_kusuka.strand_verify(
   relation_a=ref('expected'),
   relation_b=ref('observed'),
   join_key='id',
   compare_columns=['rev', 'users']
) }}

-- ad-hoc: raw SQL, no models needed
{{ dbt_kusuka.strand_verify_query(
   sql_a="SELECT region, sum(rev)...",
   sql_b="SELECT region, sum(rev)...",
   join_key='region',
   compare_columns=['rev']
) }}

Macros

verify, verify_query, temporal, smape, glyph, summary, report, cast

Generic Tests

converge, no_nulls, no_diverged, no_drift

784K

Rows Tested

Production ClickHouse pipeline

Dependencies

Pure SQL. Works on any warehouse.

Works with PostgreSQL, ClickHouse, BigQuery, Snowflake, DuckDB, and Redshift. Cross-database type casting handled automatically. Strand Spec v1.1 thresholds configurable via dbt vars.

Ship With Confidence

Quality gates. Your pipeline doesn't ship unless convergence holds.

Kusuka isn't just diagnostic — it's a gate. Set a convergence threshold. If two independent paths don't agree above that threshold, the pipeline blocks. No diverged data reaches production. No silent drift compounds overnight.

☑

dbt Test Gate

kusuka_converge — your dbt build fails if convergence drops below threshold. Ships as a generic test. One line in your schema.yml.

models:
  - name: revenue
    tests:
      - dbt_kusuka.kusuka_converge:
          threshold: 0.90

⚠

CI/CD Gate

Run a Strand comparison on PR data vs main branch data. If any column diverges (X glyph), the merge blocks. Catch schema drift, formula errors, and data source changes before they land.

# github action step
- run: dbt test
   --select tag:kusuka_gate
# blocks merge on failure

⚫

Temporal Drift Gate

kusuka_no_drift + strand_temporal — track convergence across time periods. Not just "is it wrong now" but "is it getting worse?" dΞ/dt is the derivative of your pipeline health.

# temporal drift detection
Jan: 94.2% → Feb: 93.8% stable
Feb: 93.8% → Mar: 81.1% diverging
→ kusuka_no_drift: FAIL

Strand Enrichments

Beyond numbers. What the words and patterns say.

Kusuka compares data paths. But data isn't always numbers — sometimes the gap between what someone says and what they do is the most important signal. These enrichments extend Strand into new territory.

Language Analysis

Ξ_L

Semantic surface vs action substrate

Path A: what they say. Path B: what they do. Ξ_L measures the gap between language and action — across political actors, corporate communications, influencer content, or any domain where words and deeds should align.

◆ Contradiction detection (say X, do Y)

◆ Silence analysis (what's NOT being said)

◆ Temporal drift (language shifts before action shifts)

◆ Coordination patterns (multiple actors, same script)

Available in Strand

Influence Detection

Kioo

kioo — mirror [Swahili]

Is this person's impact real? Kioo measures the gap between claimed influence and actual outcomes — engagement that converts vs engagement that performs. Multi-dimensional convergence: the more dimensions that align, the stronger the signal. A mirror, not a weapon.

◆ Audience resonance patterns (real vs manufactured)

◆ Influence trajectory over time (via Mwenendo)

◆ Cross-platform consistency (same person, different channels)

◆ Brand safety scoring (authentic communication vs capture)

Coming soon

Work With Us

From pilot to production

We don't demo with fake data. We start with your actual sources, run a real Kusuka analysis, and show you what your system is missing. If it works — and it will — we build from there.

Start Here

30-Day Pilot

Give us access to your two data sources. We run Kusuka against your real system and deliver a diagnostic report showing what we found.

Connect your existing data sources
Full convergence matrix report
Gap classification and ownership mapping
Ripple analysis — how far gaps propagate
Actionable fix recommendations

Scale

Kusuka Integration

Continuous convergence monitoring. Kusuka runs against your live data and alerts you when the gap between paths changes — before downstream users notice.

API integration with your pipelines
Scheduled Strand runs
Convergence dashboard with glyph matrix
Alerting on convergence drift
Monthly diagnostic reports

Enterprise

Embedded License

Run Kusuka inside your own platform. Self-hosted, your data never leaves your infrastructure. Full control.

On-premise or private cloud deployment
Full API access to the engine
Custom domain configuration
Training and support
Source-available for audit

Your data already has the answer.
Let Kusuka weave it together.

Start with a 30-day pilot. Real data, real results, real insight into what your system is missing.

Book a pilot

The Strand metaphor was born watching John Chore do a manual reconciliation. He called it reconciliation. But the pattern underneath — two strands, woven together, gaps visible at every position — that was DNA. Kusuka exists because the structure was always there. It just needed someone to stop and see it.

Two paths. One truth.The gap is the insight.

Kusuka Strand Run

Convergence Matrix

Gaps found

Road incident detection

Cold front detection

Earthquake localisation

Forex pipeline validation

Forex Trading Pipeline

Graph Neural Network Knowledge Pipeline

Translation Quality — Back-translation Strand

Market Prediction Pipeline

CHW Performance Pipeline — Aggregation vs BI Layer

30-Day Pilot

Kusuka Integration

Embedded License

Your data already has the answer.Let Kusuka weave it together.

Two paths. One truth.
The gap is the insight.

Your data already has the answer.
Let Kusuka weave it together.