AI Product

Building Sentinel: AI Reasoning for AML Alert Triage With a Full Audit Trail

How SEYSO Services built Sentinel — a production-grade AML triage layer that sits on top of any transaction monitoring system and cuts 40–60% of false-positive alerts with cited evidence and complete regulatory traceability.

11 min readBy SEYSO SERVICES INC

40–60%

FP auto-cleared

4–12s

Per-alert

100%

Audit-traceable

3

Modules · 1 pattern

Your TM system is 95% false positive. Your investigators know it.

A mid-size bank generates tens of thousands of AML alerts a year. Between 90% and 98% are false positives. Every one is manually reviewed by a trained investigator who pulls 90 days of transaction history, checks prior alerts on the customer, compares behaviour to peer baselines, reviews KYC and related-party data, and writes the disposition rationale. Four of those five steps are assembly — not judgment.

The Big-4 tuning engagement model has not moved this number in a decade. The bottleneck is not the rules engine; it's what happens after the rule fires.

What Sentinel is — and what it is not

Sentinel is the reasoning layer that sits on top of your existing transaction monitoring system. It does not replace Actimize, SAS, Verafin, Oracle FCCM, or your internal stack. For every alert your system raises, Sentinel:

  1. Assembles the 90-day transaction history, prior alerts, peer baselines and KYC into a single context blob.
  2. Runs a two-pass Claude analysis — first pass produces a structured analysis, second pass critiques it looking for missed red flags.
  3. Emits a Pydantic-validated disposition (clear · clear_with_note · escalate_to_l2 · escalate_to_sar · request_additional_info) with cited evidence.
  4. Logs every input, output and version stamp to the audit trail.

The investigator sees a prepared case, not a triage screen: a likelihood score with calibrated reasoning, red flags with cited transaction IDs, mitigating factors drawn from the customer's actual history, a recommended disposition, and specific questions to answer when judgment is required.

Three modules, one architectural pattern

Same reasoning pattern applied to three structurally different AML queues. The shared recipe: Pydantic-validated structured output, two-pass analyzer + critique, cited evidence on every claim, complete audit log.

01 · Triage

Heavy-context behavioral analysis of TM alerts. Pulls 90-day transaction history, peer-baseline statistics, prior alerts. Audit prefix AN_*.

02 · Watchlist

Identity-match + jurisdiction adjudication of fuzzy sanctions / PEP hits. Light context, hybrid model split (Haiku 4.5 first pass, Sonnet 4.5 critique) to keep cost low without sacrificing rigor on the calls that matter. Audit prefix WL_AN_*.

03 · SAR

Downstream of Triage. Consumes the upstream FinalAnalysisand drafts a FinCEN Form 111 narrative — fixed at 7 sections (5 W's + how + actions), every claim cited, both passes Sonnet for high-stakes regulated text. Audit prefix SAR_AN_*. Lineage from alert → triage → SAR is recorded explicitly.

The non-negotiables that shaped every decision

When we sat down to build this, we wrote down five things we wouldn't compromise on — and refused features that would have meant breaking them.

  • Every rationale cites specific transaction IDs or customer data points. No vague reasoning. This is what makes the output audit-defensible and is the #1 thing regulators and BSA officers will probe.
  • Structured output only. Every Claude call returns Pydantic-validated JSON. No free-text parsing downstream.
  • Two-pass reasoning. First pass analyzes, second pass critiques the first looking for missed red flags. Both are logged.
  • Complete audit trail. Every AI decision, the prompt version, the model version, the input context hash, and the human disposition are logged. A regulator should be able to reconstruct any decision months later.
  • No real PII, ever — for the demo. The sandbox runs on 100% synthetic data. Paid engagements move into your environment under your model risk controls.

Stack

Boring where it should be; sharp where it counts.

  • Data layer: DuckDB — single-file, embedded analytics. Perfect for demo portability and lifts cleanly to your existing data warehouse in a paid engagement.
  • Backend: Python 3.11+, FastAPI, Pydantic for schema validation.
  • Reasoning: Anthropic SDK — Claude Sonnet for both Triage passes and the SAR drafter; Haiku for the Watchlist first pass where cost matters and the second pass critiques anyway.
  • Frontend: React + Vite, Tailwind, Recharts, shadcn/ui — self-contained for the sales sandbox; replaceable by your investigator UI in paid engagements.
  • Audit: Per-analysis JSON records under audit/logs/. One record per decision, fully self-contained.

Three demo cases that prove the point

The sales demo walks through exactly three alerts. They are the product, not the marketing material.

Riverbend Hardware · score 12/100 · auto-clear

A structuring rule fires on a small-business owner with 24 months of consistent cash-deposit history. Sentinel clears it in 4 seconds with cited prior-period comparisons.

Apex Global Trading · score 90/100 · escalate to SAR

A $187,500 BVI wire in, $185,000 ACH to a related entity five hours later, shared UBO. Sentinel escalates with a complete evidence chain plus a FinCEN advisory citation, in ~12 seconds. The SAR drafter then takes the same FinalAnalysis and produces a seven-section Form 111 narrative.

Northgate Realty · score 48/100 · human review

Round-dollar wire activity that could be legitimate real estate closings — or something else. Sentinel flags it for review with specific questions the investigator should ask when contacting the relationship manager. This is the pattern: ambiguous cases keep the human firmly in the loop, with the evidence already in front of them.

What we'll do for your team

A two-week proof-of-value engagement, fixed fee. We run on your sanitized or synthetic data. Up to five typologies, up to 10,000 historical alerts. You leave with a working sandbox in your environment, a prompt library tuned to your taxonomy, model-risk documentation sized for your MRM team, and a board-packet executive summary.

For the full breakdown — modules, architecture diagram, screenshots from the live sandbox, and the time-savings math at three scale profiles — see the Sentinel product page.

Sit with us for thirty minutes.

We'll walk you through the three demo cases on live data, answer model-risk questions, and come back inside a week with a proof-of-value scope sized for your taxonomy.