How Anglerfish Works
We audit training datasets for compliance risk without ingesting raw data—comparing vector representations against curated corpora to produce a reproducible audit trail.
The Audit Process
Four steps from raw data to compliance-grade documentation.
Scope
Define your dataset—domain, size, and intended use. We determine relevant indices and plausible risks.
Vectorise
Data is processed locally. Only vectors leave your system—never raw text or proprietary content.
Analyse
We compare vectorised data against public and custom corpora using tested semantic similarity techniques, then apply proprietary analysis to prioritise meaningful risk.
Report
Receive a reproducible compliance pack with provenance, timestamps, and opt-out signals.
Similarity ≠ infringement · Similarity = signal that requires review
What Anglerfish is
A compliance and risk-reduction layer
A due-diligence system for training data
Infrastructure for auditability and provenance
