How FindPy works.
Overview
An analyst's question is decomposed by a Planner LLM into a DAG of tasks. Each task is dispatched to a specialized Agent. Agents read and write a shared Evidence Graph. Every source is content-hashed and signed at ingest; every claim cites its sources. A Synthesizer rolls findings into an analyst brief — every claim ends with an evidence ID. A WebSocket hub streams progress to the dashboard live.
Architecture
Analyst UI (Next.js)
│ HTTP + WS
▼
FastAPI ── Orchestrator ──┬─► Planner agent
│
├─► Specialized agents (parallel)
│ web · news · telegram · image · c2pa
│ deepfake · geo · sat · credibility
│ narrative · cib
│
└─► Evidence graph (SQLite / Neo4j)
+ content-addressable artifact store
+ signed envelopesSee ARCHITECTURE.md for component diagrams and the agent contract.
The 11 agents
The forensic jury
The deepfake jury votes across five real algorithms with weighted aggregation and per-detector rationale. Confidence-gating prevents low-signal detectors from dominating; a single strong detector (≥ 0.80) is enough to flip a verdict — real OSINT triage behaviour.
| DETECTOR | w | SIGNAL |
|---|---|---|
| ELA | 0.15 | Error Level Analysis — re-save at known JPEG quality, diff against original; splices leave higher residual energy. |
| JPEG ghost | 0.15 | Per-region recompression-quality minima. Splices show inconsistent ghost minima across regions. |
| shadow physics | 0.20 | Measured shadow azimuth (Sobel gradients) vs NOAA solar position at claimed lat/lon/time. Confidence-gated. |
| GAN fingerprint | 0.20 | Horizontal/vertical Laplacian variance imbalance — proxy for real GAN-trace CNN. Drop-in slot for ONNX DFDC model. |
| amplification pattern | 0.30 | OSINT-grade signal: pHash-duplicate carriers + credibility profile; flags credible-origin → low-cred bloom. |
Hybrid verdict rule: synthetic if weighted score ≥ 0.55 OR any single detector ≥ 0.80 OR (≥ 2 detectors firing AND weighted ≥ 0.40); else suspect if ≥ 0.30 or any detector firing; else authentic.
Evidence graph
The graph is a property graph (SQLite in dev, Neo4j-ready in production):
Nodes: Investigation · Task · Source · Entity · Claim · Image · Finding Edges: SUPPORTS · CONTRADICTS · CITES · MENTIONS · DERIVED_FROM PRODUCED_BY · DEPENDS_ON · LOCATED_AT
Provenance
Every Source carries a SHA-256 content hash, a producer-agent label, a retrieval timestamp, and an HMAC-SHA256 signature over the canonical JSON envelope. The signing key is generated on first run atdata/.signing_key (0600). Production swap: Ed25519 — theSigner interface is identical.
# verify the entire investigation
curl https://api.findpy.com/api/audit/<investigation_id>
# expected:
{
"summary": { "sources": 11, "signatures_verified": 11, "artifact_hashes_ok": 11 },
"sources": [ { "id": "src_...", "signature_verified": true, ... }, ... ]
}API reference
Live OpenAPI spec: findpy-api.fly.dev/docs
Deployment
The recommended split:
- Frontend → Vercel (free hobby tier; native Next.js).
- Backend → Fly.io (free allowance covers the demo; supports WebSockets + persistent volume).
- Optional: Cloudflare Tunnel for showing a local backend with a public URL.
Full deploy guide: DEPLOY.md
Stack & swap-in
Every dev component has a labeled production swap-in. The agent contract does not change — only the layer underneath.
| layer | dev → production |
|---|---|
| LLM | Ollama qwen2.5:7b → Qwen2.5-72B on vLLM |
| evidence graph | SQLite → Neo4j cluster |
| embeddings | hashing-trick → BGE-M3 in Qdrant |
| deepfake CNNs | pure-PIL detectors → ONNX DFDC + AIGenImageDetector |
| signing | HMAC-SHA256 → Ed25519 |
| Telegram | demo corpus → Telethon multi-account rotation |
| sat imagery | STAC discovery → STAC + band download + NDBI delta |
| multi-tenancy | none → OIDC + RBAC + structured audit log |