FindPy is the analyst's assistant.
A swarm of specialized AI agents that autonomously plan and conduct OSINT investigations, with cryptographic provenance on every claim. Sovereign by default. Built for the Indian Air Force.
THE PROBLEM
Open-source intelligence has outgrown the analyst. A single geopolitical event spawns thousands of articles, hundreds of social posts in a dozen languages, and floods of imagery — much of it now AI-generated or part of coordinated influence operations. Traditional tools scrape and dashboard. They don't reason, they don't verify, and they leave the analyst alone with raw data and a deadline.
THE THESIS
FindPy decomposes an analyst's plain-language question into a directed graph of tasks dispatched to 11 specialized agents — web crawlers, social listeners, satellite STAC search, image forensics, deepfake jury, geolocator, source credibility, narrative genealogy, coordinated-behavior detection. They collaborate via a shared property graph, and the planner can re-plan when partial results land.
Every source is content-hashed and HMAC-signed at ingest. Every claim cites its sources. A single endpoint re-verifies every envelope and re-hashes every artifact byte — a defensible chain of custody before action.
WHAT MAKES IT DIFFERENT
- Agentic, not pipeline. Most OSINT tools are linear scrape→clean→dashboard. FindPy dispatches agents that read & write a shared evidence graph and the planner can re-plan after seeing partial results.
- Verifiable evidence. Every Source carries a content hash + HMAC signature over a canonical envelope. The audit endpoint re-verifies all signatures and re-hashes all bytes — defensible up the chain of command.
- Real forensic jury. Five real algorithms vote with explainable rationale: ELA, JPEG-ghost recompression, NOAA sun-position shadow physics, GAN-fingerprint heuristic, and amplification-pattern analysis (the OSINT-grade signal that catches influence ops even when pixels look clean).
- Sovereign by default. Hot-swappable LLM layer (Ollama / vLLM / mock). Air-gap mode switch disables all outbound network. No hosted API in the demo path.
- IAF-vertical. Aerial-domain gazetteer, Sentinel-2 STAC change-detection wired against Copernicus, demo scenarios built around airbase change-detection rather than celebrity news.
HOW IT WORKS
- Analyst types a question in plain English.
- Planner LLM decomposes it into a DAG of tasks.
- Agents fan out, ingest sources, sign every artifact at ingest.
- Image agents extract pHash + EXIF; sat-imagery agent calls STAC.
- Credibility scorer rates every source on four factors.
- Deepfake jury votes on every image; CIB agent looks for clusters.
- Synthesizer writes a brief — every claim ends with an evidence ID.
- Audit endpoint can re-verify every signature on demand.
WHO IT IS FOR
THE STACK
- LLM ............. Ollama / vLLM / OpenAI-compatible / mock
- Reasoning ....... Qwen2.5-72B (production), qwen2.5:7b (dev)
- Embeddings ...... hashing-trick (dev) → BGE-M3 (production)
- Evidence graph .. SQLite (dev) → Neo4j (production)
- Backend ......... FastAPI + WebSocket pub/sub
- Frontend ........ Next.js 14 + Tailwind + React Flow + Leaflet
- Sat imagery ..... Sentinel-2 via Copernicus Earth Search STAC
- Forensics ....... pure PIL + numpy (no model weights required)
- Signing ......... HMAC-SHA256 (dev) → Ed25519 (production)
STATUS
v0.1 — prototype. 11 agents working end-to-end. Real forensic jury. Real Sentinel-2 STAC discovery. Real evidence-graph audit. 23/23 tests passing. Frontend live, backend on Fly.io Mumbai region. Designed for an ADITI 4.0 / iDEX submission to the Indian Air Force's Problem Statement 18.