Governed RAG for Regulated Industries
Evidence-backed answers. Fail-closed refusal. On your hardware.
Every response scored for factual consistency. Query-time role-based access control. Hash-chained, tamper-evident audit trails. Two live deployments with quantitative evaluation results.
Built for industries where wrong answers have regulatory or safety consequences.
Demo credentials: demo.getkeystone.ai. Log in as operator1 / demo123. Try: "What atmospheric testing is required before entering a confined space?"
Safety-critical knowledge is scattered, unfindable, and unauditable
Hundreds of SOPs live across SharePoint folders, shared drives, and filing cabinets. When a regulator asks which version of a confined space entry procedure was in effect on the day of an incident, finding the right document takes hours. When an internal audit asks who accessed a specific safety procedure last Tuesday, the honest answer is often: we don't know.
Generic AI tools make this worse. They answer with confidence whether or not the evidence supports it, send your operational documents to third-party cloud providers, and produce no audit trail of what was accessed or by whom.
Keystone was built to address this.
What the system enforces
Keystone enforces nine architectural properties simultaneously. These are not prompt instructions or configuration options. They are structural constraints in the retrieval pipeline, database schema, and API layer.
Your documents. Your hardware. Cited answers and a full audit trail.
Closes the loop between written procedure and field reality
After every answer, workers can flag what was useful, what was missing, and what was wrong. Those signals feed a governed review process. When a procedure owner validates a change, the next person who asks the same question gets a better answer. Your procedures improve because your people use them.
Live demo at demo.getkeystone.ai (operator1 / demo123). Eval baseline (KDAT-001B, 2026-04-11): P@1=0.75, MRR=0.79. Adversarial ACL: 8/8 blocked, 0 leaks. Fail-closed: 5/6 (83%). Audit chain intact.
Quantitative evaluation baseline
All capability claims are backed by evaluation evidence. The current baseline (KDAT-001B, 2026-04-11) was run against a 53-document Alberta OHS safety corpus with 2,674 indexed chunks.
- Precision at 1 (P@1): 0.75
- Mean Reciprocal Rank (MRR): 0.79
- Corpus: 53 documents, 2,674 chunks
- Hybrid retrieval: pgvector + full-text search
- Adversarial ACL testing: 8/8 blocked, 0 leaks
- Fail-closed accuracy: 5/6 (83%)
- Audit chain: intact, immutable
- INSERT-only DB role enforced
- Enterprise HA or disaster recovery
- Multi-node or distributed deployment
- OIDC/SAML production identity integration
- Third-party penetration testing
- WCAG accessibility compliance
Built from operational and infrastructure experience
Arnaldo Sepulveda built Keystone AI after nearly 13 years delivering and supporting enterprise platforms at Genesys for regulated and public sector customers. Those environments required production reliability, security controls, and the ability to prove what happened, when, and who authorized it.
That operational background is why Keystone is built the way it is. Every design decision maps to a documented requirement. Every capability claim maps to a demonstrated proof artifact. No overclaims, no roadmap presented as current capability.
Based in New Brunswick, Canada.
Built with
Python, FastAPI, PostgreSQL 16 with pgvector, Ollama (nomic-embed-text, qwen2.5:7b-instruct), React/TypeScript/Tailwind, Docker Compose, Caddy, Cloudflare Tunnels. No cloud dependency for core operation.
Source and evaluation evidence: github.com/getkeystone