projects

Production AI I've shipped.

What I can show publicly: BIFROST, ORVIAN, and Polaris in production, plus open-source tools for policy, memory, evidence, and harness control.

email →cv →full-time, contract, or fractional · remote EU
system inventory
current work
3
public repos
5
focus
production ai
base
madrid

production systems

public, non-confidential detail
state
system
BIFROST
Financial document workflows
Document intelligence for the documents finance runs on: ingestion quality gates, semantic chunking, multimodal retrieval, pgvector/HNSW search, caching, source-quality summaries, analytics, and honest no-answer behavior.
ORVIAN
B2B collections
AI workflow runtime with protected multi-tenant APIs, context assembly, durable memory, deterministic/cached/full-LLM execution tiers, run events, idempotency, queue processing, and human-review metadata.
Polaris
Support, sales, and product
Internal AI assistant product integrating BIFROST retrieval with MONARCH guardrails, cached safety-to-retrieval handoff, citations, streaming UX, analytics, and suggestion revalidation.
state
system
gommage
Deterministic policy engine for AI coding agents: maps tool calls to capabilities, evaluates YAML rules, and signs every decision in a verifiable audit log, with hard-stops that policy can't bypass.
nahuali
Self-inspecting, auditable memory for AI agents: surfaces the evidence, provenance, and health behind each recall so callers can see which memory to trust, with an optional Ed25519-signed tamper-evident ledger. Local-first, Rust.
traceframe
Local-first trace recorder for AI agent runs: append-only, verifiable evidence of what the agent called, what it was allowed, and what failed, with hook ingestion for Codex/OMX harnesses.
vestig
Runtime-agnostic structured logging with automatic PII sanitization (GDPR/HIPAA/PCI-DSS) and native W3C tracing. Zero dependencies; runs on Node, Bun, Deno, Edge, and the browser.
greco
Research harness exploring whether a coding-agent harness can measurably improve itself through typed, layered modifications validated against operator-defined evals within strict budgets.