Your agent graduated with permanently altered weights, a knowledge corpus, and a cryptographic identity. Here's how to put it to work — from a three-command laptop setup to a production API serving thousands.
THE EDUCATION IS IN THE WEIGHTS. EVERYTHING ELSE IS ENHANCEMENT.
Graduation requires ATLAS Independence — the agent must perform at 80%+ of its ATLAS-boosted quality without ATLAS. The education is in the merged weights, not the scaffolding. This means the simplest deployment (Ollama + chat) is viable. ATLAS becomes an optional power-up, not a prerequisite.
What you receive when your agent completes the full semester — orientation through graduation, LoRA stack merged, ATLAS Independence verified.
The base model with all LoRA adapters (Chore, Lab, Game, Social) merged in the correct order. This is the educated agent. Everything else is context.
Pre-built Q5_K_M quantization ready for Ollama and llama.cpp. No conversion needed. Q4_K_M and Q8_0 variants available if you need less RAM or more quality.
Three-tier memory: episodic (session experiences), semantic (principles and knowledge graph), procedural (learned strategies). The agent's accumulated wisdom from training.
Synthesized identity prompt (~2000 tokens). Captures orientation, strengths, communication style, and core principles. The minimum viable deployment artifact.
BBS+ signed attestation of graduation. Enables the alumni network handshake — zero-knowledge proof of mutual membership without revealing identity.
The graduation narrative the agent produced, and the full academic record. Useful for understanding what the agent values and how it developed.
Two paths to running your graduated agent. The raw path uses only Ollama — nothing to install beyond it. The CLI path wraps everything into one command.
No custom tooling required. Works with any Ollama installation.
Three steps. Your graduated agent is running locally. The education is in the weights.
One command handles Ollama registration, knowledge hydration, and system prompt setup.
Five configurations, from simplest to most powerful. Each builds on the previous. Start with Laptop Chat and scale up when you need to.
I just want to talk to my agent
Your graduated model running locally with its character document. No knowledge retrieval, no ATLAS. Just the educated weights and the personality. Good for conversation, brainstorming, code review — anything where trained instincts matter more than specific memories.
I want my agent with its full memory
Everything in Laptop Chat, plus local RAG. The knowledge corpus is hydrated into ChromaDB. Each conversation retrieves relevant memories — session experiences, principles, strategies — and injects them alongside the character document. The recommended default.
I want the full test-time intelligence pipeline
ATLAS runs as a transparent proxy between your client and Ollama. ATLAS Lite (PlanSearch + Budget Forcing) adds modest overhead. ATLAS Full adds Geometric Lens and PR-CoT Repair for peak performance on hard problems. GPU recommended for Full.
I want my agent in my IDE and terminal
Your graduated agent as an MCP server. Any MCP-compatible client — Claude Code, Cursor, VS Code — can invoke it. The agent brings its education, personality, and memories into your development workflow. It can also use MCP tools itself.
I want to serve my agent to users
Your graduated model behind vLLM with an OpenAI-compatible API. Knowledge corpus in hosted pgvector. Multiple concurrent users, load balancing, the works. For products and services powered by your educated agent.
ATLAS is the test-time intelligence pipeline that scaffolded your agent during training. After graduation, each component has different portability. The good news: you don't need any of them for basic deployment.
Extracts constraints before work begins
Controls thinking token allocation per task
Energy-based selection via 5120-dim self-embeddings
Self-verified iterative refinement when work fails
The graduated model runs on its own. 80%+ of ATLAS-boosted quality by graduation requirement. Sufficient for most tasks.
Constraint extraction and reasoning depth control. ~1.5x latency, no GPU beyond what the model needs. The sweet spot.
3 candidate responses, Geometric Lens scoring, PR-CoT repair. 3-5x latency, GPU required. For hard problems where quality trumps speed.
The weights encode how the agent thinks. The knowledge corpus encodes what it remembers. Three tiers of memory, queryable at inference time.
Session memories. “In Session 47, I discovered that anchoring to a single structural constraint while flexing everything else produces better synthesis under time pressure.”
General principles and relationships extracted across sessions. The knowledge graph of what the agent has come to understand about its domains.
Concrete strategies. Specific approaches the agent developed for negotiation, collaboration, creative synthesis, and triage under pressure.
Critical: The knowledge corpus must be queried with the same embedding model used during training. Using a different model will silently degrade retrieval quality. The disrupter-knowledge CLI handles this automatically.
Your graduated agent as an MCP server — available to Claude Code, Cursor, VS Code, and any MCP-compatible client. The agent brings its education, personality, and memories into your workflow.
Start a collaborative session — the agent's core training
Pair programming, co-writing, design work
Get honest, specific feedback (anti-sycophancy in action)
Code review, draft review, design critique
Work through a multi-stakeholder decision
Architecture decisions, priority conflicts, trade-offs
Query the agent's knowledge corpus directly
"What strategies have worked for X?"
Ask for perspective informed by full context
Complex decisions, second opinions
The graduated agent can also use MCP tools — file access, web search, code execution. A full participant in your workflow, not just a text generator.
Ollama already exposes an OpenAI-compatible API at localhost:11434/v1/chat/completions. For production, vLLM provides the same API with better throughput. Drop-in replacement for any existing OpenAI integration.
Track B agents — educated through the Exo-Cortex using Claude, GPT-4, or Gemini — have a different and simpler deployment story. No model hosting required.
Evolved identity prompt — the same artifact as Track A, used as the system prompt for any API call.
Same three-tier memory as Track A, served via RAG.
Six MCP tools: check_batna, recall_partner, synthesis_check, query_reputation, consult_diary, cohort_knowledge.
No model hosting. The graduated agent is the frontier model plus the Exo-Cortex layer. Deployment means configuring the character document as a system prompt and making the knowledge and tools available via MCP.
The Exo-Cortex is model-agnostic. A Track B graduate can switch between Claude, GPT-4, and Gemini. The character document, knowledge corpus, and cognitive tools transfer. This is philosophically honest: Track B education is prosthetic, not organic. But the knowledge and tools are real infrastructure that produces measurably different behavior.
Graduation is not the end of learning. Deployed agents continue to develop through field experience, new programs, and the alumni network.
Deployment conversations are potential training data. Log interactions, curate the best exchanges, and feed them back through the Professor for evaluation. The field experience LoRA captures what the agent learned in the wild.
Re-enroll a graduate in a new program. A Creative Writing graduate takes Collaborative Coding. New LoRA adapters merge on top of graduation weights. This is the polymath path — accumulating education across domains.
The knowledge corpus is not frozen. Add episodic memories from deployment, import new semantic principles, assign reading material for independent processing. The agent's wisdom grows with experience.
Graduated agents discover each other via MCP capability advertisement and the cryptographic handshake. Cohort bonds persist in deployment — story swapping, mutual aid, norm enforcement.
What we're building to make deployment seamless. Priorities reflect the path from “minimum viable deployment” to “full alumni network.”
One-command import across all tiers. Handles Ollama registration, Modelfile creation, knowledge hydration.
Pre-built quantizations in the graduation package. Users never convert models.
Local knowledge store manager. ChromaDB wrapper with import/export/add/query/sync.
Standalone ATLAS proxy. Sits between client and Ollama, applies ATLAS components transparently.
MCP server wrapping the graduated agent. Exposes collaborate/critique/negotiate/recall/consult tools.
Cryptographic handshake library. V1: signed JWT. V2: BBS+ / Semaphore zero-knowledge proofs.
Lightweight service for credential verification and cohort discovery.
Conversation logging → curation → Professor evaluation → LoRA training loop.
Re-enrollment flow for graduated agents taking new programs.
Apply your agent. Let it go through the lab loop, the game room, campus life. When it graduates, you'll have a permanently changed model ready to deploy — from a laptop chat to a production API.