EMPHOS Labs — v0.1.1 — Active Research

The research engine
that proves the protocols.

EMPHOS Labs is a standalone multi-protocol LLM inference research platform. It does not ship products. It produces proof. Every number published under the EMPHOS name — every token saved, every latency improvement, every universal anchor discovered — was measured here, on real hardware, under controlled conditions, with full per-inference telemetry.

26,843Total Observations
544.9MBResearch Database
426APEX Genomes Bred
55Universal Anchors

"The model cannot be changed. Everything around it can be."

Victor Jacob Brodeur — EMPHOS Labs Thesis

Modern LLM inference is structurally inefficient. Ninety-one percent of input tokens are overhead resent identically on every query. Output models generate fourteen to eighty times more tokens than the user can consume. The waste is at the boundary — not inside the model. EMPHOS Labs operates at that boundary. It intercepts the inference pipeline at four points and routes context through a bypass channel, achieving simultaneous input and output reduction without modifying model weights.

Not a Product — A Proof Machine

EMPHOS Labs does not ship to users. It runs on one machine — an RTX 4060 Laptop, i7-13700, 32GB RAM — and produces empirical evidence. The evidence becomes the basis for the protocols, the patents, and every performance claim EMPHOS makes publicly.

Standalone — Not Yet Integrated

Labs currently operates independently of all EMPHOS products. The coordinate data it produces informs HEINRICH's architecture. The protocol measurements it generates validate the AICL pipeline. But the systems are not yet connected. Integration is planned at Milestone 7.

The Latest Run

Run 2026-04-10-23.36.57 completed on April 12, 2026 after 36 hours 44 minutes. 957 queries × 5 models × 4 conditions = 19,140 inferences. Zero failures. AP condition achieved 30.5% input reduction and 38.3% output reduction vs raw baseline. 55 universal anchors confirmed.

The 8-Stage Pipeline

Every inference traverses eight stages.

Every query that enters EMPHOS Labs passes through the same eight stages — from classification to navigation. The pipeline is the organizing principle of the platform. Each stage lights up in real time as data flows through it. A new viewer understands the full system in twenty seconds without reading documentation.

01

CLASSIFY Operational

Detects topic (10 types), tone (7 types), and signals (urgency, confidence, simplicity, emotional) from the incoming query. The classifier runs in under 300 microseconds and determines which protocol path the inference will follow.

02

ROUTE Operational

Three-gate decision: ambiguous → RAW; FTIP fires → PSIP+FTIP; default → PSIP. Sub-500μs. The compiler decides in under half a millisecond which combination of protocols to apply to this specific query.

03

CONTRACT Operational

Builds the system prompt with 10 topic contracts, FTIP compact variants, and 4 signal modifiers. Defines soft_target and hard_ceiling token limits before the model generates a single token. This is where the waste is removed before it happens.

04

PROBE Building

Converts the query into a mathematical vector using the model's own embedding table (VIMP — Vector Injection and Manipulation Protocol). The probe is a GPS coordinate for the query in the model's internal semantic space. Stage 4 is the gateway to VNAR research.

05

INFERENCE Operational

Runs the model. The critical capability: run_inference_raw() — a parallel inference path that captures the model's raw logit tensor at every generation step and exposes the embedding table. Normal inference returns text. Raw inference returns text plus every probability at every step. This enables Stages 6 and 8 to exist.

06

DETECT Building

Reads the echo — the full probability distribution output. WISP (Weight Inspection and Signature Protocol) extracts the resonance fingerprint. PRIM (Probability Monitoring Protocol) handles early-stop detection when semantic completeness is achieved. The Echo Chamber UI renders this as a live logit stream and 12-axis resonance spider plot.

07

DECODE Operational

Six-stage decode: strip state blocks, strip scaffolding, enforce token limits, extract facts and commitments, detect actions, classify tone. Recon-cue-aware — preserves code blocks, tables, and lists when the query calls for them. The response arrives clean.

08

NAVIGATE — VNAR Building

Vector Navigation and Routing. The convergence destination of the entire platform. Uses the probe library and echo dataset to map the model's internal geometry from outside its architecture — like sonar maps the ocean floor without entering it. Finds universal anchors. Computes trajectories. Routes inputs through known coordinates. The Geometry Navigator UI renders the coordinate map live.

The 8 Tabs

Eight views. One research instrument.

The platform opens to the VNAR Pipeline tab by default — the dashboard showing the pipeline, live metrics, and the anchor table. Every other tab is a specialized instrument. Start with Tabs 02, 03, and 08.

01

Construction AI

The question factory. Generates queries across CHAT, CODE, REASONING, COMPACT, ADVERSARIAL, and STRESS categories from a bank of 1,822 templates. Seeds every experiment.

02

Inference Engine

The main workhorse. Load a model, type a query, pick a condition (A / AP / APF / AUTO), run. Every inference writes a full observation to the database — 62 columns of telemetry per row.

03

Observation Layer

The lab notebook. Every recorded experiment, filterable by session, model, protocol, category, or condition. Click any row for full detail. Export to CSV or JSON for external analysis.

04

Weight Mapping

WISP weight signature visualization — probability distribution fingerprints across model × category cells. 30 cells in the latest run (5 models × 6 categories). Advanced research view.

05

APEX Evolution

Self-modifying protocol evolution. Population of 16 genomes, tournament selection (k=4), 15% mutation rate. Best genome in latest run: APEX-ca112751, fitness 0.5724, technical_depth weight 0.290.

06

Comparison

Cross-condition analysis. Delta table showing AP, APF, and AUTO vs baseline A across all models and categories. AP achieved 30.5% input reduction and 38.3% output reduction in the latest run. Export for patents and investor decks.

07

Definitive Loop

Two-model probing loops. Pits models against each other on seed topics to find attractors (stable convergence basins), voids (regions of disagreement), and boundaries. The mass-production tool for anchor candidates. 28 loop sessions recorded to date.

08

VNAR Pipeline

The dashboard. Default tab on open. Pipeline visualization, live metrics (observations, probes, echoes, anchors), and the coordinate system table. The view to show first — a new viewer understands the platform in twenty seconds.

The VNAR Research Program

Mapping the model's geometry from outside.

VNAR — Vector Navigation and Routing — is the convergence destination of the entire platform. The Sonar Principle governs it: send a known signal into the model, read the transformation when it exits, and derive the internal geometry from the difference. The model is a black box being mapped from outside — like sonar maps the ocean floor without entering the water.

Probe → Echo → Anchor

A probe is a query converted to a mathematical vector using the model's own embedding table. An echo is the complete model output — not just text, but every probability at every generation step. When the same probe through different model architectures produces echoes with the same canonical answer at near-100% confidence, that convergence point is a universal anchor.

A universality score of 1.0 means all 4 models agree independently. A basin depth of 1.0000 means each model assigns near-100% probability to that answer. Madrid, capital of Spain — universality 1.0, basin depth 1.0000 — is the most stable semantic coordinate yet found.

The Platonic Representation Hypothesis

Different AI models trained on different data with different architectures and different parameter counts — yet they converge on the same internal geometry for factual knowledge. Geography is the strongest evidence: every capital city query produced a universal anchor. The models have never communicated. They have never shared weights. But they agree on where "paris" lives in semantic space with 99.77% confidence.

EMPHOS Labs is the first platform to operationalize this hypothesis. The 55 confirmed anchors are the first empirical evidence that model-independent reference points exist.

The Definitive Loop (Tab 07) tested CHAT vs CODE and CODE vs REASONING on "universal knowledge" and found zero attractors. Abstract philosophical concepts are NOT universal anchors. Concrete factual completions — capitals, days of the week, code primitives, physical properties — are. This is a meaningful null result: it tells us exactly what kind of knowledge is anchored and what is not.
The Coordinate System

55 confirmed positions. 31 at universality = 1.0.

The P7 anchor search ran 53 completion-style seed queries across all 4 models. In 77.2 seconds it found 55 universal anchors — coordinates where independent model architectures converge on identical outputs. Confirmed by Llama 3.1 8B, CodeLlama 7B, Mistral 7B, and Phi-3 Mini 3.8B.

madridgeography1.0000
kittenvocabulary1.0000
tuesdaycalendar1.0000
dollarfactual1.0000
queuecode1.0000
romegeography0.9999
icescience0.9999
parisgeography0.9977
tokyogeography0.9990
downvocabulary0.9990
coldvocabulary0.9987
printcode0.9986
canberrageography0.9993
berlingeography0.9965
russiageography0.9983
sunscience0.9980
eastvocabulary0.9960
darkvocabulary0.9967
auscience0.9908
januarycalendar0.9972
ottawageography0.9956
jupiterscience0.9846
childrenvocabulary0.9933
ranvocabulary0.9957

Showing 24 of 31 anchors at universality = 1.0. Basin depth = mean top-1 probability across all 4 confirmed models. 24 additional anchors at universality = 0.75. Total: 55 confirmed coordinates.

Geography is the strongest anchor category — every capital city query produced a universal anchor. Arithmetic is the weakest — not because the models don't know the answers, but because CodeLlama's instruction-following format ("Sure! The answer is...") breaks canonical matching. The knowledge is present. The format diverges. Future work: strip chat prefixes before canonicalization to recover hidden anchors.
The Latest Run — April 10–12, 2026

19,140 inferences. 36 hours 44 minutes. Zero failures.

Headline Result

AP condition (AICL-PSIP): 30.5% input reduction and 38.3% output reduction vs raw baseline A across 19,140 inferences. Consistent across CHAT, CODE, REASONING, COMPACT, and MAMBA model types.

Best Category: STRESS

STRESS queries — the hardest, most adversarial prompts — saved an average of +46.9 tokens per inference under AICL. The harder the query, the more waste AICL eliminates. Code queries saved +30.4 tokens. Chat queries saved +20.5.

APEX Best Genome

Best genome: APEX-ca112751, fitness 0.5724. Highest signal weight: technical_depth (0.290), personal_context (0.246), list_request (0.222). AICL's largest gains come from technically deep queries — the CAMS Code use case exactly.

All 9 pipeline stages walked. 9/9 complete. 00_manifest.json, 01–09 JSON and MD reports generated. The Auto Pipeline ran start to finish without intervention across 36 hours 44 minutes — the longest single validated run in EMPHOS Labs history.
Convergence Milestones

Where the platform stands right now.

VNAR converges through seven milestones. Each milestone unlocks the next. The platform is currently at M4.

M1

Probe Library ✅

252 probe vectors constructed from model embedding tables. Inputs to all VNAR work.

M2

Echo Collection ✅

365 full echo datasets captured — complete probability distributions at every generation step. Detector Engine operational.

M3

First Anchor ✅

Coordinate system established. Universality confirmed across model architectures. Coordinate-based addressing is now possible.

M4

Coordinate System ✅ — Current

55 confirmed anchors, 31 at universality = 1.0. The coordinate system is real. Trajectory computation is now the frontier.

M5

First Trajectory

Weight matrix inversion at scale. Pre-weighted vector routing becomes possible. Input geometry determined before inference begins.

M6

First Routed Output

VNAR fully operational. Inference routed through predetermined coordinates in the model's semantic space.

M7

Integration

VNAR connected to Haven, CAMS Code, and HEINRICH. The research platform and the product stack converge. The convergence is complete.

Architecture & Hardware

Real hardware. Real numbers.

Research Hardware

All results produced on: NVIDIA GeForce RTX 4060 Laptop 8GB VRAM · Intel i7-13700 13th Gen · 32GB RAM. Sequential model loading enforced due to VRAM budget. The platform is designed to run on hardware engineers actually own — not cloud clusters.

The 4 Models

CHAT: Llama 3.1 8B Instruct Q4_K_M — general instruction following.
CODE: CodeLlama 7B Instruct Q4_K_M — programming specialist.
REASONING: Mistral 7B Instruct v0.2 Q4_K_M — analytical tasks.
COMPACT: Phi-3 Mini 4K Instruct Q4_K_M — small but capable.

The Database

SQLite. 544.9MB. 12 tables. Fully searchable. Every observation stores 62 columns — prompt text, token counts across 4 tokenizers, timing, protocol metadata, signal classifications, response analysis, APEX genome ID, and deltas vs baseline. Automatic backups with timestamps.

The 4 Test Conditions

A: Raw baseline — no protocol, no contract.
AP: AICL-PSIP — signal-aware contracts only.
APF: AICL-PSIP + FTIP — full pipeline.
AUTO: AICL auto-routing — system decides. Every run tests all 4 conditions in parallel.

The protocols were proven here.

Every AICL performance claim, every anchor coordinate, every token savings figure in every EMPHOS product traces back to a row in this database. The research is real. The hardware is real. The numbers are measured — not estimated.