Work

FOOD SCIENCE × AI

Food Science AI Agent

AI agent system for food scientists: formula generation, ingredient modification, reverse engineering from nutrition labels. Built a context engineering observability system that lets AI debug and improve itself.

ROLE

Lead AI Engineer

YEAR

2023-2025

STACK

Next.js · Python · OpenAI Agents SDK

STATUS

In Production

THE PROBLEM

Reverse engineering a food formula from a nutrition label is a systematic process of narrowing an infinite solution space. Each iteration eliminates impossible combinations based on macros, nutrients, ingredient constraints, and category-specific rules.

Food scientists do this through a combination of linear programming, domain knowledge, and intuition built over years. The question: Can an AI agent learn this process well enough to do it reliably?

Even ChatGPT Pro ($200/month) gets confused in predictable ways. This wasn't a prompting problem. It was a context engineering problem.

THE EVOLUTION

This project evolved across two phases as AI capabilities expanded.

PHASE 1AI Workflows2023-2024

→Formula generation (e.g., low-sugar probiotic soda)
→Ingredient modification (e.g., make granola bar keto)
→30+ page research reports with two-model architecture
→RAG system for ingredient matching

PHASE 2Agentic System2024-2025

→20+ component agent architecture
→Context engineering observability harness
→Self-improving development loop with Claude Code

PHASE 1: AI WORKFLOWS

Two-Model Report Architecture

Built a system for 30+ page research reports (before deep research products existed). GPT-4 Mini acted as orchestrator and summarizer; Claude did the actual writing. Rolling summaries kept Claude on track across sections without repetition or context loss.

Domain Expertise Extraction

Worked directly with food scientists to deconstruct their reverse engineering process. Learned the math (linear programming, nutrient calculations), the heuristics, the category-specific rules. This became the foundation for Phase 2.

RAG for Ingredient Matching

Embedding-based retrieval system for matching ingredients to USDA database entries. Handles fuzzy matching, brand variations, and composite ingredients.

PHASE 2: THE AGENTIC SYSTEM

20+ Component Agent Architecture

Deconstructed the entire reverse-from-label process into discrete tools and agents. Math operations, nutrient validation, ingredient lookup, substitution checking. Each became a callable tool. The orchestrating agent dynamically adjusts its approach based on label complexity.

Context Engineering Observability

Standard LLM observability (Braintrust, Langsmith, Helicone) only answers "how did we ask?" They're prompt engineering tools. For complex agent systems, you need to answer "what did the LLM know?" What data reached each agent? Was it accurate? What's missing from the pipeline? I built a harness that provides observability over context itself, not just prompts.

Custom Eval Metrics

TACE (total absolute composition error) and MAPE metrics against golden datasets with known-good USDA formulas. Reward function: TACE under 10. This let me measure whether context improvements actually worked, and iterate systematically instead of guessing.

Self-Improving Development Loop

The breakthrough: Claude Code drives the eval harness. Agents report back why they failed. Claude reads the feedback, applies fixes, reruns the tests. The system literally improves itself. I could run this loop for hours, across multiple sessions using Agent Focus for context handoffs.

SELF-IMPROVING DEVELOPMENT LOOP

RUN EVAL

Execute test harness on golden datasets

→

COLLECT FEEDBACK

Agents report why they failed

→

ANALYZE

Claude reads failure patterns

→

FIX

Apply targeted improvements

→

REPEAT

Loop until metrics converge

THE RESULT

30→5

TACE SCORE

Average error reduction

20+

AGENT COMPONENTS

Tools, validators, orchestrators

∞

SELF-IMPROVING

Autonomous optimization loop

TACE scores dropped from 30-60 (unreliable) to 0-25 (production-ready, even on adversarial labels). The system is now in production, handling real reverse engineering requests.

WHY THIS MATTERS

The leverage hierarchy: Context Engineering > Prompt Engineering > Math Tweaks. For complex AI applications, you can't fix problems by editing prompts in a UI. You need observability over the entire context pipeline: what data reaches each agent, whether it's accurate, what's missing.

This system doesn't replace food scientists. It amplifies them. Domain experts can now iterate on formulations faster, validate nutrition data automatically, and explore possibilities that would take days to compute manually.

The self-improving development loop is the meta-innovation: AI that can debug and improve itself across multi-day development cycles, with humans setting goals rather than writing fixes.