Your AI agent generated a pull request. Tests pass. CI is green. You merge it. The code runs perfectly - it just skips the fraud check before processing payments. Traditional monitoring sees success. You see a compliance violation six hours later.
Same trace. Two completely different interpretations.
Every span returned 200 OK. Traditional monitoring sees success. Story-based monitoring catches the sequence violation: fraud check skipped before payment processing. This is a silent failure - successful execution of the wrong behavior.
We built observability backward. We instrument code. Capture telemetry. Then try to figure out what it means. But we never wrote down what was supposed to happen in the first place.
You're reverse-engineering what the code should have done.
Intent is the starting point, not an afterthought.
Story-based monitoring flips the model. You write down what should happen before the agent runs. Then telemetry proves whether it did.
A single file that connects intent, implementation, and verification. It's what you write before your agent ships code. It's what telemetry gets compared against after.
story: "Process checkout with fraud check"
intent:
actor: "checkout-service"
goal: "Validate cart, check fraud, process payment, confirm order"
steps:
- name: "validate cart"
expects: "validate_cart span with items"
- name: "check fraud"
expects: "check_fraud span before process_payment"
# ^ This is the step traditional monitoring misses
- name: "process payment"
expects: "process_payment span with amount"
- name: "confirm order"
expects: "confirm_order span with order_id"
verification:
mode: "strict"
on_sequence_violation: "alert"You declare what the agent should do before it runs. This becomes ground truth - the source of "should" in your system.
Standard OpenTelemetry spans capture what actually happened: which operations ran, in what sequence, with what results.
Runtime telemetry is validated against the manifest. A visual storyboard shows what matched the intent - and what didn't.
Traditional logs repeat the same text millions of times. Event templates store the structure once and extract only the variables - cutting bytes and surfacing what matters.
Payment declined: insufficient funds. Customer cust_8x9k2m attempted $299.99 charge but account balance is $45.20. Retry recommended with alternate payment method.
Agents don't just write code - they ship it. They make autonomous decisions. And they can succeed at every step while completely missing the point. Traditional monitoring wasn't built for this.
An agent can complete every operation successfully and still violate the requirement. Status codes won't catch it.
The manifest becomes the source of "should" in your system. Not documentation. Not tribal knowledge. A versioned, durable artifact.
The storyboard doesn't just show what happened - it shows whether it matched the declared intent. Instantly.
No proprietary formats. No vendor lock-in. Standard OTEL spans, enriched with behavioral verification. Use your existing instrumentation.
Explore a real working example using Backlog.md - an open-source task manager. See how manifests validate behavior, catch silent failures, and generate storyboards in real-time.
No signup required. Fully interactive in your browser.