05 · feedback · mixed

What happens after you act on a recommendation

A rec is written. You read it, act on it (or don't), and mark its disposition. From that point on, two things happen: the markdown file gets patched to reflect your action, and the system's overall action-rate updates. If action-rate falls below 30%, the system declares itself broken.

The full feedback loop

A. rec is created — markdown + DB row, status = open
operator picks act, snooze (≤7d), or dismiss
B. surface writes the disposition — UI button (Lovable / FastAPI) or markdown edit (learning)
C. rec store updated — status, acted_disposition, fit_1_5, snoozed_until
D. next /rx-* invocation — step 0.5 auto-revive + step 0.7 Phase W reconcile
E. markdown patched — status, acted_at, phase_w_synced_at
F. action-rate updates (per-domain rolling fraction) and feeds back into A

Snoozed recs auto-revive back to open — lazily, at the next invocation.

Hover any node to see its connections highlighted. The reconcile step is lazy — it fires when you next run a rx command.

Phase W reconciler in plain language

Phase W = the reconciler. After you act on a rec in the UI, the database row is updated. But the markdown file on disk still says status: open. The reconciler is the step that fixes that mismatch. It runs at step 0.7 of the next /rx-<door> invocation: queries recent rec rows, compares each to its on-disk markdown, and patches the frontmatter where the DB has newer state. If everything is already in sync, it does nothing.

Two assumptions worth naming. The reconciler leans on rec frontmatter staying machine-parseable YAML, and on nothing writing these files concurrently — a safe bet today because the system is single-operator and single-laptop, and the reconcile runs inline in the command rather than as a daemon. There is no lock protocol because there is no second writer. The cost: a hand-edit that breaks the YAML silently stops the patch for that file until it's fixed.

What gets patched into frontmatter

Field	Patched when
`status`	DB row moved from open → snoozed / acted / dismissed
`acted_disposition`	operator picked acted_as_prescribed / acted_modified / skipped / dismissed
`acted_at`	any non-open transition stamps this
`subjective_fit_1_5`	operator gave a 1–5 rating in the UI
`outcome_note`	operator typed a free-text reflection
`snoozed_until` · `snooze_count`	operator snoozed the rec
`phase_w_synced_at`	stamped every time the reconciler touches the file (idempotency marker)

The disposition options

acted_as_prescribed

Did the exact thing the rec said. Strongest positive signal.

acted_modified

Took the direction but altered the specifics. Still counts as acted for action-rate.

skipped

Made an explicit choice not to act but agreed with the framing. The one non-acting disposition that stays in the action-rate denominator — it pulls the rate down without counting as a win. Distinct from dismiss in acted_disposition. See the formula below.

dismissed

Rejected the rec entirely. Excluded from the action-rate — a rejection is treated as a rec-quality signal (tracked via acted_disposition and the fit score), not a follow-through miss, so it neither helps nor hurts the ratio.

The subjective fit score (1–5)

Required whenever the operator acts on a rec (a Pydantic validator rejects an acted_* disposition without it); optional on the non-acting dispositions. It is the operator's rating of how well the rec fit the actual moment. Useful for /rx-analyze later — recs from sources or signals that consistently get fit ≤2 are candidates for weight reduction.

Outcome attribution — the finance closed loop

Finance is the only door where the loop closes all the way to P&L. The trades table has a related_rec_id FK (D-046) — when a trade is journalled, the operator can link it back to the rec that triggered it. /rx-analyze later joins trades to recs to compute per-signal hit-rate weighted by realised P&L.

Closed-loop attribution lives in finance only. Fitness equivalent is "did the workout actually happen" — proxied by next-session log existence.

De-biasing the loop — when feedback flatters itself

A closed loop has a failure mode that looks like learning: a rec nudges the operator into a trade, the trade's P&L is then credited back to the rec, and the system congratulates itself for an outcome it caused rather than predicted. Three app-side mechanisms break that flywheel.

Attribution split

Each trade carries rec_influence_kind — preceded_independent (the idea predated or was independent of the rec) vs influenced (the rec drove it). It is operator-set at trade capture, not inferred from timestamps. /rx-analyze's predictive_lift excludes influenced trades — only independent (and unclassified legacy) trades can credit a rec, so the system can't take credit for moves it caused.

Explicit hypothesis linkage

Recs now carry linked_hypothesis_ids named at compose time (match_type="explicit"). The old D-046 substring heuristic isn't removed — it's demoted to a fallback suggestion: any already-explicitly-linked hypothesis suppresses its substring match, and the rest are tagged match_type="substring_fallback". The old landmine — a rec mentioning "NVDA" silently matching every nvda-slug hypothesis — is defused.

Value vs engagement

Action-rate says recs get acted on; it says nothing about whether acting paid. P&L-per-rec sums realised P&L over a rec's linked closed trades; a divergence_flag fires when action-rate ≥ 30% AND total value < 0 — green engagement, red money. Surfaced in the /rx-analyze report. The value metric generalises per door (P&L for finance, drift-improvement for fitness/nutrition, goal-progress for learning) — but only finance has hard money today.

Action-rate itself is unchanged — still acted / (acted + skipped) (dismissed + snoozed excluded). De-biasing adds a value axis beside it; it doesn't touch the engagement formula.

Prediction accuracy and model drift (finance)

The finance door sits on top of a price-prediction model, so the loop has a second self-monitoring layer the other doors don't. Two things are measured continuously, and one of them is itself a rec signal.

Offline accuracy — measured without any traffic

With a single operator there is no click-stream, so quality is judged against the market, not against engagement:

Per-rule hit-rate = correct / total over the last 30 prediction evaluations, bucketed per (ticker, horizon, model, interval). It is a bare ratio — no Wilson interval or Bayesian smoothing (a known gap at n=1).
MAPE / RMSE accuracy grid — mean absolute percentage error and RMSE per (ticker, horizon, model), over rolling 7-, 30-, 90-, and 365-day windows.

Model-drift detector — the `drift_alerts` signal's source

A regime change can silently degrade a checkpoint: predictions stay plausible but get systematically worse on exactly the tickers being traded. The detector catches that by watching its own error rate:

Mechanism	compare `recent_mape` (last 30 days) to `all_time_mape`
Fires when	`recent_mape / all_time_mape ≥ 1.5` (recent error is 1.5× the baseline)
Minimum data	≥10 recent samples AND ≥30 all-time samples
Action on fire	writes a `DriftAlert` row + Telegram ping. Alert-only — no auto-retrain, no auto-demote, no reweight. Idempotent per `(ticker, horizon, model)`.
Feeds back as	the `drift_alerts` component of the finance drift composite (weight 0.15)

Why alert-only, and why it was built. A Kronos checkpoint degraded to roughly 2× MAPE after a regime shift; it was noticed only when realised P&L drifted from the predicted move over several weeks. The fix was to surface degradation proactively as a rec signal rather than a separate ops alarm. It stays alert-only because a 30-day / 10-sample window is too thin to justify auto-demoting a long-running rule — regime changes are often transient, and a human acknowledgment is a circuit-breaker that doesn't amplify short-term variance.

Note this is model drift (prediction-quality degradation), distinct from the user-state drift the rx composite measures — and now also distinct from the two further detectors below.

Two more drift detectors (pure math, not yet scheduled)

Two newer detectors were added as pure functions — deterministic, cheap, and deliberately unwired for now: the series-assembly and scheduling (a cron) aren't in place yet, so today they're callable building blocks, not live alarms.

Detector	Mechanism	Fires when
Preference drift	least-squares slope of action-rate / subjective-fit over an evenly-spaced rolling window (e.g. per week)	`slope ≤ −0.05` → `declining`; `≥ +0.05` → `improving`; else `stable` (<2 points → `insufficient_data`)
Embedding-distribution shift	cosine distance between rolling per-domain embedding centroids of incoming chunks	`distance > 0.15` → `shifted`

Preference drift is the discrete answer to "is the operator quietly losing trust?" — it watches the slope, not just whether the absolute number sits below a band. The embedding-shift monitor watches whether the corpus itself is drifting under the index. Both are distinct from the MAPE-ratio model-drift detector above.

Action-rate as system-death signal

D-003 · the < 30% rule

An open-loop recommender that nobody acts on is broken regardless of the quality of its content. The /rx-*-history commands surface this metric per domain. The threshold:

acted

Example: 58% — healthy.

The red tick at the top marks 30% — the GREEN/YELLOW boundary, and the bar the future cross-domain meta-rec must clear to unlock. Below 15% the door is RED (critical). The gauge fills the moment the metric is computed and colours itself by band.

action_rate = acted / (acted + skipped)

The formula filters on acted_disposition, and only three of the four values appear in it. Numerator = acted_as_prescribed + acted_modified. Denominator = those plus skipped. dismissed and snoozed are both excluded entirely — so, perhaps counter-intuitively, a skip (agreed but didn't act) drags the rate down while a dismiss (rejected outright) doesn't touch it. The reasoning: the metric measures follow-through among recs the operator accepted as legitimate; a dismissal is a rec-quality problem handled elsewhere, not a follow-through miss.

Run-gate: a domain with fewer than 5 dispositioned recs is skipped as "insufficient data" — this, not the 30%, is what blocks the metric per domain
Window: rolling, per-domain (no aggregate across doors)
Health bands: GREEN ≥30% · YELLOW 15–30% · RED <15% (D-003) — these are reporting bands, not run-gates
What 30% actually gates: it is the unlock precondition for the future cross-domain meta-rec (Phase O / /rx-meta) — which needs ≥30% on every gated door plus ≥10 dispositioned recs across domains — not a switch that stops /rx-analyze

Why the band cutoff is global while every other dial is per-door: the 30% threshold measures operator trust in the rec channel itself, not domain dynamics — an unacted rec costs the same attention in every door, regardless of whether that door decides daily (fitness) or slowly (finance). The window is already per-domain; a per-door cutoff is a v1.x candidate once every door has enough dispositioned recs for one to be statistically meaningful.

Proposals only, never auto-applied (D-019). Even when /rx-analyze is allowed to run, it only proposes weight adjustments — it never rewrites them. The operator approves each change. This is the same human-in-the-loop stance as the model-drift detector above: the system surfaces, the operator decides.

Per-door surface comparison

Door	How operator dispositions	Reconcile direction
Zeus	Lovable verocity UI button → Supabase UPDATE	Supabase → markdown (step 0.7)
Athena	no surface (D-047)	not applicable (no DB writes)
Lakshmi	FastAPI panel → `POST /v1/rx/recs/{id}/disposition`	TradingV postgres → markdown (step 0.7)
Ganesh	`/rx-learning-status <id>` directly edits markdown	not applicable — markdown is authoritative

← prev

04 · Doors

06 · Deep tech