The Price of Knowing You
An ad agent wants to show you the right ad. To do it well, it has to climb a ladder — from seeing correlations, to deciding under uncertainty, to asking what a fact about you is worth, all the way to imagining the ad it didn't show. This is a topologically-sorted tour of that ladder: Bayesian networks → utility → decision networks → value of perfect information → interventions → counterfactuals. Each rung has a worked example you can poke at.
One agent — the Defendotron, a real homework I wrote and have since modernized so it runs again — appears twice. At rung 2 it asks "which ad, and is your demographic worth buying?" (value of information). At rung 3 the same agent asks "given you clicked, would you have clicked under a different ad?" (a counterfactual). Watching it climb is the whole story.
The framing follows Pearl's Ladder of Causation as taught by Andrew Forney (LMU CMSI 4320), with the probabilistic machinery from Russell & Norvig's AIMA. The three tiers:
| Tier | Question | Example | Denizen |
|---|---|---|---|
| 3 · imagine | What if $X$ had been $x'$? | "Would I be home had I left the 405?" | Structural Causal Models |
| 2 · do | What if I do $X=x$? | "Will exercise lower my cholesterol?" | Causal Bayes nets · RL |
| 1 · see | What if I see $X=x$? | "Is symptom $X$ tied to disease $Y$?" | Bayesian networks |
Act I · Seeing
Bayesian networks — what do I believe?
Start at the bottom. A Bayesian network answers "what do I believe?" by storing a big joint distribution cheaply: a DAG of cause-ish arrows, plus one small conditional probability table (CPT) per node. The joint factorizes along the structure:
$$P(X_1,\dots,X_n)=\prod_i P\!\left(X_i \mid \text{parents}(X_i)\right)$$
Our running example (Forney's, and the one we'll ride all the way up): Stressed, Vapes, Exercises, develops Heart-disease, with $S\to V$, $S\to E$, $V\to H$, $E\to H$. Instead of $2^4-1=15$ free numbers, we store four little CPTs:
$$P(S)\,P(V\!\mid\!S)\,P(E\!\mid\!S)\,P(H\!\mid\!V,E)$$
d-separation — independence you can read off the picture
The graph also tells you what's independent of what, via three local rules. Conditioning on the middle node blocks a fork ($X\!\leftarrow\!Z\!\to\!Y$) or a chain ($X\!\to\!Z\!\to\!Y$); but conditioning on a collider ($X\!\to\!Z\!\leftarrow\!Y$) opens it — observing a common effect makes its causes dependent ("explaining away"). That collider quirk is the seed of every spurious correlation we'll fight in Act III.
Inference — asking the network questions
Given evidence $e$, we want $P(Q\mid e)$. Forney's four-step recipe:
find $P(Q,e)=\sum_y P(Q,e,y)$ from the factorization, find
$P(e)=\sum_q P(q,e)$, divide. That's enumeration
(AIMA's enumeration_ask); variable elimination
is the same idea with the sums pushed inward. When the network is too
big, you sample (likelihood weighting, Gibbs).
Act II · Deciding
Utility & decision networks — what should I do?
Beliefs don't act. To decide, add what you want: a utility $U(s)$ over outcomes. A rational agent picks the action with the highest expected utility:
$$\mathrm{EU}(a\mid e)=\sum_s P(s\mid e,a)\,U(s),\qquad \mathrm{MEU}(e)=\max_a \mathrm{EU}(a\mid e).$$
A decision network is a Bayes net plus decision nodes
(your actions) and a utility node. You run inference inside an argmax
over actions. That's AIMA's DecisionNetwork — and it's
exactly what the Defendotron does.
The Defendotron — a decision network on real data
For CMSI 3300 I built an ad-selection agent: learn a Bayes net from a
demographic CSV, then choose which ad to serve to maximize expected
payoff. It has hand-rolled eu(), meu(), and
vpi(). The catch Forney's students hit years later: it was
written against pomegranate 0.x and a 2022 pgmpy,
and stopped running. So I modernized it
(BicScore→BIC, BayesianNetwork→DiscreteBayesianNetwork,
fixed a utility-variable indexing bug) and re-ran it on the original
adbot-data.csv:
# pgmpy 1.1.2 — verified MEU + VPI on the original adbot-data.csv
MEU(no evidence) → serve {Ad1:1, Ad2:0} E[utility] ≈ 746.8
MEU(T=0, G=0) → serve {Ad1:0, Ad2:0} E[utility] ≈ 797.0
VPI("G" | -) ≈ 20.77 # gender is worth ~21 utility units to learn
VPI("F" | -) ≈ 0 # feature F can't change the decision → worthless
VPI("G" | H=0,T=1,P=0) ≈ 66.8 # in context, the same fact is worth 3× more
The modernized run reproduced the original
test-suite's headline numbers — MEU≈746.8 (author's value
746.72) and VPI(G)=20.77 exactly. Where the structure-learner
lands on a slightly different DAG across pgmpy versions, the
downstream VPI shifts too (and can even dip numerically negative — which
is why you clamp it). That sensitivity is itself the lesson: VPI is only
as stable as the graph beneath it.
Value of Perfect Information — the crown jewel
Now the question this whole project is named for. Before you act, is it worth learning one more fact? The value of perfect information of observing $E'$ is the expected best-you-could-do after seeing it, minus the best-you-could-do now — denominated in utility:
$$\mathrm{VPI}_e(E')=\Big[\textstyle\sum_v P(E'\!=\!v\mid e)\,\mathrm{MEU}\!\big(e\cup\{E'\!=\!v\}\big)\Big]-\mathrm{MEU}(e).$$
Three properties worth holding onto: VPI is never negative
(information can't hurt in expectation); it's non-additive
(two facts can be redundant or synergistic); and it is
zero exactly when the fact can't change your decision —
VPI measures decision-relevance, not Shannon information. That's
why VPI(F)≈0 above: F is informative about the world but
irrelevant to which ad wins.
My hand-rolled vpi() turns out to implement the
identical formula to AIMA's InformationGatheringAgent.vpi()
(Fig 16.9) — I re-derived the textbook on real data without knowing it.
Act III · Doing
Why correlations aren't enough
The Defendotron is powerful, but it lives on tier 1: it knows how variables covary, not what happens if you act. Ask it "does vaping cause heart disease?" and the naive answer $P(H\!=\!1\mid V\!=\!1)-P(H\!=\!1\mid V\!=\!0)$ is wrong, for three reasons:
- Observational equivalence — $S\!\to\!V$ and $V\!\to\!S$ encode the same independences but opposite causes.
- Spurious correlation — conditioning lets information leak along the back-path $V\!\leftarrow\!S\!\to\!E\!\to\!H$, tangential to the effect we want.
- Latent confounders — variables we never recorded that drive two we did.
Forney's tagline: causal queries demand causal assumptions, encoded in the model's structure. Promote the Bayes net to a Causal Bayesian Network, where an arrow $\text{Parents}(Y)\to Y$ means $Y=f_Y(\text{Parents}(Y))$ — forcing a cause changes the effect, but not vice-versa.
The do-operator — surgery on the graph
Observing differs from intervening. We write $do(X=x)$ for forcing $X$ to $x$ apart from its normal causes. Structurally, $do(X=x)$ builds the mutilated graph $G_x$: the original graph with every inbound edge to $X$ cut. $X$ is now set by fiat, so its parents no longer explain it:
| world | graph | factorization |
|---|---|---|
| observe (nature) | $S\!\to\!V,\,S\!\to\!E,\,VE\!\to\!H$ | $P(S)\,P(V\!\mid\!S)\,P(E\!\mid\!S)\,P(H\!\mid\!V,E)$ |
| $do(V=v)$ | cut $S\!\to\!V$ | $P(S)\,\underline{\ 1\ }\,P(E\!\mid\!S)\,P(H\!\mid\!V\!=\!v,E)$ |
The payoff is striking: you can compute the answer to a randomized experiment you never ran — provided your causal assumptions hold.
Backdoor & adjustment
When a confounder muddies $X\to Y$, block the non-causal back-paths by conditioning on an admissible set $Z$ (the backdoor criterion):
$$P(Y\mid do(X\!=\!x))=\sum_z P(Y\mid X\!=\!x,Z\!=\!z)\,P(Z\!=\!z).$$
That adjustment formula is the heart of CMSI 4320's Assignment 4 ("Back Doors, Do, and other Funny Sounding Things"): translate an English query to a $do$-expression, then hand-compute it — e.g. $P(H\!=\!1\mid do(E\!=\!1),S\!=\!1)$ on the vaping net. A randomized controlled trial is a physical $do()$; the algebra lets us emulate one from observational data.
Act IV · Imagining
Structural causal models & the top rung
The most expressive model writes down the mechanisms themselves. A structural causal model $M=\langle U,V,F\rangle$ has endogenous variables $V$, exogenous "luck" variables $U$ (one per $V$, $E[U_i]=0$), and structural equations $V_i\gets f_{V_i}(\text{pa}_i,U_{V_i})$. Fix the $U$ and everything is determined — that determinism is what makes counterfactuals computable. When two variables secretly share a $U$ (in the linear case, $\mathrm{Cov}(U_X,U_Y)\neq 0$), you have an unobserved confounder.
Counterfactuals — abduction, action, prediction
A counterfactual conditions on what actually happened and asks about a world where one thing was different — two worlds sharing the same luck. You can't just intervene (that throws away the evidence about the luck); you must first recover it. The three-step engine, on a linear afterschool→GPA model ($Z\gets 3X+U_Z,\ Y\gets 2X+4Z+U_Y$):
- Abduction — solve the $U$ from the observed facts.
- Action — apply $do(X=x')$ to the mutilated model, carrying the $U$ forward.
- Prediction — recompute the outcome.
The ladder closes — VPI becomes ETT
Here is the payoff the project was built around. CMSI 4320's Assignment 5 ("Ifs, Onlys, and Buts") pushes counterfactuals into named quantities. The headline one is the Effect of Treatment on the Treated:
$$\mathrm{ETT}=E\big[Y_{x'}-Y_{x}\ \big|\ X=x\big]$$
— among those who actually took action $x$, what would have happened under $x'$? And the assignment's worked instance is a web-advertising system: demographics $\to$ ad shown $\to$ clickthrough. Given the people who clicked, would they have clicked under a different ad?
That is the Defendotron's own question, one rung higher. At tier 2 it asked "is your demographic worth buying before I pick an ad?" (VPI). At tier 3 the same ad agent asks "given you clicked, would you have clicked under the ad I didn't show?" (ETT). Same machinery — beliefs, utilities, the value of information — climbed from seeing to imagining. The ad agent walked the whole ladder.
The same act yields attributable risk and probability of necessity (the drug-liability question: of those harmed after taking the drug, what fraction were harmed because of it?), and Forney's research thread, MABUC — multi-armed bandits with unobserved confounders, where an agent uses its own intent (a counterfactual signal) to beat a confounded slot machine. That's the bridge from this curriculum into reinforcement learning.
Where this shows up
The same ladder, eight domains:
| Domain | Tier-1/2 (see / decide) | Tier-3 (imagine) |
|---|---|---|
| Advertising (the spine) | Defendotron: which ad? VPI of a demographic | ETT: would they have clicked under another ad? |
| Medicine | vaping → heart-disease; exercise → cholesterol | drug-liability attributable risk; physician drug choice |
| Public policy | agricultural subsidy decision | terrorism-predictor causal structure |
| Education | study-habits MDP | afterschool → GPA counterfactual |
| Hiring | screen on predicted performance | counterfactual fairness: same hire under a different background? |
| Navigation | taxi-dispatch / grid-world MDP | "home by now had I left the 405?" |
| Genomics | DNA-sequence HMM (Viterbi) | — |
| Risk under uncertainty | tiger-doors POMDP; value of listening | — |
The tier-2 bridge — MDPs, bandits, reinforcement learning — is its own story. In robot-arm-sim a crawler teaches itself to walk by Q-learning, and the punchline is the same as here: a model-based method (one that can imagine roads not taken) beats every model-free one. Decision theory and that RL project are two halves of the same ladder.
What's mine vs. what's cited. The
Defendotron ad agent (the eu/meu/vpi
engine, structure learning, and the modernization that got it running
again) is my CMSI 3300 homework, author-confirmed in the source. The
causal-hierarchy framing, the vaping and afterschool
examples, and the do/counterfactual machinery are from
Andrew Forney's CMSI 4320
lectures and assignments (Pearl's Ladder of Causation). The
inference and VPI reference implementations are from Russell &
Norvig's AIMA (probability.py,
making_simple_decision4e.py, Ch. 13–16). The three widgets
above are my own, written to mirror the verified Python.
Built as a node of a larger indexed archive of my coursework.
Reproduce the Defendotron with pip install pgmpy pandas
and the original adbot-data.csv.