TheMurrow

Why “Good” Data Still Leads to Bad Decisions

Dashboards can be accurate and still mislead. Here’s a plain-English guide to bias, metrics, incentives, and context—so “data-driven” doesn’t become “wrong with confidence.”

By TheMurrow Editorial
February 2, 2026
Why “Good” Data Still Leads to Bad Decisions

Key Points

  • 1Recognize that accuracy isn’t adequacy: clean dashboards can still mislead through biased selection, warped metrics, or context-free interpretation.
  • 2Expect KPIs to change behavior: Goodhart’s and Campbell’s laws explain why targets corrupt indicators and can worsen real performance.
  • 3Force context into decisions: segment to avoid Simpson’s paradox, check base rates for rare events, and study exits as carefully as winners.

Dashboards rarely lie. They just don’t tell the whole truth.

Plenty of organizations have what any sane person would call “good data”: clean pipelines, large samples, statistically significant results, meticulous charts updated every morning at 9. And yet the decisions that follow can be strangely, stubbornly wrong—products shipped into the wrong market, incentives that rot culture, “data-driven” strategies that collapse on contact with reality.

The uncomfortable lesson is that accuracy is not the same thing as adequacy. Data can be technically correct and still mislead because the selection is biased, the metric is warped, or the interpretation forgets what the numbers can’t say.

The failure isn’t just human. It’s structural. Teams build measurement systems that reward the wrong behavior, hide the people who disappeared from the dataset, and flatten complex worlds into a single blended rate.

The most dangerous data isn’t fake. It’s clean, persuasive, and quietly incomplete.”

— TheMurrow

The hidden premise: “good data” depends on asking the right question

Data quality discussions often start in the familiar places: instrumentation, completeness, schema, statistical significance. Those basics matter. Yet even perfect measurement can’t rescue a bad premise.

A team can run impeccable analyses on the wrong population, optimize a precise proxy for the wrong outcome, or treat correlation as causation because leadership needs a clean story by Friday. The result is a decision that looks defensible—numbers, charts, p-values—while driving the organization somewhere it didn’t intend to go.

Three failure modes show up again and again:

- Bias in what gets measured: sampling, selection, missingness, survivorship.
- Bias in how success is defined: metric design, incentives, Goodhart-style collapses.
- Bias in how results are interpreted: base rates, aggregation errors, causal confusion, and what happens when models meet the real world.

Each failure mode can exist inside a “high-quality” data environment. The pipeline may be pristine. The dashboard may be correct. The decision can still be wrong.

“Good” is often defined too narrowly

A quiet trap sits inside the phrase “data-driven”: it suggests that the numbers themselves do the driving. In practice, people decide what to measure, what to ignore, what to count as success, and what trade-offs are acceptable.

Those choices often happen far upstream from the spreadsheet—during scoping meetings, KPI negotiations, and executive reviews where pressure favors simple stories over honest uncertainty. Numbers then inherit those assumptions and transmit them with a false air of inevitability.

Bias isn’t only prejudice. It’s systematic error that hides in plain sight.

In everyday conversation, bias means unfairness. In statistics, bias often means something more banal and more dangerous: systematic error—a consistent tilt that pulls conclusions away from reality.

Sampling and selection issues are especially treacherous because the data can look clean. Fields are filled. Values are valid. The dataset might be huge. None of that guarantees representativeness.

Sampling bias, selection bias, survivorship bias

Survivorship bias is a classic example: “a logical or statistical error where attention is paid only to entities that made it through a selection filter,” producing incorrect conclusions. Encyclopaedia Britannica uses the term in precisely that sense, emphasizing how selection filters can distort what you think you’re observing.

The canonical case comes from World War II. Analysts examined returning aircraft and proposed adding armor where bullet holes were most common. Statistician Abraham Wald saw the trap: the planes with those holes survived. Armor belonged where returning planes showed the least damage—those were likely the fatal hit zones in planes that never returned. Britannica recounts the episode as a vivid demonstration of how selection filters can invert the correct decision.

The modern versions are everywhere:

- “We interviewed our top customers.” (You interviewed survivors—the people already inclined to like you.)
- “We analyzed users who stayed six months.” (You conditioned on an outcome—retention—and treated it as a neutral filter.)
- “We used only completed surveys.” (Nonresponse isn’t random; it often correlates with dissatisfaction or disengagement.)

A dataset can be accurate and still be unrepresentative—the neatest table can be the most biased.”

— TheMurrow

Attrition: when missingness isn’t random

Survivorship bias often hides inside attrition. People drop out of funnels and studies for reasons that relate to the outcome. Britannica notes this dynamic in longitudinal contexts: missing data often isn’t a clerical accident; it’s a signal.

A product team might celebrate improving satisfaction based on follow-up surveys—while unhappy users churn and stop responding. A health study might show better outcomes over time because the sickest participants dropped out. The data isn’t “wrong.” It’s incomplete in a way that points systematically toward a flattering story.

The hard part is cultural, not technical. Teams prefer data that feels solid. Missingness feels messy. Yet messy is often the truth.

When a metric becomes a target, it stops measuring what you think

Even when a dataset is representative, decision-making can be derailed by what the organization chooses to optimize. The most common mistake is confusing a proxy for the thing it stands in for.

Metrics are necessary. They’re also fragile. Once a number becomes a target—tied to promotions, budgets, reputation—people adapt. Sometimes they cheat. More often, they simply respond rationally to the incentive system.

Goodhart’s Law: the KPI that collapses under pressure

The best-known warning comes from Goodhart’s Law, often summarized as: “When a measure becomes a target, it ceases to be a good measure.” Wikipedia traces the concept to economist Charles Goodhart, who criticized monetary targeting in the UK. The broader point travels well beyond central banking: when leadership uses a metric for control, the statistical relationship that made it useful can break.

The corporate examples are painfully familiar:

- A call center optimizes for shorter handle time—and customers call back more often because problems aren’t solved.
- A sales organization rewards booked revenue—and discovers later it has optimized for deals that don’t retain.
- A content moderation team faces quotas—and quality declines because speed becomes the job.

Goodhart doesn’t require villains. It requires incentives.

Campbell’s Law: the pressure to corrupt the indicator

Campbell’s Law extends the idea into social systems. Donald T. Campbell’s formulation is blunt: the more a quantitative social indicator is used for decision-making, the more it becomes vulnerable to corruption pressures and the more it distorts the processes it’s meant to monitor.

If Goodhart feels like a business caution, Campbell reads like a public policy diagnosis. Education, policing, healthcare—any domain where a metric becomes the public proof of performance—becomes susceptible to distortion. The “numbers” become the work, rather than evidence about the work.

Metrics don’t merely measure behavior. Under pressure, they manufacture it.”

— TheMurrow

Survivorship bias in the boardroom: why “best customers” give the worst advice

Organizations love to learn from winners. The instinct is understandable and often useful—until it becomes the only instinct.

Interviewing top customers can yield insight into what your product does well. Yet it can also hide why others never adopted, quickly churned, or quietly left. The dataset of “people who love us” is a selection filter with a halo.

Conditioning on outcomes: the retention trap

A common analytics move is to analyze “users who stayed” to find what “drives retention.” That phrase often smuggles in a logical error. If you restrict your dataset to people who retained, you’ve removed the very variation you need to understand why others didn’t.

The pattern repeats in hiring (“let’s model what predicts success using only current employees”), in education (“let’s study what top students do”), and in investing (“let’s learn only from unicorn founders”). Survivors tend to have traits that look causal but may be correlated with hidden filters: access, timing, luck, or systemic advantage.

A better discipline: study exits, not just entrances

Survivorship bias doesn’t argue against studying high performers. It argues for symmetry. If you collect “why they stayed,” you also need “why they left.” If you analyze conversions, you need drop-offs. If you build a portrait of success, you need the shadow: failure modes.

Practically, that means designing data collection to capture:

- churn and cancellation reasons,
- nonresponse patterns,
- cohorts that never converted,
- and segments excluded by default filters.

Otherwise, “insight” becomes a flattering mirror.

Simpson’s paradox: the dashboard can be correct and still reverse the truth

Even with representative data and sane metrics, interpretation can go wrong because aggregation can hide the real story.

Simpson’s paradox is the cleanest example: a trend that appears in aggregated data can reverse when the data is broken into groups. The numbers don’t contradict each other; your interpretation does.

The UC Berkeley admissions case (1973)

Wikipedia’s canonical case involves UC Berkeley graduate admissions for Fall 1973. At the top line, the results looked damning:

- Men: 44% admitted
- Women: 35% admitted

Those are straightforward rates. They are also incomplete. When researchers examined admissions by department, the picture shifted. Women applied more heavily to more competitive departments with lower admission rates overall. Aggregation produced what looked like a single story; segmentation revealed a different mechanism.

Those two numbers—44% vs. 35%—remain one of the most sobering reminders that a single blended metric can be both accurate and misleading.
44% vs. 35%
UC Berkeley Fall 1973 admissions looked biased in aggregate (men vs. women), but department-level segmentation changed the interpretation.

What Simpson’s paradox looks like in modern organizations

In business, Simpson’s paradox shows up as:

- A company-wide conversion rate that rises while key segments decline.
- A “diversity improvement” statistic that masks declines in senior roles.
- A customer satisfaction average that hides polarized experiences (delighted power users, frustrated new users).

The fix is not complicated, but it requires discipline: slice the data the way the world is structured—by cohort, channel, geography, department, tenure, or product line—before you trust the headline rate.

Base rates and rare events: why “high accuracy” can still fail you

One of the most persistent interpretation errors is treating “accuracy” as a universal sign of quality. In many real-world problems, the event you care about is rare: fraud, disease, security incidents, harmful content. Rare events change everything.

When base rates are low, a system can be “accurate” in a way that’s practically useless—because it mostly predicts that nothing happens.

Screening trade-offs in plain English

Medical screening provides the clearest language for this problem. Screening tools must navigate trade-offs between false positives and false negatives. A test that flags many people may catch more true cases but can also produce needless fear and follow-up procedures. A test that is conservative may miss cases.

Research summaries such as those produced by bodies like the U.S. Preventive Services Task Force (USPSTF) often rely on large datasets (for example, BCSC data in a 2016 evidence summary) to quantify these trade-offs. The deeper point is not a single figure; it’s the logic: without the base rate, you don’t know what an “accurate” flag means in the lives of real people.

In organizational settings, the same logic applies to risk models, content moderation, and compliance systems. Leaders demand one number—accuracy—when the real question is moral and operational: Which kind of error can we live with, and at what cost?

The leadership mistake: hiding trade-offs behind a score

A single score feels decisive. It also hides policy choices. Every threshold embeds values: whose time gets wasted, who gets investigated, who gets blocked, who gets missed. Data can inform those choices, but it cannot make them disappear.

So what do you do? A practical checklist for decisions that deserve better than a dashboard

None of these problems require cynicism about data. They require seriousness about measurement.

Organizations that make better calls tend to build habits that feel almost old-fashioned: skepticism, context, and a willingness to look at uncomfortable slices of reality.

Practical takeaways you can apply immediately

Before you ship a decision based on “good data,” ask:

- What are we not measuring? Identify likely blind spots: churned users, nonresponders, rejected applicants, failed transactions.
- Who is missing, and why? Treat missingness as information, not just noise. Attrition often correlates with dissatisfaction or risk.
- Is the metric a proxy or the mission? If the metric is a proxy, name what it stands in for and what it fails to capture.
- Could incentives distort it? Goodhart’s Law and Campbell’s Law aren’t academic. If compensation or status depends on the number, expect adaptation.
- What happens when we segment? Check for Simpson’s paradox by breaking out key groups: department, cohort, channel, geography.
- What’s the base rate? If the event is rare, don’t let “accuracy” drive the decision. Ask about false positives, false negatives, and costs.

A team doesn’t need perfection to improve. It needs a shared language for how data misleads when everyone is acting in good faith.

Pre-decision checklist (from the article)

  • What are we not measuring?
  • Who is missing, and why?
  • Is the metric a proxy or the mission?
  • Could incentives distort it?
  • What happens when we segment?
  • What’s the base rate?

Multiple perspectives: why metric-first cultures persist

KPI-heavy systems don’t survive because leaders are foolish. They survive because organizations need coordination. Metrics are legible across teams. They travel upward. They make performance auditable.

The critique is not “stop measuring.” The critique is: measure with humility. Treat metrics as instruments, not verdicts—and expect the instrument to change the thing it measures once careers depend on it.

The real work: keeping judgment alive in an age of numbers

The deepest risk of “good data” isn’t error. It’s complacency—the belief that because a number is precise, it is complete.

Survivorship bias warns that selection filters can erase the very cases you need to see. Goodhart’s Law warns that targets reshape reality. Campbell’s Law warns that institutional pressure can corrupt the indicator. Simpson’s paradox warns that aggregation can flip the story. Base rates warn that even honest performance scores can mislead when the world is lopsided.

The solution is not to retreat into instinct. It’s to build decision processes that force contact with what the dashboard hides: who is missing, what incentives are doing, which segments disagree, which errors you are choosing.

Data can sharpen judgment. It can also anesthetize it.

The difference is whether you treat numbers as answers—or as prompts for better questions.

Key Insight

The article’s core claim: accuracy is not adequacy. “Good” data can still mislead via biased selection, warped incentives, or context-free interpretation.
9:00 a.m.
Even meticulously updated dashboards (every morning at 9) can drive stubbornly wrong decisions when the premise, metric, or interpretation is flawed.
6 questions
A short checklist—missing cases, missingness reasons, proxy vs. mission, incentives, segmentation, base rates—can prevent repeatable data-driven failures.

Editor’s Note

The critique here isn’t “stop measuring.” It’s: measure with humility—and design decision processes that surface what dashboards hide.
T
About the Author
TheMurrow Editorial is a writer for TheMurrow covering explainers.

Frequently Asked Questions

How can data be “clean” and still lead to bad decisions?

Clean data usually means the values are valid, consistent, and well-processed. Decisions go wrong when the selection is biased (you’re measuring the wrong slice of reality), the metric is a proxy that breaks under incentives, or interpretation ignores context—like base rates or subgroup differences.

What is survivorship bias in business terms?

Survivorship bias happens when you focus only on people or cases that made it through a filter—like “top customers” or “users retained six months.” The missing cases (non-buyers, churned users, failed products) often contain the reasons outcomes differ. Ignoring them can invert what you should do next.

What does Goodhart’s Law mean for KPIs?

Goodhart’s Law is often summarized as “when a measure becomes a target, it ceases to be a good measure.” Once a KPI drives rewards or punishments, people adapt to hit the number, sometimes in ways that damage the underlying goal. The KPI may still rise while performance falls.

How is Campbell’s Law different from Goodhart’s Law?

Goodhart’s Law focuses on how turning a measure into a target breaks its usefulness. Campbell’s Law adds a social dimension: heavy reliance on quantitative indicators invites corruption pressures and distorts the system being measured. It’s frequently discussed in education, policing, and healthcare—places where metrics become public proof.

What is Simpson’s paradox, and why should leaders care?

Simpson’s paradox occurs when a trend in aggregate data reverses when you split the data into groups. The UC Berkeley 1973 admissions case is famous: overall admission rates (44% men vs. 35% women) looked like one story, but department-level patterns changed the interpretation. Leaders should segment results before trusting headlines.

Why is “accuracy” a misleading model metric for rare events?

When the event is rare, a model can be highly “accurate” by mostly predicting that nothing happens. What matters instead are error trade-offs: false positives and false negatives, plus their costs. Screening contexts—like those summarized by groups such as the USPSTF—highlight how base rates shape what a positive result means.

More in Explainers

You Might Also Like