0.95 Doesn't Mean "Cause": Correlation vs. Causation

At the end of last post I dropped this question.

Does run differential cause winning percentage, or do they just move together?

We confirmed r = 0.9x across 5 seasons and 150 teams. The number itself isn't going anywhere. But what the number means is a separate problem.

Looking at r = 0.95, can we say "run differential causes wins"?

You've Probably Heard This One

Anyone who's poked around data analysis even a little has seen this sentence.

Correlation is not causation.

It gets quoted so often it sounds like a slogan, but the meaning is serious. Two variables moving together doesn't mean one of them is producing the other.

There are classic examples.

Ice cream sales and drowning incidents have a strong positive correlation. Ice cream doesn't cause drowning. They both go up because of summer.
Shoe size and vocabulary have a strong positive correlation. Bigger feet don't help you learn more words. Adults have bigger feet and bigger vocabularies than kids. Age is dragging both up.

These hidden third variables are called lurking variables (or confounders). Two things look like they're walking hand in hand, but really they're both being pulled by something invisible.

Okay, back to our data.

Could there be a lurking variable behind run diff and win rate?

"Team Quality" Is the Obvious Suspect

There's one obvious candidate: team quality.

Think about it. If a team is good — strong hitters, strong pitchers, solid defense — then naturally two things happen.

Runs scored go up, runs allowed go down → bigger run differential
They win more games → higher win rate

The arrows look like this.

TEXT

       Team Quality
       ↙          ↘
   Run Diff      Win Rate

Run differential isn't "causing" win rate. They're both children of a common parent called team quality. The r = 0.95 we're seeing might just be two siblings moving in sync because of who their parent is.

Frame it this way and 0.95 starts to look different. It just says "these two move together" — it doesn't say one of them is making the other happen.

But... is that really the whole story?

Baseball Is a Special Case

We have to step into baseball-specific territory here. The lurking-variable logic above fits most data well. But for run diff and win rate, there's something unusual going on.

What does it mean, by definition, to win a baseball game?

Score at least one more run than the other team over 9 innings.

In other words, in any single game, runs scored > runs allowed means you win. That's the rule of the sport. It isn't a statistical observation — it's the rulebook itself.

Now lift that to the season level.

A season is 162 games.
In each of those 162 games, "runs > runs allowed that day" gives you a win, otherwise a loss.
Add up runs scored and runs allowed across all 162 games and you get season RS and season RA.
Season run diff = RS − RA = "how much more we scored than the opponent across 162 games."

So season win rate and season run diff can't be two unrelated variables. They're tied together by definition in every single game, and that tie is just being summed up 162 times.

This is fundamentally different from the ice-cream-vs-drowning case. Ice cream sales and drowning are separate events that happen to coincide in summer. Runs and wins are two facets of the same event.

So Is It Causation?

Careful here. "Tied by definition" doesn't mean "run diff causes win rate" either.

Here's a more precise framing. Season run diff and season win rate are both summaries of a deeper fact — "how runs played out across 162 games."

Win rate summarizes those 162 games as win/loss binaries. A 6–5 win and a 15–0 win are both "1 win."
Run diff summarizes those same 162 games as a sum of margins. A 6–5 win is +1; a 15–0 win is +15.

Same games, two lenses. That's why they're so tightly coupled. But the two lenses see slightly different information.

Suppose a team always wins by 1 and always loses by 10. Their win rate is average; their run diff is brutal.
Flip it: always lose by 1, win by 10. Average win rate, big run diff.

These teams are exactly where the two lenses disagree. That's why r is 0.9x and not exactly 1.0. The 0.05 gap above 0.9 is the "win rate is sensitive to how runs are distributed across games; run diff isn't" part.

Re-reading 0.95

Let's pin down how to read r = 0.95.

❌ Wrong: "Run diff causes wins. Increase run diff by N and win rate goes up by Y."

❌ Also wrong: "These two numbers just happen to dance together by coincidence."

✅ Right: "Run diff and win rate are two summaries of the same set of game outcomes. Knowing one lets you predict the other almost exactly. The 'almost' — that 5% — comes from how the runs were distributed across games."

This distinction sounds nitpicky, but it matters when you actually do analysis. If someone says "let's design a strategy to grow our run differential," that's the wrong prescription. Run differential isn't a target to maximize — it's a byproduct of playing good baseball.

A loose analogy: r = 0.95 here is like the relationship between height and weight. They move together strongly, but "to get taller, gain weight" is not the takeaway.

What the Data Doesn't Tell You

Worth pausing here to mark a limit of data analysis.

We computed r from 5 seasons and 150 points. The number is precise. But that number alone can't tell you why the relationship exists. To get to "why," you have to know what kind of game baseball is and how a single game gets decided.

This is the typical shape of data analysis.

Data tells you "what moves with what."
Domain knowledge tells you "why it moves together."

Either one alone is dangerous. Data without domain knowledge concludes that ice cream causes drowning. Domain knowledge without data ends in "I just have a feeling that..." You need both for it to be analysis.

The Next Question

So far we've looked at run diff and win rate as just these two numbers. But as mentioned in passing above, there's that 5% gap between them.

Two teams can have the same run diff but different win rates.
The difference comes from how they distributed the same runs across games.

This naturally raises the next question.

If run differential alone leaves something on the table, is there a better formula for win rate?

There already is one in the baseball stats world, and it has a slightly grandiose name: the Pythagorean Expectation. In the next post I'll plug it into our data and see how well it actually fits.

This analysis was done with just-mlb, a tool I built. Look at the 5-season scatter, see the r = 0.95 yourself, and try to spot where that 5% gap lives between the points.

0.95 Doesn't Mean "Cause": Correlation vs. Causation

You've Probably Heard This One

"Team Quality" Is the Obvious Suspect

Baseball Is a Special Case

So Is It Causation?

Re-reading 0.95

What the Data Doesn't Tell You

The Next Question

Comments

Comments

Comments