Back to Blog

Can We Put a Number on "Related"?

(edited: April 13, 2026)

In the last post, I made a scatter plot.

Run differential on the X-axis, winning percentage on the Y-axis. The dots flowed from bottom-left to top-right. The feeling was clear: "these are related."

But that's just a feeling.

"Looks related" means different things to different people. One person sees a strong trend, another sees too much scatter. Same chart, different conclusions.

Can we express how strong this relationship is with a single number?


The Limits of Eyeballing

Imagine comparing two scatter plots:

  • A: dots sit almost perfectly on a straight line
  • B: dots trend upward, but they're pretty spread out

You can tell A shows a "stronger" relationship. But you can't answer:

  • How much stronger is A than B?
  • Is B's relationship "weak" or "moderate"?
  • Does this relationship get stronger or weaker across seasons?

To compare, track, and judge, you need numbers, not eyes.


A Number Called r

There's a number for exactly this. The correlation coefficient, usually written as r.

Skipping the formula, here's what matters:

  • r = 1 → dots form a perfect upward line. One goes up, the other always goes up too.
  • r = -1 → perfect downward line. One goes up, the other always goes down.
  • r = 0 → no relationship. Dots scattered everywhere.

In reality, r is rarely exactly 1 or 0. It falls somewhere in between. So we need a scale:

r valueInterpretation
0.9+Very strong
0.7 – 0.9Strong
0.4 – 0.7Moderate
0.2 – 0.4Weak
Below 0.2Negligible

So What's the r for Run Differential vs. Win%?

I calculated it with 2025 MLB data — all 30 teams.

1775907753055_gb3ll8______2026-04-11_204105.png

Result:

r ≈ 0.95 — Very strong positive correlation

0.95. Almost 1.

That gut feeling from the scatter plot wasn't wrong. Not only is there a relationship — it's a very strong one. Teams with a big run differential almost always have a high winning percentage.


What About Other Combinations?

Is it only run differential that correlates strongly with winning percentage?

1775907862810_9za16z_image.png

  • Wins (W) vs. Losses (L): r around -0.9. Obviously — more wins means fewer losses.
  • Runs Scored (R) vs. Win% (Pct): r in the 0.7–0.8 range. Scoring runs matters, but not as much as run differential.

Run differential explains winning percentage better than scoring or pitching alone. It's not just about offense — the gap between offense and defense is what counts. The numbers confirm it.


But Is 30 Teams Enough?

Here's the nagging question.

We only have 30 data points. One season, 30 teams. Would r still be 0.95 for the 2024 season? For 2023?

Drawing a strong conclusion from 30 points feels a bit premature. Maybe 2025 is just an outlier year.

We need more data to be sure.

→ In the next post, I'll combine multiple seasons and check whether this relationship holds year after year — or if it's just a one-season fluke.


This analysis was done with just-mlb, a tool I built. Want to see the correlation coefficient on the scatter plot yourself? Check it out.

Comments