Monday, February 4, 2019

Overturned 49% by Replay, Umpires Still 99.5% Accurate

When we investigated the 49.3% of umpire calls shipped to video review that have been overturned (4,046 out of 8,212) since expanded replay's 2014 inception, we found that umpires were 99.5% accurate. With MLB overturning umpires at a nearly 50-50 clip, how do we reconcile a documented 49.3% overturn rate with 99.5% replay-eligible accuracy? Are umpires getting better or worse and what does 49.3%, or 4,046, really signify?

Today's edition of Ask the UEFL is stats-based and comes from Sam Ryan's question, which represents what quite a few baseball fans ordinarily think upon seeing that umpires are overturned at a rate of 49%:
Would any expert on this site mind commenting on a statistic that I find shocking? Ever since instant replay was installed, umpires have been adjudicated wrong an average of 48% of the time!! I am NOT trying to start an argument. It's a sincere question. I know I couldn't keep my job with that record, and I don't quite understand how MLB umps keep theirs. I make plenty of mistakes in my job.....but if I was proven wrong half the time I assure you my Board of Directors would hand me a pink slip.
Definitions: To discuss overturn rates, we must define the term "overturn rate" itself. It's vitally important to know that a 48 or 49% overturn rate isolates only those calls that fit several criteria. For one, these calls tend to be bang-bang decisions, though they also include those once described by Tona La Russa and others as "the obvious miss." We also must define "the time" in Sam's statement, "...proven wrong half the time." As we will discuss below, "the time" does not refer to an entire regulation game or season, but instead to a very small part of some games that corresponds to less than one third of one inning.
Related PostReplay History - Overturned Calls Outnumber Upheld (7/27/16).

Graphic: Umpire Overturn Rate.
Second, the calls selected for review are subject to bias. With an approximate ratio of nine Manager's Challenges for one Crew Chief Review (in 2018 it was 146 Crew Chief Reviews out of 1398 total reviews), we see that approximately nine of every ten reviews are subject to a team deciding whether to challenge a call based on confounding variables, such as (a) whether an overturn will help the team, or (b) whether the team has a chance of winning the challenge.

This is different than, for instance, ESPN's 2010 study, which led off with the headline that umpires err on 20% of their non-ball/strike close calls (e.g., the ESPN study did not filter out calls that [a] failed to benefit a given team, or [b] were correct to begin with; however, the study did fall into the trap of excluding calls that weren't defined as "close," pursuant to a subjective framework).

Yet these biases fall glaringly short on the fact that umpires do much more than adjudicate close plays. We have no publicly available metric to rule on issues such as umpire positioning or game management, but even so, we still have many other calls that occur every game that don't wind up under review.

And therein lies the discrepency and difference between an umpire's overturn rate and an umpire's accuracy rate.

Overturn Rate is based only on calls that go to replay, which sometimes spur controversy and occasionally result in ejection when New York Replay HQ reaches a decision that fails to satisfy one team or another—ejections over upheld or overturned replay rulings still exist. In a career such as Sam describes, these are the few decisions one makes on the job that is bound to upset someone, whether right or wrong.
Related PostMLB Ejections 173-174 - Joe West (3-4; CWS x2) (9/22/18).

If reviewed, this call is computed for both rates.
If not reviewed, it is only part of accuracy rate.
Accuracy Rate includes all calls eligible for replay, whether or not they actually go to video review. They include all aforementioned close calls that serve as the basis for the overturn rate as well as less-controversial rulings and similar "easy" declarations. In a career such as Sam describes, these are the bulk of the decisions one makes, including the no-brainers as well as the controversial ones that just might be wrong.

For instance, while ESPN's 2010 study highlighted the 20% miss statistic front-and-center, it failed to fully extrapolate the data, which found that umpires miss one call for every 220 chances, which corresponds to an overall accuracy rate of 99.55%. See the trend? 20% would correspond to a pre-replay version of an overturn rate, while 99.55% was umpiring's overall accuracy rate.
Related PostBerating Officials: The Grand New American Pastime? (10/10/12).

Numerical Analysis: We experienced 1,356 regular season replay events in 2018, which over a 2,430-game season, is one review for every 1.79 games played (or 0.559 reviews per game). Of these, umpires were overturned 666 times. 666 overturns over 2,430 games corresponds to one overturned call for every 3.649 games played.

Plays like these are only counted once in the #s.
Assuming umpires adjudicate a bare minimum of 52 decisions per game (51 outs [8½ innings] + 1 run-producing play), we find that .559 reviews-per-game equates to .559 reviews-per-52 decisions.* Applying a 49.3% overturn rate to our .559 reviews-per-game yields a result of .276 overturns-per-game. .276 overturns for every 52 decisions equates to one overturn for every 188 decisions, or an overall accuracy rate of 99.47% for MLB umpires on replay-eligible calls.

Remember, 99.47% is a bare minimum rate and the real accuracy rate is likely higher. For instance, HP Umpire David Rackley's six calls in six seconds from a Houston walk-off in July forced the umpire to make at least six decisions on no less than six different rules during one play. In the end, the bare minimum methodology would count this as just one decision (if even that...the game likely already saw 52 decisions, which means Rackley's rapid response was more than likely ignored entirely using the bare minimum method...even so, none of these decisions were replay-eligible to begin with).
Related PostWild Walk-Off - Analysis of 6 Rules for 1 Play in Houston (7/11/18).

Conclusion: In 2010, ESPN found that umpires were 99.55% accurate on all non-ball/strike (incl. check swing) calls. In 2019, we found that umpires are at least 99.47% accurate on all replay-eligible decisions. As a drop of .08% is not statistically significant, especially given how 99.47% is a minimum value, we cannot conclude that umpires are worse than they were at the decade's start (nor can we conclude that they are better). On the contrary, we would hypothesize that technology is improving, meaning that it is easier now than it was even 10 years ago to discern an incorrect call.

Overall, 49.3% as an overturn rate, while factually correct, can be misleading if taken out of context. As the earlier pie chart and bar graphic indicates, the 49.3% figure is taken from such a relatively small part of the game that it is rather akin to picking apart one minute out of a three-hour movie (at one review for every 1.79 games played, it's more like one minute out of a five hour, 22 minute film) and performing the analysis from there.

Nonetheless, with the league staff as a whole erring on approximately one decision every 188 chances [2014-18]—or once every 3.649 games [2018]—we conclude that just like Sam Ryan, umpires do make mistakes on the job, but unlike Sam's initial assertion, umpires are not proven wrong "half the time"—unless "half the time" applies strictly to those 0.559 times per game that Replay Review is invoked.

*There's no perfect way of figuring out how many replay-eligible decisions or calls an umpire makes per game. 52-per-game assumes every batter is retired in order through the top of the ninth and then the first batter in the bottom of the frame wins it for the home team in walk-off fashion. This fails to account for non-replay eligible rulings (such as strikeouts, air outs on the infield, interference, etc.), but also assumes all 2,430 regular season MLB games experience the bare minimum of batters. We take both failures on opposite sides of the spectrum and, very roughly, cancel them out to produce the one-per-188 figure.


Post a Comment