Friday, October 9, 2020

Umpire Plate Scores - Comparing Scoring Methods

When we introduced UEFL f/x 3.0, we spoke of the plate score tool's transparency, accuracy, timeliness, and thoroughness for ball/strike umpire analysis. One question we received asks why numbers by another ump scorecard source doesn't match ours.

To begin, there exist roughly two main methods for strike zone analysis: one is to let MLB's Baseball Savant make all determinations as to whether a given pitch is located within or outside of the strike zone and the other is to parse the data to calculate and assign a probable ball/strike status to each pitch.

Boston University/UmpScores
The BU version, made famous from Mark T. Williams' umpire study, employs the former method, which is most convenient and less processor-intensive for a tremendously large set of data points. As we found in 2019, this Baseball Savant scraping method relies on a variable roughly known as "zone," which does exactly what it looks like: It assigns a single-digit number (1-9) for a pitch located within the strike zone and a double-digit number (11-14) for a pitch located outside the zone. Much like a MIDI file sans percussion from the 1990s, track 10 isn't used.

Because the Zone variable is derived from other variables to calculate what number to assign as a zone statistic, this approach essentially doesn't work with measured data and instead relies on a processed number pursuant to Savant terms and manipulation. Not great for the scientific method, but a lot easier and simpler to calculate whether an umpire's call of ball/strike matches with Savant's zone designation of 11-14 (ball) or 1-9 (strike). Thus, a ball call on a pitch with zone 14 would be deemed Correct and a ball call on a pitch with zone 9 would be Incorrect. Simple.

UEFL f/x and UmpireScorecards appear to share the second of data collection methods through analysis of raw variables px, pz, sz_bot, and sz_top.

The difference between UEFL f/x's tri-score report and Scorecards or Auditor's single, simplified report is primarily how calculations are made after collecting raw data.

Whereas UEFL f/x runs through a process to analyze each pitch three different ways for the three different scores, accounting for a certain measure of error and physical properties of a baseball (e.g., as we've stated since the beginning, Auditor artificially inflates umpire "missed call" statistics by failing to acknowledge that a baseball has a radius), the other systems only return a number for what we call ML Public, or a zero-error system.

Scorecards, meanwhile, took to creating graphics to represent what it calls "Worst Missed Calls in terms of change in run expectancy" (yes, there really is a significant font size difference, and that's part of the issue: it's designed as a tool to measure changes in run expectancy first and considers umpire accuracy determination somewhat of an afterthought). This proved problematic when, for Angel Hernandez's October 7, 2020 game, Scorecards claimed the worst missed call of the game (see above), labeled Pitch 1, was a ball call that...well...looks like a correct call, drawing several critical replies, such as "1 is a ball all day long. But we are counting that as 'inaccurate'?"

This is what a zero-error (ML Public) system does: creates zero allowance on the edges. Yet perhaps, thanks to these graphics, it might be easier to comprehend why exactly ML Public scores tend to trend lower than UEFL f/x (other than the margin of error issue): because computers, despite acting entirely logically, seemingly act decidedly inhuman in robotically applying criteria.

The other issue pertains to Scorecards admitting to not knowing where to draw the line as to the edge of the strike zone, and admitting to not knowing how Savant graphs its strike zones (e.g., how a full game's plot isn't normalized, dating back to the Brooks Baseball days of strike zone graphics). This, too, might just be problematic for large-scale analysis using very precise numbers where baseball is colloquially a game of inches, though in the 21st century, it has become a game of mere fractions.

This also explains the difference between Scorecards' numbers and our ML Public numbers: the values used for plate edges as well as the values used for the radius of a baseball might be different, which would explain different plate scores using conceivably similar zero-error methods. ML Public, for instance, just happens to calculate radius from circumference pursuant to Official Baseball Rule 3.01 (Auditor's favorite rule to ignore), and plate width from a combination of OBR 2.02 and Definition of Terms [Strike Zone].

Apparently, the pressure might be getting to Scorecards as evidenced by Friday morning's tweet, "I KNOW BALLS 1 AND 2 ARE VERY CLOSE. Please do not yell at me."

Must not be an umpire.

Video as follows:


Post a Comment