Caught Looking: Analyzing Variation in Umpire Strike Zones

Abstract

As technology advances, Major League Baseball (MLB) has faced increased pressure from fans, coaches, and players to use video technologies to aid umpires in making calls on the field, especially for the notoriously subjective ball and strike calls. With this project, we will assess the ability of umpires to make ball and strike calls that match the rulebook and that are consistent across different game situations. Using nonlinear classification methods such as kernel linear regression and support vector machines we can learn a strike zone for each umpire based on pitch location as well as game circumstances. After learning strike zone classifiers for each game situation and umpire combination, we use kernel PCA to create a low dimensional encoding of the strike zones that can be used for inference. We perform multiple analysis of variance and mixed effects multivariate regression on the principal components to determine which factors have a statistically significant effect on an umpire’s strike zone. Finally we compute a ranking of each umpire and compare our top umpires with those featured on other lists.