Saturday 24 October 2015

Palmer on DiPS

Pete Palmer was interviewed in April of this year, and was asked about Voros McCracken's Defence-Independent Pitching Statistics. Palmer is arguably the most important sabermetrician OF ALL TIME. Certainly his only rival is going to be Bill James, so reading Palmer's comments on DiPS theory, which James himself regarded as important, makes an interesting comparison and contrast.

James referred to McCracken's work in the New Bill James Historical Baseball Abstract (p 885):

3. This knowledge is significant, very useful. 4. I feel stupid for not having realised this thirty years ago.
Palmer, however, has a very different take on the matter.
I didn’t have a lot of faith in [DiPS]....[McCracken] said there wasn’t a great amount of correlation from season to season. But as I said, the variations due to chance and everything in sports, baseball in particular, is a lot higher than people think. Your average could drop 60 points from one year to the next, and it’s not really statistically significant because 500 at-bats isn’t that many at-bats to verify what your current batting average should be.
Whether this opinion is rooted in statistical analysis or not, it does conform somewhat with the analysis provided in "Solving DiPS", a compilation of an on-line discussion which you can find a copy of here. One key solution in "Solving DiPS" is that, given 700 Balls in Play, some 44 per cent of the outcomes are a consequence of random variance, the single largest factor. (Pitchers were assigned 28 per cent, fielding 17 per cent. Hold that thought for a moment.)

I have seen it suggested that Palmer does not understand DiPS, which has become a tool for projecting a pitcher's future. But from the perspective of evaluating a pitcher's season, Palmer's lack of "faith" makes more sense. BABIP's variance is irrelevant, because it is in the nature of the game. What is important is to convert extra-base hits into singles, and singles into outs.

When you don’t look at walks and strikeouts and home runs, you’re actually minimizing a difference between a good pitcher and a bad pitcher. And therefore, the gap in that category is going to be artificially low because some of the factors that would make it higher are not counted.
In other words, we shouldn't be surprised that pitchers appear to have limited or no control over the outcomes of balls in play. That has never been where the difference has been visible in the small sample size of a single season.

Finally, to return to those percentages from "Solving DiPS", what might be surprising from the traditional reception of DiPS is that pitchers have more control over the outcome than fielders. So, again, perhaps we should be a bit more sceptical, like Pete Palmer, of those making grandiose claims for DiPS. Insofar as anything has control over the outcome of the batted ball, it is the pitcher. Random variance in its nature is uncontrolled.

Thursday 22 October 2015

Oh, the Humanities

As I tweeted a week or so ago, this was a good season for the part of me that is a Tigers' fan to miss. I have been dealing with a return of my wife's cancer (the outlook is not great but, as the last lines of the original theatrical release of Blade Runner go, "I didn't know how long we had together... Who does?"), in addition to moving house (and changing countries). However, I accumulated a few bookmarks and other ideas to work through, especially now we can only watch other teams in the post-season.

While I was busy, a very important blog post was made back in May. Phil Birnbaum, who is nothing if not insightful in writing about sabermetrics, announced that dWAR, a measure of fielding value, seemed to him to have a significant problem. Birnbaum proposed that dWAR inherently overvalued fielding. Birnbaum's argument is rooted in mathematical accuracy, so I don't feel confident trying to explain it. If you haven't read the post already, you should go to his blog to read how he explains it.

However, his explanation boils down to three key points, if we focus on the effects:

a) the runs allocated to the fielders under dWAR are too high, by an order of around fifty percent. (So a team dWAR of -40 is actually more like -20

b) The cause of this is that when one assumes "certain balls in play are the same" (as one has to do with older baseball statistics) then the math sends all the credit to the fielders.

c)

"Observations are a combination of talent and luck. If you want to divide the observed balls in play into observed pitching and observed fielding, you're also going to have to divide the luck properly."
Here, I think, we run into the problem of "All things being equal", or the distinction that the philosopher of history R.G. Collingwood made between meteorology and chemistry. It is an essential fact of human life that all things are NOT equal. People working in meteorology can collect observations of events, but cannot reproduce them at will, unlike people working in chemistry. By contrast, the historian can observe events, but they cannot create political or social crises at will, nor send qualified observers back into the past in order to collect the information needed to understand those events in the way scientists might send an expedition to view an eclipse or collect specimens. In scoring a baseball game, at best a sabermetrician can be a weatherman.

One can take issue with the statement "the assassination of Archduke Franz Ferdinand of Austria on 28 June 1914 triggered the First World War" as one of causality, but without doubt the shooting set off a diplomatic crisis that led to the war. More importantly, luck played a crucial role in the event because the Archduke's car came to a complete halt very close to where the "Yugoslav nationalist" Gavrilo Princip, had stationed himself. An earlier attempt to kill the Archduke in a moving car had failed. We have no idea whether Princip could have been successful if his targets had been in a moving car. So, what percentage of responsibility to the war do we assign to Princip, to the driver, to the governor of Bosnia at whose orders the driver stopped, to the Serbian officers who conspired to arm Princip, to the Archduke or to the general diplomatic situation? And any formula that did allocate "responsibility shares" to these people would be essentially an act of faith.

Birnbaum went on to add some further details to his understanding in a threat on the blog of Tom Tango, the tremendously influential pseudonymous saberist. In the comments section of Tango's thread on the post by Birnbaum, Birnbaum suggested in one reply that it was just not possible for a system like Defensive Runs Saved or Ultimate Zone Rating to make distinctions about balls in play that could tell us something about the skill of the fielder.But before that he stated that he wanted to assign the luck to the pitcher. However, reading the comments there is to venture into a world where something like the Responsibility Shares is thought to be possible. Possibly, with enough computing power, such things can be made for evaluating baseball players. But I can't help but think the effect will be small.

To reduce Birnbaum's position down, what he thinks is that about half of the dWAR effects at the team level need to be transferred from the fielder to the pitcher. Another way to think about it is that he wants a cap on the amount of Runs Allowed value distributed to the fielders. But this would also have effects on how we value players. A quick-and-dirty method would be to halve the UZR assigned to any player when calculating their WAR, although I suspect Birnbaum would object on the grounds that something true at the team level may not be true at the level of the individual player.