De civitate sabermetricarum: Sabermetrics

Showing posts with label Sabermetrics. Show all posts

Saturday, 24 October 2015

Palmer on DiPS

Pete Palmer was interviewed in April of this year, and was asked about Voros McCracken's Defence-Independent Pitching Statistics. Palmer is arguably the most important sabermetrician OF ALL TIME. Certainly his only rival is going to be Bill James, so reading Palmer's comments on DiPS theory, which James himself regarded as important, makes an interesting comparison and contrast.

James referred to McCracken's work in the New Bill James Historical Baseball Abstract (p 885):

3. This knowledge is significant, very useful. 4. I feel stupid for not having realised this thirty years ago.

Palmer, however, has a very different take on the matter.

I didn’t have a lot of faith in [DiPS]....[McCracken] said there wasn’t a great amount of correlation from season to season. But as I said, the variations due to chance and everything in sports, baseball in particular, is a lot higher than people think. Your average could drop 60 points from one year to the next, and it’s not really statistically significant because 500 at-bats isn’t that many at-bats to verify what your current batting average should be.

Whether this opinion is rooted in statistical analysis or not, it does conform somewhat with the analysis provided in "Solving DiPS", a compilation of an on-line discussion which you can find a copy of here. One key solution in "Solving DiPS" is that, given 700 Balls in Play, some 44 per cent of the outcomes are a consequence of random variance, the single largest factor. (Pitchers were assigned 28 per cent, fielding 17 per cent. Hold that thought for a moment.)

I have seen it suggested that Palmer does not understand DiPS, which has become a tool for projecting a pitcher's future. But from the perspective of evaluating a pitcher's season, Palmer's lack of "faith" makes more sense. BABIP's variance is irrelevant, because it is in the nature of the game. What is important is to convert extra-base hits into singles, and singles into outs.

When you don’t look at walks and strikeouts and home runs, you’re actually minimizing a difference between a good pitcher and a bad pitcher. And therefore, the gap in that category is going to be artificially low because some of the factors that would make it higher are not counted.

In other words, we shouldn't be surprised that pitchers appear to have limited or no control over the outcomes of balls in play. That has never been where the difference has been visible in the small sample size of a single season.

Finally, to return to those percentages from "Solving DiPS", what might be surprising from the traditional reception of DiPS is that pitchers have more control over the outcome than fielders. So, again, perhaps we should be a bit more sceptical, like Pete Palmer, of those making grandiose claims for DiPS. Insofar as anything has control over the outcome of the batted ball, it is the pitcher. Random variance in its nature is uncontrolled.

Thursday, 22 October 2015

Oh, the Humanities

As I tweeted a week or so ago, this was a good season for the part of me that is a Tigers' fan to miss. I have been dealing with a return of my wife's cancer (the outlook is not great but, as the last lines of the original theatrical release of Blade Runner go, "I didn't know how long we had together... Who does?"), in addition to moving house (and changing countries). However, I accumulated a few bookmarks and other ideas to work through, especially now we can only watch other teams in the post-season.

While I was busy, a very important blog post was made back in May. Phil Birnbaum, who is nothing if not insightful in writing about sabermetrics, announced that dWAR, a measure of fielding value, seemed to him to have a significant problem. Birnbaum proposed that dWAR inherently overvalued fielding. Birnbaum's argument is rooted in mathematical accuracy, so I don't feel confident trying to explain it. If you haven't read the post already, you should go to his blog to read how he explains it.

However, his explanation boils down to three key points, if we focus on the effects:

a) the runs allocated to the fielders under dWAR are too high, by an order of around fifty percent. (So a team dWAR of -40 is actually more like -20

b) The cause of this is that when one assumes "certain balls in play are the same" (as one has to do with older baseball statistics) then the math sends all the credit to the fielders.

"Observations are a combination of talent and luck. If you want to divide the observed balls in play into observed pitching and observed fielding, you're also going to have to divide the luck properly."

Here, I think, we run into the problem of "All things being equal", or the distinction that the philosopher of history R.G. Collingwood made between meteorology and chemistry. It is an essential fact of human life that all things are NOT equal. People working in meteorology can collect observations of events, but cannot reproduce them at will, unlike people working in chemistry. By contrast, the historian can observe events, but they cannot create political or social crises at will, nor send qualified observers back into the past in order to collect the information needed to understand those events in the way scientists might send an expedition to view an eclipse or collect specimens. In scoring a baseball game, at best a sabermetrician can be a weatherman.

One can take issue with the statement "the assassination of Archduke Franz Ferdinand of Austria on 28 June 1914 triggered the First World War" as one of causality, but without doubt the shooting set off a diplomatic crisis that led to the war. More importantly, luck played a crucial role in the event because the Archduke's car came to a complete halt very close to where the "Yugoslav nationalist" Gavrilo Princip, had stationed himself. An earlier attempt to kill the Archduke in a moving car had failed. We have no idea whether Princip could have been successful if his targets had been in a moving car. So, what percentage of responsibility to the war do we assign to Princip, to the driver, to the governor of Bosnia at whose orders the driver stopped, to the Serbian officers who conspired to arm Princip, to the Archduke or to the general diplomatic situation? And any formula that did allocate "responsibility shares" to these people would be essentially an act of faith.

Birnbaum went on to add some further details to his understanding in a threat on the blog of Tom Tango, the tremendously influential pseudonymous saberist. In the comments section of Tango's thread on the post by Birnbaum, Birnbaum suggested in one reply that it was just not possible for a system like Defensive Runs Saved or Ultimate Zone Rating to make distinctions about balls in play that could tell us something about the skill of the fielder.But before that he stated that he wanted to assign the luck to the pitcher. However, reading the comments there is to venture into a world where something like the Responsibility Shares is thought to be possible. Possibly, with enough computing power, such things can be made for evaluating baseball players. But I can't help but think the effect will be small.

To reduce Birnbaum's position down, what he thinks is that about half of the dWAR effects at the team level need to be transferred from the fielder to the pitcher. Another way to think about it is that he wants a cap on the amount of Runs Allowed value distributed to the fielders. But this would also have effects on how we value players. A quick-and-dirty method would be to halve the UZR assigned to any player when calculating their WAR, although I suspect Birnbaum would object on the grounds that something true at the team level may not be true at the level of the individual player.

Monday, 26 May 2014

NERD Fight

In 2010 Fangraphs' Carson Cistulli, in response to a throwaway line on ESPN by Rob Neyer discussing Cistulli's own Fangraphs post on 'Why We Watch' baseball, developed a means of capturing the appeal of a given baseball game between major-league teams to a number on a 1 to 10 scale. Cistulli christened this 'NERD'.

I didn't find NERD myself until a couple of years ago, and I have used it from time to time to help me choose what baseball game to follow. Having studied its components, I came to realise that what I find 'watchable' about a baseball game is not at all the same things Cistulli enjoys. For me, a baseball game's enjoyment depends on the following:

a) Baserunners who score. Getting men on base who don't score is the sign of a mediocre offense and a lot of frustration for their fans. Solo home runs put the 'i' in team.

b) Batters who look for contact. Nothing is more dull than watching a succession of batters standing at the plate looking for their pitch, and winding up either called out on strikes or taking a walk. Give me eight or nine Vladimir Guerreros in my lineup any day.

c) Exciting fielding plays. Grabbing a ball at the edge of one's fielding zone either starts with an exciting run towards where the ball is going to land or ends with a bang-bang play on a throw to the base.

d) A bullpen that is likely to keep it close, even if that means not adding an eighth run. I don't want to see relievers giving up more and more runs if it just isn't the starter's day.

e) A starter who works fast and misses bats in the zone ensures steady action in the game.

f) However, a starter who induces swings at pitches out of the zone probably has a lot of deceptive movement, which is a joy for pitching aficionados.

And that's it. Everything else should be secondary to these elements. I don't care to put more emphasis on seeing younger, cheaper teams in preference to older, more expensive ones. There's a good chance that the latter have more star players building Hall of Fame cases. I'm not interested in whether the commentators are exceptionally good or exceptionally bad, because I've always been able to tune the bad ones out. I find home runs boring. Much better to watch two doubles than one dinger.

So, here's version 1 of my formula:

Team Score, step-by-step

Add together the following components.

1) Subtract a team's home runs from hits and runs. Divide remaining hits by remaining runs. Calculate a Z score. Multiply that times 4.

2) Find the hitters' Pitch F/X Swing percentage. Multiply that times a notional 100 pitches. (Eg, a 50% swing percentage would give you 50 pitches swung at.) Multiply that times the Pitch F/X Contact percentage. (Eg, a 50% contact percentage would give you 25 pitches actually struck.) Calculate another Z score and multiply that times 4.

3) Get the bullpen xFIPs for all teams. Calculate another Z score. Multiply that first by -1 so that the negative numbers become positive and vice versa, and then multiply that times 2.

4) Find the OOZ plays for each time in Revised Zone Rating on Fangraphs. Calculate a fourth Z score.

The sum of these four components will give a raw Team Score. One then needs to adjust it to scale from 1 to 30 by adding a constant. Take the lowest score and adjust it to equal 1. Adjust all other scores by the same amount. One has now arrived at the final Team Score.

Starter Score:

First, get the following scores (I use Fangraphs): the speed at which starters work (Pace at Fangraphs), their Pitch F/X O-swing, Z-Swing and Z-Contact percentages.

1) Calculate a Z-score for Pace.

2) For the Z-Swing and Z-Contact, use the '100 pitches' method used in calculating Team Score (2) above, first calculating the number of Z-Swings out of 100 and then the number of Z-Contact out of that. Calculate a Z-Score for that, but multiply it by minus 1 so that the pitchers who miss bats have positive numbers, and those whose pitches are hit have negative numbers.

3) Do the same for the O-Swing percentage, but don't multiply your Z-score by minus 1.

Add all these up to arrive at a raw Starter Score. One needs to adjust these as well, but this requires a little more art than was required by the Team Score. The objective here is again to avoid negative scores, but one also has to allow for the minimum number of innings one thinks is necessary to rate a pitcher. At the moment, I use 14, because that's the minimum needed to include Robbie Ray in the list. Since Robbie Ray's raw score is -3.5, I add 4.5 to everybody (even those with fewer than 14 innings pitched). If I chose, instead, to adopt a minimum of 20 innings pitched, the constant would become 4.1, and Robbie Ray would have a negative score.

Game Score:

For each game, add together the two Team Scores and divide by 4. Then, add together the Starter Scores and divide by 2, thus giving the Starters more weight than the Teams. Finally, divide that sum by 2. At theoretical extremes, you might get scores higher than ten or lower than one, but these can be capped at ten or one respectively. Everything else will scale out between 10 and 1.

I have only played around with this system a couple of days, but so far I like the results. However, it is somewhat time consuming, and unless someone starts paying me to supply the data, I'll only do it when I have time and am in the mood. Here are some scores for today's games:

NY Yankees vs St Louis Cardinals 8 (TOP GAME)

Colorado Rockies vs Philadelphia Phillies 7 (MLB.tv FREE GAME)

Detroit Tigers vs Oakland Athletics 7

San Diego Padres vs Arizona Diamondbacks 6

Miami Marlins vs Washington Nationals 4

Wednesday, 19 February 2014

Jeffrey Loria, Sabermetric Hero?

Last year I finally got round to becoming a big consumer of baseball podcasts. One of my favourites was The Baseball Show with Rany and Joe, sponsored by the Baseball Reference Play Index. This past winter, Rany Jazayerli decided to hang up his microphone, and Joe Sheehan declined to continue the podcast either alone or with a new partner. That's just my lot in life, always a day late and a dollar short.

The reason I liked it so much is the same reason I tend to read op-ed columns I disagree vehemently with. The 'contentious statement' quotient was high, and stimulated a great deal of thought and research. In this context, it is worth noting that Baseball Prospectus' Effectively Wild podcast has inherited this sponsorship. The reason is that a recent podcast presented thoughts about the 2014 fortunes of the Miami Marlins. Nothing was liable to set off both Rany and Joe on flights of self-righteous dudgeon than mentioning the name of Jeffrey Loria. Indeed, when listening to today's sabermetricians one seems to think of Mr Loria as occupying the frozen centre of Hell, with three faces, bat's wings and gnawing on the souls of Judas Iscariot, Brutus and Cassius. Or, if that is too august a position to assign to him, that Lucifer would have sprouted a fourth face just for Jeffrey.

One might think that an Expos' fan would share such sentiments, but in this case one would be wrong. I do not think Mr Loria served the franchise in Montréal well; but I do think he served himself well, and that by the standards of today's sabermetrics he should not be consigned to the centre of anything except perhaps some kind of sabermetric Hall of Fame.

Like a lot of Baseball Prospectus' content in this year's annual, the essay (which I have not yet read) was not actually written by a fan of the team. Instead, a Mets' fan, David Roth, a staff writer at SB Nation and co-founder and editor of The Classical, took the job. Apparently the essay has Latin in it, which given the title of this blog must come as a recommendation. He appears to characterise the Marlins' front office as a 'doof junta', and the organisation itself as a piece of 'towering crude pop art'. 'Do [the Marlins] have any positive role in the business of baseball?' asks Sam Miller, one of the podcast's co-hosts. Roth responds that it is not just the Age of Loria that has established this reputation, but that Wayne Huizenga also had his part to play. We'll come back to that towards the end.

That opening question announces that this podcast will be an assault on Mr Loria's management of the franchise. This becomes plain later on when Miller asks,'Is he a bad dude?' Roth replies: 'He's a bad dude....Loria meddles. He cares. But he cares about all the wrong stuff, and he cares about it way too much.' Oh, he cares too much about stuff that Roth thinks is the wrong stuff. This just typifies how what Roth, Miller and Miller's co-host Ben Lindbergh actually say completely undermines the image they think they are constructing.

Such problematic discussion riddles the podcast. The image painted of the Marlins' ownership and administration is of a deceitful group. Some of this derives from the fire-sale trades that followed the unprofitable and underachieving 2012 team. 'I can't think of them as a team that is willing to deal in good faith,' says Roth, after describing how the Marlins will not give no-trade clauses to players, as opposed perhaps to giving them and then having to enter into negotiations about setting aside the clause when the opportunity to make a deal arises. The Marlins seem to take a fairly straightforward position here. If a no-trade clause is important to a player, he should tell his agent not to return calls from the Marlins.

The Marlins make a huge profit, according to Roth. He does not substantiate the opinion. It could well be the case, as before making a $7 million loss in 2012 according to Forbes, the Marlins averaged a profit of $42 million between 2006 and 2009. (BTW, it's worth noting that the team had to pay $155 million for the stadium, which is a little bit under the $168 million of profits made between 2006 and 2009.) Since then their profits gradually declined until a big jump in payroll to coincide with the opening of Marlins Park led to the loss. Chances are that, thanks to revenue sharing, the Marlins made around $40-50 million again. So let's grant Roth that point. Roth goes on to say, 'Loria's certainly a better businessperson than the Wilpons.' So he hasn't spent the money yet because the players the Marlins have at the moment are probably not ready to contend for a pennant. And this makes him a bad dude?

Well, Lindbergh seems to think that the obligation to make money is an onerous task to befall a baseball organisation, when he says 'You would think the typical GM would want to think of himself as something more than an enabler of the owner, or as someone who makes the owner more money.' Miller chimes in, '[Loria] goes behind his GM's back and signs Greg Dobbs...which feels like a weird way to meddle. He does this tear-down...this poorly received tear-down....And when he tore down, he tore down really incredibly effective immediately in building the farm system up. And he does seem to have some knack for bringing in free agents when he gets personally involved in wooing them....From a competitive perspective, is he better or worse than the median GM?' In answering, Roth has this to say about the Marlins under [Larry] Beinfest (and Mr Loria, if Miller's question is to be accepted in the form Miller posed it): 'They weren't cheap in the draft....They paid above slot. They did like all the things a team that was trying to draft well would do....Beinfest had free rein in this very limited very unnecessarily overactive way. I think he did more than a lot of other GMs would do.' So all the good things are to Beinfest's credit, while the bad are Mr Loria's fault?

And what about the fans? Lindbergh: 'I wonder to what extent it spoils the fan experience? Often the redeeming aspect of rooting for a bad team is that you can anticipate something good a few years down the road...you can imagine your team... building this core... and having a perennial contender. Whereas if you're the Marlins fan based on past precedent the most you can really hope for is that the strong farm system produces a bunch of guys all at once and that you can hang on to them for a year or two before you have to start talking about trading them.' Well, wait a minute. Those chaps traded in fire sales in 1998 and in 2012 were mostly free-agent signings or acquired in trade: Bonifacio, Buck, Buehrle, Infante and Reyes in 2012; or Alou, Bonilla, Brown, Leiter, Nen and Sheffield in 1998. I'll give Lindbergh a pass on Hanley Ramirez and Anibal Sanchez, who were acquired by the Marlins when they were still down on the farm.

The problem for Baseball Prospectus commenting in this way on the Marlins is tucked away in Baseball Between the Numbers, pp 306-25, 'Is Wayne Huizenga a Genius?', by Jonah Keri. Keri has this to say:

With apologies to sentimental types, the twin goals of a baseball team are: (a) to seize the opportunity when it arises and win the World Series and (b) to make money. Huizenga recognized that winners and losers are often separated by mere inches, the bat of an eyelash. He also saw a moneymaking opportunity, with a winning team likely to rake in the bucks and a dismantled winner likely to make more....Mensa has reserved a spot for Wayne Huizenga a tthe head of the table.

Connie Mack, beloved figure of bygone years said much the same thing about the relationship between profits and winning.

What we have discussed here is the 'success cycle', a not entirely accepted concept of newfangled sabermetrics. Among them, none of Roth, Miller nor Lindbergh really has an exact handle on the Marlins' application of this principle. Roth makes an important error when he says, 'Both of those World Series teams they burned down the next year.' Not true. The 2003 winner was not actually burned down until the Delgado signing during the 2004/5 offseason failed to carry them back into the playoffs. The tear-down came in 2005/6, when Lowell and Beckett were dealt to the Red Sox, Delgado to the Mets and Castillo to the Twins. This slip-up basically undermines his argument when he says in another place, 'They have this model that worked twice. I wonder if that entrenched that this is an OK way to do things.' This isn't quite right, because he misses the 2009 team, which was a further run at playoff contention. The missing link here is the lack of a big-name free agent signing. In 2010, the Marlins faded to .500, Fredi Gonzalez was fired in mid-season and the second-place finish of 2009 possibly seemed flukish. To me, the firing of Gonzalez, is a sign that Mr Loria expected a repeat of 2009. Mr Loria, remember, cares.

Nothing illustrates the difficulty of taking seriously the analysis by the 'experts' in this podcast more than an incident near the end. There is then about a minute-and-a-half of mocking of the veteran free agent signings by the Marlins this off-season, with Sam and Ben giggling like a couple of schoolboys over whatever might amuse schoolboys. (The names of Casey McGehee and Brian Bogusevic excite especial mirth.) This leads to a bit of cruel mockery of old baseball fans feeling at home in their dotage watching familiar names of yesteryear. I think I'm missing some kind of in-joke here. Earlier on, Roth characterises the experience of the Marlins' front office in a mental picture of Jeffrey Loria driving up in a golf cart and calling you an arsehole before driving off again. By the time he puts Sam and Ben in stitches, one is inclined to wonder what the difference is between that and calling people a 'doof junta'.

This basically was an excuse once again for the sabermetric chattering classes to throw the contents of their chamber-pots over Mr Loria, an unpopular owner because he is a hard-nosed businessman who is committed to another of those sabermetric themes, that baseball is a business. Mr Loria exploits a system that benefits him, but unlike the other 'welfare queens' (another of those negative images trotted out to attack Mr Loria), the evidence suggests that Mr Loria actually wants to win the World Series. If you want to attack someone for being a stingy owner only interested in pocketing revenue sharing and publicly financed ballparks, I would suggest there are better targets than him. While his franchise may have eschewed the single-minded pursuit of OBP (Marlins' hitting prospects have tended to be sluggers), and not bought into the notion TINSTAAPP (there is no such thing as a pitching prospect), in other respects it is a model of sabermetric strategy:
1. draft and develop prospects
2. buy or trade for talent at the right time
3. PROFIT!

Tuesday, 13 April 2010

Dodgy Data?

Colin Wyers' has written an interesting article at BaseballProspectus.com about differences in baseball batted-ball data depending on who is assembling it. Since I never regarded batted-ball data as being anything better than an approximation of longitude while at sea without a chronometer, I'm not undergoing quite the crisis of faith that Wyers is. Nonetheless, it is worth bearing in mind when reading about sabermetrics based on batted-ball data.

Sunday, 21 June 2009

Primer Cross Post: The Chaos Theory of Sabermetrics

I've decided to cross-post some of my contributions at Primer to this blog. That way I can keep better track of some of my important reflections.

Brian Joseph, who may have been involved with Baseball Prospectus Idol (which I didn't follow, I think the whole idea of 'Idol' is stupid), made a stab at attacking sabermetrics here. It's a pretty poor effort, to be brutally honest; but I think I see where he's going with it. He wants more granularity in sabermetrics. A Primer discussion broke out here.

The most misguided point of Joseph's argument is here:

The notion that sabermetrics is truly objective is silly when there are a number of ways to “objectively” look at a situation statistically depending on your subjectiveness toward the game.

This statement is, I believe, based on a misunderstanding of what it is to be objective. And all the rest of the article's problems arise from here. I suspect that if 'objective' was replaced with 'scientific', the author would not have misunderstood. 'Scientific' refers to a method, nothing more, so history can be scientific. Sabermetrics sometimes is not purely scientific. (Think of James's 'subjective factor' in the New Historical Abstract.) But that's rare.

Joseph then wanders into various specific examples, which unfortunately don't clarify the matter. One problem is that 'neo-sabermetrics', to borrow a term from Don Malcolm, is concerned with evaluating True Talent Level. Joseph is arguing that on a day-to-day level, True Talent Level doesn't actually explain very much. Well, anyone who thought about the matter probably knew that already. But True Talent Level isn't the only way to use use sabermetric studies.

It's always worth reminding ourselves that Bill James didn't start from wanting to know how good players would be, but rather how good they had been. Malcolm and some other members of the Big Bad Annual (BBBA) crowd, which included Primer's own Jim Furtado, were sort of feeling around the theoretical foundation that the game, not the season, is the cornerstone of performance analysis. Then BPro's great success and certain unprofessional characteristics of BBBA strangled that initiative, not quite at birth, but in late childhood. However, many of those basic concepts are still out there. James himself gave us the Game Score for pitchers, but I don't find that helpful. I don't want a number in that way, I prefer the categories of the Quality Matrix. The same with the idea of Leverage for bullpens. Leverage, and the related Win Expectancy, can tell us everything we need to know about what succeeded in a victory or what failed in a loss. Start totting that data up in columns and there's a handy explanation of a team's strengths and weaknesses.

Pecota, Zips, Chone and Marcel are great tools, but they are literally only half the picture.

About fra paolo

fra paolo is a SABR member and occasional participant at what used to be known as "Baseball Primer".

He grew up in Detroit, going to Tiger Stadium, and cheering on the team through 1972. But the introduction of the designated hitter rule was too much for his eleven year old's traditionalism, and he began wandering in a wilderness of rooting for National League teams. For a time he was a fan of the Montréal Expos, because they were the closest team to London, England, where he lived from 1982 until 2008. With the Expos' move to Washington, he returned to his Tiger roots.

For now he lives in Florida, from where he keeps an eye on the Detroit Tigers and other baseballing matters.

(The title alludes to St Augustine of Hippo's De Civitate Dei contra Paganos, in case theology isn't on your curriculum vitae.)

Follow me on Twitter @frapaolotweets