Sunday, May 23, 2010

Understanding the Psychology of Crackpot Stats

To understand how mob psychology can make common wisdom of complete nonsense consider these claims by Voros McCracken in 2001:

"There is little if any difference among major-league pitchers in their ability to prevent hits on balls hit in the field of play."
"The critical thing to understand is that major-league pitchers don't appear to have the ability to prevent hits on balls in play."

In support of that claim, McCracken goes on to site the following about pitchers' Batting Average on Balls In Play ( (H-HR)/(BFP-HR-BB-SO-HB) ):

"The vast majority of pitchers who have pitched significant innings have career rates between .280 and .290."


Of course McCracken's claim caused a sensation in the statistical community. It greatly simplified the troublesome need to consider team defense when evaluating pitching (FIPS anyone?) It all but eliminated the need to consider pitching when evaluating fielding. As a result, the statistical community developed all sorts of new statistics working from his premise and it has been extended to hitters.

Now you would expect some healthy skepticism of that claim. The obvious question is what was the league BABIP in 2000 when McCracken did his "study?" If the vast majority of successful pitchers, those that got a significant number of outs, are all above average in getting hitters out on balls in play you would probably conclude it is unlikely they have no control over it. But the league BABIP was and is around.300. That is 10-20 points higher than the figure sited for the "vast majority" of successful starting pitchers as identified by McCracken.

Now you might think this was overlooked all these years. It wasn't. That .300 average is sited on the Baseball Prospectus site as the "typical" BABIP for pitchers. In fact, that claim is itself inaccurate. The average (mean) for all pitchers is .300, but the typical (median) pitcher's BABIP is actually considerably higher since the best pitchers face more batters than those with higher BABIP. But either way, the best pitchers have the best BABIP, well above an average pitcher.

As others looked at the numbers and raised troublesome question about McCracken's basic premise finding numerous examples that contradicted it, they were explained as "outliers", ground ball pitchers, etc..Many pitchers had career BABIP  far lower than the .280 McCracken claimed and the range of career. BABIP for pitchers goes from as low as .250 up to .350. Very similar to the range in hitter's batting averages. Combined with McCracken's own data showing  the vast majority of successful starting pitchers have above average BABIP, you would think pitchers influence over whether a ball goes for a hit had been proven. Successful pitchers are successful, in part, because they get hitters to put the ball in play in ways that make it easy for their fielders to turn them into outs. That certainly is traditional baseball wisdom. And the actual numbers support it.

Nonetheless, McCracken's basic conclusion has become an urban legend. Like any urban legend, once believed, no amount of facts will cause people to abandon their belief in it. None of us like to admit we were enthusiastically wrong.





18 comments:

David84 said...

But either way, the best pitchers have the best BABIP, well above an average pitcher.

Interesting discussion, particularly about the mean and median BABIP's and whatnot, but the statement I'm quoting is just flatly wrong. To show why, I'll compare ERA to BABIP of the league's top performer in each category. I use ERA because I know you don't believe in FIP or xFIP, but I have to make this caveat: a low BABIP almost inevitably leads to a depressed ERA. Therefore, this is a bit of a Sisyphean endeavor on my part, because a good BABIP will generally equal a good ERA. However, if you evaluate the "best" pitchers by their ERA (which you do, correct?), it is not true that those pitchers to a name have the best BABIP's, nor is it even true generally that they tend to have very good BABIP's. Look at the best pitchers for 2009, comparing their ERA to their BABIP.

2009 (Pitcher, ERA, BABIP)
Zack Greinke, 2.16, .313
Chris Carpenter, 2.24, .272
Tim Lincecum, 2.48, .297
Felix Hernandez, 2.49, .289
Jair Jurrjens, 2.60, .273
Adam Wainwright, 2.63, .309
Roy Halladay, 2.79, .313
Clayton Kershaw, 2.79, .274
Javier Vasquez, 2.87, .297
Matt Cain, 2.89, .268

You can see first that there is no semblance of a pattern showing BABIP increasing as you go down the list. In fact, the leader has the highest BABIP of anyone on the list - tied with Roy Halladay (who is generally considered one of the best pitchers in baseball over the last half decade, at least). One could argue that Greinke and Halladay were the two best pitchers in baseball last year - Halladay getting such consideration for his consistent excellence, and Greinke for his historic season. Those two players have the worst BABIP of anyone in the top 10, and at .313, they are well above the generally accepted average BABIP of .290 - .300 Now, Look at the BABIP leaders and their corresponding ERA:

2009 (Player, BABIP, ERA)
Jarrod Washburn, .257, 3.78
Randy Wolf, .257, 3.23
Ross Ohlendorf, .265, 3.92
Matt Cain, .268, 2.89
JA Happ, .270, 2.93
Bronson Arroyo, .270, 3.84
Ted Lilly, .270, 3.10
Chris Carpenter, .272, 2.24
John Danks, .273, 3.77
Jair Jurrjens, .273, 2.60

As you can see, there is no general pattern of ERA increasing as the BABIP increases. In fact, one of the two leaders has the third highest ERA of anyone on that list. Now, obviously all of these pitchers have stellar ERA's, but that just begs the question that BABIP claims to answer; that is, to the question of, "What accounts for good ERA scores?" BABIP answers, "Oftentimes, though not always, it is due to good luck." When a pitcher outperforms expectations, it's usually due to an inordinate amount of luck, i.e. a low BABIP. Now, take a look at the leaders so far in 2010:

2010 (Player, ERA, BABIP)
Ubaldo Jiminez: .93, .229
Jaime Garcia: 1.47, .278
Roy Halladay: 2.03, .302
Adam Wainwright: 2.05, .259
Josh Johnson: 2.10, .286
Mike Leake: 2.22, .281
Livan Hernandez: 2.22, .242
David Price: 2.29, .248
Matt Cain: 2.36, .228
Mike Pelfrey: 2.39, .279

2010 (Player, BABIP, ERA)
Matt Cain, .228, 2.36
Ubaldo Jiminez, .229, .093
Tim Hudson, .233, 2.44
Jeff Neiman, .240, 2.79
Jamie Moyer, .240, 3.98
Doug Fister, .240, 2.45
Livan Hernandez, .242, 2.22
Jonathan Sanchez, .243, 2.63
Mat Latos, .247, 3.26
David Price, .248, 2.29

As you can see, while not as striking as in 2009, 2010 shows again that there is no pattern demonstrating that the best pitchers have the best ERA, although BABIP leaders do have very good ERA's.

Claiming that a good pitcher will always have a good BABIP is just incorrect. A good pitcher is durable, strikes out at least 3 batters for every walk he issues, and maintains a healthy ground ball rate. Whether or not balls hit into play fall in or sneak through, or are converted into an out is largely out of the control of a pitcher.

TT said...

I am not sure what you are trying to show. But your data doesn't seem to show what you think it does.

I agree, BABIP is not the sole determining factor in how successful a pitcher is. The more outs a pitcher gets by striking people out, the less important it is. I think that is obvious. It has nothing to do with whether a pitcher influences the outcome on balls in play.

But both the median and average BABIP for your sample of ten pitchers with the best era are both considerably better than the major leagues' and the ten pitchers with the best BABIP all have ERA's under 4.00. As you acknowledge, there is clearly a statistical relationship between ERA and BABIP.

The second thing is that I don't think I have ever said you can determine the best pitcher by looking at ERA or any other single statistic. The best ERA's every year usually belong to relievers.

In addition, it is clear one of the misrepresentations by McCracken was based on mistaking the volatility of BABIP as a statistical measure, for lack of meaning. Its really just a sample size issue that disappears when you look at career numbers.

The comment I made about "best pitchers" was not intended to imply that if you look at BABIP for a single season you will identify the pitchers who had the best seasons. You won't. That won't happen even if you look at career BABIP, since there are other factors that help determine a pitcher's success. But if you look at Hall of Fame pitchers, you aren't likely going to find any who consistently gave up more hits on balls than the typical pitcher. And if you do find one, its because they are extreme outliers from what makes the typical pitcher successful.

Its a long ways from saying that how often they get people out on balls in play is the sole to determining factor in the relative success of pitchers and that it doesn't matter at all. In fact, what your data shows is that it is very important.

TT said...

BTW

"use ERA because I know you don't believe in FIP or xFIP,"

BTW, you do understand how ridiculous it would be to determine how much control a pitcher had over balls in play, or how important it was, based on how well it correlated to Fielding Independent Pitching (FIP)?

David84 said...

I am not sure what you are trying to show.

I had a feeling you would say that. That's why I put it in bold:

it is not true that [the best] pitchers to a name have the best BABIP's, nor is it even true generally that they tend to have very good BABIP's.

I felt the need to make this point because you seemed to imply in your post that anyone who believes that pitchers have little to no "ability to prevent hits on balls in play" are simply subscribing to a belief in "complete nonsense" that has become "common wisdom" by virtue of a "mob psychology" in which "no amount of facts" will disprove an "urban legend." That is, you seem to be trashing the basic premise of BABIP - that it is outside the pitcher's control. In fact, you said, "Successful pitchers are successful, in part, because they get hitters to put the ball in play in ways that make it easy for their fielders to turn them into outs." Understanding this to be your argument, I understood your comment that "either way, the best pitchers have the best BABIP, well above an average pitcher," to be support for your argument that BABIP is an "urban legend." Basically, the evidence that good pitchers control the result of batted balls is proved by the fact that the best pitchers have the best BABIP. My point then is that, year in and year out, great pitchers post high BABIP's, meaning there is probably only tepid support for your point.

TT said...

"it is not true that [the best] pitchers to a name have the best BABIP's, nor is it even true generally that they tend to have very good BABIP's."

That is why I am confused. Your own data shows the opposite.Seven of the ten pitchers you listed from last year have an annual BABIP above the league average and probably all 10 are above the league median.

The top ten pitchers in ERA or any other statistic are by definition outliers. You can't draw any conclusion about all pitchers based on that sample. But you especially can't start drawing conclusions about a statistic that is known to be volatile.

"That is, you seem to be trashing the basic premise of BABIP - that it is outside the pitcher's control."

Your own data trashes that idea.

"My point then is that, year in and year out, great pitchers post high BABIP's,"

But, in fact, NONE of the pitchers you listed would be said to have "high" BABIP. At worst, they are slightly above league average.

David84 said...

Seven of the ten pitchers you listed from last year have an annual BABIP above the league average

And you said...

the best pitchers have the best BABIP, well above an average pitcher.

Above the league average is one thing. A lot of mediocre pitchers last year were in the top 10 of BABIP.

As for the stuff about a small sample/top ten/etc., the point of BABIP is to evaluate INDIVIDUAL performance and project INDIVIDUAL performance. If individual pitcher A, like Jarrod Washburn, has an abnormally low (not just low) BABIP, it's safe to say he's pitching over his head. And, as I've been trying to point out, BABIP is not skill-based. All you have to do is look at the top pitchers for any year, and there are plenty w/ average BABIP's.

TT said...

David -

You listed the top ten pitchers in BABIP for a single season last year. They ALL had ERA's under 4.00.

"All you have to do is look at the top pitchers for any year, and there are plenty w/ average BABIP"

Where are they? And if there is no skill, and the results are pure chance, some of them ought to be WAY above average. Your data shows, to the contrary, that they are well below average for the most part.

As I have said repeatedly, annual BABIP is a volatile statistic. A pitchers BABIP will fluctuate from year to year. That doesn't mean pitchers has no control over what happens, any more than a batter going 2 for 4 one day and 0 for 5 the next means he has no control over his hits.

"If individual pitcher A, like Jarrod Washburn, has an abnormally low (not just low) BABIP, it's safe to say he's pitching over his head."

That is actually true for any statistical measure. Regression toward the mean is pretty typical. If your argument is that an INDIVIDUAL pitcher has a higher or lower BABIP than he typically does you can expect it to go closer to his own mean, then sure. But it you expect a pitcher with a career .275 BABIP to do worse next year because he had a .280 BABIP, you are being foolish.

David84 said...

What are you trying to say about BABIP?

TT said...

"What are you trying to say about BABIP?"

That it is clear evidence that pitchers have a lot of control over whether a ball goes for a hit. Which is the opposite of this:

"as I've been trying to point out, BABIP is not skill-based."

You have provided a lot of evidence to the contrary. Why do you think all the "best pitchers" in your own data are above average? Is this Lake Wobegon?

David84 said...

it is clear evidence that pitchers have a lot of control over whether a ball goes for a hit.

As I have said repeatedly, annual BABIP is a volatile statistic. A pitchers BABIP will fluctuate from year to year.

These two statements are entirely contradictory.

TT said...

"These two statements are entirely contradictory."

They aren't even vaguely contradictory. But you and McCracken seem to think they are.

A player's batting average is highly volatile from game to game. Does that prove there is no skill involved? The fact that a number varies widely around its average has nothing to do with anything.

Over the course of a pitchers' careers it is quite clear that some pitchers are much better than others at getting batters out on balls in play. That means there is a skill involved.

What is strange is that you have a data set that shows the pitchers with the best ERA's and they have BABIP above the league average. You have pitchers with the best BABIP and they all have ERA's that are above average. You then conclude this proves the two have nothing to do with one another. I don't see how you arrive at that conclusion from that data.

David84 said...

A player's batting average is highly volatile from game to game. Does that prove there is no skill involved?

It doesn't mean there is no skill involved, but at the time when a player's BA is actually volatile (early in the season), often batting average is not a fair measure of skill. When Matt Tolbert hit .400 a few years ago over a month, that was not a fair indiciation of his skill. If Joe Mauer starts a season batting .150, that is not a fair indication of his skill. You could, in fact, look at their BABIP for some suggestion of why they are unexpectedly excelling or struggling.

When a pitcher has an abnormally high or low BABIP, that is NOT an adequate measure of their skill. Your point about it being so volatile is precisely why BABIP is valuable, but in your drive to discredit every statistic devised to further the understanding of baseball beyond BA and RBI, you completely miss the point. If a pitcher posts a .313 BABIP (like Halladay last year), you can either compare that against a general league average of .290-.300 and reasonably assume that he's probably not going have many more balls fall in, or you could compare it to his career average of .299 and make a similar conclusion. (Interesting that Halladay, one of the best in the league, has a BABIP that is almost precisely "average"!) Either way, it's close enough to both of those averages to conclude that Halladay was not abnormally lucky or unlucky. But, if at the beginning of the year Halladay has a .400 BABIP, you can conclude that this is very high and expect it to regress, based both on his career average and the generally accepted league average. There's absolutely nothing "crackpot" about the concept of BABIP.

TT said...

"if at the beginning of the year Halladay has a .400 BABIP, you can conclude that this is very high and expect it to regress, based both on his career average and the generally accepted league average."

That is true of any statistic. Halladay's era last year was his career best, considerably better than his norm. You would expect it to regress toward his mean this year.

"There's absolutely nothing "crackpot" about the concept of BABIP."

I suggest you re-read the article. Because I clearly didn't call BABIP crackpot. I used BABIP to support my position several times throughout the article. In fact, it was about the only statistic I used.

What is crackpot is the conclusion that pitchers have no influence over how often balls in play go for hits. Any realistic analysis of your data shows that claim isn't true. You might as well claim hitters have no control over whether the balls they put in play go for hits.

David84 said...

I suggest you re-read the article. Because I clearly didn't call BABIP crackpot. I used BABIP to support my position several times throughout the article. In fact, it was about the only statistic I used.

An article entitled "understanding the psychology of crackpot stats" followed by a lengthy discussion in which BABIP "was about the only statistic [you] used" is an article that is not calling BABIP a crackpot stat. Got it.

TT said...

Like I said David, try reading the article.

Then try reading Voros original article, which goes on to develop a statistical measure of pitching based on his crackpot conclusion.

Statistics are just numbers, its how they are used and the conclusions drawn from them that makes them crackpot.

Clarence Thomas said...

Perhaps you should go back an re-title the articl to "Understanding the psychology of crackpot interpretation of statistics" and then actually back it up with what you say. All I see is an attempted slam on BABIP without much supporting evidence, but hey, to each their own I guess.

TT said...

This is from the lead sentence Clarence.

"mob psychology can make common wisdom of complete nonsense"

I think your response just reinforces that point. I might as well try to convince the people who believe it that Obama isn't a Muslim born in Nigeria.

TT said...

One last note on Roy Halladay. Here are his BABIP since his breakout year in 2002:

.288
.284
.308
.263
.278
.302
.285
.306
.290

His BABIP since 2002 is .290.

His BABIP his first four years while facing 1500+ batters was .308. A level he reached only once annually in the next 10 years.

Of course, that doesn't account for all of Halladay's improvement. But his BABIP improved when he matured as major league pitcher, right along with all his other results.

There is simply no evidence that hits on balls in play are any less a reflection of pitching skill than strikeouts, walks or home runs.

MLB Twins Updates