Search the web
Sign In
New User? Sign Up
baseball-databank · Baseball Databank
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Philosophical question about statistics from Retrosheet   Message List  
Reply | Forward Message #3050 of 3880 |
Re: Philosophical question about statistics from Retrosheet

"dsreyn" (Doug) wrote:

> I've been meaning to bring this up for a while, and Cliff's recent
> post reminded me again. My question is, how much value is being
> placed on data compiled through Retrosheet? I certainly don't mean
> this as a criticism of Retrosheet in any way; it's just that I
> don't think the overturning of official statistics (particularly
> from recent seasons) should be taken lightly.
>
> My primary concern is relatively subtle errors in play by play
> data. As an example, I found a problem not long ago (see post
> 3022) with a caught stealing that both Retrosheet and BDB had
> credited to Terry Pendleton in the 1991 World Series. Other
> sources I checked did not show a caught stealing. It turned out
> that the Retrosheet play by play data was (apparently) in error.

There is an important distinction to be made between PBP associated
with a post-season or all-star game (for which no official dailies
exist) and the PBP for regular-season games. Both Terry Pendleton's
CS as well as post 2705 dealt with post-season games. Since proofing
these games are difficult, problems of the sort Doug mentions might
arise in these games. Regular season games, however, are rigourously
proofed against official statistics and dailies so no unintentional
mistakes of the type he mentions should be there. For example, if we
had a sacrifice fly recorded as a normal fly-out in a regular season
game, this would show up as a discrepancy in several places (batter
and pitcher at-bats and sacrifice-flies) and would be resolved prior
to season being released by Retrosheet.

This is not to say that discrepancies will not still exist in the
released season. It is my feeling that Doug's primary concern should
not be with the relatively subtle errors in Retrosheet's data, but
rather with the not-so-subtle errors in the official statistics. For
example, when proofing the 1921 and 1922 NL official dailies, I
uncovered 241 errors in the 1921 dailies and 240 in the 1922 dailies.
Now these were not cases where Retrosheet's data disagreed with the
official accounts (for many of these game we don't have PBP to compare
against them); rather, these were cases where the official data
disagreed with itself. In other words, one team's batters' hits did
not equal the other team's pitchers' hits allowed; a team's batting
totals did not equal the sum of its batters' totals, and so on.

Some of these errors can be quite large. For example, Virgil Cheeves
is officially credited with allowing 18 hits against the Phillies in
the second game of the 7-29-1922 DH. He actually allowed only 8.
Instead of receiving credit for striking out 4 Brave batters in the
first game of the 6-17-1922 DH, Vic Aldridge was charged with hitting
4 batters. In the 11-0 loss to the Pirates on 9-5-1922, Clyde Barfoot
allowed 8 runs, 1 sacrifice hit, 3 walks and hit no one; these
statistics were entered into the wrong column in the dailies, so
officially he allowed 11 runs (his hits allowed total repeated), 8
sacrifices (probably an erroneous single game record), 1 walk and 3
hit batters. And so on. Often smaller errors are significant because
they involve famous players. In 1921, Grover Cleveland Alexander was
credited with 2/3 of an inning and one complete game too much and 5
strikeouts too few. In 1922, his IP total is 1 2/3 inning too high.
Luis Aparicio, who holds the MLB record for most games played by a
shortstop, was credited with 154 games played at SS in 1968; he
actually played in 156.

All of these problems don't even involve Retrosheet's accounts. They
are simply internal errors in the dailies themselves (or in the case
of Luis Aparicio, an error adding up or counting the daily lines).
When you start trying to reconcile our PBP accounts with the official
results, you typically run into hundreds and hundreds differences in
the older seasons (prior to 1970 or so). In proofing 430 (or so)
games from the 1922 NL, I ran into nearly a thousand cases where it
appears from all accounts that the official statistics are in error.

Now, I am not suggesting that BDB generate their regular season
statistics from Retrosheet files. Even at Retrosheet, we display
"official" statistics on our player and team pages. Just be aware
that there are literally tens of thousands of errors in baseball's
official statistical record. Given the difficulty in fixing even a
single official mistake (see Hack Wilson's 1930 RBI total), it is not
surprising that there is not a lot of enthusiasm for "fixing" the
thousands of errors we already know about or suspect.

Retrosheet is currently in the middle of an effort to give greater
exposure to these discrepancies. In the next release of the web-site,
we hope to present lists of statistical discrepancies. Eventually, we
would like visitors to the site to be able to click on any disputed
statistic and be taken to a page describing the nature of the dispute.

Tom Ruane






Sat May 13, 2006 6:21 pm

tjruane
Offline Offline
Send Email Send Email

Forward
Message #3050 of 3880 |
Expand Messages Author Sort by Date

I've been meaning to bring this up for a while, and Cliff's recent post reminded me again. My question is, how much value is being placed on data compiled...
dsreyn
Offline Send Email
May 12, 2006
2:56 pm

... There is an important distinction to be made between PBP associated with a post-season or all-star game (for which no official dailies exist) and the PBP...
tjruane
Offline Send Email
May 13, 2006
6:27 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help