Friends:
Dave, Jamie and I have spent considerable time — especially Dave — the last two days examining our stats collection and computing system. We have even corresponded with Baseball Reference, who gave us a helpful but incomplete response. Here are our most important findings:
- MLB routinely makes errors in its initial daily stats reports used by Baseball Reference, our source for daily stats. We always knew errors were possible, but we have discovered they are considerably more frequent than we had ever suspected.
- BR doesn’t catch those errors in the data they supply us. Apparently they do not correct errors themselves. Instead, MLB will catch and correct errors, and send those corrections along in separate reports to BR.
- We do not necessarily get those corrections under our current system. We just take each new daily report and lay it on top of previous days to get our running totals each month.
- The errors are generally small, and definitely random — at least as to hitters. And there may be a tendency for GIDP to go under-reported. Their effect on the standings is small, but possibly could come into play in very close races.
- Under our old Baseball Prospectus system these errors were occurring, but since we were using month-to-date stats, we would get the corrections, generally without noticing anything. We were still vulnerable to any errors on the last day of the month, but I don’t recall ever noticing one.
- Under our current system we are vulnerable to every day’s errors, and have been since we adopted it.
The upshot is this: We have always had errors in our data. We do not recommend going back to the BP system because we had to do wonky, highly unrealistic things to be able to modify our allocations mid-month — and they often had retroactive effect. The current system delivers more realistic results, even though we are all vulnerable to random errors.
We do not believe there is any way to avoid the daily errors. We are still working on whether we can, on the last day of the month, run an extra error-correction “day” to undo all the month’s accumulated errors. We want to avoid having to hand-check all 300+ EFL players for errors each month. That would be several hours of work. Even if we can avoid that problem, it won’t be perfect because the errors might have occurred under different player allocations than what prevails at the end of the month. And any errors in the last day of the month’s stats would be uncorrected (which has always been the case, it turns out, since the beginning of the league).
Dave did the most to discover this. Using the Pears as guinea pigs, he checked every player’s accumulated August stats, according to BR and BP, against our EFL totals. The EFL totals represent the sum of each daily BR report, without error correction. The monthly accumulated stats include any error corrections from MLB. Here is what he found for Peshastin’s hitters this month, as of yesterday:
Robles add 1 GDP 8/2
Soto add 1 hit, 1 BB, 1 IBB, 1 GDP, 1 R
Mitch Haniger add 1 GDP 8/5 or 8/10
Yoan Moncada add 1 PA, 1 AB, 1 R, 2 H, 1 RBI, 3 SO
Willi Castro subtract 4 PA, 4 AB, 2 H, 1 CS
Jo Adell add 1 PA, 1 AB, 2 R, 1 3B, 4 RBI, 2 SO, 1 GDP
Jazz Chisholm subtract 1 PA, 1 AB, 1 SO, add 1 SB, 1 GDP
Mountcastle add 1 AB, 2 H, 2 HR, 2 RBI, 1 BB, 2 SO, 1 GDP
Zimmerman add 4 PA, 4 AB, 1 R, 1 H, 1 HR, 2 RBI, 2 SO, 1 GDP
Schrock subtract 1 PA, 1 AB, add 1 R
EFL | ||||||
TEAM | WINS | LOSSES | PCT. | GB | RS | RA |
Old Detroit Wolverines | 89 | 39 | .693 | — | 734.3 | 489.3 |
Flint Hill Tornadoes | 83 | 45 | .649 | 5.5 | 676.3 | 495.9 |
D.C. Balk | 80 | 47 | .634 | 7.6 | 730.2 | 555.1 |
Kaline Drive | 79 | 49 | .615 | 9.9 | 691.5 | 546.0 |
Peshastin Pears | 78 | 50 | .612 | 10.3 | 631.8 | 507.4 |
Cottage Cheese | 73 | 56 | .566 | 16.1 | 728.8 | 655.7 |
Haviland Dragons | 71 | 57 | .553 | 17.8 | 664.3 | 617.4 |
Canberra Kangaroos | 70 | 57 | .551 | 18.2 | 655.4 | 603.2 |
Pittsburgh Alleghenys | 70 | 60 | .536 | 19.9 | 654.3 | 607.1 |
Bellingham Cascades | 67 | 63 | .518 | 22.3 | 560.2 | 540.8 |
Portland Rosebuds | 61 | 67 | .475 | 27.9 | 672.6 | 713.3 |
AL East | ||||
TEAM | WINS | LOSSES | PCT. | GB |
Old Detroit Wolverines | 89 | 39 | .693 | — |
Flint Hill Tornadoes | 83 | 45 | .649 | 5.5 |
Tampa Bay Rays | 80 | 48 | .625 | 8.6 |
New York Yankees | 76 | 52 | .594 | 12.6 |
Boston Red Sox | 74 | 56 | .569 | 15.6 |
Toronto Blue Jays | 66 | 61 | .520 | 22.1 |
Baltimore Orioles | 40 | 87 | .315 | 48.1 |
NL East | ||||
TEAM | WINS | LOSSES | PCT. | GB |
D.C. Balk | 80 | 47 | .634 | — |
Canberra Kangaroos | 70 | 57 | .551 | 10.5 |
Atlanta Braves | 69 | 58 | .543 | 11.5 |
Philadelphia Phillies | 64 | 64 | .500 | 17 |
New York Mets | 61 | 67 | .477 | 20 |
Washington Nationals | 55 | 72 | .433 | 25.5 |
Miami Marlins | 53 | 76 | .411 | 28.5 |
AL Central | ||||
TEAM | WINS | LOSSES | PCT. | GB |
Chicago White Sox | 75 | 55 | .577 | — |
Pittsburgh Alleghenys | 70 | 60 | .536 | 5.3 |
Bellingham Cascades | 67 | 63 | .518 | 7.7 |
Cleveland Indians | 63 | 63 | .500 | 10 |
Detroit Tigers | 62 | 67 | .481 | 12.5 |
Kansas City Royals | 58 | 70 | .453 | 16 |
Minnesota Twins | 56 | 72 | .438 | 18 |
NL Central | ||||
TEAM | WINS | LOSSES | PCT. | GB |
Milwaukee Brewers | 78 | 51 | .605 | — |
Cottage Cheese | 73 | 56 | .566 | 5 |
Cincinnati Reds | 71 | 59 | .546 | 7.5 |
St. Louis Cardinals | 65 | 62 | .512 | 12 |
Chicago Cubs | 56 | 74 | .431 | 22.5 |
Pittsburgh Pirates | 47 | 82 | .364 | 31 |
AL West | ||||
TEAM | WINS | LOSSES | PCT. | GB |
Kaline Drive | 79 | 49 | .615 | — |
Houston Astros | 76 | 52 | .594 | 2.8 |
Haviland Dragons | 71 | 57 | .553 | 7.9 |
Oakland A’s | 70 | 59 | .543 | 9.3 |
Seattle Mariners | 69 | 60 | .535 | 10.3 |
Los Angeles Angels | 63 | 67 | .485 | 16.8 |
Texas Rangers | 44 | 84 | .344 | 34.8 |
NL West | ||||
TEAM | WINS | LOSSES | PCT. | GB |
San Francisco Giants | 83 | 45 | .648 | — |
Los Angeles Dodgers | 81 | 48 | .628 | 2.5 |
Peshastin Pears | 78 | 50 | .612 | 4.7 |
San Diego Padres | 69 | 61 | .531 | 15 |
Portland Rosebuds | 61 | 67 | .475 | 22.2 |
Colorado Rockies | 59 | 69 | .461 | 24 |
Arizona Diamondbacks | 44 | 86 | .338 | 40 |
If there is no easy fix then we can just ignore them, assuming they are random thus affecting all teams similarly. Ignorance is bliss.