Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

baseball-databank · Baseball Databank

The Yahoo! Groups Product Blog

Check it out!

Group Information

? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Messages

Advanced
Messages Help
Messages 3594 - 3623 of 4385   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Show Message Summaries Sort by Date ^  
#3594 From: Tangotiger <tangotiger@...>
Date: Tue Sep 2, 2008 11:52 pm
Subject: Jiim Busby - Catcher - data inconsistency
tangotiger
Send Email Send Email
 
(Cross-posting to Retrolist and BDB)

This is to point out a discrepancy, without knowing which is correct.

The Lahman DB and B-R.com has Jim Busby with one game at catcher.  Retrosheet
has two games, with the additional being in 1960:
http://www.retrosheet.org/boxesetc/B/Pbusbj101.htm

Tom

---------------------------------------------
Tim Raines, Hall of Fame 2008
http://www.raines30.com/










---------------------------------------------

#3595 From: Tangotiger <tangotiger@...>
Date: Wed Sep 3, 2008 12:58 am
Subject: Re: [RetroList] Jiim Busby - Catcher - data inconsistency
tangotiger
Send Email Send Email
 
Thank you for showing that the best answer is found in the easiest of places.

Tom


---------------------------------------------

--- On Tue, 9/2/08, J. G. Preston <jgpreston@...> wrote:
From: J. G. Preston <jgpreston@...>
Subject: Re: [RetroList] Jiim Busby - Catcher - data inconsistency
To: RetroList@yahoogroups.com
Date: Tuesday, September 2, 2008, 8:00 PM

That box being:

http://www.retroshe et.org/boxesetc/ 1960/B07172CHA19 60.htm

On 9/2/08, Tangotiger <tangotiger@yahoo. com> wrote:
>
> (Cross-posting to Retrolist and BDB)
>
> This is to point out a discrepancy, without knowing which is correct.
>
> The Lahman DB and B-R.com has Jim Busby with one game at catcher.
> Retrosheet has two games, with the additional being in 1960:
> http://www.retroshe et.org/boxesetc/ B/Pbusbj101. htm
>
> Tom
>
> ------------ --------- --------- --------- ------
> Tim Raines, Hall of Fame 2008
> http://www.raines30 .com/
>
> ------------ --------- --------- --------- ------
>
>
>

[Non-text portions of this message have been removed]



#3596 From: "Dave Carter" <terpsfan101@...>
Date: Thu Sep 25, 2008 6:45 pm
Subject: Retrosheet Misc Batting and Pitching Stats
terpsfan101
Send Email Send Email
 
I have compiled additional batting and pitching statistics from
Retrosheet Play by Play data and wanted to share them here. I've
linked the Retro ID's to the BDB player ID's so you can easily join
them to your database. The only things stopping me from posting them
is that I don't know how to identify players with multiple stints on
the same team during one season. Also I am uncertain as to what
statistical categories I should include. For pitchers, I'd really
would of liked to have counted pickoffs. However, I didn't know an
easy way to seperate catchers pickoffs from pitchers pickoffs. That'll
be on the way shortly. Stolen Base and Caught Stealing data for
pitchers and catchers is already in the fielding tables.  Here's the
list of what I have compiled from Retrosheet from 1954 to 2007:

Batting:

Catcher's Interference
Reached on Error
Reached on Error Sacrifice Hit
Reached on Error Sacrifice Fly
Reached on Fielder's Choice (No Outs Recorded)
Reached on Fielder's Choice Sacrifice Hit
Pickoff Caught Stealing
Picked Off
Pickoff Error
Balk (Lead Baserunner)
Passed Ball (Lead baserunner that Advances)
Wild Pitch (Lead baserunner that Advances)
Defensive Indifference
Other Advance (Out Advancing)
Ground Balls
Fly Balls
Line Drives
Popups
GIDP Opportunities (Man on 1st, 0 or 1 out)
Retrosheet Plate Appearances
Missing Plate Appearances

Pitching

Doubles Allowed
Triples Allowed
GIDP Allowed
Sacrifice Hits Allowed (Complete from 1921 to the present)
Sacrifice Flys Allowed
Catcher's Interference Allowed
Reached on Error Allowed
Reached on Error Sacrifice Hit Allowed
Reached on Error Sacrifice Fly Allowed
Reached on Fielder's Choice Allowed
Reached on Fielder's Choice Sacrifice Hits Allowed
Ground Balls Against
Fly Balls Against
Line Drives Against
Popups Against
GIDP Against Opportunities
Retrosheet IP Outs
Missing IP Outs

Again, let me know what you'd for me to include. I can obviously
combine some of the categories.

#3597 From: "Matthew Gargano" <mgargano@...>
Date: Fri Sep 26, 2008 12:43 pm
Subject: Re: Retrosheet Misc Batting and Pitching Stats
tkestars
Send Email Send Email
 
Dave - Thanks so much for your efforts from the baseball databank group!

I was wondering if you could show me a thing or two.  Also I was wondering if you knew of an easy way to get pitch by pitch, or just similar data to what you have provided plus the basics (offense: AB, H, 2B, 3B, RBI, R, etc. and pitching: W L IP SV ER R BB, H, etc.) data into a database.  I have found some sites that I could scrape with PHP - but most of them are problematic and are subject to changes in design which would render my scraping schema useless.

Thanks so much again Dave!

Mat

On Thu, Sep 25, 2008 at 2:45 PM, Dave Carter <terpsfan101@...> wrote:
I have compiled additional batting and pitching statistics from
Retrosheet Play by Play data and wanted to share them here. I've
linked the Retro ID's to the BDB player ID's so you can easily join
them to your database. The only things stopping me from posting them
is that I don't know how to identify players with multiple stints on
the same team during one season. Also I am uncertain as to what
statistical categories I should include. For pitchers, I'd really
would of liked to have counted pickoffs. However, I didn't know an
easy way to seperate catchers pickoffs from pitchers pickoffs. That'll
be on the way shortly. Stolen Base and Caught Stealing data for
pitchers and catchers is already in the fielding tables.  Here's the
list of what I have compiled from Retrosheet from 1954 to 2007:

Batting:

Catcher's Interference
Reached on Error
Reached on Error Sacrifice Hit
Reached on Error Sacrifice Fly
Reached on Fielder's Choice (No Outs Recorded)
Reached on Fielder's Choice Sacrifice Hit
Pickoff Caught Stealing
Picked Off
Pickoff Error
Balk (Lead Baserunner)
Passed Ball (Lead baserunner that Advances)
Wild Pitch (Lead baserunner that Advances)
Defensive Indifference
Other Advance (Out Advancing)
Ground Balls
Fly Balls
Line Drives
Popups
GIDP Opportunities (Man on 1st, 0 or 1 out)
Retrosheet Plate Appearances
Missing Plate Appearances

Pitching

Doubles Allowed
Triples Allowed
GIDP Allowed
Sacrifice Hits Allowed (Complete from 1921 to the present)
Sacrifice Flys Allowed
Catcher's Interference Allowed
Reached on Error Allowed
Reached on Error Sacrifice Hit Allowed
Reached on Error Sacrifice Fly Allowed
Reached on Fielder's Choice Allowed
Reached on Fielder's Choice Sacrifice Hits Allowed
Ground Balls Against
Fly Balls Against
Line Drives Against
Popups Against
GIDP Against Opportunities
Retrosheet IP Outs
Missing IP Outs

Again, let me know what you'd for me to include. I can obviously
combine some of the categories.







------------------------------------

http://www.baseball-databank.org/Yahoo! Groups Links

<*> To visit your group on the web, go to:
   http://groups.yahoo.com/group/baseball-databank/

<*> Your email settings:
   Individual Email | Traditional

<*> To change settings online go to:
   http://groups.yahoo.com/group/baseball-databank/join
   (Yahoo! ID required)

<*> To change settings via email:
   mailto:baseball-databank-digest@yahoogroups.com
   mailto:baseball-databank-fullfeatured@yahoogroups.com

<*> To unsubscribe from this group, send an email to:
   baseball-databank-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
   http://docs.yahoo.com/info/terms/



#3598 From: "Dave Carter" <terpsfan101@...>
Date: Fri Sep 26, 2008 9:05 pm
Subject: Re: Retrosheet Misc Batting and Pitching Stats
terpsfan101
Send Email Send Email
 
Matt,

I don't know anything about scraping data. The basic data (AB,H,...)is
already in the BDB database. Unless you wanted to derive the basic
data for 2008. But it would probably be easiest to copy and paste the
basic data from Baseball Reference. Again, you'll have to provide a
little more detail about what you are trying to do here.

#3599 From: "Tangotiger" <tom@...>
Date: Fri Sep 26, 2008 9:05 pm
Subject: Re: Retrosheet Misc Batting and Pitching Stats
tom@...
Send Email Send Email
 
> Dave - Thanks so much for your efforts from the baseball databank group!
>
> I was wondering if you could show me a thing or two.  Also I was wondering
> if you knew of an easy way to get pitch by pitch, or just similar data to
> what you have provided plus the basics (offense: AB, H, 2B, 3B, RBI, R,
> etc.

Join RetroSQL yahoo group.

Tom

#3600 From: "Charles Saeger" <rasputin@...>
Date: Sat Sep 27, 2008 3:23 am
Subject: Re: Retrosheet Misc Batting and Pitching Stats
rasputin443556
Send Email Send Email
 
--- In baseball-databank@yahoogroups.com, "Dave Carter" <terpsfan101@...> wrote:

> Ground Balls
> Fly Balls
> Line Drives
> Popups

For pre-1980 years, the differences between these will blur in the data, and you
won't
have hits. I'd just go for:

Air Outs - Pulled
Ground Outs - Pulled

Pitchers have just Air Outs and Ground Outs.

And a record of bunts:

Sacrifice Attempts - Sacrifices - Hits - Errors
Bunt for a Hit Attempts - Hits
Squeeze Attempts - Squeezes

Pitchers would have a mirrored record. I'd also consider something like Runners
Moved Up
and some sort of RBI Opportunity.

For fielders, have Independent Putouts and Assists, those being ones the fielder
started,
and Double Plays Started and Turned, and opportunities for the Double Plays.
Separate
errors by those that put a runner on base, those that allow an existing runner
to advance,
and other, mostly muffed foul flies that don't do much.

#3601 From: "Dave Carter" <terpsfan101@...>
Date: Sat Sep 27, 2008 5:41 am
Subject: Re: Retrosheet Misc Batting and Pitching Stats
terpsfan101
Send Email Send Email
 
Charles,

Most of your suggestions shouldn't be difficult to implement. I
had a feeling that I should have used outs for batted balls.
Although seperating "Outs on Base" from the batted-ball categories
would be a pain. For instance, a single where the batter gets thrown
out at second-base would get counted as a ground-ball out. Would you
be OK with this. If you prefer, I can look at the baserunner fields
and remove these plays from GB and FB outs.

For Sacrifice Bunt Attempts, I'll use any bunt that occurrs during a
sacrifice situation. Bunt Hits and Bunt Hit attempts we'll be easy.
MGL uses infield hits, I think this would be more useful than Bunt
Hits. I don't think I want to attempt to measure "Runners moved up."
Should RBI Opportunities include situations where there is only 1
baserunner and he's on first base, in other words, anytime there is
a player on-base?

The defensive stats you suggested would be great to have. I can
categorize pickoffs for pitchers and catchers as well. This will
take some time. Give me at least a week to compile this stuff. If
there are any other categories you'd like to see, post them here or
email me.


--- In baseball-databank@yahoogroups.com, "Charles Saeger"
<rasputin@...> wrote:
>
> --- In baseball-databank@yahoogroups.com, "Dave Carter"
<terpsfan101@> wrote:
>
> > Ground Balls
> > Fly Balls
> > Line Drives
> > Popups
>
> For pre-1980 years, the differences between these will blur in the
data, and you won't
> have hits. I'd just go for:
>
> Air Outs - Pulled
> Ground Outs - Pulled
>
> Pitchers have just Air Outs and Ground Outs.
>
> And a record of bunts:
>
> Sacrifice Attempts - Sacrifices - Hits - Errors
> Bunt for a Hit Attempts - Hits
> Squeeze Attempts - Squeezes
>
> Pitchers would have a mirrored record. I'd also consider something
like Runners Moved Up
> and some sort of RBI Opportunity.
>
> For fielders, have Independent Putouts and Assists, those being
ones the fielder started,
> and Double Plays Started and Turned, and opportunities for the
Double Plays. Separate
> errors by those that put a runner on base, those that allow an
existing runner to advance,
> and other, mostly muffed foul flies that don't do much.
>

#3602 From: "Dave Carter" <terpsfan101@...>
Date: Sat Sep 27, 2008 12:53 am
Subject: Re: Retrosheet Misc Batting and Pitching Stats
terpsfan101
Send Email Send Email
 
I guess that I could throw out some of the baserunning categories.
Here's my new list:

Batting:

XI
ROE
ROE SH
ROE SF
RFC
RFC SH
PkO CS
PkO
PkO Error
GB
FB
LD
POP
GIDP Opp
Retro PA
Miss PA

Pitching:

2BA
3BA
GIDPA
SHA
SFA
XIA
ROEA
ROEA SH
ROEA SF
RFCA
RFCA SH
GBA
FBA
LDA
POPA
GIDPA Opp
Retro IPOuts
Miss IPOuts

#3603 From: "yonushonis" <yonushonis@...>
Date: Tue Sep 30, 2008 6:57 pm
Subject: When will the final .sql be available on the bbdatabank.org site?
yonushonis
Send Email Send Email
 
I realize the season isn't over yet, but I always like the info.

#3604 From: "Sean Forman" <sean-forman@...>
Date: Tue Sep 30, 2008 7:01 pm
Subject: Re: When will the final .sql be available on the bbdatabank.org site?
sforman71
Send Email Send Email
 
There is a chance it will be next week.

sean

On Tue, Sep 30, 2008 at 2:57 PM, yonushonis <yonushonis@...> wrote:

I realize the season isn't over yet, but I always like the info.




--
Sean Forman
President, Sports Reference LLC
http://www.sports-reference.com/

#3605 From: "hackersdienow" <hackersdienow@...>
Date: Tue Oct 7, 2008 8:51 am
Subject: 2007 records missing...
hackersdienow
Send Email Send Email
 
Hey guys...I don't know if anyone else noticed this since it's not a
huge deal 99% of the time, but for some reason in the 2007 season
(most recent in the database version I have) the convention to include
all pitching seasons and fielding seasons in the batting table whether
or not a batting record for that player/season existed seems to have
stopped and there are missing names.  The guy building our search
feature relies on the batting table as the general player record
creator (each player has one record per teamID/stint in the batting
table, which is why he chose to do that), so we were kind of wondering
if any fixes were planned to fill in those missing records or not.

Thanks in advance,

SABR Matt

#3606 From: "Tangotiger" <tom@...>
Date: Tue Oct 7, 2008 8:26 pm
Subject: Re: 2007 records missing...
tom@...
Send Email Send Email
 
We've longed discussed the idea of an APPEARANCES table, so that it
captures exactly what it's supposed to capture, rather than taking over
the BATTING table just because the official rules say to use it.  (DBAs
follow the data, not the legal definitions of names.)

The APPEARANCES table would give us the total number of games played,
games started, games finished, and potentially further broken down by
fielding position and lineup position.  So, you could have:

playerID, appearCode, appearSubcode, GS, GP, GF
raineti01, 0, 0, 141, 147, 142
raineti01, fld, 7, 141, 141, 140
raineti01, fld, 8, 0, 8, 2
raineti01, bat, 1, 140, 140, 139
raineti01, bat, 3, 1, 1, 1
raineti01, bat, 7, 0, 6, 2

Something like that.  So, for your purposes, you would select on
appearCode = 0.

Or, if that's too complicated, we'd have an APPEARANCES table like this:
playerID, GS, GP, GF
raineti01, 141, 147, 142

And an APPEARANCES_SPLIT table like this:
playerID, appearCode, appearSubcode, GS, GP, GF
raineti01, fld, 7, 141, 141, 140
raineti01, fld, 8, 0, 8, 2
raineti01, bat, 1, 140, 140, 139
raineti01, bat, 3, 1, 1, 1
raineti01, bat, 7, 0, 6, 2

Regardless, I do not have an answer for your original question.

Tom

#3607 From: "hackersdienow" <hackersdienow@...>
Date: Wed Oct 8, 2008 5:02 am
Subject: Re: 2007 records missing...
hackersdienow
Send Email Send Email
 
That's the way I would approach it as well if I were the guy doing the
building of the DB...I highly recommend Lahman/Forman work toward that
goal.  It sounds like you're saying, however, that there are no plans
to make the batting table rules consistent...which means I should be
telling our web search/sort builder to prepare other methods for
fixing the irregularities.  Do I have that correct?

#3608 From: "Tangotiger" <tom@...>
Date: Wed Oct 8, 2008 1:57 pm
Subject: Re: Re: 2007 records missing...
tom@...
Send Email Send Email
 
> That's the way I would approach it as well if I were the guy doing the
> building of the DB...I highly recommend Lahman/Forman work toward that
> goal.  It sounds like you're saying, however, that there are no plans
> to make the batting table rules consistent...which means I should be
> telling our web search/sort builder to prepare other methods for
> fixing the irregularities.  Do I have that correct?
>
>

There are no current plans, that's correct.  How/when this will be
rectified is undetermined.

Tom


---------------------------------------------
The Book--Playing The Percentages In Baseball
http://www.InsideTheBook.com

#3609 From: "hackersdienow" <hackersdienow@...>
Date: Thu Oct 9, 2008 6:34 am
Subject: Re: 2007 records missing...
hackersdienow
Send Email Send Email
 
OK...thanks Tom.

I'm going to ask our web coder to McGuyver an appearances
table...that's the correct way to normalize the database anyway.

#3610 From: "wyerscj" <PontifexExMachina@...>
Date: Sun Oct 19, 2008 3:56 am
Subject: Re: SCHEMA - Oversized MySQL fields?
wyerscj
Send Email Send Email
 
--- In baseball-databank@yahoogroups.com, "Randy Fiato"
<sysadmin@...> wrote:
>
> I realize that this is a relatively minor detail in relation to
some of the
> other things being planned here, but I've noticed that the current
MySQL
> schema has several fields that are oversized. That is to say, the
fields are
> represented as an integer that is far too large for the possible
range of
> values.
>
> For example, in the Master table, birthYear and deathYear are both
int(4),
> which in MySQL is a 4-byte integer. smallint(4) unsigned (the same
as yearID
> in the other tables) would be better, as this uses only 2 bytes.
Similarly,
> the bith/death days and months could be tinyint(2) unsigned instead
of
> int(2).
>
> Another example that occurs in several tables is that games in a
season (or
> wins, losses, etc.) can be described as tinyint(3) unsigned (range
of 0 -
> 255) instead of smallint(3). I don't think MLB is going to start
playing
> 250-game seasons any time soon. :-)
>
> I've made these changes, among several others, to my copy and have
been able
> to shrink the size of the database by at least a couple of
megabytes (not
> sure of the exact amount).
>
> --
> Randy Fiato
> System Administrator, Big League Forums
<http://www.bigleagueforums.net/>

And to think this is the source of most of my problems. (Well, no,
but I'm trying to add dramatic flair.)

The problem I'm running into is that, to use these values for a lot
of purposes I have to recast them as signed variables. I really
couldn't find any other discussion on these issues - would it be
really inconvenient to change the schema to not use unsigned
variables?

--CW

#3611 From: Paul DuBois <paul@...>
Date: Sun Oct 19, 2008 4:48 pm
Subject: Re: Re: SCHEMA - Oversized MySQL fields?
pdubois20
Send Email Send Email
 
On Oct 18, 2008, at 10:56 PM, wyerscj wrote:

> --- In baseball-databank@yahoogroups.com, "Randy Fiato"
> <sysadmin@...> wrote:
>>
>> I realize that this is a relatively minor detail in relation to
> some of the
>> other things being planned here, but I've noticed that the current
> MySQL
>> schema has several fields that are oversized. That is to say, the
> fields are
>> represented as an integer that is far too large for the possible
> range of
>> values.
>>
>> For example, in the Master table, birthYear and deathYear are both
> int(4),
>> which in MySQL is a 4-byte integer. smallint(4) unsigned (the same
> as yearID
>> in the other tables) would be better, as this uses only 2 bytes.
> Similarly,
>> the bith/death days and months could be tinyint(2) unsigned instead
> of
>> int(2).
>>
>> Another example that occurs in several tables is that games in a
> season (or
>> wins, losses, etc.) can be described as tinyint(3) unsigned (range
> of 0 -
>> 255) instead of smallint(3). I don't think MLB is going to start
> playing
>> 250-game seasons any time soon. :-)
>>
>> I've made these changes, among several others, to my copy and have
> been able
>> to shrink the size of the database by at least a couple of
> megabytes (not
>> sure of the exact amount).
>>
>> --
>> Randy Fiato
>> System Administrator, Big League Forums
> <http://www.bigleagueforums.net/>
>
> And to think this is the source of most of my problems. (Well, no,
> but I'm trying to add dramatic flair.)
>
> The problem I'm running into is that, to use these values for a lot
> of purposes I have to recast them as signed variables. I really
> couldn't find any other discussion on these issues - would it be
> really inconvenient to change the schema to not use unsigned
> variables?

If the nature of the data is that it is unsigned, it should be stored
in an unsigned column.  If the nature of your application is that you
use the values in a signed way, *your application* should make the
necessary adjustments.

Otherwise, why should the next person not come along and say, my
purposes require the values to be floats, can the schema be changed to
make them floats?

#3612 From: "James" <sogish@...>
Date: Sun Oct 19, 2008 10:24 pm
Subject: Re: SCHEMA - Oversized MySQL fields?
emiduplam
Send Email Send Email
 
--- In baseball-databank@yahoogroups.com, Paul DuBois <paul@...> wrote:
>
>
> On Oct 18, 2008, at 10:56 PM, wyerscj wrote:
>
> > --- In baseball-databank@yahoogroups.com, "Randy Fiato"
> > <sysadmin@> wrote:
> >>
> >> I realize that this is a relatively minor detail in relation to
> > some of the
> >> other things being planned here, but I've noticed that the current
> > MySQL
> >> schema has several fields that are oversized. That is to say, the
> > fields are
> >> represented as an integer that is far too large for the possible
> > range of
> >> values.
> >>
> >> For example, in the Master table, birthYear and deathYear are both
> > int(4),
> >> which in MySQL is a 4-byte integer. smallint(4) unsigned (the same
> > as yearID
> >> in the other tables) would be better, as this uses only 2 bytes.
> > Similarly,
> >> the bith/death days and months could be tinyint(2) unsigned instead
> > of
> >> int(2).
> >>
> >> Another example that occurs in several tables is that games in a
> > season (or
> >> wins, losses, etc.) can be described as tinyint(3) unsigned (range
> > of 0 -
> >> 255) instead of smallint(3). I don't think MLB is going to start
> > playing
> >> 250-game seasons any time soon. :-)
> >>
> >> I've made these changes, among several others, to my copy and have
> > been able
> >> to shrink the size of the database by at least a couple of
> > megabytes (not
> >> sure of the exact amount).
> >>
> >> --
> >> Randy Fiato
> >> System Administrator, Big League Forums
> > <http://www.bigleagueforums.net/>
> >
> > And to think this is the source of most of my problems. (Well, no,
> > but I'm trying to add dramatic flair.)
> >
> > The problem I'm running into is that, to use these values for a lot
> > of purposes I have to recast them as signed variables. I really
> > couldn't find any other discussion on these issues - would it be
> > really inconvenient to change the schema to not use unsigned
> > variables?
>
> If the nature of the data is that it is unsigned, it should be stored
> in an unsigned column.  If the nature of your application is that you
> use the values in a signed way, *your application* should make the
> necessary adjustments.
>
> Otherwise, why should the next person not come along and say, my
> purposes require the values to be floats, can the schema be changed to
> make them floats?
>


Anything in baseball that could possibly be a negative number?

#3613 From: "Tangotiger" <tom@...>
Date: Wed Oct 22, 2008 1:05 am
Subject: Re: Re: SCHEMA - Oversized MySQL fields?
tom@...
Send Email Send Email
 
Regarding the tinyint and int and the like issue: I disagree.  I'll give
you two good reasons:

1. An 8 GB flash drive costs $29.  A 500 GB backup drive costs 75$.
Saving space should be the least of our concerns.  Indeed, I'd prefer
creating a data warehouse, which would mean heaving denormalized data.

2. If you do sum(wins) on a tinyint, some DBMS will keep the data type as
tinyint.  You will get into overflow situations.

Whatever performance issues you can possibly gain pales in comparison to
potentially corrupt output.

Tom

#3614 From: "Clem Comly" <ccomly@...>
Date: Fri Oct 24, 2008 2:05 am
Subject: small stat change
ccomly2003
Send Email Send Email
 

1952 Bud Black for Detroit-- his season IP total should be 7.2 not 8.  Daily on microfilm has 7.2 for him and all the other Detroit Ps IPs are OK (seems to me I heard official totals went out with rounded IP season totals  and Pete Palmer has found and fixed a lot of them)

 

Clem Comly


#3615 From: "Sean Forman" <sean-forman@...>
Date: Mon Nov 10, 2008 5:51 pm
Subject: 2008 Update this week or early next
sforman71
Send Email Send Email
 
I've made good progress on the BDB update for this year and should have something soon.

I'm adding an Appearances table with the following schema (this will have data from 1973 on for the AL and 1974 on for the NL).  Corresponding to complete retrosheet data for the NL and the start of the DH for the AL. 

I'm open to suggestions, but in Batting, I think I'm going to put an entry in for every player who played and then  add a column G_bat that will show how many games they appeared in the lineup, so pre-Interleague AL pitchers will have their games pitches in G and then 0 for G_bat and nulls for the values of the batting stats.

Feedback?

sean

CREATE TABLE `Appearances` (
  `yearID` smallint(4) NOT NULL default '0',
  `teamID` char(3) NOT NULL default '',
  `lgID` char(2) default NULL,
  `playerID` char(9) NOT NULL default '',
  `G_all` tinyint(3) unsigned default NULL,
  `G_batting` tinyint(3) unsigned default NULL,
  `G_defense` tinyint(3) unsigned default NULL,
  `G_p` tinyint(3) unsigned default NULL,
  `G_c` tinyint(3) unsigned default NULL,
  `G_1b` tinyint(3) unsigned default NULL,
  `G_2b` tinyint(3) unsigned default NULL,
  `G_3b` tinyint(3) unsigned default NULL,
  `G_ss` tinyint(3) unsigned default NULL,
  `G_lf` tinyint(3) unsigned default NULL,
  `G_cf` tinyint(3) unsigned default NULL,
  `G_rf` tinyint(3) unsigned default NULL,
  `G_of` tinyint(3) unsigned default NULL,
  `G_dh` tinyint(3) unsigned default NULL,
  `G_ph` tinyint(3) unsigned default NULL,
  `G_pr` tinyint(3) unsigned default NULL,
  PRIMARY KEY  (`yearID`,`teamID`,`playerID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;



--
Sean Forman
President, Sports Reference LLC
http://www.sports-reference.com/

#3616 From: KJOK <kjokbaseball@...>
Date: Tue Nov 11, 2008 7:06 am
Subject: Re: 2008 Update this week or early next
kjokbaseball
Send Email Send Email
 
I think we've discussed having an appearance or roster table before, and talked about it as a very good idea.
 
I'm less clear about the need for the Batting Table Changes, given the Appearance Table seems to take care of the G_Bat issue for AL pitchers.
 
THANKS,
Kevin

--- On Mon, 11/10/08, Sean Forman <sean-forman@...> wrote:
From: Sean Forman <sean-forman@...>
Subject: [baseball-databank] 2008 Update this week or early next
To: "Baseball Databank" <baseball-databank@yahoogroups.com>
Date: Monday, November 10, 2008, 11:51 AM

I've made good progress on the BDB update for this year and should have something soon.

I'm adding an Appearances table with the following schema (this will have data from 1973 on for the AL and 1974 on for the NL).  Corresponding to complete retrosheet data for the NL and the start of the DH for the AL. 

I'm open to suggestions, but in Batting, I think I'm going to put an entry in for every player who played and then  add a column G_bat that will show how many games they appeared in the lineup, so pre-Interleague AL pitchers will have their games pitches in G and then 0 for G_bat and nulls for the values of the batting stats.

Feedback?

sean

CREATE TABLE `Appearances` (
  `yearID` smallint(4) NOT NULL default '0',
  `teamID` char(3) NOT NULL default '',
  `lgID` char(2) default NULL,
  `playerID` char(9) NOT NULL default '',
  `G_all` tinyint(3) unsigned default NULL,
  `G_batting` tinyint(3) unsigned default NULL,
  `G_defense` tinyint(3) unsigned default NULL,
  `G_p` tinyint(3) unsigned default NULL,
  `G_c` tinyint(3) unsigned default NULL,
  `G_1b` tinyint(3) unsigned default NULL,
  `G_2b` tinyint(3) unsigned default NULL,
  `G_3b` tinyint(3) unsigned default NULL,
  `G_ss` tinyint(3) unsigned default NULL,
  `G_lf` tinyint(3) unsigned default NULL,
  `G_cf` tinyint(3) unsigned default NULL,
  `G_rf` tinyint(3) unsigned default NULL,
  `G_of` tinyint(3) unsigned default NULL,
  `G_dh` tinyint(3) unsigned default NULL,
  `G_ph` tinyint(3) unsigned default NULL,
  `G_pr` tinyint(3) unsigned default NULL,
  PRIMARY KEY  (`yearID`,`teamID` ,`playerID` )
) ENGINE=MyISAM DEFAULT CHARSET=latin1;



--
Sean Forman
President, Sports Reference LLC
http://www.sports- reference. com/


#3617 From: "Sean Forman" <sean-forman@...>
Date: Tue Nov 11, 2008 12:15 pm
Subject: Re: 2008 Update this week or early next
sforman71
Send Email Send Email
 


On Tue, Nov 11, 2008 at 2:06 AM, KJOK <kjokbaseball@...> wrote:

I think we've discussed having an appearance or roster table before, and talked about it as a very good idea.
 
I'm less clear about the need for the Batting Table Changes, given the Appearance Table seems to take care of the G_Bat issue for AL pitchers.
 
THANKS,
Kevin


 




I see this as backwards compatibility.  I know that lost of people use the tables as is and historically G has been all games played.  This will leave Batting as listing all players, but also give an easy means to filter out the non-batting lineup players if desired.

sean
--
Sean Forman
President, Sports Reference LLC
http://www.sports-reference.com/

#3618 From: Tangotiger <tangotiger@...>
Date: Tue Nov 11, 2008 12:18 pm
Subject: Re: 2008 Update this week or early next
tangotiger
Send Email Send Email
 
It is usually desirable to go vertically, than horizontally, like so:
http://sports.groups.yahoo.com/group/baseball-databank/message/3606

You can create more codes, without altering the number of columns.  So, you can include stuff like batting order and the like.  From there, you can always convert the subset you need to map horizontally.

Tom


---------------------------------------------

--- On Mon, 11/10/08, Sean Forman <sean-forman@...> wrote:
From: Sean Forman <sean-forman@...>
Subject: [baseball-databank] 2008 Update this week or early next
To: "Baseball Databank" <baseball-databank@yahoogroups.com>
Date: Monday, November 10, 2008, 12:51 PM

I've made good progress on the BDB update for this year and should have something soon.

I'm adding an Appearances table with the following schema (this will have data from 1973 on for the AL and 1974 on for the NL).  Corresponding to complete retrosheet data for the NL and the start of the DH for the AL. 

I'm open to suggestions, but in Batting, I think I'm going to put an entry in for every player who played and then  add a column G_bat that will show how many games they appeared in the lineup, so pre-Interleague AL pitchers will have their games pitches in G and then 0 for G_bat and nulls for the values of the batting stats.

Feedback?

sean

CREATE TABLE `Appearances` (
  `yearID` smallint(4) NOT NULL default '0',
  `teamID` char(3) NOT NULL default '',
  `lgID` char(2) default NULL,
  `playerID` char(9) NOT NULL default '',
  `G_all` tinyint(3) unsigned default NULL,
  `G_batting` tinyint(3) unsigned default NULL,
  `G_defense` tinyint(3) unsigned default NULL,
  `G_p` tinyint(3) unsigned default NULL,
  `G_c` tinyint(3) unsigned default NULL,
  `G_1b` tinyint(3) unsigned default NULL,
  `G_2b` tinyint(3) unsigned default NULL,
  `G_3b` tinyint(3) unsigned default NULL,
  `G_ss` tinyint(3) unsigned default NULL,
  `G_lf` tinyint(3) unsigned default NULL,
  `G_cf` tinyint(3) unsigned default NULL,
  `G_rf` tinyint(3) unsigned default NULL,
  `G_of` tinyint(3) unsigned default NULL,
  `G_dh` tinyint(3) unsigned default NULL,
  `G_ph` tinyint(3) unsigned default NULL,
  `G_pr` tinyint(3) unsigned default NULL,
  PRIMARY KEY  (`yearID`,`teamID` ,`playerID` )
) ENGINE=MyISAM DEFAULT CHARSET=latin1;



--
Sean Forman
President, Sports Reference LLC
http://www.sports- reference. com/



#3619 From: "Matthew Gargano" <mgargano@...>
Date: Mon Nov 10, 2008 6:05 pm
Subject: Re: 2008 Update this week or early next
tkestars
Send Email Send Email
 
Sean:

That schema sounds great to me.

Just a quick question - I did some research and searched the Yahoo! group and couldn't find the answer to this.  There are some players that are missing InnOuts in the Fielding table (Moises Alou for the late 90's for example) and ZR (Jeff Keppinger for 2007 for example).

Are these data available in full?  Will they be in the newest release?  Did I just have a bad copy of the DB?  I noticed that (for Fielding) InnOuts are on Baseball-Reference so it must be somewhere.

Let me know.

I appreciate all of the work!

Thanks,

Matthew Gargano

#3620 From: robert bluestein <robertbluesteinphotography@...>
Date: Tue Nov 11, 2008 2:59 pm
Subject: Re: 2008 Update this week or early next
robertbluest...
Send Email Send Email
 
You can do it that way, but SABRE prefers it to be the other way.
 
 


--- On Tue, 11/11/08, Tangotiger <tangotiger@...> wrote:
From: Tangotiger <tangotiger@...>
Subject: Re: [baseball-databank] 2008 Update this week or early next
To: baseball-databank@yahoogroups.com
Date: Tuesday, November 11, 2008, 6:18 AM

It is usually desirable to go vertically, than horizontally, like so:
http://sports. groups.yahoo. com/group/ baseball- databank/ message/3606

You can create more codes, without altering the number of columns.  So, you can include stuff like batting order and the like.  From there, you can always convert the subset you need to map horizontally.

Tom


------------ --------- --------- --------- ------

--- On Mon, 11/10/08, Sean Forman <sean-forman@ baseball- reference. com> wrote:
From: Sean Forman <sean-forman@ baseball- reference. com>
Subject: [baseball-databank] 2008 Update this week or early next
To: "Baseball Databank" <baseball-databank@ yahoogroups. com>
Date: Monday, November 10, 2008, 12:51 PM

I've made good progress on the BDB update for this year and should have something soon.

I'm adding an Appearances table with the following schema (this will have data from 1973 on for the AL and 1974 on for the NL).  Corresponding to complete retrosheet data for the NL and the start of the DH for the AL. 

I'm open to suggestions, but in Batting, I think I'm going to put an entry in for every player who played and then  add a column G_bat that will show how many games they appeared in the lineup, so pre-Interleague AL pitchers will have their games pitches in G and then 0 for G_bat and nulls for the values of the batting stats.

Feedback?

sean

CREATE TABLE `Appearances` (
  `yearID` smallint(4) NOT NULL default '0',
  `teamID` char(3) NOT NULL default '',
  `lgID` char(2) default NULL,
  `playerID` char(9) NOT NULL default '',
  `G_all` tinyint(3) unsigned default NULL,
  `G_batting` tinyint(3) unsigned default NULL,
  `G_defense` tinyint(3) unsigned default NULL,
  `G_p` tinyint(3) unsigned default NULL,
  `G_c` tinyint(3) unsigned default NULL,
  `G_1b` tinyint(3) unsigned default NULL,
  `G_2b` tinyint(3) unsigned default NULL,
  `G_3b` tinyint(3) unsigned default NULL,
  `G_ss` tinyint(3) unsigned default NULL,
  `G_lf` tinyint(3) unsigned default NULL,
  `G_cf` tinyint(3) unsigned default NULL,
  `G_rf` tinyint(3) unsigned default NULL,
  `G_of` tinyint(3) unsigned default NULL,
  `G_dh` tinyint(3) unsigned default NULL,
  `G_ph` tinyint(3) unsigned default NULL,
  `G_pr` tinyint(3) unsigned default NULL,
  PRIMARY KEY  (`yearID`,`teamID` ,`playerID` )
) ENGINE=MyISAM DEFAULT CHARSET=latin1;



--
Sean Forman
President, Sports Reference LLC
http://www.sports- reference. com/



#3621 From: "Tangotiger" <tom@...>
Date: Tue Nov 11, 2008 4:05 pm
Subject: Re: 2008 Update this week or early next
tom@...
Send Email Send Email
 
Who is "SABRE", and can you cite their reasoning?

Tom

> You can do it that way, but SABRE prefers it to be the other way.
>  
>  
>
>
> --- On Tue, 11/11/08, Tangotiger <tangotiger@...> wrote:
>
> From: Tangotiger <tangotiger@...>
> Subject: Re: [baseball-databank] 2008 Update this week or early next
> To: baseball-databank@yahoogroups.com
> Date: Tuesday, November 11, 2008, 6:18 AM
>
>
>
>
>
>
>
>
>
>
> It is usually desirable to go vertically, than horizontally, like so:
> http://sports. groups.yahoo. com/group/ baseball- databank/ message/3606
>
> You can create more codes, without altering the number of columns.  So,
> you can include stuff like batting order and the like.  From there, you
> can always convert the subset you need to map horizontally.
>
> Tom
>
>
> ------------ --------- --------- --------- ------
>
> --- On Mon, 11/10/08, Sean Forman <sean-forman@ baseball- reference. com>
> wrote:
>
> From: Sean Forman <sean-forman@ baseball- reference. com>
> Subject: [baseball-databank] 2008 Update this week or early next
> To: "Baseball Databank" <baseball-databank@ yahoogroups. com>
> Date: Monday, November 10, 2008, 12:51 PM
>
>
>
>
> I've made good progress on the BDB update for this year and should have
> something soon.
>
> I'm adding an Appearances table with the following schema (this will have
> data from 1973 on for the AL and 1974 on for the NL).  Corresponding to
> complete retrosheet data for the NL and the start of the DH for the AL. 
>
> I'm open to suggestions, but in Batting, I think I'm going to put an entry
> in for every player who played and then  add a column G_bat that will show
> how many games they appeared in the lineup, so pre-Interleague AL pitchers
> will have their games pitches in G and then 0 for G_bat and nulls for the
> values of the batting stats.
>
> Feedback?
>
> sean
>
> CREATE TABLE `Appearances` (
>   `yearID` smallint(4) NOT NULL default '0',
>   `teamID` char(3) NOT NULL default '',
>   `lgID` char(2) default NULL,
>   `playerID` char(9) NOT NULL default '',
>   `G_all` tinyint(3) unsigned default NULL,
>   `G_batting` tinyint(3) unsigned default NULL,
>   `G_defense` tinyint(3) unsigned default NULL,
>   `G_p` tinyint(3) unsigned default NULL,
>   `G_c` tinyint(3) unsigned default NULL,
>   `G_1b` tinyint(3) unsigned default NULL,
>   `G_2b` tinyint(3) unsigned default NULL,
>   `G_3b` tinyint(3) unsigned default NULL,
>   `G_ss` tinyint(3) unsigned default NULL,
>   `G_lf` tinyint(3) unsigned default NULL,
>   `G_cf` tinyint(3) unsigned default NULL,
>   `G_rf` tinyint(3) unsigned default NULL,
>   `G_of` tinyint(3) unsigned default NULL,
>   `G_dh` tinyint(3) unsigned default NULL,
>   `G_ph` tinyint(3) unsigned default NULL,
>   `G_pr` tinyint(3) unsigned default NULL,
>   PRIMARY KEY  (`yearID`,`teamID` ,`playerID` )
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
>
>
>
> --
> Sean Forman
> President, Sports Reference LLC
> http://www.sports- reference. com/
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


---------------------------------------------
The Book--Playing The Percentages In Baseball
http://www.InsideTheBook.com

#3622 From: "Sean Forman" <sean-forman@...>
Date: Thu Nov 13, 2008 7:26 pm
Subject: 2008 pre-release
sforman71
Send Email Send Email
 
I've updated a BDB update today.  Here are the release notes.  One should consider this a proposed release as I would like lots of review to point out any issues with the newest release.

sean

Notes:

Added an appearances table that goes from 1973-present for the AL and 1974 on for the NL. These years were chosen because they contain the entirety of the DH-era for the AL and the seasons in the NL for which Retrosheet data is complete. This table contains a summary of games played by position including a summary by position. It also lists, in G_batting, the number of games in which the player appeared in a batting order. In the DH-era AL pitchers may well have all zeros here as they never appeared in the lineup.

Postseason batting, pitching and fielding tables were greatly expanded and improved from play-by-play data.

The games played data in Batting should now list all games played for all players, and all players will appear in this table regardless of whether they were in the lineup. I've added a G_batting column to show you when the player did not appear in the lineup that season and also nulled out their stats when they did not have batting stats for that year.

In FieldingOF, I deleted entries for seasons after 1956, since I have entered full LF-CF-RF entries for those seasons and the games played totals are now redundant.

Added an AllstarFull table that adds info like starter and GP info.



--
Sean Forman
President, Sports Reference LLC
http://www.sports-reference.com/

#3623 From: "Tangotiger" <tom@...>
Date: Thu Nov 13, 2008 9:28 pm
Subject: Re: 2008 pre-release
tom@...
Send Email Send Email
 
Two hours without comment?  I hope that's because we are in shock and not
passe over this.  Anyway, a huge thanks to Sean for delivering the data.

I am going to work on updating my MS Access shell to automatically import
the data.  With new columns, I have to spend a bit of time to get that
straightened out.  I'll make that available once I've got it working,
hopefully no later than early next week.

Also, please treat this as an open invitation to everyone out there to
deliver whatever data they get their hands on, especially "ID mapping"
data across the various data sources. It's important to think of this
ground as a dumping ground for any and all data.

Tom


> I've updated a BDB update today.  Here are the release notes.  One should
> consider this a proposed release as I would like lots of review to point
> out
> any issues with the newest release.
>
> sean
>
> Notes:
>
> Added an appearances table that goes from 1973-present for the AL and 1974
> on for the NL. These years were chosen because they contain the entirety
> of
> the DH-era for the AL and the seasons in the NL for which Retrosheet data
> is
> complete. This table contains a summary of games played by position
> including a summary by position. It also lists, in G_batting, the number
> of
> games in which the player appeared in a batting order. In the DH-era AL
> pitchers may well have all zeros here as they never appeared in the
> lineup.
>
> Postseason batting, pitching and fielding tables were greatly expanded and
> improved from play-by-play data.
>
> The games played data in Batting should now list all games played for all
> players, and all players will appear in this table regardless of whether
> they were in the lineup. I've added a G_batting column to show you when
> the
> player did not appear in the lineup that season and also nulled out their
> stats when they did not have batting stats for that year.
>
> In FieldingOF, I deleted entries for seasons after 1956, since I have
> entered full LF-CF-RF entries for those seasons and the games played
> totals
> are now redundant.
>
> Added an AllstarFull table that adds info like starter and GP info.
>
>
> --
> Sean Forman
> President, Sports Reference LLC
> http://www.sports-reference.com/
>


---------------------------------------------
The Book--Playing The Percentages In Baseball
http://www.InsideTheBook.com

Messages 3594 - 3623 of 4385   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help