Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

baseball-databank · Baseball Databank

The Yahoo! Groups Product Blog

Check it out!

Group Information

? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Hear how Yahoo! Groups has changed the lives of others. Take me there.

Messages

Advanced
Messages Help
Messages 1721 - 1750 of 4385   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Show Message Summaries Sort by Date ^  
#1721 From: "KJOK" <kjokbaseball@...>
Date: Sun Aug 17, 2003 6:43 am
Subject: Re: PitchingPost Table Problems
kjokbaseball
Send Email Send Email
 
--- In baseball-databank@yahoogroups.com, Michael Mavrogiannis
<mmavrogi@o...> wrote:
> In the PitchingPost table, the fields GS, SHO, HR, and BAOpp are
missing for
> most years, with sporadic values sprinkled in to confuse
researchers. Did
> you know that Buck Becannon and Andy Pettitte started, apparently
unopposed,
> the sole post-season games of the 1884 and 1995 seasons? You would
if you
> were to believe the dirty lyin' database. As recently as 1999,
there were an
> odd number of games started in the post-season.
>
> Hope this helps.

I've mostly concentrated on the 'regular season' tables, and had
never really looked at PitchingPost, but you're right, it's weak -
doesn't have GS, or even Runs Allowed, HR's allowed, etc. which are
all available for post season games (at least 20th century ones).  If
Sean will indulge me, I'd like to create a new PitchingPost table
that would have all of these plust BFP, GF, IBB, WP, GDP and even SH
and SF allowed, and would replace completely the current PitchingPost
Table.

THANKS,
Kevin

#1722 From: "tmasc@..." <tmasc@...>
Date: Sun Aug 17, 2003 12:00 pm
Subject: Re: Re: TO DO - updated list
tangotiger
Send Email Send Email
 
--- Michael Westbay
<westbaystars@...> wrote:
> Also, I was wondering if the DB Design Committee is
> on a different
> channel (mailing list).  Not much in the way of
> design is discussed
> here, but that's where much of my interest is.
>

Michael,

Tom Lewis and I are the only members, and we delivered
the design in Jan, and we have not made any
changes/discussions since.

Starting a separate yahoo group sounds like a good
idea, and I'll send out a note Monday to that effect.
Anyone wishing to make contributions to the design
will be invited.

Things like keys and the like will be discussed
prominently, I'm sure.

I'll also look for past threads on this issue, so that
we'll all be on the same page.

Tom




__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

#1723 From: Paul Wendt <pgw@...>
Date: Sun Aug 17, 2003 2:22 pm
Subject: Re: LICENSING
pgw02472
Send Email Send Email
 
17 Aug 2003, Michael Westbay wrote, in small part, near the end:

> For redistribution,

All of the paragraphs that I have deleted also pertain to redistribution,
not to modifications by those users who do not redistribute.

> provide reference to the
> original work (at BDB's web site?) in appropriate documentation.

> I don't mind people making a profit off of my work, so long as they
> don't forget to let people know where they got the information (credit
> and link) and provide feedback about any corrections and/or additions
> to the data.

> The GPL is the best way to enforce this behavior from people
> looking to distribute the data.

- reference to the original; credit and link
- provision of feedback

These requirements to provide credit and feedback are elements of a
different model.  Berkeley's model, I believe.
R.M. Stallman, founder/overseer of the Free Software model and author of
the GNU licenses, has argued vehemently against both.

P/\/ \/\/t
Paul Wendt, Watertown MA, USA <pgw@...>

#1724 From: "tangotiger" <tmasc@...>
Date: Sun Aug 17, 2003 2:49 pm
Subject: BDB DESIGN - group started
tangotiger
Send Email Send Email
 
A BDB Design Committee yahoo group has been started here:

http://groups.yahoo.com/group/BDBDesign/

I will ask that Tom Lewis and Michael Westbay join this group when
they get the chance.  Anyone else wanting to join may do so, and be a
part of the design committee.

The message archives are available for all members, and therefore,
you do not need to join the group if you are a viewer only.

Note that this committee, as well as all others, are best described
as splinter groups that make recommendations to the BDB group.  Sean
and Sean, as owners of the database, would have the last word, I'd
guess (though this has not been formally specified either).

On Monday, I will cull the archives and make a list of messages where
we discussed design issues (though the TO DO list already provided is
probably a great starting point).

Thanks, Tom

#1725 From: "rbhs1980" <scotthberg@...>
Date: Sun Aug 17, 2003 5:23 pm
Subject: Re: Holes in the Historical Data
rbhs1980
Send Email Send Email
 
I'm new to the Group, but I have a 1901 Reach baseball guide that
shows Stolen Bases that the Catchers allowed for 1900. The table
breaks it down by Games, Stolen bases by opponents, and Average by
Game. If needed I can send the information to who is heading this up.

Scott Berg


--- In baseball-databank@yahoogroups.com, "Jeff Burk"
<arkyvaughan@b...> wrote:
> It appears that most of the "holes" in the data are for statistics
that were not kept at
> the time and have never been reconstructed. Taking a look at the
batting data as it
> stands today, the data appears to include all available years for
the following
> categories:
>
> AB, H, 2B, 3B, HR, R, RBI, BB, IBB, HBP, SO, SH, SF, SB, CS, GDP
>
> For pitching data, including what was uploaded on June 30, the
data appears to
> include all available years for the following categories:
>
> G, W, L, GS, CG, ShO, Sv, GF, IP, BFP, R, ER, H, HR, BB, IBB, HB,
SO, WP, Bk
>
> Perhaps the efforts of Retrosheet will eventually fill in many of
these gaps, but some
> of it simply does not exist right now.
>
> The only two long-standing "official statistics," i.e., the
statistics that are known to
> exist but are not in the database, are SH and SF allowed by
pitchers.
>
> By the way, assuming the BFP, BB, and HB data are accurate, and SH
and SF are
> eventually added, there is no longer reason to include BAOpp in
the database. This is
> redundant, since it can be figured using H/(BFP-BB-HB-SH-SF). Nor
is ERA now
> needed since the data now includes IP and ER.
>
> The fielding data was updated on June 27 with PB. I believe this
was the last long-
> standing "official statistic" not included in the fielding data.
>
> So what we are really after now is stuff that has only been kept
in the last 25 years or
> so or is being reconstructed by Retrosheet. Including what Tom
listed, I would be
> interested in gathering as much of the following as possible:
>
> Batting: Pitches Seen, Groundballs, Flyballs
>
> Pitching: Save Opportunities, Holds, Inherited Runners, Inherited
Runners Scored,
> Doubles Allowed, Triples Allowed, RBI Allowed, Stolen Bases
Allowed, Caught Stealing
> Allowed, GDP Induced, Pickoffs, Pitches Thrown, Groundballs,
Flyballs
>
> Fielding: Games Started, Innings Played, Triple Plays, Zone
Chances, Zone Outs,
> Stolen Bases Allowed by Catchers, Caught Stealing Allowed by
Catchers, Pickoffs by
> Catchers
>
> I have some of this from about 1989 to 1998, although it is not
broken down by stint
> and, frankly, I'm not sure how such a breakdown could be obtained.
I will rummage
> around in my harddrive and see if I can find this data and get it
in some useful form.
>
> Also, I believe the LF/CF/RF data is going to be similarly
limited. It's either going to
> be from the last 25 years or so or from some seasons in the 19th
century when
> LF/CF/RF data was separately kept.
>
> Finally, home and road home run data is available for all seasons.
Is the Tattersall log
> available somewhere that might make it possible for us to
integrate this data into the
> database? I am not suggesting we try to create an entire separate
situational database
> with home/road breakdowns, but the home run data might be
interesting and is
> known to be available for all seasons.
>
> I'd be interested in working with folks to make these additions
and improvements to
> the batting, pitching, and fielding databases.
>
> > > -----Original Message-----
> > > The "holes" in our data is as follows:
> > >
> > > Hitting:
> > > SB - 1876 to 1885
> > > CS - 1876 to 1919
> > > IBB - 1871 to 1954
> > > HBP - 1871 to 1883
> > > GIDP - 1871 to 1932
> > > SH - 1871 to 1894
> > > SF - 1871 to 1953
> > > (though I realize that the way the SH and SF rules
> > > and
> > > recording practices were at the time that it might
> > > not
> > > even be possible to separate things)
> > >
> > > Pitching:
> > > for any years where it was recorded:
> > > pitch counts, balls, strikes
> > >
> > > for all years, except 1972-1992 and 1999-2002
> > > WP, PB, BK, Pickoffs, SB, CS, HBP, IBB
> > >
> > > Fielding:
> > > for any years where it was recorded:
> > > "balls in zone"
> > >
> > > for all years, except 1972-1992 and 1999-2002
> > > - Innings Played by Position
> > > - Games played for LF,CF,RF as individual position
> > > - WP, PB, BK, Pickoffs, SB, CS for catchers
> > >
> > > Any of the voids that can be filled would be most
> > > appreciated.  I understand if you will only have
> > > time
> > > to look into this during a lull in the off-season.
> > >
> > > Thanks, Tom
> > >
> >
> >
> > __________________________________
> > Do you Yahoo!?
> > Yahoo! SiteBuilder - Free, easy-to-use web site design software
> > http://sitebuilder.yahoo.com

#1726 From: Sean Forman <sean-forman@...>
Date: Sun Aug 17, 2003 5:34 pm
Subject: POLICY - 001 - BDB ROADMAP
sforman71
Send Email Send Email
 
I want to thank everyone for their comments and their feedback.  However
I want to urge people to hold off for now on discussion of the policy
issues I listed in my first message.  [licensing, normalization, etc.]

My goal is to set out a roadmap for BDB development over the next year
and I fear that if we start hashing out 5 or 6 things at once.

1) we are going to be swamped with messages that the busier members of
this effort won't be able to follow and that important decisions will be
made without consensus.  This will make these policies far more
difficult to implement.

2) I'll (and perhaps others) will fizzle out with the sheer amount of
stuff that we would like to do.  I want to take a long view of this
project.  Keep in mind that Retrosheet just celebrated its tenth year in
existence.  We aren't going to get everything done we would like to get
done next year.

I would like us to take a few weeks or so and hash out where we want to
be in the next year or two.

I would like to see more discussion as to what the priorities are and I
will put together another document listing priorities and I hope
iteratively we can arrive at a consensus as to what needs to be done and
what needs to be decided before moving forward.

For example, it sounds like syncing with retro ID's is a big priority.

What are others from the list I posted earlier?
http://groups.yahoo.com/group/baseball-databank/message/1696

thank you for your enthusiasm,
sean

Baseball Stats!   http://www.Baseball-Reference.com/
Baseball Analysis!    http://www.BaseballPrimer.com/

#1727 From: "tangotiger" <tmasc@...>
Date: Sun Aug 17, 2003 6:07 pm
Subject: POLICY - 001 - Roadmap
tangotiger
Send Email Send Email
 
In response to Sean's email regarding roadmap and priorities, the
message below is the "section titles" of the "TO DO" list of
January.  I would say this is what we should strive for.

As for Nov, 30 launch date of the next release, this should contain
the latest BDB design, including the park data.

Tom

> > ==========================
> > PRIORITY 1 - Handle ASAP
> > Type:
> > (1) any item that contains errors in data
> > (2) data that existed, and is now missing
> > ==========================
> > ==========================
> > PRIORITY 2 - Handle Very Very Soon
> > Type:
> > (1) Organizational, procedural items for handling BDB
> > ==========================
> > ==========================
> > PRIORITY 3 - Handle Very Soon
> > Type:
> > (1) Primary Tables
> > ==========================
> > ==========================
> > PRIORITY 4 - Handle Soon
> > Type:
> > (1) Design Issues, normalization, keys
> > (2) XREF to other databases
> > (3) Standards
> > ==========================
> > ==========================
> > PRIORITY 5 - Handle At Some point
> > Type:
> > (1) Cool add-ons
> > (2) Secondary (tertiary?) data/tables
> > ==========================
> > ==========================
> > PRIORITY 6 - Unknown
> > ==========================

#1728 From: "KJOK" <kjokbaseball@...>
Date: Sun Aug 17, 2003 6:20 pm
Subject: Re: POLICY - 001 - BDB ROADMAP
kjokbaseball
Send Email Send Email
 
--- In baseball-databank@yahoogroups.com, Sean Forman <sean-
forman@b...> wrote:
> .............
> I would like us to take a few weeks or so and hash out where we
want to
> be in the next year or two.
>
> I would like to see more discussion as to what the priorities are
and I
> will put together another document listing priorities and I hope
> iteratively we can arrive at a consensus as to what needs to be
done and
> what needs to be decided before moving forward.
>
> For example, it sounds like syncing with retro ID's is a big
priority.
>
> What are others from the list I posted earlier?
I'd like to go ahead and make updating the PitchingPost and
BattingPost Tables a priority.  I have the data, and can volunteer to
go ahead and get this taken care of, hopefully this week.

I also created a folder in the files section called "Data
Corrections" to post corrected tables or corrections needing to be
made to tables as I think this should also be considered "high
priority" for the next few weeks.  This folder is where I'll post the
new BattingPost and PitchingPost Tables, the new Teams Table (with
Park ID's), etc.

THANKS,
Kevin

#1729 From: "tjruane" <truane@...>
Date: Mon Aug 18, 2003 3:53 pm
Subject: Re: Retro ID Problems
tjruane
Send Email Send Email
 
In response to my question about "\N" in the outfield fielding file,
Tom wrote:

> \N is probably "null".
>
> There's inconsistency in the files with those pesky
> escape characters (though \N is usually endline), and
> trying to get things into ascii format.

So should I interpret these fields as 0?  Or unknown?  Clearly, we
don't want endlines in these fields.

Also, any thoughts on the accuracy of this data.  I did two simply
checks (ensuring that the total LF-CF-RF sum to at least the OF games
played, and that no LF-CR-RF total exceeds OF games played) and found
quite a few errors.

Tom Ruane

#1730 From: "Jeff Burk" <arkyvaughan@...>
Date: Mon Aug 18, 2003 4:00 pm
Subject: Re: Retro ID Problems
arkyvaughan
Send Email Send Email
 
What if a player started a game in center field but moved over to
right field later in the game?

Under a regime where outfield games are counted together, he would
have one game played.

Under a regime where left field, center field, and right field games
are counted separately, would he not have one game in center field
and one game in right field?

I am not saying the data may not have errors, but I think in this
case the sum of center field games and right field games would exceed
the total of outfield games, but this would not be erroneous.

--- In baseball-databank@yahoogroups.com, "tjruane" <truane@v...>
wrote:
> In response to my question about "\N" in the outfield fielding file,
> Tom wrote:
>
> > \N is probably "null".
> >
> > There's inconsistency in the files with those pesky
> > escape characters (though \N is usually endline), and
> > trying to get things into ascii format.
>
> So should I interpret these fields as 0?  Or unknown?  Clearly, we
> don't want endlines in these fields.
>
> Also, any thoughts on the accuracy of this data.  I did two simply
> checks (ensuring that the total LF-CF-RF sum to at least the OF
games
> played, and that no LF-CR-RF total exceeds OF games played) and
found
> quite a few errors.
>
> Tom Ruane

#1731 From: "tmasc@..." <tmasc@...>
Date: Mon Aug 18, 2003 4:16 pm
Subject: Re: Re: Retro ID Problems
tangotiger
Send Email Send Email
 
--- tjruane <truane@...> wrote:
> So should I interpret these fields as 0?  Or
> unknown?  Clearly, we
> don't want endlines in these fields.
>

Unknown.  Any

,null,
,\null,
,\N,
,,

fields are considered unknown.

> Also, any thoughts on the accuracy of this data.  I
> did two simply
> checks (ensuring that the total LF-CF-RF sum to at
> least the OF games
> played, and that no LF-CR-RF total exceeds OF games
> played) and found
> quite a few errors.
>
> Tom Ruane
>

I agree with the other reply that the sums shouldn't
be a check.  Definitely the individual LF,CF,RF games
cannot exceed the OF games.

***

This is a similar issue that we really don't have
anywhere in the database that shows exactly how many
games a player played, since we have "G" in hitting,
fielding, and pitching tables.

However, I get the feeling that the "G" in the hitting
table is actually a player's "G" as a hitter and/or
fielder, so this should be corrected somehow.

I'm pretty sure, we can't tell how many unique games
Babe Ruth actually played.

Tom


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

#1732 From: "Jeff Burk" <arkyvaughan@...>
Date: Mon Aug 18, 2003 4:19 pm
Subject: Pitching and Fielding Tables
arkyvaughan
Send Email Send Email
 
I am trying to update the pitching and fielding tables in the next
several weeks so that they can be incorporated into the next release
of the database this fall.

In addition to the extra columns added by Derek Adair back in June, I
am trying to add SH and SF to the pitching table. I have some of this
data electronically for the last 15 years, but if anyone has access
to it electronically prior to 1987 please let me know.  I see on
Retrosheet that SH and SF for individual pitchers is available for
most seasons. Is there a source at Retrosheet to get this from?

With the post-1987 SH and SF data, I also have blown saves, holds,
and quality starts. I am trying to get this into the database as
well, but in the interest of not biting off more than I can chew, I
will hold off if I am unable to get it completed in time to be error-
checked.

For the fielding table, I have innings played, zone chances, zone
outs, and LF/CF/RF breakouts since 1987. I notice we have innings
played and LF/CF/RF breakouts for some recent seasons. I will try to
fill in holes for the last 15 years.

The problem I have in both cases is my data is not segregated for
players with multiple stints. Some of this can be resolved easily (a
pitcher with no relief appearances for Team A must have all his blown
saves and holds for Team B; a pitcher with no starts for Team A must
have all his quality saves for Team B; a pitcher who only played
second base for Team A must have all his innings, zone chances, and
zone outs at other positions for Team B). In other cases, I have
printed sources that can help fill in holes. At worst, I can refer to
box scores and game logs to figure out blown saves, holds, and
quality starts.

The only statistics I am not confident in being able to get complete
breakouts for is zone chances and zone outs. It would be nice to have
this rather than simply zone rating, since it would make it possible
to add up over multiple seasons.

Finally, once we get the Retrosheet IDs fixed, shouldn't we be able
to figure out games started for the fielding table since 1972? The
new game logs have the starters at each position for every game. I
can probably do this with the game logs database I created, but the
IDs have to be accurate to do it.

#1733 From: "Jeff Burk" <arkyvaughan@...>
Date: Mon Aug 18, 2003 4:30 pm
Subject: Re: Pitching and Fielding Tables
arkyvaughan
Send Email Send Email
 
--- In baseball-databank@yahoogroups.com, "Jeff Burk"
<arkyvaughan@b...> wrote:

> For the fielding table, I have innings played, zone chances, zone
> outs, and LF/CF/RF breakouts since 1987. I notice we have innings
> played and LF/CF/RF breakouts for some recent seasons. I will try
to
> fill in holes for the last 15 years.

I just noticed that files with defensive innings played for 1972-1992
and 1999-2002 were posted in June. I will thus concentrate on
bridging the gap (1993-1998) in this aspect of the work that I'm
doing.

Since we have defensive innings for 1972-1992 and 1999-2002, does
this also mean we have LF/CF/RF breakouts for those seasons as well?
If so, I will concentrate just on 1993-1998 in that regard, too.

#1734 From: "tmasc@..." <tmasc@...>
Date: Mon Aug 18, 2003 4:34 pm
Subject: Re: Pitching and Fielding Tables
tangotiger
Send Email Send Email
 
Jeff,

You may find this valuable:

http://groups.yahoo.com/group/baseball-databank/files/%20Positions/

================
InningPosition7292.zip
Number of outs recorded by team, while fielder in
game. 1972-1992. By retroid, team, year, pos. Divide
by 3 to get innings played.

InnningPosition9902.zip
Number of outs recorded by team, while fielder in
game. 1999-2002. By retroid, team, year, pos. Divide
by 3 to get innings played.
================

All data was retrieved using Ray Kerby's software at
www.astrosdaily.com/ass , which is the best software
freely available to parse the wonderful event files at
retrosheet, and I had to do a little manipulation.

As I mentioned, I'm kicking myself for not having run
it to include PO,A,E.  And, since you just brought it
up, I can also run it to include GS and G.

If you want to figure out this extra information, feel
free.  I might be able to get to it later on.

As for SH, SF, note that with the sublime retrosheet,
we can get the complete "opponent hitting lines" for
all pitchers, 1972-1992.  As well, we can include
WP,PB,BK, pickoffs by pitcher, pickoffs by catcher,
SB, CS (each by base) if we so desired.

The real issue with this is to guarantee the complete
compliance between the RetroID and the BDBid.  Derek
Adair has probably done the most to ensure that, but
I'm not sure if we've had someone else independently
confirm this (at least for the 1972-1992 time period).

The question is: how much of this stuff do we want to
add?  People are free to do the work they want, and
contribute what they want, but once people respond
with their priority lists, and Sean implements his
roadmap, the incorporation of the data into the
database will probably be subject to those deadlines.

Thanks, Tom


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

#1735 From: "Jeff Burk" <arkyvaughan@...>
Date: Mon Aug 18, 2003 9:41 pm
Subject: Re: Pitching and Fielding Tables
arkyvaughan
Send Email Send Email
 
--- In baseball-databank@yahoogroups.com, "tmasc@y..." <tmasc@y...>
wrote:

> All data was retrieved using Ray Kerby's software at
> www.astrosdaily.com/ass , which is the best software
> freely available to parse the wonderful event files at
> retrosheet, and I had to do a little manipulation.
>
> As I mentioned, I'm kicking myself for not having run
> it to include PO,A,E.  And, since you just brought it
> up, I can also run it to include GS and G.
>
> If you want to figure out this extra information, feel
> free.  I might be able to get to it later on.
>
> As for SH, SF, note that with the sublime retrosheet,
> we can get the complete "opponent hitting lines" for
> all pitchers, 1972-1992.  As well, we can include
> WP,PB,BK, pickoffs by pitcher, pickoffs by catcher,
> SB, CS (each by base) if we so desired.
>
> The real issue with this is to guarantee the complete
> compliance between the RetroID and the BDBid.  Derek
> Adair has probably done the most to ensure that, but
> I'm not sure if we've had someone else independently
> confirm this (at least for the 1972-1992 time period).
>
> The question is: how much of this stuff do we want to
> add?  People are free to do the work they want, and
> contribute what they want, but once people respond
> with their priority lists, and Sean implements his
> roadmap, the incorporation of the data into the
> database will probably be subject to those deadlines.

Thanks for the tip, Tom.  I forgot Ray had that program and data
available.  (Too bad he is sworn to secrecy with respect to the post-
1992 data.)  I have set up the program and imported the data, and
will begin using it to fill in holes.  I agree with you that the
stuff we add right now should be limited to what can realistically be
compiled and error-checked for the next update.  Moreover, the A.S.S.
software and Retrosheet data are so versatile, someone could create
literally scores of new columns.  There has to be some kind of
limitation.  I will stick with columns that I can compile from the
1972-1992 Retrosheet data and supplement from what I already have
electronically from 1993-2002.  That way, the expanded data set will
run continuously from 1972 to 2002.

#1736 From: Michael Mavrogiannis <mmavrogi@...>
Date: Mon Aug 18, 2003 10:37 pm
Subject: Statement of Scope
mmavrogi
Send Email Send Email
 
Has anyone ever considered writing a declaration of the Scope for the
Baseball Database and/or this Yahoo Group. A useful model of a Statement of
Scope can be found in the very first sentence at the top of Retrosheet's
website, supplemented by the few paragraphs to be found by following their
adjacent hyperlink.

On Sat, Aug 16, Sean L. wrote:
<<
My goal will be to do a release of the Access version of the
database by mid-November.  At that point, we should have been able to
integrate 2003 playing stats, postseason data, and award winners.
>>
That (plus maybe a Master table) is nearly everything I expect from the DB
I've come to know and love over the years.

On Sun, Aug 17, Sean F., perhaps using the future tense out of politeness,
observed:
<<
we are going to be swamped with messages that the busier members of
this effort won't be able to follow and that important decisions will be
made without consensus
>>
While lists of Japanese minor league pitching coaches, or whatever it is
being discussed in those messages I'm PgDn'ing past, are fine things to
compile and discuss on the Internet, the complete set of all such projects
should not weigh down our efforts here.

A succinct statement of What We Do might go a long way toward reducing the
swamping effect. Without it, there is no consensus as to what we don't do,
and we run the risk of becoming a depository of any and every research
project that someone with a modem started and worked on until they got tired
of it.

#1737 From: Michael Westbay <westbaystars@...>
Date: Tue Aug 19, 2003 12:04 am
Subject: NULL in CSV
westbaystars
Send Email Send Email
 
I did an experiment to see how various values would be interpreted in MySQL.

The table:

     create table Test (
       sampleInt int,
       sampleString varchar(10)
     );

The data:

     1,text
     null,null
     ,
     \N,\N
     \null,\null

The results:

     +-----------+--------------+
     | sampleInt | sampleString |
     +-----------+--------------+
     |         1 | text         |
     |         0 | null         |
     |         0 |              |
     |      NULL | NULL         |
     |         0 |
     ull         |
     +-----------+--------------+

As you can see, only ¥N became NULL.  The text for line 2 became the
string "null."  Having a blank field (i.e. ",," in a CSV file) becomes a
blank string (i.e. "").  ¥null became a carriage return followed by "ull."

In all cases other than ¥N, the entry was considered non-numeric and
converted to 0 for the numeric field.

The only valid NULL value in CSV files, so far as MySQL is concerned, is ¥N.

Note, "¥N" (with quotes) will be interpreted as a carriage return.

Hope this helps to clear things up.

--
Michael Westbay
Writer/System Administrator
http://JapaneseBaseball.com

#1738 From: "tmasc@..." <tmasc@...>
Date: Tue Aug 19, 2003 12:15 am
Subject: Re: NULL in CSV
tangotiger
Send Email Send Email
 
Funny, but when I important an empty field, like
,,
in Oracle, it treats it as null.

In Access, depending in which method you choose to
import (they have 2 ways), you either have to set it
as
,,
or you set it as
,\null,
(or something like that).

We discussed this last year for Access specifically.
I'll see if I can find that post tomorrow.

I think ,, is always easier, and it's up to the
particular app, or person running it, to convert that
to whatever format that app needs it.  I think.

Tom


--- Michael Westbay
<westbaystars@...> wrote:
> I did an experiment to see how various values would
> be interpreted in MySQL.
>
> The table:
>
>     create table Test (
>       sampleInt int,
>       sampleString varchar(10)
>     );
>
> The data:
>
>     1,text
>     null,null
>     ,
>     \N,\N
>     \null,\null
>
> The results:
>
>     +-----------+--------------+
>     | sampleInt | sampleString |
>     +-----------+--------------+
>     |         1 | text         |
>     |         0 | null         |
>     |         0 |              |
>     |      NULL | NULL         |
>     |         0 |
>     ull         |
>     +-----------+--------------+
>
> As you can see, only ¥N became NULL.  The text for
> line 2 became the
> string "null."  Having a blank field (i.e. ",," in a
> CSV file) becomes a
> blank string (i.e. "").  ¥null became a carriage
> return followed by "ull."
>
> In all cases other than ¥N, the entry was considered
> non-numeric and
> converted to 0 for the numeric field.
>
> The only valid NULL value in CSV files, so far as
> MySQL is concerned, is ¥N.
>
> Note, "¥N" (with quotes) will be interpreted as a
> carriage return.
>
> Hope this helps to clear things up.
>
> --
> Michael Westbay
> Writer/System Administrator
> http://JapaneseBaseball.com
>
>
>
> ------------------------ Yahoo! Groups Sponsor
>
> http://www.baseball-databank.org/
>
> To unsubscribe from this group, send an email to:
> baseball-databank-unsubscribe@yahoogroups.com
>
>
>
> Your use of Yahoo! Groups is subject to
> http://docs.yahoo.com/info/terms/
>
>


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

#1739 From: Michael Westbay <westbaystars@...>
Date: Tue Aug 19, 2003 4:17 am
Subject: Re: NULL in CSV
westbaystars
Send Email Send Email
 
Tom wrote:

>Funny, but when I important an empty field, like
>,,
>in Oracle, it treats it as null.
>
>In Access, [...]
>

I guess that a formal convention accepted by the group as a whole would
be in order.  A rule to follow for all CSV files.

I like the ",," personally.  I can then run it through sed to make it
",¥N," before importing.  If null is not allowed in a column, then that
column should have an appropriate DEFAULT value (such at '' for strings
or 0 for numeric fields).

Consistency is what would be needed to allow scripts to prepare the data
for import into one's favorite database.  And the rule should be clearly
stated in a FAQ.

--
Michael Westbay
Writer/System Administrator
http://JapaneseBaseball.com

#1740 From: Sean Forman <sean-forman@...>
Date: Tue Aug 19, 2003 11:22 am
Subject: Re: Statement of Scope
sforman71
Send Email Send Email
 
Michael Mavrogiannis wrote:
> Has anyone ever considered writing a declaration of the Scope for the
> Baseball Database and/or this Yahoo Group. A useful model of a Statement of
> Scope can be found in the very first sentence at the top of Retrosheet's
> website, supplemented by the few paragraphs to be found by following their
> adjacent hyperlink.


I agree that something like this would be useful.  This is poorly
stated, but here is what I put out in my first statement.

The focus of Baseball-Databank.org is to provide baseball information.
Just like the linux kernel which backs a large number of
distributions, this DataBank will strive to be the backbone that can
be freely used as others see fit.

This is definitely too vague.



> That (plus maybe a Master table) is nearly everything I expect from the DB
> I've come to know and love over the years.


There is a dichotomy between the end user issues and the people who
distribute to the end users.  I see this project as being for the people
who distribute to the end users.  Just like in Retrosheet, I don't see a
lot of people using bevent.exe and the raw pbp files to find data.  I
think this project is going to be best served if it keeps to producing a
   normalized database in the form of a series of CSV (comma-separated
values) files.  Then someone like Sean Lahman can take that and create
an easy to use product for a larger audience.


> While lists of Japanese minor league pitching coaches, or whatever it is
> being discussed in those messages I'm PgDn'ing past, are fine things to
> compile and discuss on the Internet, the complete set of all such projects
> should not weigh down our efforts here.


Another concept that we've discussed quite a bit is modules and I think
a table of Japanese Minor League Pitching Coaches (JMLPC) would be
appropriate within this module structure.  It wouldn't be in the core
DB, but if Sean Lahman or others decided there was a demand for this
they could package it into their product.



> A succinct statement of What We Do might go a long way toward reducing the
> swamping effect. Without it, there is no consensus as to what we don't do,
> and we run the risk of becoming a depository of any and every research
> project that someone with a modem started and worked on until they got tired
> of it.

I think professional baseball would be the main one here.  I think if
someone wanted to do College data they could start their own project to
match it up with our data.

sean

Baseball Stats!   http://www.Baseball-Reference.com/
Baseball Analysis!    http://www.BaseballPrimer.com/

#1741 From: "tmasc@..." <tmasc@...>
Date: Tue Aug 19, 2003 11:38 am
Subject: Re: Statement of Scope
tangotiger
Send Email Send Email
 
--- Sean Forman <sean-forman@...> >
values) files.  Then someone like Sean Lahman can
> take that and create
> an easy to use product for a larger audience.
>
>

I've at times thought about creating some cool stuff
with this, that includes work that I do.  For example,
if you wanted all players who played 250 games over a
2-year span and wanted to know how they performed over
the next 2 years, or some crazy sh-t like that.  Or
custom LWTS or BAseRuns or Leverage Index, etc, etc.

Since Sean Lahman I'll guess compiled most of this,
I'm always apprehensive of trying to cut into his pie.
  Is there a position that Sean L takes on this?

If I tell users:"go download the csv files from BDB,
and come back to my site, and you can download these
custom reports, and I'll create an installer for you
to merge it all", is that ok?

Tom




__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

#1742 From: Sean Forman <sean-forman@...>
Date: Tue Aug 19, 2003 11:51 am
Subject: BDB Development Roadmap
sforman71
Send Email Send Email
 
This document lays out the direction of BDB development over the next
three months.  This document is a compromise between all that we would
like to accomplish and what can be safely done in that time period.
Eventually, we will include all of the items on this list.  I'll post
the first policy topic tomorrow.

Sincerely,
Sean Forman

Baseball Stats!   http://www.Baseball-Reference.com/
Baseball Analysis!    http://www.BaseballPrimer.com/
---------------------------------------------------------

Baseball-Databank.org Technology Plan
http://www.baseball-databank.org/
http://groups.yahoo.com/group/baseball-databank/

The focus of Baseball-Databank.org is to provide baseball information.
Just like the linux kernel which backs a large number of
distributions, this DataBank will strive to be the backbone that can
be freely used as others see fit.  A diverse group of people work on
the BDB.

This roadmap will state the next year's goals for the BDB and will
list the current holdings in the BDB, both those that have been
implemented and those that have not.  It will also state the priority
additions for the future.

See the most recent 'def' file in
http://www.baseball-databank.org/files/ for the current state of the
DB and a list of all the columns and tables in the release BDB.

The current tables are as follows.

TABLE                     =>   ROWS
Master                    =>  15965
Teams                     =>   2415
TeamsFranchises           =>    120
TeamsHalf                 =>     52
Batting                   =>  83284
Pitching                  =>  34888
Fielding                  => 119417
FieldingOF                =>  21600
Salaries                  =>  14785
Managers                  =>   2965
ManagersHalf              =>     93
Allstar                   =>   3907
AwardsPlayers             =>   1424
AwardsSharePlayers        =>   5658
AwardsManagers            =>     41
AwardsShareManagers       =>    241
HallOfFame                =>   3283
BattingPost               =>   8280
PitchingPost              =>   3195
SeriesPost                =>    208

----------------------------------------------------------------------

--------------------------
Policies to be discussed.
--------------------------
Since there are a number I'm proposing that we do the following.  I
will post the topic for discussion.  Discuss it for a week or two in
the discussion group and then arrive at a consensus at which point
someone writes up a policy statement.

Please keep in mind that this will take some time.  We are not going
to be able to discuss all of these at once.  Dates are when the
discussion of these topics will begin.  Two weeks should be sufficient
time to come to a consensus.

Aug. 15, Create a statement of purpose for this project.

Sept. 1, Clarify copyright and licensing issues with the BDB.
Determine the core database and then define modules built around that
database.

Sept. 15, Reiterate the convention for naming posts.  Perhaps, create an
FAQ for the site that can be posted to the mailing list as needed.

Sept. 20, Description of how to incorporate new data into the DB.

Oct. 1, Some technique of versioning control to allow others to
update parts of the database.  Perhaps CVS.  Allow decentralization of
DB control.

Oct. 15, Further normalization of the tables.
     Settle numeric vs. non-numeric keys once and for all. Refine the
     primary keys used in the existing tables.  should lg_ID be in keys?
     Determine policy on handling player names.

Nov. 1, Check on a CSV standard for DB data and follow that standard.
Direct people to the MS Access Shells and other utilities.
    Tools for placing a standard DB into a variety of different formats.
    Most notably Access.  Clarify use of NULL and escape characters.
    Determine when things are actually NULL and when they are just 0.


Nov. 15, Set up a detailed changelog allowing other's to sync their DBs.

Dec. 1, Provide a plan for annual organization of ideas and taking
stock.  Hash out direction for the upcoming year and what data should
be added when.

Dec. 15, Create mechanism for citing who did what and where things
came from.

Jan. 15, Create a means for stating the quality of the data within the
tables.  Perhaps with Alpha, Beta, Supported.

Feb. 1, Keep bio data synced with the state of the art from the
Biographical committee.  Are they our final arbiter?

Feb. 15, Discuss place names using iso standards



-----------------------------------------------------------

--------------------------------------
Immediate data incorporation
--------------------------------------

There will be at least two new database releases prior to Nov. 15.
The first will incorporate at least the following items by
Sept. 20. This will be done primarily by Sean Forman and will be
checked by other members of the BDB.

Correct Xref with retrosheet IDs

Incorporation of completed additional pitching data.

Franchise/Park changes introduced in the DB by KJOK

Correction of any known errors.

All-Star Batting, All-Star Pitching, All-Star Fielding, All-Star Rosters

Perhaps some other small things like Passed Balls, completion of first
and last games.

On Sept. 20, we will begin to assess what is and is not needed for the
2003 update.  This data will be divided up among the members (with
redundancy in some important areas) and will begin to be assimilated
on Oct. 5.  Completion and proofing of this data should occur by
Nov. 4.  A description of the tasks to be performed and the proofing
done should be created as we are creating the new update.  This
description will guide us in future updates.

-----------------------------------------------------------

------------------------------------------------
Long and Medium-term data incorporation
------------------------------------------------

Below are listed additional data sets.  Following the 2003 update's
release, we will take stock of what we have here and then begin to
incorporate the completed portions into the BDB.

(Incorporate Mike Crain's data)
Non-playing personnel: Coaches
Non-playing personnel: Executives
Non-playing personnel: Umpires


Clear up games played issues in the batting table when relating to
pinch hitters, pinch runners and pitchers.  A games played table?


Innings played for the fielding records from retrosheet data.

Team data, xref data, attendance data, home road data is added

Accomplishments and assorted awards

Hitting streaks

retired numbers

Scouting Data

College Data (Sean Holtz has generously offered this)

HOF Voting

Baseball Magazine Awards

Japanese baseball Database.

shell of minor league data

19th century transactions

1900 AL from Paul Wendt

Retrosheet Trade Data
Retrosheet Transaction Data
Retrosheet Gamelog data
MLB schedules as scheduled  from schedulejunkie

Banned and blacklisted players

Starts for Sandy Koufax

Other pitcher starts

Win Probability added by Doug Drinen

Review the managerial data for pre-1900.  Make certain that it is
correct.

Look at MLB's stats and see if we can work with them on adding their
additional data.

DB front ends both on the web and off web.

Add splits data and other breakouts from Retrosheet as additional
modules.

post-season managerial data

post-season game logs.

Uniform Numbers

Match the pitchingPost and pitchingAllStar data with what we have in
the regular pitching tables

Post-season Umpires

Military Service

Commissioners

Use retrosheet data to add LF, RF, CF data for more years in addition
to OF for those years.

Further pitching, batting and fielding data.

Batting: Pitches Seen, Groundballs, Flyballs
Pitching: Save Opportunities, Holds, Inherited Runners, Inherited
Runners Scored, Doubles Allowed, Triples Allowed, RBI Allowed, Stolen
Bases Allowed, Caught Stealing Allowed, GDP Induced, Pickoffs, Pitches
Thrown, Groundballs, Flyballs
Fielding: Games Started, Innings Played, Triple Plays, Zone Chances,
Zone Outs, Stolen Bases Allowed by Catchers, Caught Stealing Allowed
by Catchers, Pickoffs by Catchers


Thank you to all for the considerable work you've contributed to this
project.

#1743 From: "tangotiger" <tmasc@...>
Date: Tue Aug 19, 2003 7:24 pm
Subject: STINTS - process
tangotiger
Send Email Send Email
 
This is probably a question more for Derek, who I think did all the
grunt work.

I find that the stintID is alot work for little gain.  As we start to
add more data, say from the minor leagues, it won't be clear which
team a player played first, but stintID is part of the key.

So, you might have
player1,2003,1,Harr
player1,2003,2,Calg

Did he really play in that order, or are we simply fudging it to get
uniqueness?  And of course, you can have the minor league version of
Rob Ducey of players going from team1 to team2 back to team1 (or Jeff
Manto...twice!).  Or, how about stints between MLB and minors, as the
Yankees players of old are famous for?

Is the stintID going to be an obstacle?  Should we do away with it,
and send it back to its own table?  Are we going to have a minor
league table for batting with a different key than that for MLB (one
with stint, and one without)?  Are we going to fudge in stints for
the minors?

Tom

#1744 From: "tjruane" <truane@...>
Date: Tue Aug 19, 2003 10:07 pm
Subject: Re: Retro ID Problems
tjruane
Send Email Send Email
 
Jeff Burk wrote:

> What if a player started a game in center field but moved over to
> right field later in the game?
>
> Under a regime where outfield games are counted together, he would
> have one game played.
>
> Under a regime where left field, center field, and right field
games
> are counted separately, would he not have one game in center field
> and one game in right field?
>
> I am not saying the data may not have errors, but I think in this
> case the sum of center field games and right field games would
exceed
> the total of outfield games, but this would not be erroneous.

This was in response to the following that I wrote:

> Also, any thoughts on the accuracy of this data.  I did two simply
> checks (ensuring that the total LF-CF-RF sum to at least the OF
games
> played, and that no LF-CR-RF total exceeds OF games played) and
found
> quite a few errors.

Both Jeff (and Tom, who agreed with Jeff) must not have carefully read
what I wrote.  Read it again.  Regardless of how many outfield
positions a fielder occupies during a game, the TOTAL sum must add to
AT LEAST the OF games played.  So it's fine if Joe Blow has a 50-50-50
LF-CF-RF split and has 100 games in the OF, but it is NOT fine if Joe
Blow has a 10-10-10 split and 100 games in the OF.  The three numbers
must sum to at least 100 to make sense.  That was the example I had
provided two weeks ago, and there are several cases of it in the
fielding OF data.

Tom Ruane

#1745 From: Paul Wendt <pgw@...>
Date: Tue Aug 19, 2003 9:20 pm
Subject: Re: STINTS - process
pgw02472
Send Email Send Email
 
On Tue, 19 Aug 2003, tangotiger wrote:

> So, you might have
> player1,2003,1,Harr
> player1,2003,2,Calg
>
> Did he really play in that order, or are we simply fudging it to get
> uniqueness?
. . .
> Is the stintID going to be an obstacle?

If the key is {Player, Year, Stint}, then yes, an obstacle.

I suppose that numerical order of stint IDs can feasibly represent
chronological order only within league (or within all major leagues).
It will generally be unreasonable to demand more than that before
incorporating data for a minor league, because the cost in original
research will be too great, more than the ml data is worth.

In Tom's example: If Harr and Calg are in different leagues, then each
should be stint 1 within its league-season.  At least, the design should
permit that.  In effect, league is part of the key.

--Paul

#1746 From: Sean Forman <sean-forman@...>
Date: Tue Aug 19, 2003 11:22 pm
Subject: Re: Re: Retro ID Problems
sforman71
Send Email Send Email
 
h Jeff (and Tom, who agreed with Jeff) must not have carefully read
> what I wrote.  Read it again.  Regardless of how many outfield
> positions a fielder occupies during a game, the TOTAL sum must add to
> AT LEAST the OF games played.  So it's fine if Joe Blow has a 50-50-50
> LF-CF-RF split and has 100 games in the OF, but it is NOT fine if Joe
> Blow has a 10-10-10 split and 100 games in the OF.  The three numbers
> must sum to at least 100 to make sense.  That was the example I had
> provided two weeks ago, and there are several cases of it in the
> fielding OF data.
>
> Tom Ruane

These need to be fixed.  If I recall, some of this shows up in TB as well.

Sincerely,
Sean Forman

Baseball Stats!   http://www.Baseball-Reference.com/
Baseball Analysis!    http://www.BaseballPrimer.com/

#1747 From: Derek Adair <dadair@...>
Date: Wed Aug 20, 2003 4:21 am
Subject: Re: STINTS - process
D_Adair
Send Email Send Email
 
On Tue, 19 Aug 2003, tangotiger wrote:

> This is probably a question more for Derek, who I think did all the
> grunt work.
>
> I find that the stintID is alot work for little gain.  As we start to
> add more data, say from the minor leagues, it won't be clear which
> team a player played first, but stintID is part of the key.
>
> So, you might have
> player1,2003,1,Harr
> player1,2003,2,Calg
>
> Did he really play in that order, or are we simply fudging it to get
> uniqueness?  And of course, you can have the minor league version of
> Rob Ducey of players going from team1 to team2 back to team1 (or Jeff
> Manto...twice!).  Or, how about stints between MLB and minors, as the
> Yankees players of old are famous for?
>
> Is the stintID going to be an obstacle?  Should we do away with it,
> and send it back to its own table?  Are we going to have a minor
> league table for batting with a different key than that for MLB (one
> with stint, and one without)?  Are we going to fudge in stints for
> the minors?

I'm on vacation and will be back in a week; I imagine I'll be answering a
lot of these threads when I return.

Something to keep in mind is that minor league stints obviously don't
demarcate separate major league stints. If we were to maintain stints
across all of the leagues, I think as both you and Paul W. state, we're
biting off more than we can chew, and for not too much gain.

Ideally, we'd have all information with stint ID's for new leagues we'd
add in. I agree with Paul's assessment that stints should be unique only
within their own league. Perhaps an enterprising soul at some point will
create a player tracker database, including all of the transaction data we
currently have, plus a whole lot more. For now, though, I think we'll stop
a lot of forward progress by requiring cross-league stint info.

If the data isn't there to denote stints for new data, I'd suggest rather
than fudging with 1's and 2's, perhaps use an alpha key. X, Y, Z,
what-have-you, could be used to denote stints we know are separate but
don't know in which order they happened. This would need to be documented,
however.

Regards,
Derek

#1748 From: Sean Forman <sean-forman@...>
Date: Wed Aug 20, 2003 10:54 am
Subject: POLICY - 002 - Statement of Purpose
sforman71
Send Email Send Email
 
POLICY - 002 - Statement of Purpose

The goal here is to create a succinct statement of purpose for this
project.

Questions that need to be answered.

Who are the intended users of this project for?

What data should be included?  At what level (career, season, game,
play-by-play, pitch-by-pitch, anything), from what leagues (majors,
minors, college, foreign, international, etc.)

How much should we repackage retrosheet data? (which we are free to do
under their license)

Would we consider buying data at some point?

Should we look at forming a not for profit corporation?

Other issues?
--
Sincerely,
Sean Forman

Baseball Stats!   http://www.Baseball-Reference.com/
Baseball Analysis!    http://www.BaseballPrimer.com/

#1749 From: "tangotiger" <tmasc@...>
Date: Wed Aug 20, 2003 6:22 pm
Subject: Re: Retro ID Problems
tangotiger
Send Email Send Email
 
--- In baseball-databank@yahoogroups.com, "tjruane" <truane@v...>
wrote:
> Both Jeff (and Tom, who agreed with Jeff) must not have carefully
read
> what I wrote.  Read it again.  Regardless of how many outfield
> positions a fielder occupies during a game, the TOTAL sum must add
to
> AT LEAST the OF games played.


Tom, my mistake.

I read your statement:

"I did two simply checks (ensuring that the total LF-CF-RF sum to at
least the OF games played, and ...)"

as
"I did two simple [sic] checks (that at least ensuring that the total
LF-CF-RF sum to the OF games played, and ...)"

Kinda like saying:
"The league batting hits has to match, at least, to the league
pitching hits...if you can't even get that, then we've got a
problem.".  I must have seen those commas in your sentence.

Sorry for the confusion.

Tom

#1750 From: "tangotiger" <tmasc@...>
Date: Wed Aug 20, 2003 6:25 pm
Subject: Re: POLICY - 002 - Statement of Purpose
tangotiger
Send Email Send Email
 
--- In baseball-databank@yahoogroups.com, Sean Forman <sean-
forman@b...> wrote:

Sean,

I don't know about anyone else, but my head is going to explode from
info-overload!

I'd be happy if you do a "rapid-fire" thing on these items.  Start an
email like
POLICY - Statement of Purpose - Item 1 - Intended Users
put in your question, give 24 hours for replies, and start the next
one.

This way, we can clear things out, and we can concentrate on things
one at a time.

Just my 1 cent.

Thanks, Tom

Messages 1721 - 1750 of 4385   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help