Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

baseball-databank · Baseball Databank

The Yahoo! Groups Product Blog

Check it out!

Group Information

? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Hear how Yahoo! Groups has changed the lives of others. Take me there.

Messages

Advanced
Messages Help
Messages 4048 - 4077 of 4393   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Show Message Summaries Sort by Date ^  
#4048 From: "KJOK" <kjokbaseball@...>
Date: Wed Feb 2, 2011 3:58 am
Subject: Re: BFP issues
kjokbaseball
Send Email Send Email
 
This is from Total Baseball - I believe it also applies to the Lahman database
and, by likely therefore applies to BDB:


       BATTERS FACING PITCHER Unavailable before 1903 in the National
       League. The 1903 and 1908 data was not published and has been
reconstructed.

       BFP was unavailable for the American League of 1901-1907. Excepting the NL
of 1876-1888 and the AA of 1882 and 1884-1887, for which John Tattersall
calculated BFP from box scores, earlier years in both leagues have had their BFP
constructed from available data in this manner:

subtract league base hits from league at-bats, divide by league innings pitched,
multiply by the pitcher's innings, and add his hits allowed, walks, hit by
pitch, and sacrifice hits, if available.

THANKS,
KJOK

--- In baseball-databank@yahoogroups.com, David Carter <terpsfan101@...> wrote:
>
> Clem,
>
> I spent alot of time cleaning up the BFP data earlier this year. I worked from
multiple sources (Spalding/Reach Guides, Newspapers, and Palmer's Data) and
estimated BFP when it wasn't available. The BFP/IP for any player/season is well
within an acceptable range. Like I said, I spent about a month going over the
BFP data and deciding what sources to use. Check out my custom Baseball Databank
Database that I made available earlier this year:
>
> http://www.mediafire.com/?newmuztnwww
>
> --- On Thu, 10/7/10, Clem Comly <ccomly@...> wrote:
>
> From: Clem Comly <ccomly@...>
> Subject: [baseball-databank] BFP issues
> To: baseball-databank@yahoogroups.com
> Date: Thursday, October 7, 2010, 8:29 AM
>
>
>
>
>
>
>
>  
>
>
>
>
>
>
>
>
>
>
>
>
>  I was using the data from last spring
> to look at BFP/IP.  Doing this I found a few impossible BFP numbers and a
> few inconsistencies with Retrosheet/Palmer data where the latter's data looks
> better.  I don't have the Retro data en masse,  just was manually
> looking up the best and worst BFP/IP seasons per BDB on the Retro
> site.
>  
>  I had figured out post-1900 leaders but when I checked against
> Retrosheet the BFPs
> did not match so I didn't show
> them
>                      
>
:                                       
> BDB
> Retro
>
                                        \
                      
> BFP  BFP
> 3.64  Dickson, Walt     
> 1914  PIT  FL    935  1035
> 3.65  Willett,
> Ed           1913 
> DET  AL   883 
> 1002
> ==============================
> 5.08  Vangilder, Elam 
> 1927  SLA  AL  1031  931
>  
> When I extended my search to 19th century, lowest
> BFP/IP were:
>  BFP/IP best and worst season (minimum 150 IP)
>  2.36 Healy,
> Egyptian    1891 Balt AA
>  2.79 Madden,
> Kid       1891 Balt AA
>  3.18 McMahon,
> Sadie  1891 Balt AA
>
>
> 1891 is even uglier than I thought.
> 1891 Baltimore AA only had 7
> pitchers--2 were below 10 games pitched with
> 20-60 IP: those 2 had BFPs of
> 0.
> Of the other 5, 4 have BFP/IP below 3.  The last, Sadie McMahon had
> 1601 BFP
> in 503 IP with 493 hits and 149 walks.
>
> 1891 NL has 2 teams
> with 0 BFP.  If you subtract those teams IPs from league
> totals the
> other 6 have about 7300 IP and 6400 BFP!
>
>
> Clem
> Comly
>

#4049 From: "Clem Comly" <ccomly@...>
Date: Wed Mar 2, 2011 5:13 pm
Subject: Data inconsistencies
ccomly2003
Send Email Send Email
 
Bill Nicholson (nichobi01) in fielding for PHI in 1949 has 91 while breakdown has 0-0-92.
The fielding s/b 92 games per Retrosheet.
 
Joe Lafata (lafatjo01) in fielding for 1947 has 19 games in OF while breakdown has 20-1-0.
The fielding s/b 20 games per Retrosheet.
 
Paul Waner (wanerpa01) in fielding for 1938 has 147 games in OF while breakdown has 0-1-148.
The fielding breakdown s/b 0-1-147 games per Retrosheet.
 
Tex Vache (vachete01) in fielding for 1925 has 53 games in OF while breakdown has 57-0-1.
The fielding breakdown s/b 52-0-1 per Retrosheet.
 
Myron "Moose" Grimshaw (grimsmy01) for 1907 has 18 games in OF while breakdown has 1-0-22.
The fielding breakdown s/b 1-0-17 per Retrosheet.
 
Bill Donovan (donovbi01) for  1904 has 1 game in OF while breakdown has 4-0-0.
The fielding breakdown s/b 0-0-1 per Retrosheet.
 
Clem Comly

#4050 From: "Clem Comly" <ccomly@...>
Date: Fri Mar 4, 2011 12:12 am
Subject: Data inconsistencies part 2
ccomly2003
Send Email Send Email
 
This group is similar but not the same.  These have records in Fielding for individual OF positions as well as official lumped together OF record but inconsistent FieldingOF record.
Joe Frazier ( frazijo01) in 1954 had in fielding 11 total games with 2-0-9 breakdown in Fielding but  2-0-8 breakdown in FieldingOF
retrosheet has him with 2-0-9 breakdown.
Dusty Rhodes ( rhodedu01) in 1954 had in fielding 37 total games with 34-2-1 breakdown in Fielding but 33-2-1 breakdown in FieldingOF
retrosheet has him with 34-2-1 breakdown
George Shuba ( shubage01) in 1954 had in fielding 11 total games with 7-0-6 breakdown in Fielding but  7-0-5 breakdown in FieldingOF
retrosheet has him with 7-0-6 breakdown.

#4051 From: "Clem Comly" <ccomly@...>
Date: Sun Mar 6, 2011 1:48 am
Subject: Data inconsistencies part 3
ccomly2003
Send Email Send Email
 
These players have only a combined OF record in Fielding that is inconsistent with matching FieldingOF record.
For mantlmi01 1953 Fielding total games was 121 versus FieldingOFs breakdown of 0-116-4. Per Retrosheet should be 121 and 1-114-6
dusaker01 1952 11 vs. 2-4-3 when per Retrosheet should be 9 and 3-3-3.
lowrepe01 1952 106 vs. 63-35-6 per Retrosheet s/b 69-35-7
maxwech01 1950 2 vs. 0-0-1 per Retrosheet s/b 0-0-2
olmolu01 1950 55 vs. 18-11-24 per Retrosheet s/b 19-13-24
reisepe01 1950 24 vs. 16-6-0 per Retrosheet s/b 16-7-1
walkeha01 1950 46 vs. 9-30-3 per Retrosheet s/b 9-36-3
cavarph01 1949 25 vs. 3-0-21 per Retrosheet s/b 6-0-20
racklma01 1949 stint 1 3 vs 0-0-0 per Retrosheet s/b 3-0-0
racklma01 1949 stint 3 44 vs 4-39-0 per Retrosheet s/b total 43 39-4-0
gordosi01 1948 23 vs. 18-0-2 per Retrosheet s/b 21-0-2
laytole01 1948 20 vs. 12-3-3 per Retrosheet s/b 12-4-4
muelldo01 1948 22 vs. 18-1-1 per Retrosheet s/b 18-1-3
haasbe01 1947 69 vs. 8-58-0 per Retrosheet s/b 9-61-0
litwhda01 1947 66 vs. 64-0-1 per Retrosheet s/b 66-0-0
rikarcu01 1947 79 vs. 5-28-45 per Retrosheet s/b 5-29-45
snidedu01 1947 25 vs. 4-13-7 per Retrosheet s/b 5-13-7
walkeha01 1947 stint 1 10 vs 2-7-0 per Retrosheet s/b total 9
grahaja01 1946 62 vs. 1-0-60 per Retrosheet s/b 1-0-61
hermage01 1946 34 vs. 12-4-17 per Retrosheet s/b 13-5-16
rosengo01 1946 stint 2 85 vs. 0-30-52 per Retrosheet s/b 1-34-58
wasdeji01 1946 stint 1 11 vs. 4-0-6 per Retrosheet s/b total 10
 
Clem Comly
 

#4052 From: Clay Dreslough <cjd@...>
Date: Tue Mar 8, 2011 1:33 am
Subject: Korea Baseball Organization
dreslough
Send Email Send Email
 
Does anyone know if a statistical database exists for the KBO, ideally
in a format similar to the CSV files for Major League Baseball in the
Baseball Databank?

Or perhaps in some format where I could use something like Excel to
massage the data as needed...

Thanks!

Clay

#4053 From: KJOK <kjokbaseball@...>
Date: Tue Mar 8, 2011 1:49 am
Subject: Re: Korea Baseball Organization
kjokbaseball
Send Email Send Email
 
Yes, there is a KBO database, but it does not include 2010 season data.
 
I will post it in the FILES section.
 
THANKS,
Kevin


From: Clay Dreslough <cjd@...>
To: baseball-databank@yahoogroups.com
Sent: Mon, March 7, 2011 7:33:02 PM
Subject: [baseball-databank] Korea Baseball Organization

 

Does anyone know if a statistical database exists for the KBO, ideally
in a format similar to the CSV files for Major League Baseball in the
Baseball Databank?

Or perhaps in some format where I could use something like Excel to
massage the data as needed...

Thanks!

Clay



#4054 From: baseball-databank@yahoogroups.com
Date: Tue Mar 8, 2011 2:05 am
Subject: New file uploaded to baseball-databank
baseball-databank@yahoogroups.com
Send Email Send Email
 
Hello,

This email message is a notification to let you know that
a file has been uploaded to the Files area of the baseball-databank
group.

   File        : /International Data/KBO_1982_2009.zip
   Uploaded by : kjokbaseball <kjokbaseball@...>
   Description : KBO Historical Database

You can access this file at the URL:
http://groups.yahoo.com/group/baseball-databank/files/International%20Data/KBO_1\
982_2009.zip

To learn more about file sharing for your group, please visit:
http://help.yahoo.com/l/us/yahoo/groups/original/members/web/index.html
Regards,

kjokbaseball <kjokbaseball@...>

#4055 From: baseball-databank@yahoogroups.com
Date: Tue Mar 8, 2011 2:38 am
Subject: New file uploaded to baseball-databank
baseball-databank@yahoogroups.com
Send Email Send Email
 
Hello,

This email message is a notification to let you know that
a file has been uploaded to the Files area of the baseball-databank
group.

   File        : /International Data/2009_1982-kbo.xlsx
   Uploaded by : kjokbaseball <kjokbaseball@...>
   Description : KBO Historical Database Excel Version

You can access this file at the URL:
http://groups.yahoo.com/group/baseball-databank/files/International%20Data/2009_\
1982-kbo.xlsx

To learn more about file sharing for your group, please visit:
http://help.yahoo.com/l/us/yahoo/groups/original/members/web/index.html
Regards,

kjokbaseball <kjokbaseball@...>

#4056 From: Bryan Walko <bryanwalko@...>
Date: Tue Mar 8, 2011 5:12 am
Subject: Re: Korea Baseball Organization
bryanwalko
Send Email Send Email
 
I haven't been able to maintain the database so far this year.  I may find the time over the next few weeks to update it for 2010, as it has been on my mind.

Bryan


On Mon, Mar 7, 2011 at 8:49 PM, KJOK <kjokbaseball@...> wrote:
 

Yes, there is a KBO database, but it does not include 2010 season data.
 
I will post it in the FILES section.
 
THANKS,
Kevin


From: Clay Dreslough <cjd@...>
To: baseball-databank@yahoogroups.com
Sent: Mon, March 7, 2011 7:33:02 PM
Subject: [baseball-databank] Korea Baseball Organization

 

Does anyone know if a statistical database exists for the KBO, ideally
in a format similar to the CSV files for Major League Baseball in the
Baseball Databank?

Or perhaps in some format where I could use something like Excel to
massage the data as needed...

Thanks!

Clay




#4057 From: KJOK <kjokbaseball@...>
Date: Tue Mar 8, 2011 3:44 pm
Subject: Re: Korea Baseball Organization
kjokbaseball
Send Email Send Email
 
Bryan:
 
Thank you for what you've already done, which is fantastic.  Look forward to 2010.
 
THANKS,
Kevin


From: Bryan Walko <bryanwalko@...>
To: baseball-databank@yahoogroups.com
Sent: Mon, March 7, 2011 11:12:27 PM
Subject: Re: [baseball-databank] Korea Baseball Organization

 

I haven't been able to maintain the database so far this year.  I may find the time over the next few weeks to update it for 2010, as it has been on my mind.


Bryan


On Mon, Mar 7, 2011 at 8:49 PM, KJOK <kjokbaseball@...> wrote:
 

Yes, there is a KBO database, but it does not include 2010 season data.
 
I will post it in the FILES section.
 
THANKS,
Kevin


From: Clay Dreslough <cjd@...>
To: baseball-databank@yahoogroups.com
Sent: Mon, March 7, 2011 7:33:02 PM
Subject: [baseball-databank] Korea Baseball Organization

 

Does anyone know if a statistical database exists for the KBO, ideally
in a format similar to the CSV files for Major League Baseball in the
Baseball Databank?

Or perhaps in some format where I could use something like Excel to
massage the data as needed...

Thanks!

Clay





#4058 From: "Clem Comly" <ccomly@...>
Date: Mon Mar 14, 2011 12:04 pm
Subject: Data inconsistencies part 5
ccomly2003
Send Email Send Email
 
These players have only a combined OF record in Fielding that is inconsistent with matching FieldingOF record as did those of part 3.  These are for seasons in span 1930-1945.  All should be data per Retrosheet.
 
adamsbu01 1945 stint 1 14 vs. 13-0-0 s/b total 13
colmafr01 1945 12 vs. 6-0-5 s/b 7-0-5
crabtes01 1943 65 vs. 2-44-17 s/b 2-44-18
koyer01   1940  stint 1 19 vs. 11-5-2 s/b 11-5-3
reisepe01 1940 17 vs. 5-1-9 s/b 0-1-17
chiozlo01 1938 16 vs. 5-10-0 s/b 5-11-0
bordafr01 1937 28 vs. 3-7-15 s/b 3-7-18
cuyleki01 1937 106 vs. 41-47-16 s/b 45-47-15
daviski01 1937 stint 1 37 vs 11-25-0 s/b 11-26-0
padgedo01 1937 109 vs. 0-6-102  s/b 0-6-103
hafeybu01 1936 29 vs. 0-18-10 s/b 28 vs. 0-19-10
johnsro02 1936 33 vs 28-0-3 s/b 28-0-5
kowalfa01 1936 PHI stint 2 4 vs 0-0-0 s/b 3 vs  0-1-2
kowalfa01 1936 BSN stint 3 4 vs 0-1-2 s/b 1 vs 1-0-0
boylebu01 1933 90 vs 45-34-10 s/b 48-35-10
clarkea01 1932 16 vs 6-3-6 s/b 15
fredejo01 1932 88 vs 1-47-39 s/b  1-48-41
watkige01 1932 120 vs 38-19-62 s/b 42-24-56
bressru01 1931 35 vs 23-7-3 s/b 33 vs 23-8-3
cuyleki01 1931 153 vs 0-66-84 s/b  0-69-84
fullich01 1931 68 vs 1-65-0 s/b total 66
tayloda01 1931 67 vs 39-22-4 s/b total 65
walkege02 1931 44 vs 1-41-1 s/b 3-40-1
fishesh01 1930 67 vs 24-0-42 s/b total 66 vs 26-0-40
fothebo01 1930 31 vs 8-0-22 s/b 9-0-22
leeha01   1930 12 vs 10-0-1 s/b  11-1-1
mosolji01 1930 12 vs 1-1-9 s/b total  11 vs 2-0-9
watwojo01 1930 52 vs 1-36-14 s/b 1-38-15
 
Clem Comly

#4059 From: "Clem Comly" <ccomly@...>
Date: Mon Mar 14, 2011 11:06 am
Subject: Data inconsistencies part 4
ccomly2003
Send Email Send Email
 
 

These two players have only a combined OF record in Fielding that is inconsistent with matching FieldingOF record. Retrosheet currently has the same inconsistency as BBDB.


For mitchcl01 1916 Fielding total games was 3 versus FieldingOFs breakdown of 5-0-0. I looked at his 1916 daily and the correct total OF games is 5 not 3.


For wingoiv01 1915 Fielding total games was 1 versus FieldingOFs breakdown of 2-0-0. I looked at his 1915 daily and the correct total OF games is 2 not 1.

Clem Comly


#4060 From: "Clem Comly" <ccomly@...>
Date: Mon Mar 14, 2011 1:09 pm
Subject: Data inconsistencies part 6
ccomly2003
Send Email Send Email
 
These players have only a combined OF record in Fielding that is inconsistent with matching FieldingOF record as did those of part 3.  These are for seasons in span 1919-1929.  All "should be" data per Retrosheet.
 
 
kleinch01 1929 149 vs 0-25-123 s/b 0-25-124
applepe01 1928 1 vs 0-0-0 s/b 0-0-1
ottme01 1928 115 vs 5-1-108 s/b 5-1-109
wingoal01 1928 71 vs 35-28-6 s/b 34-28-10
douthta01 1927 125 vs 1-122-1 s/b 1-124-0
willile03 1926 1 vs 0-0-0 s/b 0-0-1
wingaer01 1926 2 vs 0-0-0 s/b 0-0-2
zitzmbi01 1926 31 vs 22-5-2 s/b 24-5-2
carroow01 1925 1 vs 0-0-0 s/b 0-1-0
wingaer01 1925 1 vs 0-0-0 s/b 0-0-1
kerrjo01 1924 2 vs 0-0-0 s/b 1 vs 0-0-1
luquedo01 1924 1 vs 0-0-0 s/b 1-0-0
southbi01 1924 75 vs 1-51-19 s/b 2-55-19 clarksu01 1923 1 vs 0-0-0 s/b 0-0-1
cunnibi02 1922 71 vs 2-68-0 vs 2-69-0
hoodwa01 1921 20 vs 3-9-7 s/b 4-9-7
jeanete01 1921 5 vs 1-2-1 s/b 4 vs 1-3-1
mokanjo01 1921 15 vs 6-0-7 s/b 6-1-8
seech01 1921 30 vs 0-11-18 s/b total 29
sullijo04 1921 66 vs 63-1-1 s/b 63-2-1
bigbely01 1920 13 vs 11-0-1 s/b 12-0-1
johnsed01 1920 4 vs 0-0-2 s/b 0-0-4
rommeed01 1920 1 vs 0-0-0 s/b 1-0-0
shottbu01 1920 51 vs 41-2-6 s/b 49 vs 42-2-6
bailege01 1919 3 vs 0-1-1 s/b 0-1-3
kingle02 1919 7 vs 2-1-3 s/b 6 vs 2-1-4
paulege01 1919 stint 2 10 vs 0-2-7 s/b 11 vs 0-7-4
powelra01 1919 122 vs 0-1-120 s/b 3-1-118
rehgwa01 1919 5 vs 0-1-3 s/b total 4
Clem Comly
Clem Comly

#4061 From: "Clem Comly" <ccomly@...>
Date: Mon Mar 14, 2011 2:41 pm
Subject: Data inconsistencies part 7
ccomly2003
Send Email Send Email
 
These players have only a combined OF record in Fielding but their matching FieldingOF record has no info--not even 0-0-0. All "should be" data per Retrosheet.
willeed01 1910 ,,, s/b 0-0-1
willeed01 1909 ,,, s/b 0-0-1
wiltsho01 1907 ,,, s/b 0-1-0
wiltsho01 1906 ,,, s/b 2-0-0
wickebo01 1905 ,,, s/b 0-3-0
wilheka01 1905 ,,, s/b 1-3-0
wiltsho01 1905 ,,, s/b 1-0-0
wiltsho01 1904 ,,, s/b 0-1-0
wilheka01 1903 ,,, s/b 1-0-0
willipo01 1903 ,,, s/b 2-0-0
wilsohi01 1903 ,,, s/b 1-0-0
wickebo01 1902 ,,, s/b 0-3-0
willipo01 1902 ,,, s/b 0-0-6
wiltssn01 1902 ,,, s/b 1-1-2
 
Clem Comly
 

#4062 From: "Sean" <sean-forman@...>
Date: Mon Mar 28, 2011 9:03 pm
Subject: Re: 2010 BETA release
sforman71
Send Email Send Email
 
Newly discovered player Chubby snyder

--- In baseball-databank@yahoogroups.com, "Tangotiger" <tom@...> wrote:
>
> Sean,
>
> In appearances table:
> 1914,"BUF","FL","",1,1,,1,1,0,1,0,0,0,0,0,0,0,0,,,
>
> Record is missing the player ID.
>
> Tom
>

#4063 From: Sean Forman <sean-forman@...>
Date: Mon Mar 28, 2011 9:18 pm
Subject: 2010 Final Update and Final all-time update.
sforman71
Send Email Send Email
 
I attempted to incorporate as many of the fixes as possible from user suggestions into the database, so please enjoy.  Thanks to Clem for the OF stint updates, and Tango for various id updates.  I also incorporated the BFP data provided by a user whose name escapes me.

I will no longer be supporting the databank (not that I was doing a killer job earlier).  I'm hopeful that another group will step into the fray (perhaps SABR, hint, hint), but as of right now I have no plans to update the db further.  Thank you for your help and support over the years.

sean
---
Sean Forman
Sports Reference LLC, President
http://www.sports-reference.com/


#4064 From: "Tangotiger" <tom@...>
Date: Wed Mar 30, 2011 3:54 pm
Subject: The Future
tom@...
Send Email Send Email
 
The question is now how to ensure the transition to a new ownership group
so that the public continues to have access to a quality database.

I put up a thread on my blog (start at post #5):
http://www.insidethebook.com/ee/index.php/site/comments/future_baseball_databank\
_updates/

And SABR came up as a possible alternative.  Please read that thread.  In
responding to FX, I said this:

"If the issue was just to provide an annual update (say the 2011 season
data), then dozens of people here can do that in a matter of minutes, by
using the latest BDB build plus using Retrosheet.

There are two main areas that SABR provides a huge benefit, and that is
the bio data and minor league data.  Another strong area is that there is
a (presumably) dedicated machine [note: as in a well-oiled process] in
place to ensure continued viability.

What is unclear to me from your post, and perhaps you can elaborate, is
what is this “community of interest”, and does this guarantee that the
public will have unfettered access to the data?  And if not, what
limitations are imposed on the public? "

Here are the questions that you have to answer for yourselves:
1. Do I really care about getting all the non-Retrosheet (basically
pre-1950) data corrected in the future? (i.e., some new guy in 1914 was
just found; some guy's birth year was changed from 1888 to 1890, etc)

2. Do I care at all about getting minor league data?

3. Do I care about having one place that I know will be around for the
next 5 years?

If the answer is "no" to all of these, then it seems that the members here
are happy to just wing it year-by-year, as some fine folks will simply
supply the annual  updates, without caring much about prior seasons.

If the answer is "yes" to any of these, then it becomes a question of a
more dedicated organization (like SABR) to control the data.

What is it that  you guys want?

Tom

---------------------------------------------
The Book--Playing The Percentages In Baseball
http://www.InsideTheBook.com

#4065 From: "anson2995" <slahman@...>
Date: Wed Mar 30, 2011 5:43 pm
Subject: Re: The Future
anson2995
Send Email Send Email
 
Sorry I'm late to the conversation. I have many thoughts on this, but foremost
is to express my desire to keep an open source database of baseball stats
available to all who want it. That goal is at odds with the folks who are
commercial stat providers, and I suspect that's why Palmer (and now Foreman)
aren't active proponents.  That's part of the reason why it's been hard to
garner much support for the Databank effort, both because a free database costs
them licensing opportunities, but also because it creates competitors.  That's
an entirely reasonable view, just not one that helps the idea of an open source
database.

I've always thought SABR was a natural home for such a project since an open
source database fosters new research, but have always meet with resistance to
that idea. How could SABR help the databank project? By providing access to and
allowing integration of its datasets... such as the biographical data.  The
databank doesnt need SABR's help with infrastructure support or storage space.

The fact is that the databank project can survive without Foreman's support. 
99% of the work of maintaining it since the mid-1990s has been done by three or
four people, and there's no reason that model couldn't continue.  Mechanisms
already exist for folks who want to integrate outside datasets -- Retrosheet,
F/X, etc.  I think the core audience for this database is not interested in
those things.

Regards,
Sean Lahman

#4066 From: "F. X. Flinn" <fxflinn@...>
Date: Wed Mar 30, 2011 5:56 pm
Subject: Re: Re: The Future
FXFlinn
Send Email Send Email
 
SABR's thinking has been along the lines of the creative commons type license, where non-commercial use is OK but if put to commercial uses then you have to strike a deal. That approach has facilitated monetizing Pete's efforts, the efforts of various SABR committees working on demographics, minor leagues, etc., while at the same time enabling members to use the data. So its a win-win all around (the sums aren't large, by the way).

I guess my bottom line is that sentiment inside SABR exists for providing an equivalent to BDB, and if we could somehow join forces that would be important for the long term. I'd rather see SABR do something with you than suddenly show up online one day saying "hey, guys, here's BDB+" without any prior attempt to bring you aboard.

BTW I do want to stress that I am only speaking for myself, albeit as a long time board member.

F. X. Flinn
802-369-0069 | fb:f.x.flinn | t:fxflinn | fxflinn@...


On Wed, Mar 30, 2011 at 1:43 PM, anson2995 <slahman@...> wrote:
 

Sorry I'm late to the conversation. I have many thoughts on this, but foremost is to express my desire to keep an open source database of baseball stats available to all who want it. That goal is at odds with the folks who are commercial stat providers, and I suspect that's why Palmer (and now Foreman) aren't active proponents. That's part of the reason why it's been hard to garner much support for the Databank effort, both because a free database costs them licensing opportunities, but also because it creates competitors. That's an entirely reasonable view, just not one that helps the idea of an open source database.

I've always thought SABR was a natural home for such a project since an open source database fosters new research, but have always meet with resistance to that idea. How could SABR help the databank project? By providing access to and allowing integration of its datasets... such as the biographical data. The databank doesnt need SABR's help with infrastructure support or storage space.

The fact is that the databank project can survive without Foreman's support. 99% of the work of maintaining it since the mid-1990s has been done by three or four people, and there's no reason that model couldn't continue. Mechanisms already exist for folks who want to integrate outside datasets -- Retrosheet, F/X, etc. I think the core audience for this database is not interested in those things.

Regards,
Sean Lahman




--
F. X. Flinn
FXFlinn@gmail | 802-369-0069


#4067 From: "paulriker" <paulriker@...>
Date: Wed Mar 30, 2011 5:32 pm
Subject: Re: The Future
paulriker
Send Email Send Email
 
I had a plan to one day build a transformable online database that we could all
share. I was thinking wikipedia but a database. Transformable because we could
add different datasets and relate them to existing datasets with ease. Expand
data to include pitch by pitch data through the season. My hopes were, if I
built the foundation, that everyone would contribute to the administration of
the data, just like wikipedia. There is a lot of more interesting data out there
on top of what BDB has provided that I'd love for us to share, specifically
contract and team transactional data.

I think if we build a central database that we can share we can eliminate
duplicate efforts. We can focus on the analysis and tool development instead of
the administration of the data.

If this is something people are interested in please let me know and I can move
forward with it. I'd hate to develop something like this and not have it used.

Paul


--- In baseball-databank@yahoogroups.com, "Tangotiger" <tom@...> wrote:
>
> The question is now how to ensure the transition to a new ownership group
> so that the public continues to have access to a quality database.
>
> I put up a thread on my blog (start at post #5):
>
http://www.insidethebook.com/ee/index.php/site/comments/future_baseball_databank\
_updates/
>
> And SABR came up as a possible alternative.  Please read that thread.  In
> responding to FX, I said this:
>
> "If the issue was just to provide an annual update (say the 2011 season
> data), then dozens of people here can do that in a matter of minutes, by
> using the latest BDB build plus using Retrosheet.
>
> There are two main areas that SABR provides a huge benefit, and that is
> the bio data and minor league data.  Another strong area is that there is
> a (presumably) dedicated machine [note: as in a well-oiled process] in
> place to ensure continued viability.
>
> What is unclear to me from your post, and perhaps you can elaborate, is
> what is this "community of interest", and does this guarantee that the
> public will have unfettered access to the data?  And if not, what
> limitations are imposed on the public? "
>
> Here are the questions that you have to answer for yourselves:
> 1. Do I really care about getting all the non-Retrosheet (basically
> pre-1950) data corrected in the future? (i.e., some new guy in 1914 was
> just found; some guy's birth year was changed from 1888 to 1890, etc)
>
> 2. Do I care at all about getting minor league data?
>
> 3. Do I care about having one place that I know will be around for the
> next 5 years?
>
> If the answer is "no" to all of these, then it seems that the members here
> are happy to just wing it year-by-year, as some fine folks will simply
> supply the annual  updates, without caring much about prior seasons.
>
> If the answer is "yes" to any of these, then it becomes a question of a
> more dedicated organization (like SABR) to control the data.
>
> What is it that  you guys want?
>
> Tom
>
> ---------------------------------------------
> The Book--Playing The Percentages In Baseball
> http://www.InsideTheBook.com
>

#4068 From: Sean Forman <sean-forman@...>
Date: Wed Mar 30, 2011 6:50 pm
Subject: Re: Re: The Future
sforman71
Send Email Send Email
 
On Wed, Mar 30, 2011 at 1:43 PM, anson2995 <slahman@...> wrote:

Sorry I'm late to the conversation. I have many thoughts on this, but foremost is to express my desire to keep an open source database of baseball stats available to all who want it. That goal is at odds with the folks who are commercial stat providers, and I suspect that's why Palmer (and now Foreman) aren't active proponents. That's part of the reason why it's been hard to garner much support for the Databank effort, both because a free database costs them licensing opportunities, but also because it creates competitors. That's an entirely reasonable view, just not one that helps the idea of an open source database.


Sean (Lahman) ,

It appears you misread my initial note regarding my stepping back from support of the BDB, and took it to mean that I was scared about the BDB creating competition for Baseball-Reference.com.  

That isn't the case.  And if it was, I really wish I would have come to that conclusion much earlier and saved the countless hours from the last nine years I spent maintaining the Databank and producing datasets for you to use in the Baseball Archive and others to use in their iphone apps and other places.  The reason I'm stopping support is that I'm tired and believe it's a good time to make a change for the future.

So let me clarify what my position is vis a vis the BDB.  I believe strongly in this project, but I lack the time to manage it single-handedly as I have for the last nine years.

What I hope will happen is that SABR and Palmer/Gillette will make a non-commercial open sourced version of the major league and bio data (which I currently license from them for use on B-R (I don't use core bdb data on b-r. I maintain it separately)) available to the general public.  I will gladly pay my fee to them and gladly point the hobbyist or enthusiast to the full data they can use for free.

I have been trying to put this bug in their ear for many years now.  I'm only a lowly SABR member and have no pull beyond my bully pulpit and relationship as a customer of theirs.

My personal view (and the view of probably anyone who knows anything about this) is that the Palmer DB is the gold standard for the encyclopedia numbers and the SABR bio committee is the gold standard for bio.  I would urge Palmer and SABR to make the db open source for non-commercial use, and provide a way for the community to create new datasets with this as a platform instead of continually having to stamp out LF-CF-RF discrepancies or update death dates.

sean forman
and steward of The Baseball DataBank since July 2002

#4069 From: Jacob Drew <jacobbdrew@...>
Date: Wed Mar 30, 2011 6:11 pm
Subject: Re: Re: The Future
jacobbdrew
Send Email Send Email
 
I'm interested. If you get rolling, I can help with development, possibly even offer a mirror host.

On 3/30/2011 10:32 AM, paulriker wrote:
 

I had a plan to one day build a transformable online database that we could all share. I was thinking wikipedia but a database. Transformable because we could add different datasets and relate them to existing datasets with ease. Expand data to include pitch by pitch data through the season. My hopes were, if I built the foundation, that everyone would contribute to the administration of the data, just like wikipedia. There is a lot of more interesting data out there on top of what BDB has provided that I'd love for us to share, specifically contract and team transactional data.

I think if we build a central database that we can share we can eliminate duplicate efforts. We can focus on the analysis and tool development instead of the administration of the data.

If this is something people are interested in please let me know and I can move forward with it. I'd hate to develop something like this and not have it used.

Paul

--- In baseball-databank@yahoogroups.com, "Tangotiger" <tom@...> wrote:
>
> The question is now how to ensure the transition to a new ownership group
> so that the public continues to have access to a quality database.
>
> I put up a thread on my blog (start at post #5):
> http://www.insidethebook.com/ee/index.php/site/comments/future_baseball_databank_updates/
>
> And SABR came up as a possible alternative. Please read that thread. In
> responding to FX, I said this:
>
> "If the issue was just to provide an annual update (say the 2011 season
> data), then dozens of people here can do that in a matter of minutes, by
> using the latest BDB build plus using Retrosheet.
>
> There are two main areas that SABR provides a huge benefit, and that is
> the bio data and minor league data. Another strong area is that there is
> a (presumably) dedicated machine [note: as in a well-oiled process] in
> place to ensure continued viability.
>
> What is unclear to me from your post, and perhaps you can elaborate, is
> what is this "community of interest", and does this guarantee that the
> public will have unfettered access to the data? And if not, what
> limitations are imposed on the public? "
>
> Here are the questions that you have to answer for yourselves:
> 1. Do I really care about getting all the non-Retrosheet (basically
> pre-1950) data corrected in the future? (i.e., some new guy in 1914 was
> just found; some guy's birth year was changed from 1888 to 1890, etc)
>
> 2. Do I care at all about getting minor league data?
>
> 3. Do I care about having one place that I know will be around for the
> next 5 years?
>
> If the answer is "no" to all of these, then it seems that the members here
> are happy to just wing it year-by-year, as some fine folks will simply
> supply the annual updates, without caring much about prior seasons.
>
> If the answer is "yes" to any of these, then it becomes a question of a
> more dedicated organization (like SABR) to control the data.
>
> What is it that you guys want?
>
> Tom
>
> ---------------------------------------------
> The Book--Playing The Percentages In Baseball
> http://www.InsideTheBook.com
>


#4070 From: Paul Riker <paulriker@...>
Date: Wed Mar 30, 2011 6:37 pm
Subject: RE: Re: The Future
paulriker
Send Email Send Email
 

One thing I want to add to my previous post. I would develop the data structure so it would support multiple sports. Instead of AB being a field, AB would be a record. So if we needed to add a stat or a different sport it would just be a matter of adding records and not redesigning the database. This wouldn’t affect the end user at all because a query or view would exist to present the data to the users in a tabular format.

 

Paul

 

From: baseball-databank@yahoogroups.com [mailto:baseball-databank@yahoogroups.com] On Behalf Of F. X. Flinn
Sent: Wednesday, March 30, 2011 1:57 PM
To: baseball-databank@yahoogroups.com
Subject: Re: [baseball-databank] Re: The Future

 

 

SABR's thinking has been along the lines of the creative commons type license, where non-commercial use is OK but if put to commercial uses then you have to strike a deal. That approach has facilitated monetizing Pete's efforts, the efforts of various SABR committees working on demographics, minor leagues, etc., while at the same time enabling members to use the data. So its a win-win all around (the sums aren't large, by the way).

I guess my bottom line is that sentiment inside SABR exists for providing an equivalent to BDB, and if we could somehow join forces that would be important for the long term. I'd rather see SABR do something with you than suddenly show up online one day saying "hey, guys, here's BDB+" without any prior attempt to bring you aboard.

BTW I do want to stress that I am only speaking for myself, albeit as a long time board member.

F. X. Flinn
802-369-0069 | fb:f.x.flinn | t:fxflinn | fxflinn@...

On Wed, Mar 30, 2011 at 1:43 PM, anson2995 <slahman@...> wrote:

 

Sorry I'm late to the conversation. I have many thoughts on this, but foremost is to express my desire to keep an open source database of baseball stats available to all who want it. That goal is at odds with the folks who are commercial stat providers, and I suspect that's why Palmer (and now Foreman) aren't active proponents. That's part of the reason why it's been hard to garner much support for the Databank effort, both because a free database costs them licensing opportunities, but also because it creates competitors. That's an entirely reasonable view, just not one that helps the idea of an open source database.

I've always thought SABR was a natural home for such a project since an open source database fosters new research, but have always meet with resistance to that idea. How could SABR help the databank project? By providing access to and allowing integration of its datasets... such as the biographical data. The databank doesnt need SABR's help with infrastructure support or storage space.

The fact is that the databank project can survive without Foreman's support. 99% of the work of maintaining it since the mid-1990s has been done by three or four people, and there's no reason that model couldn't continue. Mechanisms already exist for folks who want to integrate outside datasets -- Retrosheet, F/X, etc. I think the core audience for this database is not interested in those things.

Regards,
Sean Lahman




--
F. X. Flinn
FXFlinn@gmail | 802-369-0069


#4071 From: Theodore Turocy <drarbiter@...>
Date: Wed Mar 30, 2011 8:03 pm
Subject: Re: Re: The Future
arb1ter
Send Email Send Email
 
On 30 Mar 2011, at 19:50 , Sean Forman wrote:

> What I hope will happen is that SABR and Palmer/Gillette will make a
non-commercial open sourced version of the major league and bio data (which I
currently license from them for use on B-R (I don't use core bdb data on b-r. I
maintain it separately)) available to the general public.  I will gladly pay my
fee to them and gladly point the hobbyist or enthusiast to the full data they
can use for free.
>
> I have been trying to put this bug in their ear for many years now.  I'm only
a lowly SABR member and have no pull beyond my bully pulpit and relationship as
a customer of theirs.
>
> My personal view (and the view of probably anyone who knows anything about
this) is that the Palmer DB is the gold standard for the encyclopedia numbers
and the SABR bio committee is the gold standard for bio.  I would urge Palmer
and SABR to make the db open source for non-commercial use, and provide a way
for the community to create new datasets with this as a platform instead of
continually having to stamp out LF-CF-RF discrepancies or update death dates.


I'm not sure how widely it's known, but for several years now I have been
serving as SABR's "dataczar, with remit to manage all the various datasets SABR
has or licenses.

My personal position on the matter matches Sean F.'s. SABR ought to be releasing
datasets under, e.g., a Creative Commons license, and ought to be providing
resources to maintain datasets for the benefit of the community as a whole.

In addition to sharing Sean's assessment about the quality of the SABR
demographics and the Palmer/Gillette MLB statistics, a further argument for SABR
being involved is logistical. I already maintain the equivalent of basically all
of baseball-databank's data - plus significantly more - on a day-to-day basis,
as part of a regularized workflow, with tools I've developed over several years
of experience.  To output the data in the format of baseball-databank or
similar, would take maybe two hours to write and test the queries as a one-off. 
In other words, the ongoing cost of me managing this data would be essentially
zero.

Where I have been able to, SABR has already started making datasets available
under a CC license, for instance, as part of the Baseball ID Working Group.

The only thing stopping me from volunteering to take on providing the
baseball-databank under the same terms straightaway is that the MLB statistics,
which are no doubt the core of the dataset, aren't currently mine to release.  I
firmly support Open Source and Open Data principles.  It is worth remembering
that one of those underlying principles is respect for copyright and licensing
terms.

I am actively working to try to make what Sean F. is proposing a reality.  Those
of you in the group who are SABR members, I encourage you to write your Board
members and tell them the same. :)  The baseball community really needs to be
spending its time on doing analysis and discovering new information -- not the
grunge work of putting together clean datasets.

Ted

#4072 From: Derek Adair <dadair@...>
Date: Wed Mar 30, 2011 10:21 pm
Subject: Re: Re: The Future
D_Adair
Send Email Send Email
 
Gang,

I have been rather quiet in this community for a long time for personal
reasons, but this issue is something I feel very strongly about. Plus this
thread is full of throw-back names, so I had to chime in :-)

Over the years, I have been involved in a number of efforts, and witnessed
several more where grass roots labor built up some store of data or a
product of value, and then that data or project got rerouted to something
commercial and/or closed to the public. Two obvious examples that come to
mind are CDDB (a store of compact disc information) and ICS (Internet
Chess Server). I don't want that to happen to this data, and I don't think
it has to.

One of the reasons for the success of this databank is its resilience. The
data source has had multiple shepherds over the years, but once it went
public, it hasn't looked back. Strides towards more inclusive formats have
been the norm, and each of us can currently download the entire dataset to
munge to our heart's content. Each new shepherd (usually named Sean) added
a layer of support for the data, and it fleshed out over time to what it
is today.

There's absolutely further work that could be done with the data. Each
spring, grand plans and ideas are raised and some take fruit while others
die off. That's fine and good and natural. But it all comes back to the
data being available to all of us.

A disclaimer is necessary. I am not a fan of SABR, primarily because of
the way it has handled its data. The SABR I am familiar with (five plus
years ago when I was a member) had closed committees with NDA's, datasets
only viewable record by record on the web, and the exact opposite of the
spirit this group has had.

True, that was a long while ago. The fact that SABR has a "data czar" with
the approach that Ted has goes a long way. They have done some great work
releasing data sets. But still, with the history there, I can't help but
feel like handing over the proverbial keys to the data, including the
ability to determine licensing back to us, is a scary step in the wrong
direction. "We" own this data now. I can download the data set and munge
away. None of us can say for sure that we will be able to do that in two
years if we give that away. If we take that risk, the gain must be
overwhelmingly worth it. I personally don't see it.

I understand this may be a bit of a doom-and-gloom view of where we're
going, but as I mentioned, my viewpoint is one of someone who has seen
their contributions to CDDB turn write-only. The impact here would be
worse, because of the reporting and research use for the data we have
collected over the years.

Regards,
Derek


On Wed, 30 Mar 2011, Theodore Turocy wrote:

>
> On 30 Mar 2011, at 19:50 , Sean Forman wrote:
>
>> What I hope will happen is that SABR and Palmer/Gillette will make a
non-commercial open sourced version of the major league and bio data (which I
currently license from them for use on B-R (I don't use core bdb data on b-r. I
maintain it separately)) available to the general public.  I will gladly pay my
fee to them and gladly point the hobbyist or enthusiast to the full data they
can use for free.
>>
>> I have been trying to put this bug in their ear for many years now.  I'm only
a lowly SABR member and have no pull beyond my bully pulpit and relationship as
a customer of theirs.
>>
>> My personal view (and the view of probably anyone who knows anything about
this) is that the Palmer DB is the gold standard for the encyclopedia numbers
and the SABR bio committee is the gold standard for bio.  I would urge Palmer
and SABR to make the db open source for non-commercial use, and provide a way
for the community to create new datasets with this as a platform instead of
continually having to stamp out LF-CF-RF discrepancies or update death dates.
>
>
> I'm not sure how widely it's known, but for several years now I have been
serving as SABR's "dataczar, with remit to manage all the various datasets SABR
has or licenses.
>
> My personal position on the matter matches Sean F.'s. SABR ought to be
releasing datasets under, e.g., a Creative Commons license, and ought to be
providing resources to maintain datasets for the benefit of the community as a
whole.
>
> In addition to sharing Sean's assessment about the quality of the SABR
demographics and the Palmer/Gillette MLB statistics, a further argument for SABR
being involved is logistical. I already maintain the equivalent of basically all
of baseball-databank's data - plus significantly more - on a day-to-day basis,
as part of a regularized workflow, with tools I've developed over several years
of experience.  To output the data in the format of baseball-databank or
similar, would take maybe two hours to write and test the queries as a one-off. 
In other words, the ongoing cost of me managing this data would be essentially
zero.
>
> Where I have been able to, SABR has already started making datasets available
under a CC license, for instance, as part of the Baseball ID Working Group.
>
> The only thing stopping me from volunteering to take on providing the
baseball-databank under the same terms straightaway is that the MLB statistics,
which are no doubt the core of the dataset, aren't currently mine to release.  I
firmly support Open Source and Open Data principles.  It is worth remembering
that one of those underlying principles is respect for copyright and licensing
terms.
>
> I am actively working to try to make what Sean F. is proposing a reality. 
Those of you in the group who are SABR members, I encourage you to write your
Board members and tell them the same. :)  The baseball community really needs to
be spending its time on doing analysis and discovering new information -- not
the grunge work of putting together clean datasets.
>
> Ted
>
>
>
>
> ------------------------------------
>
> http://www.baseball-databank.org/Yahoo! Groups Links
>
>
>
>

#4073 From: "F. X. Flinn" <fxflinn@...>
Date: Wed Mar 30, 2011 10:32 pm
Subject: Re: Re: The Future
FXFlinn
Send Email Send Email
 
Derek, I never heard of any committee having any kind of NDA agreement, and I've been on the board since July 2001. The lack of accessibility of the data was a problem we first tried to address by contracting with XMLTeam to build out a system that would make the data truly useful to a larger audience without dba skills, but that didn't work out. Meanwhile bbref had become the defacto place to go, so we felt less compelled to compete with them or with BDB.

Bottom line is that SABR could start producing BDB type products tied to a creative commons license in fairly short order, and it's definitely something we have in the hopper once the dust settles on the new move, new staff, new website that's all rolling out as this discussion takes place. If we just went ahead and did that, would all be forgiven?

FXF

On Wed, Mar 30, 2011 at 6:21 PM, Derek Adair <dadair@...> wrote:
 

Gang,

I have been rather quiet in this community for a long time for personal
reasons, but this issue is something I feel very strongly about. Plus this
thread is full of throw-back names, so I had to chime in :-)

Over the years, I have been involved in a number of efforts, and witnessed
several more where grass roots labor built up some store of data or a
product of value, and then that data or project got rerouted to something
commercial and/or closed to the public. Two obvious examples that come to
mind are CDDB (a store of compact disc information) and ICS (Internet
Chess Server). I don't want that to happen to this data, and I don't think
it has to.

One of the reasons for the success of this databank is its resilience. The
data source has had multiple shepherds over the years, but once it went
public, it hasn't looked back. Strides towards more inclusive formats have
been the norm, and each of us can currently download the entire dataset to
munge to our heart's content. Each new shepherd (usually named Sean) added
a layer of support for the data, and it fleshed out over time to what it
is today.

There's absolutely further work that could be done with the data. Each
spring, grand plans and ideas are raised and some take fruit while others
die off. That's fine and good and natural. But it all comes back to the
data being available to all of us.

A disclaimer is necessary. I am not a fan of SABR, primarily because of
the way it has handled its data. The SABR I am familiar with (five plus
years ago when I was a member) had closed committees with NDA's, datasets
only viewable record by record on the web, and the exact opposite of the
spirit this group has had.

True, that was a long while ago. The fact that SABR has a "data czar" with
the approach that Ted has goes a long way. They have done some great work
releasing data sets. But still, with the history there, I can't help but
feel like handing over the proverbial keys to the data, including the
ability to determine licensing back to us, is a scary step in the wrong
direction. "We" own this data now. I can download the data set and munge
away. None of us can say for sure that we will be able to do that in two
years if we give that away. If we take that risk, the gain must be
overwhelmingly worth it. I personally don't see it.

I understand this may be a bit of a doom-and-gloom view of where we're
going, but as I mentioned, my viewpoint is one of someone who has seen
their contributions to CDDB turn write-only. The impact here would be
worse, because of the reporting and research use for the data we have
collected over the years.

Regards,
Derek



On Wed, 30 Mar 2011, Theodore Turocy wrote:

>
> On 30 Mar 2011, at 19:50 , Sean Forman wrote:
>
>> What I hope will happen is that SABR and Palmer/Gillette will make a non-commercial open sourced version of the major league and bio data (which I currently license from them for use on B-R (I don't use core bdb data on b-r. I maintain it separately)) available to the general public. I will gladly pay my fee to them and gladly point the hobbyist or enthusiast to the full data they can use for free.
>>
>> I have been trying to put this bug in their ear for many years now. I'm only a lowly SABR member and have no pull beyond my bully pulpit and relationship as a customer of theirs.
>>
>> My personal view (and the view of probably anyone who knows anything about this) is that the Palmer DB is the gold standard for the encyclopedia numbers and the SABR bio committee is the gold standard for bio. I would urge Palmer and SABR to make the db open source for non-commercial use, and provide a way for the community to create new datasets with this as a platform instead of continually having to stamp out LF-CF-RF discrepancies or update death dates.
>
>
> I'm not sure how widely it's known, but for several years now I have been serving as SABR's "dataczar, with remit to manage all the various datasets SABR has or licenses.
>
> My personal position on the matter matches Sean F.'s. SABR ought to be releasing datasets under, e.g., a Creative Commons license, and ought to be providing resources to maintain datasets for the benefit of the community as a whole.
>
> In addition to sharing Sean's assessment about the quality of the SABR demographics and the Palmer/Gillette MLB statistics, a further argument for SABR being involved is logistical. I already maintain the equivalent of basically all of baseball-databank's data - plus significantly more - on a day-to-day basis, as part of a regularized workflow, with tools I've developed over several years of experience. To output the data in the format of baseball-databank or similar, would take maybe two hours to write and test the queries as a one-off. In other words, the ongoing cost of me managing this data would be essentially zero.
>
> Where I have been able to, SABR has already started making datasets available under a CC license, for instance, as part of the Baseball ID Working Group.
>
> The only thing stopping me from volunteering to take on providing the baseball-databank under the same terms straightaway is that the MLB statistics, which are no doubt the core of the dataset, aren't currently mine to release. I firmly support Open Source and Open Data principles. It is worth remembering that one of those underlying principles is respect for copyright and licensing terms.
>
> I am actively working to try to make what Sean F. is proposing a reality. Those of you in the group who are SABR members, I encourage you to write your Board members and tell them the same. :) The baseball community really needs to be spending its time on doing analysis and discovering new information -- not the grunge work of putting together clean datasets.
>
> Ted
>
>
>
>
> ------------------------------------
>
> http://www.baseball-databank.org/Yahoo! Groups Links
>
>
>
>



--
F. X. Flinn
FXFlinn@gmail | 802-369-0069


#4074 From: Theodore Turocy <drarbiter@...>
Date: Wed Mar 30, 2011 10:47 pm
Subject: Re: Re: The Future
arb1ter
Send Email Send Email
 
On 30 Mar 2011, at 23:21 , Derek Adair wrote:

>
>
> True, that was a long while ago. The fact that SABR has a "data czar" with
> the approach that Ted has goes a long way. They have done some great work
> releasing data sets. But still, with the history there, I can't help but
> feel like handing over the proverbial keys to the data, including the
> ability to determine licensing back to us, is a scary step in the wrong
> direction.

To move the discussion forward: How would having the data licensed under a
Creative Commons license not address this concern -- remembering that licenses
cannot be retroactively changed.

TLT

#4075 From: Derek Adair <dadair@...>
Date: Wed Mar 30, 2011 11:47 pm
Subject: Re: Re: The Future
D_Adair
Send Email Send Email
 
FX,

Well, it's obviously not my call to forgive SABR. The good thing is you
can go ahead and release those BDB-like products anyway (at least to my
understanding; I-Am-Not-a-Lawyer). There are multiple ways to do what you
said successfully, and there are a number of ways you can botch it. There
are a half-dozen variants of the creative commons license, and there are
varying kinds of "products" you could provide back. There's also the open
question of whether this data would be hosted just for SABR members, or
for everyone. I am honestly hoping for your efforts here to be successful,
but that's in your organization's hands.

For what it's worth (not much), I would recommend you not get too hung up
on meeting the needs of those missing dba skills, and stick with making
the data available as the first pass. Zipped csv/xls file storage is much
less expensive and much quicker to market than an ambitious contract
project.

My advice to the current crop of databank contributors is to be open about
contributing but guarded about risk. If something does go wrong, and the
new maintainer takes it closed, being able to roll back and only do a year
of updates, vs. catch up from this point in time is critical. I'd like to
avoid a fork, but we owe it to current and future consumers of the data to
ensure the set is available to the masses.

Regards,
Derek

P.S. On the note re: NDA's, the committee in question was the Negro
Leagues committee. In order to access their data, you were asked to sign
an NDA because they were working on a book at the time. I can't remember
exact dates, but it could have been earlier than your board service.

On Wed, 30 Mar 2011, F. X. Flinn wrote:

> Derek, I never heard of any committee having any kind of NDA agreement, and
> I've been on the board since July 2001. The lack of accessibility of the
> data was a problem we first tried to address by contracting with XMLTeam to
> build out a system that would make the data truly useful to a larger
> audience without dba skills, but that didn't work out. Meanwhile bbref had
> become the defacto place to go, so we felt less compelled to compete with
> them or with BDB.
>
> Bottom line is that SABR could start producing BDB type products tied to a
> creative commons license in fairly short order, and it's definitely
> something we have in the hopper once the dust settles on the new move, new
> staff, new website that's all rolling out as this discussion takes place. If
> we just went ahead and did that, would all be forgiven?
>
> FXF
>
> On Wed, Mar 30, 2011 at 6:21 PM, Derek Adair <dadair@...> wrote:
>
>>
>>
>> Gang,
>>
>> I have been rather quiet in this community for a long time for personal
>> reasons, but this issue is something I feel very strongly about. Plus this
>> thread is full of throw-back names, so I had to chime in :-)
>>
>> Over the years, I have been involved in a number of efforts, and witnessed
>> several more where grass roots labor built up some store of data or a
>> product of value, and then that data or project got rerouted to something
>> commercial and/or closed to the public. Two obvious examples that come to
>> mind are CDDB (a store of compact disc information) and ICS (Internet
>> Chess Server). I don't want that to happen to this data, and I don't think
>> it has to.
>>
>> One of the reasons for the success of this databank is its resilience. The
>> data source has had multiple shepherds over the years, but once it went
>> public, it hasn't looked back. Strides towards more inclusive formats have
>> been the norm, and each of us can currently download the entire dataset to
>> munge to our heart's content. Each new shepherd (usually named Sean) added
>> a layer of support for the data, and it fleshed out over time to what it
>> is today.
>>
>> There's absolutely further work that could be done with the data. Each
>> spring, grand plans and ideas are raised and some take fruit while others
>> die off. That's fine and good and natural. But it all comes back to the
>> data being available to all of us.
>>
>> A disclaimer is necessary. I am not a fan of SABR, primarily because of
>> the way it has handled its data. The SABR I am familiar with (five plus
>> years ago when I was a member) had closed committees with NDA's, datasets
>> only viewable record by record on the web, and the exact opposite of the
>> spirit this group has had.
>>
>> True, that was a long while ago. The fact that SABR has a "data czar" with
>> the approach that Ted has goes a long way. They have done some great work
>> releasing data sets. But still, with the history there, I can't help but
>> feel like handing over the proverbial keys to the data, including the
>> ability to determine licensing back to us, is a scary step in the wrong
>> direction. "We" own this data now. I can download the data set and munge
>> away. None of us can say for sure that we will be able to do that in two
>> years if we give that away. If we take that risk, the gain must be
>> overwhelmingly worth it. I personally don't see it.
>>
>> I understand this may be a bit of a doom-and-gloom view of where we're
>> going, but as I mentioned, my viewpoint is one of someone who has seen
>> their contributions to CDDB turn write-only. The impact here would be
>> worse, because of the reporting and research use for the data we have
>> collected over the years.
>>
>> Regards,
>> Derek
>>
>>
>> On Wed, 30 Mar 2011, Theodore Turocy wrote:
>>
>>>
>>> On 30 Mar 2011, at 19:50 , Sean Forman wrote:
>>>
>>>> What I hope will happen is that SABR and Palmer/Gillette will make a
>> non-commercial open sourced version of the major league and bio data (which
>> I currently license from them for use on B-R (I don't use core bdb data on
>> b-r. I maintain it separately)) available to the general public. I will
>> gladly pay my fee to them and gladly point the hobbyist or enthusiast to the
>> full data they can use for free.
>>>>
>>>> I have been trying to put this bug in their ear for many years now. I'm
>> only a lowly SABR member and have no pull beyond my bully pulpit and
>> relationship as a customer of theirs.
>>>>
>>>> My personal view (and the view of probably anyone who knows anything
>> about this) is that the Palmer DB is the gold standard for the encyclopedia
>> numbers and the SABR bio committee is the gold standard for bio. I would
>> urge Palmer and SABR to make the db open source for non-commercial use, and
>> provide a way for the community to create new datasets with this as a
>> platform instead of continually having to stamp out LF-CF-RF discrepancies
>> or update death dates.
>>>
>>>
>>> I'm not sure how widely it's known, but for several years now I have been
>> serving as SABR's "dataczar, with remit to manage all the various datasets
>> SABR has or licenses.
>>>
>>> My personal position on the matter matches Sean F.'s. SABR ought to be
>> releasing datasets under, e.g., a Creative Commons license, and ought to be
>> providing resources to maintain datasets for the benefit of the community as
>> a whole.
>>>
>>> In addition to sharing Sean's assessment about the quality of the SABR
>> demographics and the Palmer/Gillette MLB statistics, a further argument for
>> SABR being involved is logistical. I already maintain the equivalent of
>> basically all of baseball-databank's data - plus significantly more - on a
>> day-to-day basis, as part of a regularized workflow, with tools I've
>> developed over several years of experience. To output the data in the format
>> of baseball-databank or similar, would take maybe two hours to write and
>> test the queries as a one-off. In other words, the ongoing cost of me
>> managing this data would be essentially zero.
>>>
>>> Where I have been able to, SABR has already started making datasets
>> available under a CC license, for instance, as part of the Baseball ID
>> Working Group.
>>>
>>> The only thing stopping me from volunteering to take on providing the
>> baseball-databank under the same terms straightaway is that the MLB
>> statistics, which are no doubt the core of the dataset, aren't currently
>> mine to release. I firmly support Open Source and Open Data principles. It
>> is worth remembering that one of those underlying principles is respect for
>> copyright and licensing terms.
>>>
>>> I am actively working to try to make what Sean F. is proposing a reality.
>> Those of you in the group who are SABR members, I encourage you to write
>> your Board members and tell them the same. :) The baseball community really
>> needs to be spending its time on doing analysis and discovering new
>> information -- not the grunge work of putting together clean datasets.
>>>
>>> Ted
>>>
>>>
>>>
>>>
>>> ------------------------------------
>>>
>>> http://www.baseball-databank.org/Yahoo! Groups Links
>>>
>>>
>>>
>>>
>>
>>
>
>
>
> --
> F. X. Flinn
> FXFlinn@gmail | 802-369-0069
>

#4076 From: Jeff Zimmerman <wydiyd@...>
Date: Thu Mar 31, 2011 1:55 am
Subject: RE: Re: The Future
wydiyd
Send Email Send Email
 
I feel the data should be available with absolutely no restrictions on the data.  I am not a lawyer and others aren't either.  I am just a baseball fan and just want to use the data.  I don't want to have to find out if I can/can't use any/all of the data in anything I do that may/may not make a dollar.  I would just like to know what the Royals winning percentage was from 1986 to current and be able to post in an article and not worry if the any laws were/weren't broken.  No restrictions, no worries.  

Jeff Zimmerman




To: baseball-databank@yahoogroups.com
From: dadair@...
Date: Wed, 30 Mar 2011 19:47:51 -0400
Subject: Re: [baseball-databank] Re: The Future

 
FX,

Well, it's obviously not my call to forgive SABR. The good thing is you
can go ahead and release those BDB-like products anyway (at least to my
understanding; I-Am-Not-a-Lawyer). There are multiple ways to do what you
said successfully, and there are a number of ways you can botch it. There
are a half-dozen variants of the creative commons license, and there are
varying kinds of "products" you could provide back. There's also the open
question of whether this data would be hosted just for SABR members, or
for everyone. I am honestly hoping for your efforts here to be successful,
but that's in your organization's hands.

For what it's worth (not much), I would recommend you not get too hung up
on meeting the needs of those missing dba skills, and stick with making
the data available as the first pass. Zipped csv/xls file storage is much
less expensive and much quicker to market than an ambitious contract
project.

My advice to the current crop of databank contributors is to be open about
contributing but guarded about risk. If something does go wrong, and the
new maintainer takes it closed, being able to roll back and only do a year
of updates, vs. catch up from this point in time is critical. I'd like to
avoid a fork, but we owe it to current and future consumers of the data to
ensure the set is available to the masses.

Regards,
Derek

P.S. On the note re: NDA's, the committee in question was the Negro
Leagues committee. In order to access their data, you were asked to sign
an NDA because they were working on a book at the time. I can't remember
exact dates, but it could have been earlier than your board service.

On Wed, 30 Mar 2011, F. X. Flinn wrote:

> Derek, I never heard of any committee having any kind of NDA agreement, and
> I've been on the board since July 2001. The lack of accessibility of the
> data was a problem we first tried to address by contracting with XMLTeam to
> build out a system that would make the data truly useful to a larger
> audience without dba skills, but that didn't work out. Meanwhile bbref had
> become the defacto place to go, so we felt less compelled to compete with
> them or with BDB.
>
> Bottom line is that SABR could start producing BDB type products tied to a
> creative commons license in fairly short order, and it's definitely
> something we have in the hopper once the dust settles on the new move, new
> staff, new website that's all rolling out as this discussion takes place. If
> we just went ahead and did that, would all be forgiven?
>
> FXF
>
> On Wed, Mar 30, 2011 at 6:21 PM, Derek Adair <dadair@...> wrote:
>
>>
>>
>> Gang,
>>
>> I have been rather quiet in this community for a long time for personal
>> reasons, but this issue is something I feel very strongly about. Plus this
>> thread is full of throw-back names, so I had to chime in :-)
>>
>> Over the years, I have been involved in a number of efforts, and witnessed
>> several more where grass roots labor built up some store of data or a
>> product of value, and then that data or project got rerouted to something
>> commercial and/or closed to the public. Two obvious examples that come to
>> mind are CDDB (a store of compact disc information) and ICS (Internet
>> Chess Server). I don't want that to happen to this data, and I don't think
>> it has to.
>>
>> One of the reasons for the success of this databank is its resilience. The
>> data source has had multiple shepherds over the years, but once it went
>> public, it hasn't looked back. Strides towards more inclusive formats have
>> been the norm, and each of us can currently download the entire dataset to
>> munge to our heart's content. Each new shepherd (usually named Sean) added
>> a layer of support for the data, and it fleshed out over time to what it
>> is today.
>>
>> There's absolutely further work that could be done with the data. Each
>> spring, grand plans and ideas are raised and some take fruit while others
>> die off. That's fine and good and natural. But it all comes back to the
>> data being available to all of us.
>>
>> A disclaimer is necessary. I am not a fan of SABR, primarily because of
>> the way it has handled its data. The SABR I am familiar with (five plus
>> years ago when I was a member) had closed committees with NDA's, datasets
>> only viewable record by record on the web, and the exact opposite of the
>> spirit this group has had.
>>
>> True, that was a long while ago. The fact that SABR has a "data czar" with
>> the approach that Ted has goes a long way. They have done some great work
>> releasing data sets. But still, with the history there, I can't help but
>> feel like handing over the proverbial keys to the data, including the
>> ability to determine licensing back to us, is a scary step in the wrong
>> direction. "We" own this data now. I can download the data set and munge
>> away. None of us can say for sure that we will be able to do that in two
>> years if we give that away. If we take that risk, the gain must be
>> overwhelmingly worth it. I personally don't see it.
>>
>> I understand this may be a bit of a doom-and-gloom view of where we're
>> going, but as I mentioned, my viewpoint is one of someone who has seen
>> their contributions to CDDB turn write-only. The impact here would be
>> worse, because of the reporting and research use for the data we have
>> collected over the years.
>>
>> Regards,
>> Derek
>>
>>
>> On Wed, 30 Mar 2011, Theodore Turocy wrote:
>>
>>>
>>> On 30 Mar 2011, at 19:50 , Sean Forman wrote:
>>>
>>>> What I hope will happen is that SABR and Palmer/Gillette will make a
>> non-commercial open sourced version of the major league and bio data (which
>> I currently license from them for use on B-R (I don't use core bdb data on
>> b-r. I maintain it separately)) available to the general public. I will
>> gladly pay my fee to them and gladly point the hobbyist or enthusiast to the
>> full data they can use for free.
>>>>
>>>> I have been trying to put this bug in their ear for many years now. I'm
>> only a lowly SABR member and have no pull beyond my bully pulpit and
>> relationship as a customer of theirs.
>>>>
>>>> My personal view (and the view of probably anyone who knows anything
>> about this) is that the Palmer DB is the gold standard for the encyclopedia
>> numbers and the SABR bio committee is the gold standard for bio. I would
>> urge Palmer and SABR to make the db open source for non-commercial use, and
>> provide a way for the community to create new datasets with this as a
>> platform instead of continually having to stamp out LF-CF-RF discrepancies
>> or update death dates.
>>>
>>>
>>> I'm not sure how widely it's known, but for several years now I have been
>> serving as SABR's "dataczar, with remit to manage all the various datasets
>> SABR has or licenses.
>>>
>>> My personal position on the matter matches Sean F.'s. SABR ought to be
>> releasing datasets under, e.g., a Creative Commons license, and ought to be
>> providing resources to maintain datasets for the benefit of the community as
>> a whole.
>>>
>>> In addition to sharing Sean's assessment about the quality of the SABR
>> demographics and the Palmer/Gillette MLB statistics, a further argument for
>> SABR being involved is logistical. I already maintain the equivalent of
>> basically all of baseball-databank's data - plus significantly more - on a
>> day-to-day basis, as part of a regularized workflow, with tools I've
>> developed over several years of experience. To output the data in the format
>> of baseball-databank or similar, would take maybe two hours to write and
>> test the queries as a one-off. In other words, the ongoing cost of me
>> managing this data would be essentially zero.
>>>
>>> Where I have been able to, SABR has already started making datasets
>> available under a CC license, for instance, as part of the Baseball ID
>> Working Group.
>>>
>>> The only thing stopping me from volunteering to take on providing the
>> baseball-databank under the same terms straightaway is that the MLB
>> statistics, which are no doubt the core of the dataset, aren't currently
>> mine to release. I firmly support Open Source and Open Data principles. It
>> is worth remembering that one of those underlying principles is respect for
>> copyright and licensing terms.
>>>
>>> I am actively working to try to make what Sean F. is proposing a reality.
>> Those of you in the group who are SABR members, I encourage you to write
>> your Board members and tell them the same. :) The baseball community really
>> needs to be spending its time on doing analysis and discovering new
>> information -- not the grunge work of putting together clean datasets.
>>>
>>> Ted
>>>
>>>
>>>
>>>
>>> ------------------------------------
>>>
>>> http://www.baseball-databank.org/Yahoo! Groups Links
>>>
>>>
>>>
>>>
>>
>>
>
>
>
> --
> F. X. Flinn
> FXFlinn@gmail | 802-369-0069
>


#4077 From: Peter Kreutzer <askrotoman@...>
Date: Thu Mar 31, 2011 12:40 pm
Subject: The BdB purpose
pkreutzer
Send Email Send Email
 
I call everyone's attention to the BdB Statement of Purpose: http://www.baseball-databank.org/purpose.txt. I'm sorry to say that's my main contribution here over many years, but I am very appreciative of the hard work of others.

It sounds to me like the issue here is how best to advance and protect that purpose for those of us who have served some function at Baseball Databank over the years.

Are our energies best served maintaining and correcting datasets and records, or are our energies best poured into supporting SABR's efforts to do the same?

It sounds as if SABR wants to release the major league dataset under a Creative Commons license for non-commercial use. That's more restrictive than the BdB license, I believe. Are we okay with that?

And more importantly, to my mind, is what happens to other datasets? Will (or can) SABR release, under a similar license Negro and Minor league sets? Their biographical database? Is the all important BBID project going to be available for everyone's free use?

I don't know the answer to these questions, but it seems to me that SABR is the perfect home for our efforts if our overall goals can be furthered there. But if the restrictions SABR already has and has to live with are going to prevent free public development of and access to ancillary datasets, then we're better off maintaining the core BdB on our own, so that some time in the future developers and volunteers can compile the missing and ancillary data.

I don't mean this as a slight to SABR, and I think we all hope they release all the data they can whether BdB joins up or not, but we should be as pragmatic as possible protecting the ideal of getting all the data out there, one way or the other.

Cheers,
Peter

Messages 4048 - 4077 of 4393   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help