Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

baseball-databank · Baseball Databank

The Yahoo! Groups Product Blog

Check it out!

Group Information

? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Messages

Advanced
Messages Help
Messages 2121 - 2150 of 4385   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Show Message Summaries Sort by Date ^  
#2121 From: Derek Adair <dadair@...>
Date: Fri Apr 30, 2004 3:59 pm
Subject: Re: New file uploaded to baseball-databank
D_Adair
Send Email Send Email
 
On Fri, 30 Apr 2004, Paul Wendt wrote:

> Beside the issues covered by Derek and Tom yesterday, this exhibit may be
> useful in discussion of U ("User") step 0, as I call them for now:

> U - what should User do before or concurrently with giving data to Admin?
> Multiple issues.  Scope includes: data format, source notes, proofing,
> documentation of all three, (other) use of bb-db Messages and Files

Data format - anything rational.

Source notes - following from Tom's proposals, we'd want to know your
source(s). The admin might ask for clarification here since two people
might refer to the same source by a different name. We should come up with
a standardized naming scheme for our sources.

Proofing - I don't know enough about the current process to provide
ideas/answers on this one. I'm all ears.

Documentation on all three - explanatory text on the first is needed, and
was provided with this example. In some cases there might be justification
needed. For example, if the change was to make SF null before 1890
(totally made up dates here), the justification might be something along
the lines of "SF were not tracked previous to 1890; we currently have
zeroes which isn't accurate." I don't think much documentation is needed
on the source notes unless there were discrepancies and we're picking one
source as more reliable than another. The only documentation I can think
of for proofing is helpful hints - "duceyro01 should be looked at during
proofing - I believe I got the splitting up of his stats over stints
correct, but he should be verified."

bbdb messages - I think we've always been encouraged to post when we're
working on something so we can avoid duplication of effort. This should
probably continue. In addition, bbdb can be used by a User who wants to
submit his work. Send it out to the list (just as you did for this one),
posting a file if necessary. Then an Admin with spare time can post a note
saying, "I'll take care of this one," very much like Sean F. does now.

> 0 - What should/must Admin do after receipt from User, perhaps partly in
> exchange with User, before proceeding?
> >
> > 1 - checkout the Batting.csv file,
> > 2 . . .

Obviously there might be some flow back and forth between the Admin and
the User for clarification, but after that it should go as Tom previously
outlined.

Regards,
Derek

#2122 From: Sean Forman <sean-forman@...>
Date: Sat May 1, 2004 10:29 am
Subject: Re: Re: Means of Updating and Correcting BDB Data
sforman71
Send Email Send Email
 
Derek Adair wrote:
> On Fri, 30 Apr 2004, tmasc@... wrote:
>
>  >
>  > --- Derek Adair <dadair@...> wrote:
>  > > I like this table, and I like the sources table. I
>  > > think it makes sense to
>  > > have an admin table as well. Generally that will be
>  > > the person who
>  > > committed the file, but I'd rather keep the data in
>  >
>  > I was considering it, but then figured that CVS should
>  > have the person who checked in that version of the
>  > file.
>
> This might be something you'd want to display elsewhere, though - release
> notes, for example.
>
>  > > DB format than depend
>  > > on CVS to track it. On the other hand, I'd leave CVS
>  > > version out, since
>  > > CVS will be much better at tracking revision numbers
>  >
>  > I agree.  That field I have would come *from* CVS.
>  > This DATACHANGE table, through the CVSversion field,
>  > will map directly to CVS.  Hence, no need to have the
>  > admin field in DATACHANGE.
>
> This is possible, but a couple of things to keep in mind here. First,
> versions are file-based. So the version number you'd want is the one
> associated with the table modified. Second, version numbers aren't added
> until a file is committed. So you'll need to run an update on that file
> after making the changes but before committing, check the version number,
> add one, make the change to the changes table and commit. Not saying it's
> a bad idea at all, just pointing out there's some overhead.
>
> Regards,
> Derek



This is starting to look a bit onerous.

The current conversation is about the mechanics of how to make the
changes and how to store the changes, which is probably the first thing
that needs to be discussed.

I currently have a cvs version of the BDB on my computer and I make
alterations in mysql and then dump the tables out in text format and
then run cvs commit to log the changes.  It takes a few minutes time for
each commit, but it works easily.  I think it would be doable to write a
script that works the other way and loads the db from the tables checked
out from CVS.  Again, the only issue I see would be speed.  I also don't
know how easy it would be to do this sort of thing in Windows.  In
linux, I can write one script that would update my text files with the
latest version of the db, load those text files into CVS.  Then another
script to reverse the process.  As part of this, we could add a CHANGES
file to the CVS repository where it would be upon the updater to
explicitly state the changes made to the files.  Naturally, the cvs log
messages would need to be explicit as well.   We'd probably develop a
standard for this.

Also MYSQL does have good update logging tools, so one possibility would
be to have all changes go through mysql and then dump the tables and
then cvs commit on the update logs and the tables in order to properly
capture the changes.  It is possible to use comments in mysql, but I
don't know if they are properly logged or not within the log files.

I think the main thing that we want to capture is how to track who is
doing what, what changes have been made, and also the ability to rewind
easily when a disastrous mistake is made.  Are there other data projects
that use CVS? MySQL?  a combination?




I like Paul's idea of framing this using real data.

Here is an e-mail I got yesterday.  Ignoring for the moment that we
would probably wait for Bill Carle's approval before making this
correction, what would be the steps for making this change?  Be explicit
in describing the steps.


-------- Original Message --------
Subject: BR-A Suggestion
Date: Fri, 30 Apr 2004 01:02:08 -0400
From: Michael Timmons
To: Baseball-Reference Comment <feedback@...>


This comment regards:
http://www.baseball-ref.com/admin/player_edit.cgi?player_ID=mccarfr01

Comment:
Frank McCarton was my great-grandfather.  According to the 1880 US
Census, he was born in New York City, not Middletown, CT.  Also, in the
1920 & 1930 US Census, his daughter Sarah (my grandmother) stated her
father was born in New York.  I just like to see the record
straight...*smile*
Thanks
Michael Timmons
--------------------------------------------



--
Sincerely,
Sean Forman

Baseball Stats!   http://www.Baseball-Reference.com/
Baseball Analysis!    http://www.BaseballPrimer.com/

#2123 From: Derek Adair <dadair@...>
Date: Sat May 1, 2004 8:02 pm
Subject: Re: Re: Means of Updating and Correcting BDB Data
D_Adair
Send Email Send Email
 
On Sat, 1 May 2004, Sean Forman wrote:

> This is starting to look a bit onerous.

What specifically is looking onerous?

> The current conversation is about the mechanics of how to make the
> changes and how to store the changes, which is probably the first thing
> that needs to be discussed.
>
> I currently have a cvs version of the BDB on my computer and I make
> alterations in mysql and then dump the tables out in text format and
> then run cvs commit to log the changes.  It takes a few minutes time for
> each commit, but it works easily.  I think it would be doable to write a
> script that works the other way and loads the db from the tables checked
> out from CVS.  Again, the only issue I see would be speed.  I also don't
> know how easy it would be to do this sort of thing in Windows.  In
> linux, I can write one script that would update my text files with the
> latest version of the db, load those text files into CVS.  Then another
> script to reverse the process.  As part of this, we could add a CHANGES
> file to the CVS repository where it would be upon the updater to
> explicitly state the changes made to the files.  Naturally, the cvs log
> messages would need to be explicit as well.   We'd probably develop a
> standard for this.

I strongly agree we should have a standard for cvs log messages.

> Also MYSQL does have good update logging tools, so one possibility would
> be to have all changes go through mysql and then dump the tables and
> then cvs commit on the update logs and the tables in order to properly
> capture the changes.  It is possible to use comments in mysql, but I
> don't know if they are properly logged or not within the log files.

My thoughts on this approach are already public, but just to summarize....
This doesn't work well for multiple users. If two people are trying to
make changes at the same time and any two changes hit the same piece of
data, whoever commits first will lose their changes. CVS will make the
second person resolve the conflict. Also, this means every admin will have
to do their changes via MySQL, even if they're more comfortable using
another tool for that change.

> I think the main thing that we want to capture is how to track who is
> doing what, what changes have been made, and also the ability to rewind
> easily when a disastrous mistake is made.  Are there other data projects
> that use CVS? MySQL?  a combination?

I have looked on the web, without much luck. It's not the easiest thing
in the world to search for. If someone else has better luck, I'd be
interested in taking a look.

> I like Paul's idea of framing this using real data.
>
> Here is an e-mail I got yesterday.  Ignoring for the moment that we
> would probably wait for Bill Carle's approval before making this
> correction, what would be the steps for making this change?  Be explicit
> in describing the steps.
>
>
> -------- Original Message --------
> Subject: BR-A Suggestion
> Date: Fri, 30 Apr 2004 01:02:08 -0400
> From: Michael Timmons
> To: Baseball-Reference Comment <feedback@...>
>
>
> This comment regards:
> http://www.baseball-ref.com/admin/player_edit.cgi?player_ID=mccarfr01
>
> Comment:
> Frank McCarton was my great-grandfather.  According to the 1880 US
> Census, he was born in New York City, not Middletown, CT.  Also, in the
> 1920 & 1930 US Census, his daughter Sarah (my grandmother) stated her
> father was born in New York.  I just like to see the record
> straight...*smile*
> Thanks
> Michael Timmons
> --------------------------------------------

Here are the steps I would take under the proposed plan:

1. Go to my working directory with my checked out copy of the database (or
run "cvs checkout" to create a local copy).

2. Run "cvs update" to get the freshest copy from the repository.

3. Make the change to the text file. This could either be done by text
editor at this point, or by loading into MySQL, running the update command
on that data point, and extracting the data back out.

4. For the tracking we suggested, we want a datasource and a datasupplier.
I'm just taking stabs at these, but here's what my first attempt at an
entry in DATASOURCES would be:

001,"BR Comment",2004

5. I'd add a row to DATASUPPLIER:
001,"Michael Timmons"

6. If we wanted to put CVSversion in the DATACHANGE table, I'd want to run
"cvs status -v Master.txt" to see the current revision.

7. I'd add a row to DATACHANGE to link:
changeid,sourceid,supplierid,table,CVSversion,comment
195,001,001,Master,2.08,"changed birth place of mccarfr01 to NYC from
Middletown, CT"

8. I'd then run "cvs commit" and add a descriptive log message:

Changed birth place of mccarfr01 to NYC from Middletown, CT per BR comment
from great-grandson Michael Timmons.

---

There are probably two separate things to decide on here. 1-3 and 8 is how
we would use CVS. 4-7 is how we would track changes. We have choices on
either of those projects. I do think 4-7 might need to change at least a
little, since I realized when I was doing this sample that it doesn't
support multiple data sources.

Regards,
Derek

#2124 From: "tjruane" <truane@...>
Date: Sat May 1, 2004 9:27 pm
Subject: Re: Data Additions
tjruane
Send Email Send Email
 
A few days ago, I wrote:

> So if you take the RF,CF,LF errors, PO and A data and add
> it together, it will bear little relation to the official error, PO
> and A data for OF.

And Tom replied:

> I can see this is true for "G", but not the case for GS, Inn,
> PO, A, E.  Those fields would be summable from RF/CF/LF into OF.
> Am I misreading you here?

I think so.  I am assuming you are going to continue using the
official fielding data for outfielders.  I would certainly
recommend that since in many instances there is no way of telling
which account of a play is correct (and the benefit of the doubt
has to go to the official account in these circumstances).  As
a result, Retrosheet's RF/CF/LF data will certainly sum to what
we would present as OF statistics, but this sum will bear little
relationship to the official OF data.

Tom Ruane

#2125 From: "tmasc@..." <tmasc@...>
Date: Sat May 1, 2004 11:07 pm
Subject: Re: Re: Data Additions
tangotiger
Send Email Send Email
 
--- tjruane <truane@...> wrote:
> circumstances).  As
> a result, Retrosheet's RF/CF/LF data will certainly
> sum to what
> we would present as OF statistics, but this sum will
> bear little
> relationship to the official OF data.
>
> Tom Ruane


Hmmm... this would imply then that any nonofficial
breakdown that we present, say splitting data between
starts and reliefs, or vs LH/RH (can't think of a good
example), etc would fall under a similar category.

Therefore, it would be necessary that we continue to
carry redundant data, because the official data is our
true checkpoint.  Interesting...

Tom






__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs
http://hotjobs.sweepstakes.yahoo.com/careermakeover

#2126 From: "tjruane" <truane@...>
Date: Sun May 2, 2004 12:35 am
Subject: Re: Data Additions
tjruane
Send Email Send Email
 
I wrote:

> As a result, Retrosheet's RF/CF/LF data will certainly sum to
> what we would present as OF statistics, but this sum will
> bear little relationship to the official OF data.

And Tom replied:

> Hmmm... this would imply then that any nonofficial
> breakdown that we present, say splitting data between
> starts and reliefs, or vs LH/RH (can't think of a good
> example), etc would fall under a similar category.

Not at all.  There's a light year of difference between fielding
statistics and the batting/pitching stats.  There is close to 100%
agreement between the data people typically care about and the
official data.  Simply put, the official scorers are careful
when dealing with homers and hits and runs but pay much less
attention to such things as what fielder actually caught the ball.

We typically have a handful of discrepancies between the
Retrosheet data and official data for batting and pitching each
season.  Almost all of these deal with things like batters
strikeouts, intentional walks and so on.  Even in these cases,
I'm confident that the Retrosheet is far more accurate than the
official data.  When you derive your statistics from the event
files, for example, you can't put a strikeout in the caught
stealing column, or give a GIDP to the wrong batter.  Every year
I proof I run into a few cases where a batter has an impossible
stat line (1-2 with a strikeout and a GIDP, for example).

Such is not the case for defensive statistics.  Instead of dozens
of discrepancies as we have with batting and pitching stats, we
typically have a hundred or more fielding discrepancies--and
that's only dealing with defensive games played, errors and
passed balls.  What's more, while I'm confident that the
overwhelming majority of the games played discrepancies are
official errors (it's not uncommon for there to be two official
second basemen in a game, for example, and no shortstop), I think
that many of the discrepancies in errors and passed balls are
problems with our data.  It is not that uncommon for a person
scoring a game to forget to mark down a passed ball or a dropped
foul ball, especially if the miscue did not result in a run.

As for putouts and assists, I suspect that there are both
official errors and event file errors in just about every game,
especially when you get back into the 1970s and 1960s.  When I
proofed the 1963 AL (the only time I even attempted to reconcile
this data), it was extremely rare to find two scoresheets in
complete agreement about what happened.  When there was agreement,
it often seemed as if the official scorer was watching a different
game entirely.  One typical mistake is for scorers (both ours and
the official ones) to confuse the numbers for the right and left
fielder in a game.

> Therefore, it would be necessary that we continue to
> carry redundant data, because the official data is our
> true checkpoint.  Interesting...

I wasn't under the impression that you were REPLACING any data
with the stuff that I provide.  Rather, I thought I was added
data that you didn't previously have.  And just keep in mind
that while official data may be your true checkpoint, that data
is far from correct.  About the best you could say for it is
that it is official.

Sorry for the length of this note.
Tom Ruane

#2127 From: "Mike Emeigh" <piratefan1@...>
Date: Sun May 2, 2004 1:43 am
Subject: Re: Re: Data Additions
mwemeigh
Send Email Send Email
 
Tom Ruane wrote:
(snip)
>
> And just keep in mind
> that while official data may be your true checkpoint, that data
> is far from correct.

and that is *especially* true with respect to fielding data. Fielding data
is especially error-prone, because no one bothered to proof it at the end of
the season. Even into the '40s and '50s you find examples of team putout and
assist totals that do not match the official totals (although they usually
manage to get the error totals correct), and there is absolutely no
assurance that defensive replacements who didn't bat were properly recorded
as having played.

With the availability of online archives like ProQuest and Paper of Record,
we have a chance of addressing some of these problems, at least for the
period when putouts and assists were being recorded in the box scores
(keeping in mind that newspapers boxscores weren't always official, either).

Mike Emeigh
piratefan1@...

#2128 From: Paul Wendt <pgw@...>
Date: Mon May 3, 2004 1:59 pm
Subject: proofing, proof notes, sources, source notes --not limited to Documentation in the database
pgw02472
Send Email Send Email
 
"Re: Means of Updating and Correcting" focuses on the revision control
system to be implemented by Admins and Sean F, using CVS, SQL, etc.
It seems to me that that is enough scope for one thread.
--

Fri, 30 Apr 2004 11:59:47 -0400 (EDT) Derek Adair <dadair@...>
replied "Re: [baseball-databank] New file uploaded to baseball-databank"
(my poor choice of subjectline).

Here I have deleted almost all substance of Derek's reply, although
I am responding to him by providing illustration.

> 30 Apr 2004, Paul Wendt wrote:
>
> > Beside the issues covered by Derek and Tom yesterday, this exhibit may be
> > useful in discussion of [steps U ("User") and 0 (before Admin step 1)]
> > as I call them for now:

In Tom's example, Admin received new data or revisions directly from User.

Inevitably, Admin has some responsibility for that documentation which is
included in the database.  (I'll be happy to call that Documentation and
refer to more general documentation in other terms.  See next sentence.)

Does Admin have responsibility for proofing, source notes not in the DB,
or other use of bb-db messages and files?  If so, should Admin merely ask
and remind, or undertake more substantial or forceful quality control?

Examples:
asks "Have you proofed this data?",
asks "How have you proofed this data?" and assesses adequacy of reply;
reminds "Be sure to send a notice to bb-db",
etc

Derek:
> Proofing - I don't know enough about the current process to provide
> ideas/answers on this one. I'm all ears.

1.
Examples:
Suppose transcription from a printed source of player-level data.

  - If source includes team- or league-level data, each level may be
transcribed and all sums checked.  What next?  It depends ;-)

  - A second person may check the work by reference to a copy of the
printed source.

Suppose new data that should fit some data already in the db.
(Eg, games and complete games are in the db and games finished is new.
There should be a fit at level of team sums.)

2.
In any case:
What should User report to bb-db? what report to Admin? what include in
documentation?
What should Admin *foster* by asking/reminding? what *enforce* and how?


--Paul

P/\/ \/\/t
Paul Wendt, Watertown MA, USA <pgw@...>

#2129 From: "tmasc@..." <tmasc@...>
Date: Mon May 3, 2004 3:14 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
tangotiger
Send Email Send Email
 
--- Paul Wendt <pgw@...> wrote:
> Does Admin have responsibility for proofing, source
> notes not in the DB,
> or other use of bb-db messages and files?  If so,
> should Admin merely ask
> and remind, or undertake more substantial or
> forceful quality control?
>


We should separate people from roles.  Take me, for
example.  I would only be admin.  I have no interest
to be anything else.  Sean Forman may enjoy the role
of proofer and admin.  And, maybe Paul would only want
to be a proofer.

So, what you have to do is create the role of proofer,
and all of the user requests would have to go to that
role.  Now, it may be that the decision will be that
all admins must be proofers too.  But, that really is
irrelevant, because that person still needs to do a
proofer role.

You also have to establish what requirements a proofer
has to satisfy.  Does he just
1 - do his proofing, and
2 - make his write-up saying that "I think we should
do this"
3 - some other proofer second his change
4 - then send this request to the db admin (which
could be himself)
?

Thinking of things in terms of roles, rather than
people, will make things clearer.

So, what we should try to establish are all the roles
required.

Tom











__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs
http://hotjobs.sweepstakes.yahoo.com/careermakeover

#2130 From: Paul Wendt <pgw@...>
Date: Mon May 3, 2004 9:55 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
pgw02472
Send Email Send Email
 
I agree that roles should be distinguished from people, and I intended
that by capitalizing Admin and User (Tom's term for the provider of new or
revised date to Admin, last week).

Tom implies that there is a role between User and Admin --Proofer, or
Middle for generality?  He is right, of course.  Someone may serve as both
User and Middle, someone else as both Admin and Middle, someone as all
three (eg, Derek Adair).  But the roles are distinct; indeed, some Users
will be merely sources of leads.

3 May 2004, tmasc@... wrote:

> --- Paul Wendt <pgw@...> wrote:
> > Does Admin have responsibility for proofing, source
> > notes not in the DB, or other use of bb-db messages and files?

"responsibility for" should not imply full responsiblity to do it all.

Eg, Admin's responsy for proofing might be to hold commitment of a data
change until a description of proofing is received.  Admin's responsy for
Documentation might be to hold commitment of a data change until Doc
provided by someone else is good enough (Admin as Doc editor).  Admin's
responsy for use of bb-db might be to send the notice that a change has
been committed, or merely to ask Middle to do so.

Illustrative Documentation of sources recently provided by Derek implies
that Admin must coach provider on format, at least.  Eg, Admin might
undertake coding and wording.

PP [abbrev. Pete Palmer]
"Pete Palmer's database <date>" or
"email communication with Pete Palmer <date>"

Retro [abbrev. Retrosheet; very common, maybe should be subdivided]


--Paul
P/\/ \/\/t
Paul Wendt, Watertown MA, USA <pgw@...>

#2131 From: Derek Adair <dadair@...>
Date: Mon May 3, 2004 10:16 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
D_Adair
Send Email Send Email
 
On Mon, 3 May 2004, Paul Wendt wrote:

> I agree that roles should be distinguished from people, and I intended
> that by capitalizing Admin and User (Tom's term for the provider of new or
> revised date to Admin, last week).
>
> Tom implies that there is a role between User and Admin --Proofer, or
> Middle for generality?  He is right, of course.  Someone may serve as both
> User and Middle, someone else as both Admin and Middle, someone as all
> three (eg, Derek Adair).  But the roles are distinct; indeed, some Users
> will be merely sources of leads.

For the most part, I'm simply listening in the discussion of the functions
of the Middle role, since I'm not sure what work best and I could go a
number of ways myself with the process. However, I did want to point out
that while, in theory, any Admin can work with any User, this isn't true
for Middles and Users. In most cases, proofing will require a copy of the
source materials. Also, in a number of cases, we'd want to consider a
secondary source for checking. I could see where it'd be very beneficial
to have a list of Middles and sources to which they have access.

Regards,
Derek

#2132 From: Paul Wendt <pgw@...>
Date: Tue May 4, 2004 1:05 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
pgw02472
Send Email Send Email
 
Upon reading my prose and checing my archive one day later:

3 May 2004, Paul Wendt wrote, in part:
> Illustrative Documentation of sources recently provided by Derek implies
> that Admin must coach provider on format, at least.  Eg, Admin might
> undertake coding and wording.
>
> PP [abbrev. Pete Palmer]
> "Pete Palmer's database <date>" or
> "email communication with Pete Palmer <date>"

Make that:
Documentation of sources as recently illustrated by *Tom* (quoted below my
signature) uses two new tables and several new fields in the database,
which implies that Admin must coach Middle on Doc format, at least.
Perhaps Admin must typically write the Documentation himself, presumably
derived from Middle's description to bb-db and from followup private
communication with Middle.

--Paul
P/\/ \/\/t
Paul Wendt, Watertown MA, USA <pgw@...>


On Fri, 30 Apr 2004, tangotiger wrote:

> This will mean that we need some sort of control process regarding if
> the data being given to us is actually accurate.  I touched on it
> with a "DATASOURCES" table.  Say,
> 001,Total Baseball 1,1994
> 002,Spalding Guide, 1918
>
> Now that I think about it, I'd want a "DATASUPPLIER" table too.  Say,
> 001,Michael Mavrogiannis,mmavro@...
> 002,Paul Wendt,pgw@...
>
> Finally, a table that joins all this:
>
> DATACHANGE
> changeid,sourceid,supplierid,table,CVSversion,comment
> 194,002,002,pitching,2.08,added balks to 1871

#2133 From: Sean Forman <sean-forman@...>
Date: Thu May 6, 2004 1:45 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
sforman71
Send Email Send Email
 
> For the most part, I'm simply listening in the discussion of the functions
> of the Middle role, since I'm not sure what work best and I could go a
> number of ways myself with the process. However, I did want to point out
> that while, in theory, any Admin can work with any User, this isn't true
> for Middles and Users. In most cases, proofing will require a copy of the
> source materials. Also, in a number of cases, we'd want to consider a
> secondary source for checking. I could see where it'd be very beneficial
> to have a list of Middles and sources to which they have access.
>
> Regards,
> Derek


I think it is going to be VERY hard to figure out how to structure this
process before actually having to use it.  I know that is circular, but
I'm more of the mind that we should start looking at data and then hash
out what process will work best.

1) set up the revision system with a current or recent DB

2) start considering data and as we do this figure out what steps need
to be taken.

3) codify those steps

I'm also warming up to the idea of a data source, supplier, change
table, but I'm not sure how to implement it.

Sincerely,
Sean Forman

Baseball Stats!   http://www.Baseball-Reference.com/
Baseball Analysis!    http://www.BaseballPrimer.com/

#2134 From: Sean Forman <sean-forman@...>
Date: Thu May 6, 2004 1:45 pm
Subject: Re: Re: Means of Updating and Correcting BDB Data
sforman71
Send Email Send Email
 
> 1. Go to my working directory with my checked out copy of the database (or
> run "cvs checkout" to create a local copy).
>
> 2. Run "cvs update" to get the freshest copy from the repository.
>
> 3. Make the change to the text file. This could either be done by text
> editor at this point, or by loading into MySQL, running the update command
> on that data point, and extracting the data back out.
>
> 4. For the tracking we suggested, we want a datasource and a datasupplier.
> I'm just taking stabs at these, but here's what my first attempt at an
> entry in DATASOURCES would be:
>
> 001,"BR Comment",2004
>
> 5. I'd add a row to DATASUPPLIER:
> 001,"Michael Timmons"
>
> 6. If we wanted to put CVSversion in the DATACHANGE table, I'd want to run
> "cvs status -v Master.txt" to see the current revision.
>
> 7. I'd add a row to DATACHANGE to link:
> changeid,sourceid,supplierid,table,CVSversion,comment
> 195,001,001,Master,2.08,"changed birth place of mccarfr01 to NYC from
> Middletown, CT"
>
> 8. I'd then run "cvs commit" and add a descriptive log message:
>
> Changed birth place of mccarfr01 to NYC from Middletown, CT per BR comment
> from great-grandson Michael Timmons.


Steps 4, 5, 6, 7 are where it is starting to look onerous, though I may
be overreacting.  I'm just trying to imagine running through Bill
Carle's bimonthly newsletters.  There are probably 100 corrections per
newsletter.  I guess we would make all of the corrections and then make
the following data entries into the DB.

002,"SABR Biographical Commmittee Newsletter, March/April 2004",2004

002,"SABR Biographical Committee, Bill Carle, Chair"

196,002,002,Master,2.09,"made dozens of corrections to many player bio
data, place and date of birth, place and date of death, debut date and
removed two players who were found to be duplicates.  foobarr01 was
folded into foobarr02 and zippo01 was folded into ziper01."
197,002,002,Pitching,2.09,"made dozens of corrections to many player bio
data, place and date of birth, place and date of death, debut date and
removed two players who were found to be duplicates.  foobarr01 was
folded into foobarr02 and zippo01 was folded into ziper01."
198,002,002,Fielding,2.09,"made dozens of corrections to many player bio
data, place and date of birth, place and date of death, debut date and
removed two players who were found to be duplicates.  foobarr01 was
folded into foobarr02 and zippo01 was folded into ziper01."


I think that we might also put a blob field at the end, so that you
attach the entire e-mail to the datachange file.  However, you run into
issues entering that into a text file with line returns, etc.  you would
have to enter that into a db and then dump the db for that to work.

What if we set up a web form for tracking the changes and updates that
are made?


Another concern I have in the loading and dumping from and to the DB and
the text files is ordering of the lines.  It would be a real pain to
have an admin load and then dump the files in some other order and then
everyone have to update all of their files, but that is probably a minor
issue.  I know how I would handle it in linux (I'd dump and then sort),
but I'm not sure how to handle it in windows or on a mac.

I've looked for other projects trying to do this as well, but haven't
found any.  I sort of chose the name based on the ProteinDatabank, but
that isn't really the same thing.

The other advantage of doing it directly in mysql rather than to a text
file is the the formatting is done for free.  We won't have to be as
careful for column counts etc., but I guess CVS will allow us to undo
any issues that might arise.

Sincerely,
Sean Forman

Baseball Stats!   http://www.Baseball-Reference.com/
Baseball Analysis!    http://www.BaseballPrimer.com/

#2135 From: "tmasc@..." <tmasc@...>
Date: Thu May 6, 2004 2:22 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
tangotiger
Send Email Send Email
 
--- Sean Forman <sean-forman@...>
wrote:
> I'm more of the mind that we should start looking at
> data and then hash
> out what process will work best.

It might be beneficial if you can list out how you can
your corrections.  For example, I know nothing about
the newsletter that you refer to.  And, I see from
another note that people write-in changes based on
what perceive as wrong at baseball-reference.com.

Can you list say the last several corrections you made
to the DB, and how those changes came about?

***

As well, for the comments section, even if you don't
go into too much detail, the admin can always run a
CVS delta to see what was changed.  I'd treat the
comments section as a "headline": just enough info for
the admin to know that this is the revision that he
should run the CVS-delta on.

Tom




__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs
http://hotjobs.sweepstakes.yahoo.com/careermakeover

#2136 From: Sean Forman <sean-forman@...>
Date: Thu May 6, 2004 2:41 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
sforman71
Send Email Send Email
 
tmasc@... wrote:
>
> --- Sean Forman <sean-forman@...>
> wrote:
>  > I'm more of the mind that we should start looking at
>  > data and then hash
>  > out what process will work best.
>
> It might be beneficial if you can list out how you can
> your corrections.  For example, I know nothing about
> the newsletter that you refer to.  And, I see from
> another note that people write-in changes based on
> what perceive as wrong at baseball-reference.com.
>
> Can you list say the last several corrections you made
> to the DB, and how those changes came about?

Tom, why haven't you joined SABR?  I just don't get it.  You would be a
natural.

There have been about 500-600 biographical corrections since the last db
update.  The SABR Biographical committee has people poring over census
records and newspaper archives to find the place of birth, name, place
of death, etc. for every major league player and manager.  Every two
months, Bill Carle, the committee chair sends out a list of all the
corrections found.  This includes the recently deceased.  I then update
their records in the DB.  I haven't updated BR since Feb, so they
haven't all shown up there yet.  I implement those en masse without any
doublechecking.


I get a number of e-mails pointing out things like pedro martinez 1999
AL Div series record is wrong.  I then look at a reference book, Total
Baseball, Baseball Guide, etc. to figure out the correct numbers and fix
them when possible.

I got an e-mail today from someone who believes Duffy Lewis batted right
and threw right.  On that I check all my encyclopedias and then usually
ask Bill Carle what his records say.

If someone has a nickname, I'm missing, I'll usually just add it.

If someone writes and says they are someone's grandwhatever, I'll
forward it to Carle and usually wait to see if he includes it.

Some of the stats corrections that have appeared in TB or the
Encyclopedias have been implemented in a rather hit or miss manner, so
we are starting with an imperfect document, and not what you might
consider a clean slate.


> ***
>
> As well, for the comments section, even if you don't
> go into too much detail, the admin can always run a
> CVS delta to see what was changed.  I'd treat the
> comments section as a "headline": just enough info for
> the admin to know that this is the revision that he
> should run the CVS-delta on.
>
> Tom

It would be nice to know why these decisions were made and how they were
incorporated into the DB.  Just looking ahead ten years, we will still
be getting e-mails like "My encyclopedia says Ty Cobb has 4191 hits, why
is yours wrong?"  But for really obscure players, so it would be very
helpful to have an audit trail.  It will also prevent cases where we go
in a circle.  Fixing, breaking, fixing, breaking the same number over
and over.

Sincerely,
Sean Forman

Baseball Stats!   http://www.Baseball-Reference.com/
Baseball Analysis!    http://www.BaseballPrimer.com/

#2137 From: "tmasc@..." <tmasc@...>
Date: Thu May 6, 2004 3:33 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
tangotiger
Send Email Send Email
 
--- Sean Forman <sean-forman@...>
wrote:
> Tom, why haven't you joined SABR?  I just don't get
> it.  You would be a
> natural.
>

Thanks, but I'm trying to clean my hobbies plate as
much as possible.  (Not that I'm having much luck, but
lately, I'm getting there.)  I know that I'd spend too
many hours with the various SABR pubs and things, and
I get my quick fixes around the web already.  On top
of which, I already subscribe to (too) many computer
mags, that I feel like I'm drowning.  Anyway...

> deceased.  I then update
> their records in the DB.  I haven't updated BR since
> Feb, so they
> haven't all shown up there yet.  I implement those
> en masse without any
> doublechecking.
>

You mentioned in an earlier email that you make your
changes directly to the DB, otherwise, if you
download, and reupload, you have sorting issues.
What's that about?  Are you saying that the numeric ID
field you have is an autonum that gets regened each
time you re-upload?  If that's the case, then I'd
appreciate some background as to how MySQL does it
sequencing.

Tom






__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs
http://hotjobs.sweepstakes.yahoo.com/careermakeover

#2138 From: Derek Adair <dadair@...>
Date: Thu May 6, 2004 3:52 pm
Subject: Re: Re: Means of Updating and Correcting BDB Data
D_Adair
Send Email Send Email
 
On Thu, 6 May 2004, Sean Forman wrote:

> Steps 4, 5, 6, 7 are where it is starting to look onerous, though I may
> be overreacting.  I'm just trying to imagine running through Bill
> Carle's bimonthly newsletters.  There are probably 100 corrections per
> newsletter.  I guess we would make all of the corrections and then make
> the following data entries into the DB.

I definitely think you'd want to group the corrections. I think of them
more as a change set than a change to a single value. For example, "Added
ROE column" would be one "change" even though it affected a number of
rows. In the case of scattered data points, like you have with the Bio
Committee stuff, the change source and note, as well as the cvs diff,
provide you with all the tracking info you'd need.

> 002,"SABR Biographical Commmittee Newsletter, March/April 2004",2004
>
> 002,"SABR Biographical Committee, Bill Carle, Chair"
>
> 196,002,002,Master,2.09,"made dozens of corrections to many player bio
> data, place and date of birth, place and date of death, debut date and
> removed two players who were found to be duplicates.  foobarr01 was
> folded into foobarr02 and zippo01 was folded into ziper01."
> 197,002,002,Pitching,2.09,"made dozens of corrections to many player bio
> data, place and date of birth, place and date of death, debut date and
> removed two players who were found to be duplicates.  foobarr01 was
> folded into foobarr02 and zippo01 was folded into ziper01."
> 198,002,002,Fielding,2.09,"made dozens of corrections to many player bio
> data, place and date of birth, place and date of death, debut date and
> removed two players who were found to be duplicates.  foobarr01 was
> folded into foobarr02 and zippo01 was folded into ziper01."

I think you'd only need/want one of the entries in the Changes table.
Change 196 was "(making) dozens of corrections...."

> I think that we might also put a blob field at the end, so that you
> attach the entire e-mail to the datachange file.  However, you run into
> issues entering that into a text file with line returns, etc.  you would
> have to enter that into a db and then dump the db for that to work.

I agree that having a blob would make working with text difficult.
Personally, I'd rather see a web archive than have the db bloated with the
info.

> What if we set up a web form for tracking the changes and updates that
> are made?

That's definitely possible.

> Another concern I have in the loading and dumping from and to the DB and
> the text files is ordering of the lines.  It would be a real pain to
> have an admin load and then dump the files in some other order and then
> everyone have to update all of their files, but that is probably a minor
> issue.  I know how I would handle it in linux (I'd dump and then sort),
> but I'm not sure how to handle it in windows or on a mac.

I echo Tom here - what happens that your lines go screwy? I've always got
lines out in the order I put them in. I admit, I do a lot more "select *
from Batting into outfile '/dump/Batting'" type queries than I do actual
DB dumps. I do think the probable admins are tech-savvy enough that this
won't be an issue.

> I've looked for other projects trying to do this as well, but haven't
> found any.  I sort of chose the name based on the ProteinDatabank, but
> that isn't really the same thing.
>
> The other advantage of doing it directly in mysql rather than to a text
> file is the the formatting is done for free.  We won't have to be as
> careful for column counts etc., but I guess CVS will allow us to undo
> any issues that might arise.

CVS does allow you to roll back fairly easily. Hopefully this wouldn't be
an issue, though. I know before I send out any file, I import and make
sure it doesn't give errors or warnings.

Regards,
Derek

#2139 From: Sean Forman <sean-forman@...>
Date: Thu May 6, 2004 8:58 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
sforman71
Send Email Send Email
 
> You mentioned in an earlier email that you make your
> changes directly to the DB, otherwise, if you
> download, and reupload, you have sorting issues.
> What's that about?  Are you saying that the numeric ID
> field you have is an autonum that gets regened each
> time you re-upload?  If that's the case, then I'd
> appreciate some background as to how MySQL does it
> sequencing.
>
> Tom



I'm not necessarily saying it would be a problem in a system with one
user, but imagine that I download the text CVS file from the server.

Aardsma
Aaron
Aaron
...
Zernial
Zwilling


I import it into mysql or access or something else.  I then make a bunch
of correction.  Perhaps as a result of these corrections the table data
is stored in a different order.  And then in the course of working on
the data, the order of the data is flipped.


Zwilling
Zernial
...
Aaron
Aaron
Aardsma

I then dump this data from my DB.  When I try to commit it to the CVS
file, CVS will think that everything in the file has changed.  Not just
the one or two line change you made, but everything.   Then when you
update your cvs copy, it will think that it needs to download and update
your files to the zwilling -> aardsma version.

I'm not sure how legitimate this problem would be in the various DBs,
but the ordering of the lines, which is irrelevant to the DB, is
considered important by the CVS, so some care would need to be taken, so
that CVS doesn't keep making new versions based solely on orderings.

--
Sincerely,
Sean Forman

Baseball Stats!   http://www.Baseball-Reference.com/
Baseball Analysis!    http://www.BaseballPrimer.com/

#2140 From: "tmasc@..." <tmasc@...>
Date: Thu May 6, 2004 9:36 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
tangotiger
Send Email Send Email
 
--- Sean Forman <sean-forman@...>
wrote:
> file, CVS will think that everything in the file has
> changed.  Not just
> the one or two line change you made, but everything.

That's a great point.

Having every table dumped into text files by key order
would seem to be the best solution.  Not sure how
MySQL does the dumps, but an "ORDER BY [list,of,keys]"
should be added.

Tom




__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs
http://hotjobs.sweepstakes.yahoo.com/careermakeover

#2141 From: "tmasc@..." <tmasc@...>
Date: Thu May 6, 2004 9:36 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
tangotiger
Send Email Send Email
 
--- Sean Forman <sean-forman@...>
wrote:
> file, CVS will think that everything in the file has
> changed.  Not just
> the one or two line change you made, but everything.

That's a great point.

Having every table dumped into text files by key order
would seem to be the best solution.  Not sure how
MySQL does the dumps, but an "ORDER BY [list,of,keys]"
should be added.

Tom




__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs
http://hotjobs.sweepstakes.yahoo.com/careermakeover

#2142 From: Sean Forman <sean-forman@...>
Date: Thu May 6, 2004 9:41 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
sforman71
Send Email Send Email
 
tmasc@... wrote:
>
> --- Sean Forman <sean-forman@...>
> wrote:
>  > file, CVS will think that everything in the file has
>  > changed.  Not just
>  > the one or two line change you made, but everything.
>
> That's a great point.
>
> Having every table dumped into text files by key order
> would seem to be the best solution.  Not sure how
> MySQL does the dumps, but an "ORDER BY [list,of,keys]"
> should be added.
>
> Tom



That would solve the problem.  Is there a chance that this will be a
problem in any other db system?  For instance, if an admin uses access,
etc.?

Also, the newline, eol thing could be problematic as well.

I think it is doable, but I just want to think through the issues.

--
Sincerely,
Sean Forman

Baseball Stats!   http://www.Baseball-Reference.com/
Baseball Analysis!    http://www.BaseballPrimer.com/

#2143 From: "tmasc@..." <tmasc@...>
Date: Fri May 7, 2004 1:36 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
tangotiger
Send Email Send Email
 
--- Sean Forman <sean-forman@...>
wrote:
> That would solve the problem.  Is there a chance
> that this will be a
> problem in any other db system?  For instance, if an
> admin uses access,
> etc.?

With Access and Oracle, it won't be a problem.  For
any DB, just create a query/view that will export
exactly as you want it.

>
> Also, the newline, eol thing could be problematic as
> well.
>
> I think it is doable, but I just want to think
> through the issues.
>

Hmmm.... that would mean running some dos2unix or
unix2dos and ftp-ing as ascii or binary.  Yes, we
should be careful here.


Tom




__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs
http://hotjobs.sweepstakes.yahoo.com/careermakeover

#2144 From: Paul Wendt <pgw@...>
Date: Fri May 7, 2004 2:04 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
pgw02472
Send Email Send Email
 
On Thu, 6 May 2004, Sean Forman wrote:
> > Having every table dumped into text files by key order
> > would seem to be the best solution.  Not sure how
> > MySQL does the dumps, but an "ORDER BY [list,of,keys]"
> > should be added.
. . .
> I think it is doable, but I just want to think through the issues.

If technically doable with the necy variety of DB management programs,
yet a practical issue:
If Admin neglects this step 5% of the time, what is the cost?

--Paul

#2145 From: "tmasc@..." <tmasc@...>
Date: Fri May 7, 2004 2:29 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
tangotiger
Send Email Send Email
 
--- Paul Wendt <pgw@...> wrote:
> On Thu, 6 May 2004, Sean Forman wrote:
> > > Having every table dumped into text files by key
> order
> > > would seem to be the best solution.  Not sure
> how
> > > MySQL does the dumps, but an "ORDER BY
> [list,of,keys]"
> > > should be added.
> . . .
> > I think it is doable, but I just want to think
> through the issues.
>
> If technically doable with the necy variety of DB
> management programs,
> yet a practical issue:
> If Admin neglects this step 5% of the time, what is
> the cost?
>
> --Paul
>

The cost is that the cvs-delta would be useless.
However, you can get the "wrong" cvs version, load it
in your DB, export it out, and, either replace the
"wrong" version, or branch out of that version into a
new version.  Pain in the butt, but doable.  It would
look like this:

1.0
|
1.1  --- 1.1.1
|
1.2
|
1.3

So, in this case, 1.1 was checked-in the "wrong" way.
So, when you do a delta between 1.3 and 1.1, it'll
look like lots of stuff changed.  You'd need to
compare 1.3 to 1.1.1.

Not sure if CVS allows this kind of branching.

Tom






__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs
http://hotjobs.sweepstakes.yahoo.com/careermakeover

#2146 From: Derek Adair <dadair@...>
Date: Fri May 7, 2004 2:50 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
D_Adair
Send Email Send Email
 
On Fri, 7 May 2004, tmasc@... wrote:

>
> --- Sean Forman <sean-forman@...>
> wrote:
> > That would solve the problem.  Is there a chance
> > that this will be a
> > problem in any other db system?  For instance, if an
> > admin uses access,
> > etc.?
>
> With Access and Oracle, it won't be a problem.  For
> any DB, just create a query/view that will export
> exactly as you want it.
>
> >
> > Also, the newline, eol thing could be problematic as
> > well.
> >
> > I think it is doable, but I just want to think
> > through the issues.
> >
>
> Hmmm.... that would mean running some dos2unix or
> unix2dos and ftp-ing as ascii or binary.  Yes, we
> should be careful here.

This would not be as troublesome as the line order, since cvs has options
to ignore whitespace while doing diffs (so you wouldn't lose differences).
There wouldn't be any ftp-ing involved, though.

Derek

#2147 From: Derek Adair <dadair@...>
Date: Fri May 7, 2004 2:56 pm
Subject: Re: proofing, proof notes, sources, source notes --not limited to Documentation in the database
D_Adair
Send Email Send Email
 
On Fri, 7 May 2004, tmasc@... wrote:

> The cost is that the cvs-delta would be useless.
> However, you can get the "wrong" cvs version, load it
> in your DB, export it out, and, either replace the
> "wrong" version, or branch out of that version into a
> new version.  Pain in the butt, but doable.  It would
> look like this:
>
> 1.0
> |
> 1.1  --- 1.1.1
> |
> 1.2
> |
> 1.3
>
> So, in this case, 1.1 was checked-in the "wrong" way.
> So, when you do a delta between 1.3 and 1.1, it'll
> look like lots of stuff changed.  You'd need to
> compare 1.3 to 1.1.1.
>
> Not sure if CVS allows this kind of branching.

It does. You wouldn't necessarily need to use it, though. At any time, you
could check out an older version, order it how you wanted and then compare
the file to the current version as text. Either approach would work, and
nothing would be "lost."

If it's caught immediately after, then a reorder and a recommit is
probably the best approach to take.

Regards,
Derek

#2148 From: baseball-databank@yahoogroups.com
Date: Sat May 8, 2004 3:00 am
Subject: New file uploaded to baseball-databank
baseball-databank@yahoogroups.com
Send Email Send Email
 
Hello,

This email message is a notification to let you know that
a file has been uploaded to the Files area of the baseball-databank
group.

   File        : /newdat.zip
   Uploaded by : tjruane <truane@...>
   Description : Proposed Data Additions

You can access this file at the URL

http://groups.yahoo.com/group/baseball-databank/files/newdat.zip

To learn more about file sharing for your group, please visit

http://help.yahoo.com/help/us/groups/files

Regards,

tjruane <truane@...>

#2149 From: "tjruane" <truane@...>
Date: Sat May 8, 2004 3:17 am
Subject: Proposed Data Additions
tjruane
Send Email Send Email
 
I have just uploaded a file containing a first pass at some of the
data addition we have discussed, both here and over at Retrolist,
during the last few months.  There are three files in newdat.zip.

ofdata.txt contains the batting records.
They have the format:

id,year,team,g,gsl,gsr,cg,poff,roe,rof,gb,fb,miss

Where:

id - is the retrosheet player ID
team - the retrosheet team ID
g - games appeared in
gsl - starts against LHP
gsr - starts against RHP
cg - complete games
poff - times picked off
roe - times reached on errors
rof - times reached on failed fielder's choices
gb - ground balls
fb - fly balls
miss - the number of games without play-by-play data

pidata.txt contains the pitching records.
They have the format:

id,year,team,g,2b,3b,roe,rof,gb,fb,miss

Where:

2b - doubles allowed
3b - triples allowed
the rest the same as above

dedata.txt contains the fielding records.
They have the format:

id,year,team,pos,g,gs,cg,inn,iout,upo,idp,itp,pb,wp,sb,csc,
csp,pko,xi,miss

Where:

pos - position (1-10, where 10 is the DH)
inn - innings played in the field
iout - plays initiated by this fielder
upo - unassisted putouts
idp - DPs initiated by this fielder
itp - TPs initiated by this fielder
pb - passed balls
wp - wild pitches
sb - stolen bases
csc - caught stealing initiated by the catcher
csp - caught stealing initiated by the pitcher
pko - pickoffs
xi - interference
the rest the same as above

iout is the only thing we hadn't discussed on either list.
It contains the number of times a player either made an
unassisted putout or fielded a batted ball and received an
assist.  It does not include caught stealing or assists
resulting from strikeouts.  It does include assists granted
on plays where no outs were recorded (for example, "5E3").

Questions, comments and suggestions are welcomed.
Tom Ruane

#2150 From: "tangotiger" <tmasc@...>
Date: Sat May 8, 2004 1:22 pm
Subject: Re: Proposed Data Additions
tangotiger
Send Email Send Email
 
Great stuff!  My comments are interspersed...

--- In baseball-databank@yahoogroups.com, "tjruane" <truane@v...>
wrote:
> gsl - starts against LHP
> gsr - starts against RHP

Now this is an interesting category.  One thing that I do that others
may like is simply %PA against LH and RH.  (You only need one of them
of course).  This applies for batters and pitchers *and* fielders,
and shows how much the platoon effect is working for or against the
player.  For fielders, it should only be on batted balls in park
(excludes HR), excluding bunts.

However, this might be too much work for the payoff, and I'm not sure
if many others will find the benefit out of this that I have.

> gb - ground balls
> fb - fly balls

Are these all batted balls, or just outs?  (Batted balls would be
better, if you've got it.)  As well, Retro makes 5 classifications:
- gb
- fb
- pops
- liners
- bunts

Are pops and liners included in fb?  If they are, I'd rename the
field to "air balls".  (You also have bunts that are popped to 1B.)
Otherwise, I would keep them as 5 separate fields.

Tom

Messages 2121 - 2150 of 4385   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help