Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

baseball-databank · Baseball Databank

The Yahoo! Groups Product Blog

Check it out!

Group Information

? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Messages

Advanced
Messages Help
Messages 4241 - 4270 of 4385   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Show Message Summaries Sort by Date ^  
#4241 From: Mike Emeigh <mwe55innc@...>
Date: Mon Mar 19, 2012 4:35 pm
Subject: Re: Stints
mwemeigh
Send Email Send Email
 
This goes back to Tom's original question - how are people using stints?

In a volunteer effort one has to evaluate the cost of maintaining and validating the information vs the value of providing it. If split data across multiple stints takes several people several weeks to validate each year, and isn't of value to anyone except a couple of people, does it really make sense to keep it? (Not saying those numbers are correct, just presenting it as a for-example).

Sent from my iPhone

On Mar 19, 2012, at 12:20, Sean Forman <sean-forman@...> wrote:

 

Unless you want to show the stat lines for a player like Rob Ducey or Matt Luke who had multiple stints with a single team.


sean
---
Sean Forman
Sports Reference LLC, President
http://www.sports-reference.com/



On Mon, Mar 19, 2012 at 12:09 PM, Mike Emeigh <mwe55innc@...> wrote:
 

I agree that stint data should be in a separate table - it's not going to vary from, say, batting to pitching to fielding, I wouldn't think.

Where it's known, it might also be useful to have a start date and end date for each stint.

Sent from my iPhone

On Mar 19, 2012, at 11:55, "Tangotiger" <tom@...> wrote:

 

I agree with Kevin that it should appear in some other table.

Unless, well, do people need to track the batting line of someone who
played with the Mets in 1986 in Apr-May, with the Expos in Jun-Aug, then
again with the Mets in Sept 1986, so that the two 1986 Mets lines are
distinct? If you have an online DB like BR.com, or BaseballCube, or
Fangraphs, sure, maybe. For the rest of us though?

If ever we were to create a "splits" table, say, performance by home/away,
we wouldn't REALLY do a Apr-May Home split AND a Sept Home split, would
we? (I mean, if you wanted to do that, you'd want it for every player, so
you have a Home-month split, and you wouldn't want it only for those who
have two distinct splints.)

We've seen already that it's tripped up Lahman for the 2011 data, and it's
alot of extra effort to get it right (and based on the recent post, it's
still not right). (Not criticizing Sean, just pointing out that it trips
up even the most diligent of researchers.) Imagine the rest of us who
wouldn't necessarily always remember to link the stint IDs. (I fell in
the trap to this recently.)

So, I go back to "is it worth it"? Is it worth Sean's time to get it
right, is it worth our time to validate it, and is it worth our time to
first sum at the player-year-team level 99.99% of the time because we just
don't need that splint id?

Note: The splint-id existence predates my involved with the DB, so I'm
sure this has been covered many years ago.

Tom



#4242 From: Rod Nelson <rodericnelson@...>
Date: Mon Mar 19, 2012 4:44 pm
Subject: Re: Stints
rockymtnsabr
Send Email Send Email
 
Zactly.  We agree. That's the ideal format.

Rod

On Mon, Mar 19, 2012 at 12:19 PM, KJOK <kjokbaseball@...> wrote:


Rod - I didn't say it wasn't useful, just that it should NOT be in batting, pitching, fielding tables.  Same applies for the minor league data - there should be a separate table with something like:
 
ID                 Team Stint     StartDate EndDate
JoeBlowID  Team1 1   19480401  19480530
Joe BlowID  Team2 2   19480601  194800615
JoeBlowID  Team1 3   19480616 19480825 
Putting stints in batting, pitching, fielding is just a bad kluge. 
 
THANKS,
Kevin
From: Rod Nelson <rodericnelson@...>
To: baseball-databank@yahoogroups.com
Sent: Monday, March 19, 2012 11:07 AM
Subject: Re: [baseball-databank] Stints

 
I'm really surprised that Kevin would say that stints are not useful in batting, pitching, fielding tables since I know that he has dealt with historical minor league seasons. For many years, the guides showed players in some league who appeared for multiple clubs as a single entry and their performance stats were summed. That integration is very problematic and should be avoided forevermore - no matter what Tom Tango might argue - because it makes team and league totals suspect, for one reason.   This is NOT something that should ever be contemplated. Same as with breaking out outfield by position. You won't miss it until it's gone.

Is it worth it?  Of course it is.  But then again, we have access to a superior dataset.


#4243 From: "Tangotiger" <tom@...>
Date: Mon Mar 19, 2012 4:54 pm
Subject: Re: Stints
tom@...
Send Email Send Email
 
Right, Mike has the spirit of my question.

When I ask if it's worth it, I'm not asking if it's useful.  Clearly, it
has many uses, be it the chrono-order of listing teams, or, for those who
need it, splitting the stints of guys who left a team and came back in the
same year.

I'm asking if the cost of that use is a justifiable cost.

We had bad data in the first release, and we have bad data according to
the recent post, and it's traced to the stint ID.

One alternative is as Kevin suggested, and simply having a separate table,
that lists the stints for the players.  It keeps it away from that
batting, pitching, fielding table.  And, you can even enhance it by
including dates.  So, this handles the chrono-order need that some have.

That leaves us with the guys who left a team and came back in the same
year, while playing for some other MLB team in the same year.  Rob Ducey
and whatever other players are similarly affected.  Do we need to see his
batting and fielding line split between stints?  If not, then a separate
STINTS table handles that as well.  Indeed, if you really really need it,
you can have a StintsBatting table as an offshoot of the Batting table.

Think of the StintsBatting table as a split table, just as you might do a
HomeawayBatting table, you have a StintsBatting table, and it'll be
comrpised of Rob Ducey and the other handful of players around.

Everyone gets what they want here.  Without the headache of having to
remember to sum the Batting, Pitching, Fielding tables.  And, without what
we've seen so far, of having to contend with bad data, because of the
time-cost associated with it.

Tom

#4244 From: KJOK <kjokbaseball@...>
Date: Mon Mar 19, 2012 4:55 pm
Subject: Re: Stints
kjokbaseball
Send Email Send Email
 
Exactly. For example, LH/RH splits are probably much more useful, but we dont' have those in this database, primarily I suspect because they can be derived from Retrosheet data.  Same applies for the Rob Ducey two teams in one year guys - for any player post-1950 at least, you can derive their batting, fielding, pitching stint-segregated info from Retrosheet.
THANKS,
Kevin
From: Mike Emeigh <mwe55innc@...>
To: "baseball-databank@yahoogroups.com" <baseball-databank@yahoogroups.com>
Sent: Monday, March 19, 2012 11:35 AM
Subject: Re: [baseball-databank] Stints

 
This goes back to Tom's original question - how are people using stints?

In a volunteer effort one has to evaluate the cost of maintaining and validating the information vs the value of providing it. If split data across multiple stints takes several people several weeks to validate each year, and isn't of value to anyone except a couple of people, does it really make sense to keep it? (Not saying those numbers are correct, just presenting it as a for-example).

Sent from my iPhone

On Mar 19, 2012, at 12:20, Sean Forman <sean-forman@...> wrote:

 
Unless you want to show the stat lines for a player like Rob Ducey or Matt Luke who had multiple stints with a single team.

sean
---
Sean Forman
Sports Reference LLC, President
http://www.sports-reference.com/



On Mon, Mar 19, 2012 at 12:09 PM, Mike Emeigh <mwe55innc@...> wrote:
 
I agree that stint data should be in a separate table - it's not going to vary from, say, batting to pitching to fielding, I wouldn't think.

Where it's known, it might also be useful to have a start date and end date for each stint.

Sent from my iPhone

On Mar 19, 2012, at 11:55, "Tangotiger" <tom@...> wrote:

 
I agree with Kevin that it should appear in some other table.

Unless, well, do people need to track the batting line of someone who
played with the Mets in 1986 in Apr-May, with the Expos in Jun-Aug, then
again with the Mets in Sept 1986, so that the two 1986 Mets lines are
distinct? If you have an online DB like BR.com, or BaseballCube, or
Fangraphs, sure, maybe. For the rest of us though?

If ever we were to create a "splits" table, say, performance by home/away,
we wouldn't REALLY do a Apr-May Home split AND a Sept Home split, would
we? (I mean, if you wanted to do that, you'd want it for every player, so
you have a Home-month split, and you wouldn't want it only for those who
have two distinct splints.)

We've seen already that it's tripped up Lahman for the 2011 data, and it's
alot of extra effort to get it right (and based on the recent post, it's
still not right). (Not criticizing Sean, just pointing out that it trips
up even the most diligent of researchers.) Imagine the rest of us who
wouldn't necessarily always remember to link the stint IDs. (I fell in
the trap to this recently.)

So, I go back to "is it worth it"? Is it worth Sean's time to get it
right, is it worth our time to validate it, and is it worth our time to
first sum at the player-year-team level 99.99% of the time because we just
don't need that splint id?

Note: The splint-id existence predates my involved with the DB, so I'm
sure this has been covered many years ago.

Tom





#4245 From: "railsplitter_44" <danielghirsch@...>
Date: Mon Mar 19, 2012 4:41 pm
Subject: Re: Stints
railsplitter_44
Send Email Send Email
 
Just for the sake of a survey, I also use the stints in the Batting, Fielding,
and Pitching tables for my website.  Not only to correctly order the teams in
each season, but to also correctly join the tables for players who played for
the same team in 2 different stints.

I may be biased, but changing the structure of the database may cause many of us
to have to change our existing code.

I don't imagine it would be too difficult to make a Master Stint table that (for
example) shows Mike MacDougal CHA 1, Mike MacDougal WAS 2 and then apply it to
Batting, Pitching, and Fielding tables.

Dan Hirsch

#4246 From: "Tangotiger" <tom@...>
Date: Mon Mar 19, 2012 5:15 pm
Subject: Re: Stints
tom@...
Send Email Send Email
 
Another advantage of having a STINTS table is that you can incorporate
minor leagues as well.  If someone has a Mets-Expos-Mets needs for
same-year players, then surely someone else would have a
Boston-Pawtucket-Boston needs for same-year players.  Not that STINTS
table *must* do that, but it *can* be used to accomodate that requirement.
  Some guy gets sent down for three weeks, say JJ Hardy, then it might be
good to know when that happened.

And, as I said, a StintsBatting table will give the user what he needs, if
he needs it at that detail.

Basically, it's a question of doing a small redesign, and this will
localize any data quality issues in a very tight manner that will affect
only those people who need that data.

Tom

#4247 From: Theodore Turocy <drarbiter@...>
Date: Mon Mar 19, 2012 5:57 pm
Subject: Re: Stints
arb1ter
Send Email Send Email
 

Without giving away (too many) trade secrets, I can state the following about the way I manage statistical data.

There is only one statistical table, 'players', which incorporates all batting, pitching, and fielding totals, as well as relevant metadata.  These include stints (within leagues), stints (global across all professional leagues), and first and last played dates.  These all have UUIDs assigned (and published, as those of you with Baseball ID working group affiliations know) which are guaranteed to be stable.

In my own experience, managing 'batting', 'pitching', and 'fielding' as separate tables is of dubious value - all you can do is screw things up.  Further, absence of a record is not necessarily indicative of zeroes; these are logically different.  Within the very limited scope of MLB statistics, it is not a bad assumption that a missing, e.g., pitching record means the player did not pitch; outside that, it's a questionable to dubious assumption.  So schemata based on this assumption do not scale well.

Similarly, having separate 'postseason' tables adds complexity without value.  A 'season_phase' column accomplishes the same much more elegantly.  Similar scoping principles can accommodate e.g., splits if desired.

TLT
--
Dr Theodore L Turocy
Chadwick Baseball Bureau



On 19 Mar 2012, at 16:19, KJOK wrote:

 

Rod - I didn't say it wasn't useful, just that it should NOT be in batting, pitching, fielding tables.  Same applies for the minor league data - there should be a separate table with something like:
 
ID                 Team Stint     StartDate EndDate
JoeBlowID  Team1 1   19480401  19480530
Joe BlowID  Team2 2   19480601  194800615
JoeBlowID  Team1 3   19480616 19480825 
Putting stints in batting, pitching, fielding is just a bad kluge. 
 
THANKS,
Kevin
From: Rod Nelson <rodericnelson@...>
To: baseball-databank@yahoogroups.com
Sent: Monday, March 19, 2012 11:07 AM
Subject: Re: [baseball-databank] Stints

 
I'm really surprised that Kevin would say that stints are not useful in batting, pitching, fielding tables since I know that he has dealt with historical minor league seasons. For many years, the guides showed players in some league who appeared for multiple clubs as a single entry and their performance stats were summed. That integration is very problematic and should be avoided forevermore - no matter what Tom Tango might argue - because it makes team and league totals suspect, for one reason.   This is NOT something that should ever be contemplated. Same as with breaking out outfield by position. You won't miss it until it's gone.

Is it worth it?  Of course it is.  But then again, we have access to a superior dataset.

--
Rod Nelson, Managing Editor
The Emerald Guide to Baseball 2012
Download it Free!  http://www.sabr.org/


On Mon, Mar 19, 2012 at 11:55 AM, Tangotiger <tom@...> wrote:
I agree with Kevin that it should appear in some other table.

Unless, well, do people need to track the batting line of someone who
played with the Mets in 1986 in Apr-May, with the Expos in Jun-Aug, then
again with the Mets in Sept 1986, so that the two 1986 Mets lines are
distinct?  If you have an online DB like BR.com, or BaseballCube, or
Fangraphs, sure, maybe.  For the rest of us though?

If ever we were to create a "splits" table, say, performance by home/away,
we wouldn't REALLY do a Apr-May Home split AND a Sept Home split, would
we?  (I mean, if you wanted to do that, you'd want it for every player, so
you have a Home-month split, and you wouldn't want it only for those who
have two distinct splints.)

We've seen already that it's tripped up Lahman for the 2011 data, and it's
alot of extra effort to get it right (and based on the recent post, it's
still not right).  (Not criticizing Sean, just pointing out that it trips
up even the most diligent of researchers.)  Imagine the rest of us who
wouldn't necessarily always remember to link the stint IDs.  (I fell in
the trap to this recently.)

So, I go back to "is it worth it"?  Is it worth Sean's time to get it
right, is it worth our time to validate it, and is it worth our time to
first sum at the player-year-team level 99.99% of the time because we just
don't need that splint id?

Note: The splint-id existence predates my involved with the DB, so I'm
sure this has been covered many years ago.

Tom








#4248 From: "anson2995" <slahman@...>
Date: Tue Mar 20, 2012 1:39 pm
Subject: Re: Stints
anson2995
Send Email Send Email
 
"Tangotiger" <tom@...> wrote:
> I'm asking if the cost of that use is a justifiable cost.
> We had bad data in the first release, and we have bad data
> according to the recent post, and it's traced to the stint ID.

The problem of bad data don't have anything to do with the database design. It's
100% attributable to me, the person who processed most of the updates. It's the
first time in several years that I made the offseason updates rather than Sean
Forman, and the scripts I used to make and check the updates were outdated. It
shouldn't be a problem in the future.

I think it's much more labor intensive to use and maintain a table that lists
start and end dates in a separate transaction file, especially if we continue to
maintain batting/pitching/fielding as separate files.

But I'm certainly open to further discussion, on this or other design issues.

Regards,
Sean Lahman

#4249 From: Paul Golba <pgolba2@...>
Date: Sat Mar 24, 2012 9:28 pm
Subject: Lahman 5.9.1 Park Factors
pgolba2
Send Email Send Email
 
I'm not sure what the source of the park factors for the database are, but I was under the assumption that it was the same as the multi-year park factors as found on baseball-reference.com.  This is true for 2008 and 2009.  However, for 2010 and 2011 the park factors do not match for the most part, in some cases being off by as much as 6 points. 

For 2011 it appears that the database has the one-year park factors on B-R.com with the exception of Milwaukee which has the three-year park factors.  For 2010 they match neither the B-R.com one-year nor multi-year.

Is this an error or was the source of the park factors changed?

Paul Golba

#4250 From: Paul Golba <pgolba2@...>
Date: Sat Mar 24, 2012 9:12 pm
Subject: Lahman 5.9.1 Error
pgolba2
Send Email Send Email
 
Andy Phillips (phillan01), 2008, stint 1.  Lahman 5.9.1 has 31 ABs.  According to baseball-reference.com and Retrosheet, he only had 21 ABs.  MLB.com has him with 73 ABs for CIN that season, which is consistent with only 21 ABs in stint 1.

Paul Golba

#4251 From: Paul Golba <pgolba2@...>
Date: Sat Mar 24, 2012 8:34 pm
Subject: Re: Re: Stints
pgolba2
Send Email Send Email
 
My vote is to keep the stints as is. 

From a database perspective, it is much, much easier to take a stint divided table and sum it up to get the overall numbers than it is to try to take a combined table and then split it back up using a separate stint table.  I suspect it would be harder for the administrator to maintain two separate tables.

From a baseball perspective, the stint field in valuable to determine how a player moved from team to team during a season.  This is pretty basic information.  Does everyone need this information?  No.  Is it useful for people who do need this information?  Yes.

Also, along with the team discrepancies on the stints that I noted to start this (unexpected) thread, there are also several hundred pitching stints in the years 2009-2011 that did not have a Batting record at all.  Almost all of them are in the AL and I suspect none of the pitchers involved ever batted.  This is not a huge deal, except that for all other seasons if a playerID had a Pitching record he always had a Batting record, even if he never batted.  You may already be aware of it at this point, but I mention it anyway.

Paul Golba


From: anson2995 <slahman@...>
To: baseball-databank@yahoogroups.com
Sent: Tuesday, March 20, 2012 9:39 AM
Subject: [baseball-databank] Re: Stints

 
"Tangotiger" <tom@...> wrote:
> I'm asking if the cost of that use is a justifiable cost.
> We had bad data in the first release, and we have bad data
> according to the recent post, and it's traced to the stint ID.

The problem of bad data don't have anything to do with the database design. It's 100% attributable to me, the person who processed most of the updates. It's the first time in several years that I made the offseason updates rather than Sean Forman, and the scripts I used to make and check the updates were outdated. It shouldn't be a problem in the future.

I think it's much more labor intensive to use and maintain a table that lists start and end dates in a separate transaction file, especially if we continue to maintain batting/pitching/fielding as separate files.

But I'm certainly open to further discussion, on this or other design issues.

Regards,
Sean Lahman




#4252 From: "Tangotiger" <tom@...>
Date: Sat Mar 24, 2012 11:28 pm
Subject: Re: Re: Stints
tom@...
Send Email Send Email
 
Paul,

Your post is clear why we do *not* want to keep the stints as-is.

The requirement about chrono-stints can already be addressed by a Stints
table that shows the stint order.  Several posters have already responded
positively to this.

The "batting" table's dual role has already caused problems in the past.
I think the official MLB position is that the "batting" table is the "all
appearances" table, so that any game gets recorded with a "batting"
record, even if he didn't bat.  (I'm not exactly sure about this, but I'm
going on memory here, but it's consistent with players not batting still
having a batting record.)

Anyway, from a database perspective, we don't need to have a stint denoted
in the batting *and* pitching *and* fielding tables, and ensure it
matches.  The key fields of tables are supposed to identify records in not
such a rigid way that you would have to alter the key field if you find
the data needs to be updated.  That's why playerID fields should never
change, even if a pitcher's name gets changed.  You don't want to have the
stint as a key field, if it means that it may change if we have new
information.  Imagine we introduce minor league data.  Now, you've got
MASSIVE changes in key fields for tons of players across multiple tables.
(Think of players like JJ Hardy.)

At least, with a Stints table, it will be localized to a single table,
whose entire purpose is to track that.  Indeed, you wouldn't even need to
have the stintID have to be a key field.

Had we started with a clean slate, the Stints would be treated
equivalently to Home/Away splits or Inning splits or Starter/Relief
splits.  They'd be part of a child table.

Tom



> My vote is to keep the stints as is. 
>
>
> From a database perspective, it is much, much easier to take a stint
> divided table and sum it up to get the overall numbers than it is to try
> to take a combined table and then split it back up using a separate stint
> table.  I suspect it would be harder for the administrator to maintain
> two separate tables.
>
> From a baseball perspective, the stint field in valuable to determine how
> a player moved from team to team during a season.  This is pretty basic
> information.  Does everyone need this information?  No.  Is it useful
> for people who do need this information?  Yes.
>
> Also, along with the team discrepancies on the stints that I noted to
> start this (unexpected) thread, there are also several hundred pitching
> stints in the years 2009-2011 that did not have a Batting record at all. 
> Almost all of them are in the AL and I suspect none of the pitchers
> involved ever batted.  This is not a huge deal, except that for all other
> seasons if a playerID had a Pitching record he always had a Batting
> record, even if he never batted.  You may already be aware of it at this
> point, but I mention it anyway.
>
> Paul Golba
>
>
>
> ________________________________
>  From: anson2995 <slahman@...>
> To: baseball-databank@yahoogroups.com
> Sent: Tuesday, March 20, 2012 9:39 AM
> Subject: [baseball-databank] Re: Stints
>
>
>  
> "Tangotiger" <tom@...> wrote:
>> I'm asking if the cost of that use is a justifiable cost.
>> We had bad data in the first release, and we have bad data
>> according to the recent post, and it's traced to the stint ID.
>
> The problem of bad data don't have anything to do with the database
> design. It's 100% attributable to me, the person who processed most of the
> updates. It's the first time in several years that I made the offseason
> updates rather than Sean Forman, and the scripts I used to make and check
> the updates were outdated. It shouldn't be a problem in the future.
>
> I think it's much more labor intensive to use and maintain a table that
> lists start and end dates in a separate transaction file, especially if we
> continue to maintain batting/pitching/fielding as separate files.
>
> But I'm certainly open to further discussion, on this or other design
> issues.
>
> Regards,
> Sean Lahman
>
>
>


---------------------------------------------
The Book--Playing The Percentages In Baseball
http://www.InsideTheBook.com

#4253 From: "chrislambrou" <chrislambrou@...>
Date: Mon Mar 26, 2012 3:37 pm
Subject: Re: Stints
chrislambrou
Send Email Send Email
 
Does anyone have a comment on the reply below?

I'm always running into problems with appearances and have to use the fielding
table.  Not the best table to JOIN with since almost every player has multiple
records per season.

Thanks,
-Chris

--- In baseball-databank@yahoogroups.com, "Tangotiger" <tom@...> wrote:
>
The "batting" table's dual role has already caused problems in the past.
I think the official MLB position is that the "batting" table is the "all
appearances" table, so that any game gets recorded with a "batting"
record, even if he didn't bat. (I'm not exactly sure about this, but I'm
going on memory here, but it's consistent with players not batting still
having a batting record.)

#4254 From: "Tangotiger" <tom@...>
Date: Mon Mar 26, 2012 6:47 pm
Subject: Re: Re: Stints
tom@...
Send Email Send Email
 
There's an APPEARANCES table in the Lahman DB.  I haven't verified it, but
that might help you.

Tom

> Does anyone have a comment on the reply below?
>
> I'm always running into problems with appearances and have to use the
> fielding table.  Not the best table to JOIN with since almost every player
> has multiple records per season.
>
> Thanks,
> -Chris
>
> --- In baseball-databank@yahoogroups.com, "Tangotiger" <tom@...> wrote:
>>
> The "batting" table's dual role has already caused problems in the past.
> I think the official MLB position is that the "batting" table is the "all
> appearances" table, so that any game gets recorded with a "batting"
> record, even if he didn't bat. (I'm not exactly sure about this, but I'm
> going on memory here, but it's consistent with players not batting still
> having a batting record.)
>
>


---------------------------------------------
The Book--Playing The Percentages In Baseball
http://www.InsideTheBook.com

#4255 From: Nicholas Miceli <micelin01@...>
Date: Fri Apr 6, 2012 7:20 pm
Subject: Thank you to all.
nsmiceli
Send Email Send Email
 
Dear Members,

Thank you to all who responded to my request for help for readers.

As I get into the planning process for research, I'm sure that list members will have a great deal of constructive things to say.

I hope everyone has a great weekend.

Regards,

Nick Miceli

#4256 From: Clay Dreslough <cjd@...>
Date: Fri Apr 6, 2012 10:33 pm
Subject: Stints
upa2112
Send Email Send Email
 
I don't fully understand what is being proposed, but I just wanted to
take a moment to speak out in favor of backwards compatibility.

For example, people bought "Puresim 4" ( a baseball simulation game by
Shaun Sullivan). It loads data in the "Lahman" format. Even though the
game was published in 2010, it can still load last year's database and
this year's database. If the stint column goes away, many users will be
unable to load next year's database.

If it's too difficult to ensure that the stint data is correct, I'd
rather see it 90% correct, but with documentation that (for example)
stint data after 2010 may not be 100% accurate.

Clay

#4257 From: "Clem Comly" <ccomly@...>
Date: Sat Apr 7, 2012 5:06 am
Subject: Re: Stints
ccomly2003
Send Email Send Email
 
I don't currently have access to a good relational database.  If I did, the solution for stints in Fielding.csv is easy.  For all 2011 fielding stints for pitchers, update the stints to match stint in Pitching.csv for same player, year and team.  For all 2011 fielding stints for non-pitchers, update the stints to match stint in Batting.csv for same player, year and team. 
 
This will work for future years except when a player has 2 separate stints with the same team in the same year.  The fielding stints for those rare players will have to be handled manually.
 
Clem Comly

#4258 From: "Tangotiger" <tom@...>
Date: Sun Apr 8, 2012 3:05 pm
Subject: Re: Stints
tom@...
Send Email Send Email
 
BDB or Lahman DB would be "backward compatible" with the proposed change,
or at least have that as a potential.

If we do this with a STINTS table that shows it the way we've been talking
about it, then you could join the Batting, Pitching, Fielding table to the
STINTS table, and get the stintID for virtually every record.  There's no
reason that it has to be "backward compatible" out of the box.  But, one
extra join on each table will make it backward compatible.

The only issue will be the Rob Ducey's of the world, who leave and come
back to the same team in the same year, while interrupted by a different
MLB team.  Again, that can also be handled with a StintSplits table,
similar to have a splits table for Batting v Pitch Hand if we wanted to
eventually go there.

Anyway, this is really in Lahman's hands, as the keeper of the DB.  He
says that the data quality issues was that he needed to change his script.
  If that's the case, that we won't experience data quality issue again as
a result of stints, then, fine, the issue is mostly moot.  Those of us who
have no need to the stintID field can do the appropriate matching and
summing after-the-fact.

I will say that once you incorporate minor league data, we're going to
revisit this all over again.  As it currently stands, we're going to have
massive key changes when that happens.  JJ Hardy getting sent down to the
minors and being called up will have stintID 1 and 3 in 2009, whereas
right now, he has only 1 record.  And it'll get worse for guys that get
sent down and called up multiple times in the same year.  But, we're not
there yet.

Tom

#4259 From: "KJOK" <kjokbaseball@...>
Date: Tue Apr 10, 2012 8:06 pm
Subject: Re: 2012 Marlins teamID
kjokbaseball
Send Email Send Email
 
Retrosheet changed their minds and will be using TeamID MIA - I think we should
do the same...

THANKS,
Kevin

--- In baseball-databank@yahoogroups.com, Matthew Gargano <mgargano@...> wrote:
>
> Using standard database protocol, franchise ID should *never* change.
> Regardless of where the team moves.
>
> On Thu, Mar 1, 2012 at 9:53 AM, Sean Lahman <sl@...> wrote:
>
> >
> >
> > My plan is to use "MIA" as the teamID for the Miami Marlins. Don't think
> > BBRef or Retrosheet have weighed in yet.
> >
> > As far as the FranchiseID, my inclination is to leave it as is for now.
> >
> > Regards,
> > Sean
> >
> > ---
> > Sean Lahman
> > http://seanlahman.com
> >
> >
> >
> >
> >
> >
>

#4260 From: "chrislambrou" <chrislambrou@...>
Date: Tue Apr 17, 2012 1:34 pm
Subject: Re: Beta version of database available
chrislambrou
Send Email Send Email
 
I didn't see a response on this... Will DH no longer be included in the fielding
table?  Thanks.

-Chris

--- In baseball-databank@yahoogroups.com, John Rickert <rickert@...> wrote:
>
> In the Fielding.csv file previous seasons have totals for games played at DH,
but 2011 does not.
>

#4261 From: "N. S. Miceli, Ph.D." <micelin01@...>
Date: Thu Apr 19, 2012 2:52 pm
Subject: Question re how to treat data, statistically.
nsmiceli
Send Email Send Email
 
Dear group members,

Please excuse the cross posting. When examining more than one season's
worth of data, do you think that there is a need to examine the data
using time-series methods?

If this is too far off topic for the general group discussion, please
feel free to respond to me directly.

Regards,

Nick Miceli

#4262 From: "David J Wheeler" <dj.wheeler@...>
Date: Tue Apr 24, 2012 4:22 am
Subject: Newbie Help
dj.wheeler
Send Email Send Email
 
Being new to databases, I am wondering if someone might help me with
instructions on how to query career stats from the csv data provided recently.

I have tried opening the db file itself, but my computer locks up (as if the
file is too large to process. Is there a trick to this I don't know of?

All help is greatly appreciated.

#4263 From: "Tangotiger" <tom@...>
Date: Tue Apr 24, 2012 1:36 pm
Subject: Re: Newbie Help
tom@...
Send Email Send Email
 
You should download the Lahman MS Access database:

http://www.seanlahman.com/baseball-archive/statistics/

It'll make life easier for you.

Tom

> Being new to databases, I am wondering if someone might help me with
> instructions on how to query career stats from the csv data provided
> recently.
>
> I have tried opening the db file itself, but my computer locks up (as if
> the file is too large to process. Is there a trick to this I don't know
> of?
>
> All help is greatly appreciated.
>
>
>


---------------------------------------------
The Book--Playing The Percentages In Baseball
http://www.InsideTheBook.com

#4264 From: Matthew Gargano <mgargano@...>
Date: Tue Apr 24, 2012 1:03 pm
Subject: Re: Newbie Help
tkestars
Send Email Send Email
 
The easy way: use microsoft access

If you are using windows, the cheaper-and-not-as-easy-but-probably-better-way is to install XAMPP (or LAMP if you have a spare computer or want to use a virtual machine) and use something like phpmyadmin which provides an easy queryable interface. I'd recommend learning PHP, IMO it makes it a helluva lot easier to process the data. With PHP you can also use the baseball tools library I created (shameless plug)  https://github.com/matstars/baseball-tools 

Net/net, if you don't want to spend a fair amount of time, access is your best bet.


On Tue, Apr 24, 2012 at 12:22 AM, David J Wheeler <dj.wheeler@...> wrote:
Being new to databases, I am wondering if someone might help me with instructions on how to query career stats from the csv data provided recently.

I have tried opening the db file itself, but my computer locks up (as if the file is too large to process. Is there a trick to this I don't know of?

All help is greatly appreciated.




------------------------------------

http://www.baseball-databank.org/Yahoo! Groups Links

<*> To visit your group on the web, go to:
   http://groups.yahoo.com/group/baseball-databank/

<*> Your email settings:
   Individual Email | Traditional

<*> To change settings online go to:
   http://groups.yahoo.com/group/baseball-databank/join
   (Yahoo! ID required)

<*> To change settings via email:
   baseball-databank-digest@yahoogroups.com
   baseball-databank-fullfeatured@yahoogroups.com

<*> To unsubscribe from this group, send an email to:
   baseball-databank-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
   http://docs.yahoo.com/info/terms/



#4265 From: "FrankPereiro" <franpereiro@...>
Date: Thu May 3, 2012 7:52 pm
Subject: Working with different Databases
franpereiro
Send Email Send Email
 
Hi there,

I would like to merge (unite, join or however it's called) different databases.

We have a baseball site to highlight stats of latino players and we have found
some databases that we'd like to work with. We downloaded the mlb and npb
databases. And we're working on a venezuelan baseball, italian and dutch 
databases too.

But the problem are the id's of players, teams, managers, etc. The question is:
Is there a way to create one "main", batting, pitching, etc. table other than
doing it by hand?

I'm sorry if you don't understand the question, you sure can tell that english
is not my mother language.

Thanks in advanced for any tips.

Greetings,

#4266 From: "Clem Comly" <ccomly@...>
Date: Thu May 17, 2012 5:31 am
Subject: player IDs in Master but not in Appearances
ccomly2003
Send Email Send Email
 
The latest consistency check has 5 found when 2 expected.  Two of the 5 are cases apparently where the same player is accidentally in Master twice.  It looks to me that obrieje01 was merged into obriepe01 on the Retrosheet site.  Similarly whitecb01 was merged into whitebi02.  There are 1 or 2 Fielding
rows for each mergee and is 1 batting row.  I suspect smith02 who debuted 5/31/1886 was merged into another player but I am not sure who.
 
Clem Comly

#4267 From: KJOK <kjokbaseball@...>
Date: Fri May 18, 2012 5:00 am
Subject: Re: player IDs in Master but not in Appearances
kjokbaseball
Send Email Send Email
 
I think smith02 is now smithre01 - Rex Smith?
 
THANKS,
Kevin
From: Clem Comly <ccomly@...>
To: baseball-databank@yahoogroups.com
Sent: Thursday, May 17, 2012 12:31 AM
Subject: [baseball-databank] player IDs in Master but not in Appearances

 
The latest consistency check has 5 found when 2 expected.  Two of the 5 are cases apparently where the same player is accidentally in Master twice.  It looks to me that obrieje01 was merged into obriepe01 on the Retrosheet site.  Similarly whitecb01 was merged into whitebi02.  There are 1 or 2 Fielding
rows for each mergee and is 1 batting row.  I suspect smith02 who debuted 5/31/1886 was merged into another player but I am not sure who.
 
Clem Comly



#4268 From: "Clem Comly" <ccomly@...>
Date: Fri May 18, 2012 7:48 am
Subject: Re: player IDs in Batting but not in Appearances
ccomly2003
Send Email Send Email
 
I downloaded the ACCESS version of the DB early this week and hope it is in sync with other versions,
 
Alberto reported [query 09] There are 78 rows (season 2011) that are in the Batting table but not in the Appearances
.
 
I also found 78 rows for 2011 alone. 
The 78 rows appear to be players playing for 2 or more teams in 2011.  J.C. Romero has no PHI row for 2011 in Appearances but his COL row in Appearances has his combined totals for both teams.  Looking quickly, it appears most or all traded players are
missing an Appearances row. 
 
BTW a
ll of Youklis’ Appearances rows except 2011 have his old player ID.  The only 20th century problem is snydeal01 in Appearances s/b snydeja01.
There are 10 problem rows in 1880s.
 
Clem Comly

#4269 From: "Clem Comly" <ccomly@...>
Date: Fri May 18, 2012 6:02 am
Subject: Re: player IDs in Master but not in Appearances
ccomly2003
Send Email Send Email
 
Yes, Rex Smith.  Thanks, Kevin.  But all existing smithre01 records will need stint = 2 (Fielding,Pitching, Batting) before changing smith02 to smithre01.
 
These changes will reduce the count for query 5 from 5 to 2.  The 2 are Kiger (whose only ML experience was in post-season) and Hemond (winner Branch Rickey Award so in AwardsPlayers).
 
Clem Comly
 
From: KJOK
Sent: Friday, May 18, 2012 1:00 AM
Subject: Re: [baseball-databank] player IDs in Master but not in Appearances
 
 

I think smith02 is now smithre01 - Rex Smith?
 
THANKS,
Kevin

#4270 From: "Clem Comly" <ccomly@...>
Date: Sat May 19, 2012 9:40 pm
Subject: Can someone run consistency checks against 2012 version of DB?
ccomly2003
Send Email Send Email
 
I downloaded Access version and it doesn’t come with consistency queries and posted consistency queries each need manual updating to run in access.
 
For instance, I found an appearance row with a bad playerID.  For NYA for 2011 playerID “burneaj” s/b “burneaj01”.  I found no 2011 Fielding row for either player ID.
 
Clem Comly

Messages 4241 - 4270 of 4385   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help