I am about to engage in an analysis of baseball stats. The database
that I am using is Lahman-52. will be analyzing 1921 to 2004.
Is this a decent choice for the data?
In that DB there are columns for singles, doubles, triples, etc. as well
as at-bats.
The sum of such columns (including various ways to be out) is not the
same as the number of at-bats. However the numbers do not include
decimal points--so I assume that they represent counts.
Should I divide those columns by the at-bats to generate a string of
numbers that will total to the batting average? That is, a set of
columns that will sum to unity and a subset (hits, etc.) that will sum
to the batting average?
We are planning to eliminate all records where the number of games were
less than 20.
Depending on the complexity of the output, we may eliminate pitchers
from the input so as not to waste a degree of freedom.
Lastly we would like to have the names of potential collaborators in
interpreting and writing up this data.
our analytical procedure is something called "Polytopic Vector Analysis"
(PVA) and we have all ready done a few trial runs with interesting results.
In response to both responses to my response... I completely agree that Access is blown out of the water - as is Paradox, FileMaker, and every other consumer...
... Much of what you are asking for here is available in the book "Baseball Hacks" by Joseph Adler, which has already been mentioned. I don't think it ...
... I am presently writing an article that shows how to get the BDB information into PostgreSQL and load the Retrosheet data into MySQL and PostgreSQL. This...
... I think another option is SQLite[1], which is very fast and perfect for small queries. I did the conversion a couple months back, and have an SQLite...
Someone else mentioned about using Access as a front-end. This is absolutely true, and yet another great useability feature of Access. It's a snap to make an...
... If you write 100 queries (and save them), how do you keep them straight? ... Is this like "working" with oracle all the time but "using" a mailreader most...
... Ahhh... Access has a "properties" button, which you can put comments and documentation for each view, which you can then see. That column is also...
... I considered that once long ago; maybe I should consider again. Now I have Query names as two "dimensions" of organization because they are both...
... Just to give you some ideas. When I was working on my project, I tackled many topics and subtopics. So, when I worked on Relievers and Leverage, I have a...
I am about to engage in an analysis of baseball stats. The database that I am using is Lahman-52. will be analyzing 1921 to 2004. Is this a decent choice for...
What is it that you are trying to accomplish? What is the question that you are seeking an answer for? Tom ... THE BOOK -- Playing The Percentages In Baseball...
Question: Can the data vectors for each player be used to create a classification of ball players? First the number of player-types has to be determined from...
If by classification you mean a profile or style of player, that's a good project. I look forward to seeing what you have. If you want some ideas as to how...
... Of course. ... As Tom Tango suggested, that reveals mixed motives. I believe that it must get in the way of classifying players --which you might do by ...
... I guess that you are inclined to denominate by at bats, ignoring bases on balls and hits by pitch, sacrifice bunts and flies, because popular baseball uses...
Paul: Thanks for the advice--I will include the other stats. The hitting example was a feeble attempt to describe what I am doing. The procedure is robust in...
Kristin Campbell <kcampbell53@...> is a colleague, I suppose? ... The particular article by Jim Albert, advocating four rates understood sequentially,...
... Voros first described this process when he developed DIPS. He would break up the stat line into binary components: HBP, no HBP. Of the no HBP, walk or no ...
... Based on what I know of the project, I might have guessed 1939 because in the Batting table I find 26106 records with null GIDP (ground into double play)...
How was Oracle making it complicated to move data from one Oracle database to another? ... From: Tangotiger To: baseball-databank@yahoogroups.com Sent:...