I'm working on baseball right now. I'm starting
off with a model of wins and losses. I have found
another problem with modeling wins and losses.
Teams that haven't won or lost a game yet are difficult to rate especially
with logistic regression.
Colley's Matrix has a nice little adjustment
courtesy of Laplace that alleviates that problem, but the model itself can't
handle homefield advantage or change over time.
In baseball you get this problem with starting pitchers. Albie Lopez started
four games for the
Braves and the Braves lost all four games. I tried merging his games in with
the two games started by AAA callups but they lost their two games also.
In baseball it seems to help if you take the square-root of the runs scored by
each team and then do a regression for runs scored. It reduces
heteroscedasticity and keeps runaway scores from skewing the estimates.
The thing I like about modeling wins directly is that you get an estimate for
one team beating another. Modeling margin of victory, you get a winner, but
usually you have to do some simulation to get an estimate of the probability
of winning.
Alan Jordan
On Sun, 4 Aug 2002 04:49:04 -0400 Brad Kiser <rkiser@...> wrote:
> I do use Margin of Victory. I am of the opinion
> that ignoring MOV is ignoring
> a very informative chunk of data, which is a
> statistical no-no. My MOV is
> capped, so SOS is more important to a team's
> rating, but it can make the
> difference between two similar teams.
>
> Who here does/doesn't use MOV?
>
> - Brad
>