Search the web
Sign In
New User? Sign Up
sportscience · The Science of Sport and Exercise
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want to share photos of your group with the world? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
More on magnitude of correlation when controlling for something   Message List  
Reply | Forward Message #2647 of 3439 |

A couple of weeks ago I sent a summary to the list about how to express
magnitudes of effects when you control for something. See
http://sports.groups.yahoo.com/group/sportscience/message/2625 . I have
since been exchanging messages with Ian Shrier and others, and thinking
about the issues in connection with other related analyses. As far as I
can see, I got it right in that summary, but I did not provide practical
advice on how to calculate the effects and their confidence limits. Hence
this message. I have some suggestions, and I would appreciate feedback
from the handful of people who understand or care about this stuff.

Recall that, according to me, you can't use the regression coefficients to
express magnitude, because the coefficients aren't controlled properly for
collinearity. For example, in the extreme case, when one predictor X1 is
effectively the same as another predictor X2, the coefficients for X1 and
X2 are identical and equal to half what either would be on its
own. Therefore the analysis implies that X1 could still be predicting
something in the presence of X2, and vice versa. With a large amount of
collinearity the confidence intervals for the coefficients would be wide,
but the argument still applies. In short, collinearity produces bias in
the regression coefficients.

Partial or semi-partial correlations, according to me, give a more
realistic picture. Think about the correlations in terms of variance
explained. If you have only X1 in the model, you explain a certain amount
of variance in the dependent variable. If you include X2, and X2 is the
same as X1, you explain no extra variance. Therefore the partial
correlation for X2 is zero. That's the right kind of answer. In the
analysis, you would find that the residual or unexplained variation or
standard error of the estimate or root mean square error (all mean the same
thing) did not get any smaller as a result of including X2. That's right,
too. In fact, we can use the change in the error term to calculate the
partial correlations. Here's how I think it should be done.

SAS, and I presume most or all other packages, calculates the partial
correlations from sums of squares for the dependent variable. As such,
these correlations are not adjusted for degrees of freedom. They need to
be. When you do a multiple linear regression with a small sample size or
with a relatively large number of predictors, the correlations are biased
high. Adjustment for degrees of freedom removes the bias. The degrees of
freedom is the sample size minus the number of predictors (including one
for the intercept). I've done a rough calculation to show that the sample
size needs to be at least 100x(the number of predictors minus one) for the
bias to be trivial. (See below. I presume there is a vast but forgotten
literature on this topic.) So for most of the studies we do, adjustment of
the correlation is important.

The formula for the adjusted multiple correlation is given by the square
root of the fraction of variance explained: sqrt((SD^2-SEE^2)/SD^2), where
SD is the standard deviation of the dependent variable and SEE is the
standard error of the estimate for the model with all predictors. Now I
will assume, possibly wrongly, that this formula can be generalized to
semi-partial and partial correlations, as follows:
semi-partial R for X = sqrt((SEEX^2-SEE^2)/SD^2),
where SEEX is the SEE with all predictors except X in the model;
and partial R for X = sqrt((SEEX^2-SEE^2)/SEEX^2).

The confidence limits for the above would come from the p value provided by
the stats program for X in the multiple linear regression. Assume the
Fisher z transform of the correlation is normally distributed, put the
transformed value into my spreadsheet for confidence limits (in the first
bit, for normally distributed effects), insert the p value, insert 1000 or
some such very large number as the degrees of freedom, then back transform
the confidence limits.

A problem can arise when a partial correlation and sample size are
small. Because of sampling variation, you will sometimes get the square
root of a negative number. I presume you then set the partial correlation
to zero and get its upper and lower limits as above. That won't be perfect,
but it will be near enough, and I can't see what else you could do.

Some will argue that correlations don't give a good idea of magnitude,
because they are dependent on the between SD. The trouble is, the generic
measure of magnitude requires the between SD. Furthermore, regression
works only when there is a substantial SD, so you can't get away from the
SD. Also, for weak relationships, the regression coefficient depends on
the SD, so you still can't get away from the SD. But collinearity-related
bias is the real kiss of death for me. Maybe someone knows of a formula
that uses the variance-inflation factor (a measure of collinearity) to
adjust bias out of the regression coefficient, and its confidence limits.


Here's the working for the estimate of sample size needed to avoid
adjustment for degrees of freedom. The formula for the adjusted R^2 from
the SAS manual is:
adjR^2 = 1 - (1-R^2)(n-1)/(n-p),
where R is the unadjusted R (the Pearson R for simple linear regression), n
is the sample size, and p is the number of parameters in the model,
including the intercept. (Believe it or not, this formula boils down to
the variance-explained version above.) Worst-case scenario is where
effects are smallest. In this case, the adjusted R^2 is zero. Rearranging
the above, unadjusted R^2 = (p-1)/(n-1). For the adjustment to be trivial,
the unadjusted R needs to be <0.1 (Cohen's smallest correlation), so
unadjusted R^2 needs to be <0.01. It follows that n > 100(p-1)+1, as
stated above.

Will







Mon Apr 4, 2005 9:23 am

willhopkinsnz
Offline Offline
Send Email Send Email

Forward
Message #2647 of 3439 |
Expand Messages Author Sort by Date

A couple of weeks ago I sent a summary to the list about how to express magnitudes of effects when you control for something. See ...
Will Hopkins
willhopkinsnz
Offline Send Email
Apr 4, 2005
9:25 am
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help