Search the web
Sign In
New User? Sign Up
sportscience · The Science of Sport and Exercise
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Comparing reliability of two measures   Message List  
Reply | Forward Message #3406 of 3439 |

Someone recently sent me a question about reliability.  She said that it would be OK to send her question and my reply to this list. (Arenda, I have added a bit more to the reply I sent you.) The question is about comparing the reliability of two measures.  See below for the original question, then my answer.  Please feel free to post a different point of view.  I should send such messages to the list more often… 

I am playing with the format of my email messages to try to eliminate the double-line spacing that somehow gets inserted sometimes.  If this message is hard to read on your mailer, please let me know.

Will

    From: A.vanBeek@... [mailto:A.vanBeek@...]
    Sent: Sunday, 12 July 2009 1:02 a.m.
    To: will@...
    Subject: realibility measurements

    Dear Professor Hopkins,

    …I'm a PhD student interested in the regulation of cerebral blood flow in patients with Alzheimer's disease. For my measurements, I've first performed a study on reproducibility of my measurement technique in old subjects.  Therefore, I've used the information on your website, your article in SportScience, and your excel sheet.

    For my measurements, I've two methods of which I tested reproducibility in a small group, using the typical error expressed as coefficient of variation and the intraclass correlation.

    I'd like to ask you the following question: how can I test if the difference between the coefficient of variations (and the intraclass correlations) of the two methods is significant?

    I'm looking forward to your reaction.

    Yours sincerely,

    Arenda van Beek
    PhD student
    Department of Geriatric Medicine
    Radboud University Nijmegen Medical Center

Arenda, hi.  How to compare reliability of two  measures of is a good basic question.  I have addressed this question in various places for the situation when the measures come from different subjects, but I may have left people without guidance for when they come from the same subjects.

First, let’s get validity vs reliability out of the way.  When you have two measures and one of them is a criterion, their appropriate comparison is a validity study, in which you regress the criterion (on the Y axis) against the other one (on the X).  A major problem here is what to do about random error in the criterion, and it’s a problem I have solved only recently.  I presented the solution at a conference last month, and I have the spreadsheets for dealing with it.  As yet they are unpublished, but I am happy to send them.

But let’s assume that both measures are measuring what you are interested in and that the only difference between the measures is the magnitude of the random error.  They will therefore both give the right answer on average across the range of values (that is,  neither has proportional bias), so a simple comparison of their errors of measurement and/or intraclass correlation coefficients derived from a reliability study is therefore appropriate.  Fine, but how do you do it? 

If you have taken the measurements of each measure with different subjects (one group of subjects for Measure A, another group for Measure B), then the estimates of error are independent, so you can use the section of my confidence-limits spreadsheet (Confidence limits & clinical chances) for comparing two standard deviations.  You can compare the intraclass correlation  coefficients with my spreadsheet Combine/compare effects

If you have taken the measurements on the same subjects in separate trials (e.g., two trials for Measure A then two for Measure B, or maybe A,B, A, B), then you can be reasonably but not overly confident that the change scores for A are uncorrelated with the changes scores for B, so it’s probably still OK to use those spreadsheets.  But if the measurements for both measures were taken in the same two trials, you have a problem, because change scores for A will be correlated with change scores for B, so you CAN’T use my spreadsheets.  There are two solutions: mixed modeling and bootstrapping. 

With mixed modeling you set up a simple repeated-measures model in which you allow for extra error variance with one of the measures, then get confidence limits for the extra variance.  This approach has three drawbacks: (1) you can derive confidence limits only for the extra variance and not for any other way of comparing the reliability (the other two are ratio of errors and differences in intraclass correlation coefficients), (2) you need access to a package that does mixed modeling, and (3) the confidence limits for the difference in the errors are only approximate when the sample is small.  I’m not sure what “small” here is, and it depends on the magnitude of the extra variance, but in my experience a sample size of 30 is usually adequate.

With bootstrapping you can use any statistic you like to compare the measures (difference in errors, ratio of errors, differences in correlation coefficients).  You derive the statistic from each of several thousand bootstrapped samples, which are samples drawn from your original sample and with the same number of subjects as in your original sample.  The confidence limits come directly from the percentiles of the bootstrapped samples (e.g., 90% confidence limits are the 5th and 95th percentiles). This method works no matter what’s going on in your data, and it has only two drawbacks: (1) you have to generate and process the bootstrapped samples somehow, and (2) as with the mixed-modeling approach, the confidence limits are approximate for small sample sizes.  “Small” here is anything less than ~30.

Finally, please don’t “test if the difference… is significant”.  Instead, decide how big or small the difference could be.

OK if I sent this to the Sportscience mailing list? Let me know if you want me to anonymize it.

Will

Will G Hopkins, PhD FACSM
Contact info:
http://sportsci.org/will
Sportscience:
http://sportsci.org
Statistics:
http://newstats.org
Be creative: break rules



Tue Jul 14, 2009 7:09 pm

willhopkinsnz
Offline Offline
Send Email Send Email

Forward
Message #3406 of 3439 |
Expand Messages Author Sort by Date

Someone recently sent me a question about reliability. She said that it would be OK to send her question and my reply to this list. (Arenda, I have added a...
Will Hopkins
willhopkinsnz
Offline Send Email
Jul 14, 2009
7:16 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help