HSC Section 6 Nov2016 Green Book

Fig. 2. Inter-rater reliability as deter- mined by Fleiss’ kappa. [Color figure can be viewed in the online issue, which is available at www. interscience.wiley.com.]

each individual examiner varied between 44% and 100%, with an average internal consistency of 75% to 84%, depending on the statistical method used (Table I). As measured by all three statistics, 18 of 20 examiners (90%) showed > 60% internal consistency (Table I). The intra-rater reliability for each stroboscopic cri- terion had, for the most part, a very similar range of 44% to 100% (Table II), The single exception was voli- tional adduction, a category in which not a single examiner rated an exam as normal on both viewings. As a result, despite a 90% rate of intra-rater agreement, this category was found to have near-0 intra-rater corre- lation by both Spearman and Pearson correlation coefficients. Overall, height mismatch, vocal fold short- ening, and vocal process contact had the lowest intra- rater reliability, whereas the ratings of salivary pooling, glottic insufficiency, ventricular contraction, and vocal fold tone were generally consistent. Inter-rater reliability for each stroboscopic criterion was determined by kappa analysis. As represented in Figure 2, these kappa values ranged from 0.10 (poor

analysis. This exclusion assured a constant denominator in all the statistical calculations. Intra-rater reliability for each examiner was determined by comparing the 12 laryngoscopic criteria in each of the three repeated patients, for a total of 36 comparison points. Intra- rater reliability for each criterion was determined in a similar fashion, with the denominator determined by adding up the 20 examiners’ three repeated tests, by criterion, for a total of 60 comparison points. Three complementary methods were used to assess intra-rater reliability, both for each examiner and for each laryngoscopic criterion. The overall percent agreement was calculated, simply as the number of points of agreement divided by the total. This was compared to two known measures of cor- relation, Pearson product moment coefficient and Spearman corrected rank correlation coefficient. This investigation was approved by the institutional review board of Weill Cornell Medical College.

RESULTS Twenty of 22 examiners returned the survey, for a 91% response rate. The overall intra-rater reliability for

Fig. 3. Vocal process contact impaired. This case generated the most consistent rating for impaired vocal process contact. All rat- ings were made from dynamic examinations. [Color figure can be viewed in the online issue, which is available at www.interscience. wiley.com.]

Fig. 4. Arytenoid position displaced. This case generated the most consistent rating for displaced arytenoid position. All ratings were made from dynamic examinations. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.]

Laryngoscope 120: July 2010

Rosow and Sulica: Laryngoscopy of Vocal Fold Paralysis

47

Made with