Cover HSC Section6_Nov2016-Book.pdf

analysis. This exclusion assured a constant denominator in all

the statistical calculations.

Intra-rater reliability for each examiner was determined

by comparing the 12 laryngoscopic criteria in each of the three

repeated patients, for a total of 36 comparison points. Intra-

rater reliability for each criterion was determined in a similar

fashion, with the denominator determined by adding up the 20

examiners’ three repeated tests, by criterion, for a total of 60

comparison points. Three complementary methods were used to

assess intra-rater reliability, both for each examiner and for

each laryngoscopic criterion. The overall percent agreement was

calculated, simply as the number of points of agreement divided

by the total. This was compared to two known measures of cor-

relation, Pearson product moment coefficient and Spearman

corrected rank correlation coefficient.

This investigation was approved by the institutional

review board of Weill Cornell Medical College.

RESULTS

Twenty of 22 examiners returned the survey, for a

91% response rate. The overall intra-rater reliability for

each individual examiner varied between 44% and

100%, with an average internal consistency of 75% to

84%, depending on the statistical method used (Table I).

As measured by all three statistics, 18 of 20 examiners

(90%) showed

60% internal consistency (Table I).

The intra-rater reliability for each stroboscopic cri-

terion had, for the most part, a very similar range of

44% to 100% (Table II), The single exception was voli-

tional adduction, a category in which not a single

examiner rated an exam as normal on both viewings. As

a result, despite a 90% rate of intra-rater agreement,

this category was found to have near-0 intra-rater corre-

lation by both Spearman and Pearson correlation

coefficients. Overall, height mismatch, vocal fold short-

ening, and vocal process contact had the lowest intra-

rater reliability, whereas the ratings of salivary pooling,

glottic insufficiency, ventricular contraction, and vocal

fold tone were generally consistent.

Inter-rater reliability for each stroboscopic criterion

was determined by kappa analysis. As represented in

Figure 2, these kappa values ranged from 0.10 (poor

Fig. 2. Inter-rater reliability as deter-

mined by Fleiss’ kappa. [Color figure

can be viewed in the online issue,

which is available at www.

interscience.wiley.com

Fig. 3. Vocal process contact impaired. This case generated the

most consistent rating for impaired vocal process contact. All rat-

ings were made from dynamic examinations. [Color figure can be

viewed in the online issue, which is available at www.interscience.

wiley.com

Fig. 4. Arytenoid position displaced. This case generated the most

consistent rating for displaced arytenoid position. All ratings were

made from dynamic examinations. [Color figure can be viewed in

the online issue, which is available at

www.interscience.wiley.com

Laryngoscope 120: July 2010

Rosow and Sulica: Laryngoscopy of Vocal Fold Paralysis