![Show Menu](styles/mobile-menu.png)
![Page Background](./../common/page-substrates/page0272.jpg)
analysis. This exclusion assured a constant denominator in all
the statistical calculations.
Intra-rater reliability for each examiner was determined
by comparing the 12 laryngoscopic criteria in each of the three
repeated patients, for a total of 36 comparison points. Intra-
rater reliability for each criterion was determined in a similar
fashion, with the denominator determined by adding up the 20
examiners’ three repeated tests, by criterion, for a total of 60
comparison points. Three complementary methods were used to
assess intra-rater reliability, both for each examiner and for
each laryngoscopic criterion. The overall percent agreement was
calculated, simply as the number of points of agreement divided
by the total. This was compared to two known measures of cor-
relation, Pearson product moment coefficient and Spearman
corrected rank correlation coefficient.
This investigation was approved by the institutional
review board of Weill Cornell Medical College.
RESULTS
Twenty of 22 examiners returned the survey, for a
91% response rate. The overall intra-rater reliability for
each individual examiner varied between 44% and
100%, with an average internal consistency of 75% to
84%, depending on the statistical method used (Table I).
As measured by all three statistics, 18 of 20 examiners
(90%) showed
>
60% internal consistency (Table I).
The intra-rater reliability for each stroboscopic cri-
terion had, for the most part, a very similar range of
44% to 100% (Table II), The single exception was voli-
tional adduction, a category in which not a single
examiner rated an exam as normal on both viewings. As
a result, despite a 90% rate of intra-rater agreement,
this category was found to have near-0 intra-rater corre-
lation by both Spearman and Pearson correlation
coefficients. Overall, height mismatch, vocal fold short-
ening, and vocal process contact had the lowest intra-
rater reliability, whereas the ratings of salivary pooling,
glottic insufficiency, ventricular contraction, and vocal
fold tone were generally consistent.
Inter-rater reliability for each stroboscopic criterion
was determined by kappa analysis. As represented in
Figure 2, these kappa values ranged from 0.10 (poor
Fig. 2. Inter-rater reliability as deter-
mined by Fleiss’ kappa. [Color figure
can be viewed in the online issue,
which is available at www.
interscience.wiley.com.]
Fig. 3. Vocal process contact impaired. This case generated the
most consistent rating for impaired vocal process contact. All rat-
ings were made from dynamic examinations. [Color figure can be
viewed in the online issue, which is available at www.interscience.
wiley.com.]
Fig. 4. Arytenoid position displaced. This case generated the most
consistent rating for displaced arytenoid position. All ratings were
made from dynamic examinations. [Color figure can be viewed in
the online issue, which is available at
www.interscience.wiley.com.]
Laryngoscope 120: July 2010
Rosow and Sulica: Laryngoscopy of Vocal Fold Paralysis
47