![Show Menu](styles/mobile-menu.png)
![Page Background](./../common/page-substrates/page0066.png)
high intra-rater reliability. By randomizing the repeat exams
across all reviewers, this potential source of bias was eliminated.
Each list of exams was then burned to a DVD with all identifying
information removed; the file names on the disc simply appeared
in order as 01.avi, 02.avi, and so on.
Selection of Reviewers
Twenty-two fellowship-trained laryngologists were asked
to participate in the study. A package containing the disc of vid-
eos, 25 prelabeled rating sheets, an institutional review board
waiver, and a return mailer was sent to each. Results from
returned rating sheets were entered into Excel spreadsheets
(Microsoft Corp., Redmond, WA) for analysis.
Examination Rating
Each stroboscopic exam was rated with respect to 12 fea-
tures (Fig. 1). A standard vocabulary to describe the paralyzed
vocal fold does not exist. Therefore, criteria were selected from
a literature review of the diagnosis and treatment of VFP,
including those articles cited above
4–8
and others,
9,10
and infor-
mal consultation with colleagues. No formal definition of each
term was presented; the study relied on common clinical usage
to inform raters’ perception of the meaning of each term rather
than any formal training. We also acknowledge that some terms
might be in part redundant or overlapping—for example, tone,
atrophy and bowing—but sought to evaluate the utility of each
of these concurrently. Responses to 11 of 12 categories were bi-
nary (normal/not normal or present/not present); the exception
was glottic insufficiency, which was rated on a four-point scale
of none/mild/moderate/severe. However, for the purposes of sta-
tistical analysis, answers in this category were grouped in
binary fashion into none/mild or moderate/severe. Reviewers
were instructed to view and rate examinations sequentially;
individual exams could be reviewed an unlimited number of
times, but once a new examination was started, examiners were
instructed not to return to any prior examinations or alter rat-
ings. Raters were therefore blinded to clinical history, each
other’s ratings, and their own previous ratings. Reviewers were
also asked not to include written justification for their answers
on the rating forms.
Statistical Analysis
Inter-rater reliability was determined using the kappa sta-
tistic as described by Fleiss
11,12
; values closer to 0 represent
poor agreement, whereas those close to 1 represent near-perfect
agreement. Although no uniformly agreed-upon scale exists for
Fleiss’ kappa, Fleiss described a scale where values
>
0.75 rep-
resent excellent agreement, 0.40 to 0.75 represent fair to good
agreement, and values
<
0.40 represent poor agreement.
12
When an examiner omitted a rating for one of the 12 categories
for an examination (accidental or otherwise), all of that exami-
nation’s rankings in that category were excluded from kappa
TABLE I.
Examiners and Their Intra-Rater Reliabilities Determined Via the
Three Repeat Examinations Given to All Participants.
Examiner
% Agreement
Pearson
Spearman
A
0.944
0.932
0.898
B
0.806
0.794
0.775
C
0.889
0.836
0.839
D
0.833
0.849
0.768
E
0.917
0.906
0.906
F
0.861
0.846
0.764
G
1.000
1.000
1.000
H
0.944
0.949
0.957
I
0.667
0.616
0.446
J
0.778
0.700
0.678
K
L
0.694
0.605
0.526
M
0.778
0.711
0.661
N
0.778
0.752
0.652
O
P
0.833
0.773
0.681
Q
0.861
0.893
0.775
R
0.750
0.700
0.623
S
0.889
0.890
0.803
T
0.750
0.675
0.650
U
1.000
1.000
1.000
V
0.833
0.631
0.622
Average
0.840
0.803
0.751
Examiners K and O did not participate in the study.
TABLE II.
Laryngoscopic Criteria and Their Associated Inter- and Intra-Rater Reliability Scores.
Laryngoscopic Criterion
Inter-Rater
Intra-Rater
Fleiss
% Agreement
Pearson
Spearman
Volitional adduction
0.335
0.900
0.053
0.053
Vocal process contact
0.303
0.817
0.445
0.445
Arytenoid stability (jostle)
0.097
0.833
0.615
0.615
Arytenoid position
0.119
0.817
0.629
0.629
Vocal fold tone
0.310
0.900
0.744
0.744
Vocal fold atrophy
0.326
0.867
0.726
0.726
Vocal fold bowing
0.488
0.883
0.714
0.714
Vocal fold shortening
0.225
0.817
0.610
0.610
Salivary pooling
0.454
0.900
0.762
0.762
Height mismatch
0.123
0.733
0.457
0.457
Ventricular contraction
0.217
0.883
0.756
0.756
Glottic insufficiency
0.550
0.733
0.818
0.798
Laryngoscope 120: July 2010
Rosow and Sulica: Laryngoscopy of Vocal Fold Paralysis
46