Cover HSC Section6_Nov2016-Book.pdf

high intra-rater reliability. By randomizing the repeat exams

across all reviewers, this potential source of bias was eliminated.

Each list of exams was then burned to a DVD with all identifying

information removed; the file names on the disc simply appeared

in order as 01.avi, 02.avi, and so on.

Selection of Reviewers

Twenty-two fellowship-trained laryngologists were asked

to participate in the study. A package containing the disc of vid-

eos, 25 prelabeled rating sheets, an institutional review board

waiver, and a return mailer was sent to each. Results from

returned rating sheets were entered into Excel spreadsheets

(Microsoft Corp., Redmond, WA) for analysis.

Examination Rating

Each stroboscopic exam was rated with respect to 12 fea-

tures (Fig. 1). A standard vocabulary to describe the paralyzed

vocal fold does not exist. Therefore, criteria were selected from

a literature review of the diagnosis and treatment of VFP,

including those articles cited above

4–8

and others,

9,10

and infor-

mal consultation with colleagues. No formal definition of each

term was presented; the study relied on common clinical usage

to inform raters’ perception of the meaning of each term rather

than any formal training. We also acknowledge that some terms

might be in part redundant or overlapping—for example, tone,

atrophy and bowing—but sought to evaluate the utility of each

of these concurrently. Responses to 11 of 12 categories were bi-

nary (normal/not normal or present/not present); the exception

was glottic insufficiency, which was rated on a four-point scale

of none/mild/moderate/severe. However, for the purposes of sta-

tistical analysis, answers in this category were grouped in

binary fashion into none/mild or moderate/severe. Reviewers

were instructed to view and rate examinations sequentially;

individual exams could be reviewed an unlimited number of

times, but once a new examination was started, examiners were

instructed not to return to any prior examinations or alter rat-

ings. Raters were therefore blinded to clinical history, each

other’s ratings, and their own previous ratings. Reviewers were

also asked not to include written justification for their answers

on the rating forms.

Statistical Analysis

Inter-rater reliability was determined using the kappa sta-

tistic as described by Fleiss

11,12

; values closer to 0 represent

poor agreement, whereas those close to 1 represent near-perfect

agreement. Although no uniformly agreed-upon scale exists for

Fleiss’ kappa, Fleiss described a scale where values

0.75 rep-

resent excellent agreement, 0.40 to 0.75 represent fair to good

agreement, and values

0.40 represent poor agreement.

When an examiner omitted a rating for one of the 12 categories

for an examination (accidental or otherwise), all of that exami-

nation’s rankings in that category were excluded from kappa

TABLE I.

Examiners and Their Intra-Rater Reliabilities Determined Via the

Three Repeat Examinations Given to All Participants.

Examiner

% Agreement

Pearson

Spearman

0.944

0.932

0.898

0.806

0.794

0.775

0.889

0.836

0.839

0.833

0.849

0.768

0.917

0.906

0.861

0.846

0.764

1.000

0.944

0.949

0.957

0.667

0.616

0.446

0.778

0.700

0.678

0.694

0.605

0.526

0.778

0.711

0.661

0.778

0.752

0.652

0.833

0.773

0.681

0.861

0.893

0.775

0.750

0.700

0.623

0.889

0.890

0.803

0.750

0.675

0.650

1.000

0.833

0.631

0.622

Average

0.840

0.803

0.751

Examiners K and O did not participate in the study.

TABLE II.

Laryngoscopic Criteria and Their Associated Inter- and Intra-Rater Reliability Scores.

Laryngoscopic Criterion

Inter-Rater

Intra-Rater

Fleiss

% Agreement

Pearson

Spearman

Volitional adduction

0.335

0.900

0.053

Vocal process contact

0.303

0.817

0.445

Arytenoid stability (jostle)

0.097

0.833

0.615

Arytenoid position

0.119

0.817

0.629

Vocal fold tone

0.310

0.900

0.744

Vocal fold atrophy

0.326

0.867

0.726

Vocal fold bowing

0.488

0.883

0.714

Vocal fold shortening

0.225

0.817

0.610

Salivary pooling

0.454

0.900

0.762

Height mismatch

0.123

0.733

0.457

Ventricular contraction

0.217

0.883

0.756

Glottic insufficiency

0.550

0.733

0.818

0.798

Laryngoscope 120: July 2010

Rosow and Sulica: Laryngoscopy of Vocal Fold Paralysis