HSC Section 6 Nov2016 Green Book

eos, 25 prelabeled rating sheets, an institutional review board waiver, and a return mailer was sent to each. Results from returned rating sheets were entered into Excel spreadsheets (Microsoft Corp., Redmond, WA) for analysis.

TABLE I. Examiners and Their Intra-Rater Reliabilities Determined Via the Three Repeat Examinations Given to All Participants.

Examiner

% Agreement

Pearson

Spearman

A

0.944

0.932

0.898

Examination Rating Each stroboscopic exam was rated with respect to 12 fea- tures (Fig. 1). A standard vocabulary to describe the paralyzed vocal fold does not exist. Therefore, criteria were selected from a literature review of the diagnosis and treatment of VFP, including those articles cited above 4–8 and others, 9,10 and infor- mal consultation with colleagues. No formal definition of each term was presented; the study relied on common clinical usage to inform raters’ perception of the meaning of each term rather than any formal training. We also acknowledge that some terms might be in part redundant or overlapping—for example, tone, atrophy and bowing—but sought to evaluate the utility of each of these concurrently. Responses to 11 of 12 categories were bi- nary (normal/not normal or present/not present); the exception was glottic insufficiency, which was rated on a four-point scale of none/mild/moderate/severe. However, for the purposes of sta- tistical analysis, answers in this category were grouped in binary fashion into none/mild or moderate/severe. Reviewers were instructed to view and rate examinations sequentially; individual exams could be reviewed an unlimited number of times, but once a new examination was started, examiners were instructed not to return to any prior examinations or alter rat- ings. Raters were therefore blinded to clinical history, each other’s ratings, and their own previous ratings. Reviewers were also asked not to include written justification for their answers on the rating forms. Statistical Analysis Inter-rater reliability was determined using the kappa sta- tistic as described by Fleiss 11,12 ; values closer to 0 represent poor agreement, whereas those close to 1 represent near-perfect agreement. Although no uniformly agreed-upon scale exists for Fleiss’ kappa, Fleiss described a scale where values > 0.75 rep- resent excellent agreement, 0.40 to 0.75 represent fair to good agreement, and values < 0.40 represent poor agreement. 12 When an examiner omitted a rating for one of the 12 categories for an examination (accidental or otherwise), all of that exami- nation’s rankings in that category were excluded from kappa

B

0.806

0.794

0.775

C D

0.889 0.833

0.836 0.849

0.839 0.768

E

0.917

0.906

0.906

F

0.861 1.000

0.846 1.000

0.764 1.000

G

H

0.944

0.949

0.957

I

0.667 0.778

0.616 0.700

0.446 0.678

J

K L

0.694 0.778

0.605 0.711

0.526 0.661

M

N

0.778

0.752

0.652

O P

0.833

0.773

0.681

Q

0.861

0.893

0.775

R

0.750 0.889

0.700 0.890

0.623 0.803

S

T

0.750

0.675

0.650

U

1.000 0.833

1.000 0.631

1.000 0.622

V

Average

0.840

0.803

0.751

Examiners K and O did not participate in the study.

high intra-rater reliability. By randomizing the repeat exams across all reviewers, this potential source of bias was eliminated. Each list of exams was then burned to a DVD with all identifying information removed; the file names on the disc simply appeared in order as 01.avi, 02.avi, and so on.

Selection of Reviewers Twenty-two fellowship-trained laryngologists were asked to participate in the study. A package containing the disc of vid-

TABLE II. Laryngoscopic Criteria and Their Associated Inter- and Intra-Rater Reliability Scores.

Inter-Rater

Intra-Rater

Laryngoscopic Criterion

Fleiss

% Agreement

Pearson

Spearman

Volitional adduction

0.335 0.303

0.900 0.817

0.053 0.445

0.053 0.445

Vocal process contact

Arytenoid stability (jostle)

0.097

0.833

0.615

0.615

Arytenoid position

0.119 0.310

0.817 0.900

0.629 0.744

0.629 0.744

Vocal fold tone

Vocal fold atrophy

0.326

0.867

0.726

0.726

Vocal fold bowing

0.488 0.225

0.883 0.817

0.714 0.610

0.714 0.610

Vocal fold shortening

Salivary pooling

0.454

0.900

0.762

0.762

Height mismatch

0.123 0.217

0.733 0.883

0.457 0.756

0.457 0.756

Ventricular contraction

Glottic insufficiency

0.550

0.733

0.818

0.798

Laryngoscope 120: July 2010

Rosow and Sulica: Laryngoscopy of Vocal Fold Paralysis

46

Made with