Quantitative Methods 2017

Quantitative Methods in Radiation Oncology: Models, Trails And Clinical Outcomes

2017 ESTRO SCHOOL LIVE COURSE

Hypothesis testing

Outline

• Introduction • Statistical inference

• Assumptions, potential, limitations

• Quantifying uncertainty • Parametric vs. non-parametric statistics • Breakdown of basic assumptions • Multiple testing and post-hoc hypothesis

Why statistics?

• Answer questions where measurements are subject to random variation:

• Is the improvement in local control a coincidence or a ‘real’ improvement?

• Is the decrease in toxicity a coincidence or a ‘real’ improvement?

• Is the improvement in plan conformity a coincidence or a ‘real’ improvement?

Statistical inference: basic assumption

• Improbable events do not occur

Improbable events do not occur

• Assume a null hypothesis

• Find probability of same or ‘more extreme’ result by chance • p-value

• If the probability is small, we conclude that the null hypothesis is wrong

Example: Contingency tables

Observed

Toxicity

No Toxicity Total

Treatment A 30

70

100

Treatment B 40

60

100

Total

70

130

-

Assume null hypothesis: Equal toxicity

Expected

Toxicity

No Toxicity Total

Treatment A

100

Treatment B

100

Total

70

130

-

Example: Contingency tables

Observed

Toxicity

No Toxicity Total

Treatment A 30

70

100

Treatment B 40

60

100

Total

70

130

-

Assume null hypothesis: Equal toxicity

Expected

Toxicity

No Toxicity Total

Treatment A 35

65

100

Treatment B 35

65

100

Total

70

130

-

Example: Contingency tables

Observed

Toxicity

No Toxicity Total

Treatment A 30

70

100

Treatment B 40

60

100

Total

70

130

-

Expected

Toxicity

No Toxicity Total

Treatment A 35

65

100

Treatment B 35

65

100

Total

70

130

-

P=0.14

=2.2

http://www.medcalc.org/manual/chi-square-table.php

http://statpages.org/ctab2x2.html

Limitations and caveats

• The opposite is not true: • Two treatment arms are not proven equal by showing that the p-value is high! • Statistics can only reject a null hypothesis – not prove it

Statistical power

• Number of patients needed to see effect

Toxicity

No toxicity

New technique 5

15

p=0.45

15 T e p value depends on - effect size - and sample size. No toxicity Toxicity

Old technique 10

New technique 20

60

p=0.05

Old technique 40

60

Confidence intervals

• More patients increase reliability, decrease uncertainty

Toxicity

No toxicity Proportion

p=0.45

New technique Old technique

5

15

25% (9-49%)

10

15

40% (21-61%)

Toxicity

No toxicity Proportion

p=0.05

New technique Old technique

20

60

25% (16-36%)

40

60

40% (30-50%)

Quantifying uncertainty: confidence intervals • A point estimate alone is meaningless! • We need to assess the uncertainty

• Example:

Nutting et al Lancet Oncol 2011

Non-inferiority studies

• It is impossible to show equivalence • But it is possible to reject a hypothesis of inferiority • By omitting (part of) the radiation, the local control does not decrease more than xx%

• Non-inferiority studies aim at a sufficiently narrow CI to rule out a clinically relevant detriment

A note of caution

50 Gy/25 fractions

Non-inferiority studies

Standard superiority study

48 Gy/24 fractions

46 Gy/23 fractions

2 Gy/1 fraction

No radiotherapy

Parametric vs. non- parametric statistics

Parametric statistics

• Assume a parametric distribution of data • Make test based on this distribution

• Examples:

• Paired and unpaired T-test • Chi2 test for contingency tables • Binomial tests • Regression models • Cox proportional hazards

What is under the hood of your statistical software? • Compare two normally distributed series with equal variance:

x

- x

1

2

t =

Test quantity. Note common variance

1

1 2 ( 1 n 1

s

+

)

n

2

e = 2[1 - F

( t )]

t ( df 1

+ df 2 )

Two sided test

Cummulative distribution function.

What is under the hood of your statistical software? • Compare two normally distributed series with equal variance:

1 n 1

1

2 (

x 1

- x

= x

1 - x

– s

) t

( df

+ df

m 1

- m 2

+

)

2

t =

2

1

0.975

1

2

n

1

1

2

2 (

s

+

)

1

n

n

1

2

Parametric statistics

• Necessary to validate assumption • Example: test for deviation from normality • Remember: p-value >0.05 not enough

Normal

Normal P-P Plot of GTVrad

1.0

Mean = 58.1428 Std. Dev. = 43.04318 N = 85

25.0

0.8

20.0

0.6

15.0

Frequency

0.4

10.0

Expected Cum Prob

0.2

5.0

0.0

0.0

.00

50.00

100.00

150.00

200.00

250.00

0.0

0.2

0.4

0.6

0.8

1.0

GTVrad

Observed Cum Prob

Parametric statistics

• Necessary to validate assumption • Example: test for deviation from normality • Remember: p-value >0.05 not enough

Normal P-P Plot of lnGTVrad

Normal

1.0

Mean = 3.7738 Std. Dev. = .83951 N = 85

15.0

0.8

10.0

0.6

Frequency

0.4

Expected Cum Prob

5.0

0.2

0.0

0.0

0.0

0.2

0.4

0.6

0.8

1.0

.00

1.00

2.00

3.00

4.00

5.00

6.00

Observed Cum Prob

lnGTVrad

Parametric statistics

• Decisions.. Decisions....

Normal P-P Plot of TotalSUVmax

1.0

Normal P-P Plot of TotalSUVmax

1.0

0.8

0.8

0.6

0.6

0.4

0.4

Expected Cum Prob

Expected Cum Prob

0.2

0.2

0.0

0.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Observed Cum Prob

Observed Cum Prob

Transforms: natural logarithm

Non-parametric statistics

• No assumption about distribution • Example: rank tests - Rank all values

- Under null hypothesis: group 1 beats group 2 as often as vice versa. - Compare with coin flip > Less than 5% probability of result? - p-Value is probability of that or ‘more extreme’ result

Non-parametric statistics

• Examples

• Wilcoxon rank sum (signed or unsigned) • Log-rank test for survival • Cox proportional hazards

Parametric or non parametric?

Positive

Negative

Parametric

Provides description of distribution of data No assumption on distribution

Relies on assumption about distribution

Non-parametric

No information on distribution

Output

Parametric

P-value + distribution of data

Non-parametric

P-value

Random sampling techniques

• Non-parametric methods to quantify uncertainty

All patients give thick mean line Variability in patients transferred to variability in mean risk Random sampling with replacement -> large number of means 2.5%-97.5% of histogram is interpreted as 95% CI of mean

Random sampling techniques

• Bootstrap • Jackknife

• How to implement?

• In standard statistical packages • Matlab functions (bootstrp, bootci)

Types of error in hypothesis testing

Two types of errors in statistical inference

• The null hypothesis is true, but you reject it • An improbable event has occurred • The probability is controlled through the significance level - Type I error - False positive • The probability is a (usually set to 5%) • The null hypothesis is false, but you accept it • The result was not ‘extreme enough’ • Sample size not sufficient - Type II error - False negative • The probability is denoted b . • It is closely related to power , 1- b

Type I error and multiple testing

• The confidence level is typically set to 5% (p=0.05) • The risk of falsely rejecting the null hypothesis in one try is 5%

Type I error and multiple testing

• The confidence level is typically set to 5% (p=0.05) • The risk of falsely rejecting the null hypothesis in one try is 5% • The chance of accepting a true null hypothesis two tries is 0.95 2 =90.25% - The risk of falsely rejecting a true hypothesis in at least one of two tries is 1-0.95^2=9.75% • In 10 tries: 1-0.95 10 =40% • The confidence is eaten up by multiple testing • The effective a increases rapidly as we perform multiple tests • High probability of type I error

Bonferroni correction

• Adjust your confidence level to account for multiple testing: • 1 test: p=0.05 is significant • 2 tests: p=0.025 is significant • N tests: p=0.05/N is significant • Probability of a type I error maintained at 5% BUT • The probability of type II error increases!

http://prefrontal.org/files/posters/Bennett-Salmon-2009.jpg

Breakdown of the basic assumption: improbable events sometimes do occur

Clausen et al, in preparation

Bias

Definition

• An estimate is called biased if it is systematically different from the population based parameter of interest, regardless of sample size

Accurate, but not precise (unbiased):

Last but not least: Just Think!

Look at your data! Pattern of metastasis

Pelvic Primary Para Aortic

Para Aortic Relapse

Total

Stage 1A

0

1

1

2

1B

5

5

2

12

2A

1

0

0

1

2B

12

10

4

26

3A

2

0

0

2

3B

11

16

7

34

4A

2

2

0

4

4B

0

2

0

2

No. of Patients

33

36

14

83

Data from Henrik Hansen, MD

p=0.00002?

p=0.80?

Take home

• The basic assumption behind statistical inference is that improbable events do not occur • This is violated in multiple testing and post-hoc hypothesis unless corrected for • Point estimates without confidence intervals are meaningless • Wrong use of statistics can change a conclusion • It is not just details

Power and sample size

Ivan Vogelius

Background

• Hypoxic cell sensitizer trials in HNSCC

Estimated statistical resolution of 20 randomized controlled trials of hypoxic cell sensitizer in HNSCC. A meta-analysis showed a statistically significant absolute improvement in local control of 8.3%

 Bentzen R&O 32: 1 (1994)

Statistical power

• Example: Two sample t-test (remember?) • Procedure for test • Verify normal distribution • Verify same variance in the two samples • Calculate test statistic

x 1

- x

2

t =

1

+ 1 n 2

2 (

s

)

1

n

1

Statistical power

• Example: Two sample t-test • Procedure for test • Verify normal distribution

• Verify same variance in the two samples • Calculate test statistic

x

- x

1

2

t =

1

1 2 ( 1 n 1

s

+

)

n

2

Sample estimate of variance, σ 2

Statistical power

• Example: Two sample t-test • Procedure for test • Verify normal distribution

• Verify same variance in the two samples • Calculate test statistic

x

- x

1

2

t =

1

1 2 ( 1 n 1

s

+

)

n

2

number of subjects

Statistical power

• Example: Two sample t-test • Procedure for test • Verify normal distribution

• Verify same variance in the two samples • Calculate test statistic

x

- x

1

2

t =

1

1 2 ( 1 n 1

s

+

)

n

2

The smaller, the better

Statistical power

• Example: Two sample t-test • Procedure for test • Verify normal distribution

• Verify same variance in the two samples • Calculate test statistic

Difference in means (effect size)

x

- x

1

2

t =

1

1 2 ( 1 n 1

s

+

)

n

2

x

- x 2

1

t =

Statistical power

+ 1 n 2

2 ( 1 n 1

s 1

)

• The larger the effect size, the better • The smaller the variance, the better

Easy

Hard

x

-x

=2, s

=0.75

x

-x

=2, s

=1.5

1

2

1

1

2

1

x

- x 2

1

t =

Statistical power

+ 1 n 2

2 ( 1 n 1

s 1

)

• The larger the effect size, the better • The smaller the variance, the better

Easy

Hard

x

-x

=7, s

=1.5

x

-x

=2, s

=1.5

1

2

1

1

2

1

x

- x 2

1

t =

1

+ 1 n 2

2 (

s

)

1

n

1

Easy

Hard

Large effect size

Small effect size

Small variance

Large variance

Choice of endpoint affects power

Bentzen et al, Sem. Rad. Oncol. 2003

What do we need to estimate power before starting a trial?

• Effect size • Variance

What do we need to estimate power before starting a trial?

• Effect size • Variance

What do we need to estimate power before starting a trial?

• Effect size • Variance

Design study

Conduct trial

Effect size

Terminology

• Null hypothesis, H 0

• The hypothesis of no difference

• Alternative hypothesis, H 1 • The hypothesis, that there is a difference • The expected difference is used to calculate β • Significance level, α • The risk of rejecting the null hypothesis if it is true • Normally 5% (p=0.05 is significant) • Statistical power, 1-β • The probability of rejecting H 0 if H 1 is true • Often 80%, recommended 90% • Depends on α and assumed effect size

Tools for estimating power

• Commercial tools

• PASS (www.ncss.com) • Matlab (sampsizepwr in statistics toolbox) • Free software • http://dceg.cancer.gov/tools/design/POWER • http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSi ze • Online tools • http://www.dssresearch.com/KnowledgeCenter/toolkitcalc ulators/statisticalpowercalculators.aspx

Comparing proportions

• The variance is given from binomial statistics

var = np (1 - p )

Example: Xerostomia with 3DCRT vs. IMRT

Nutting et al, Lancet Oncol. 2011

• Null hypothesis, H 0 • Alternative hypothesis, H 1

• Significance level, α • Statistical power, 1-β

We are planning a study of independent cases and controls with 1 control(s) per case. Prior data indicate that the failure rate among controls is 0.9 . If the true failure rate for experimental subjects is 0.6 , we will need to study 48 experimental subjects and 48 control subjects to be able to reject the null hypothesis that the failure rates for experimental and control subjects are equal with probability (power) 0.9. The Type I error probability associated with this test of this null hypothesis is 0.05. We will use a continuity-corrected chi-squared statistic or Fisher’s exact test to evaluate this null hypothesis.

Try p0=80%, p1=50%

Try p0=90%, p1=70%

New challenge: Design trial for non- inferiority of IMRT on local control • Assumptions • Local control 3DCRT: 75%@3 years (m1=7.23)

• Accrual time: 5 years • Additional FU: 2 years

It is HARD to demonstrate clinically relevant non-inferiority!

Common challenge:

• The number of patients is given

• Example: My PhD is based on 600 NSCLC patients

• How to write the statistical section for • Ethics/IRB • Funding • PhD enrollment

A good solution

Sample size requirements in Cox regression

Sample size requirements in Cox regression

Rule of thumb: At least 10 events per predictor in the multivariate model

Design process

Clinical question

Relevant Effect size

Feasible sample size

Not feasible sample size

Not feasible sample size

Option I • Convince yourself that the effect size is larger • Convince yourself that the variance is smaller • Use a one-sided test

Option II • Change the design to have larger expected effect • Implement methods to reduce variance

Not feasible sample size

Option I • Convince yourself that the effect size is larger • Convince yourself that the variance is smaller • Use a one-sided test

Option II • Change the design to have larger expected effect • Implement methods to reduce variance

Reduction of variance - pairing

Reduction of variance - pairing

Reduction of variance - pairing

Mean heart dose, standard fractionation

• Impossible to see difference • Wilcoxon rank sum: p=0.17

Mean heart dose, hypofractionation

Data from

Reduction of variance - pairing

Focus on difference

• Clear difference • Wilcoxon signed rank: p<0.0001

Data from

Example

• Heterogeneity leads to loss of power

Heterogeneity leads to loss of power

Trial design requires clinical input

The ‘right’ choice depends on the clinical consequence of a type I or type II error

Endpoints of treatment effect

ESTRO Course: Quantitative Research in Radiation Oncology Maastricht, 8 October 2017

Hans Langendijk Department of Radiation Oncology University Medical Center Groningen GRONINGEN The Netherlands

Introduction

Classification of endpoints • Continuous endpoints: • E.g.: laboratory value (e.g. Hb) • Categorical endpoints: • Binary endpoint: e.g. dead or alive, yes or no • Ordinal endpoint (logical order): e.g. toxicity grading – Nominal endpoint (no specific order): e.g. tumour site • Survival endpoints: • Binary endpoint with time interval/cenzored data

Types of endpoints • Endpoints related to treatment efficacy • E.g. locoregional tumour control • E.g. response • Endpoints related to adverse effects • Acute and late toxicity • Patient-reported outcome measures (PROM) • Symptoms and quality of life • Endpoints related to disease status • Diagnostic procedures (e.g. metastases present or not)

Endpoints related to efficacy

Endpoints of treatment efficacy

Endpoint

Definition of event

Local control

No evidence of disease at the primary site (T-position)

Local failure rate

Recurrence in T-position

Locoregional control

No evidence of disease in T- and N-position

(Overall) survival

Death irrespective of cause

Cause-specific survival

Death of cancer

Disease-free survival

Any recurrence or death from any cause, whichever comes first

Local recurrence-free survival

Local recurrence or death from any cause, whichever comes first

Disease-free rate

Any recurrence

Local relapse free rate

Local recurrence

Bentzen et al. Radiotherapy Oncology 1998; 46: 5-18

Endpoints of treatment efficacy

Events

Endpoint

Death

Last follow up alive

T-failure N-failure M-failure

Tumour

other

Local control

E

C C C E E E C C

C C C E C E C C

C C C C C C C C

Regional control

E E

Locoregional control

E

(Overall) survival

Cause-specific survival Disease-free survival

E E E

E E

E E

Disease-free rate

Local relapse free rate

Bentzen et al. Radiotherapy Oncology 1998; 46: 5-18

Endpoints of treatment efficacy

Endpoints of treatment efficacy

Key points

• Consistent use of

endpoints for treatment efficacy

• Main questions:

• Which events relevant for which endpoints ? • When should patients be censored ?

Endpoints related to toxicity

Toxicity endpoints in radiotherapy

90 days

5 years

Early effects

Late effects

Very late effects

Consequential late effects

Surrogate endpoints

Toxicity grading systems

Toxicity grading systems • RTOG/EORTC Acute Radiation Morbidity Scoring Criteria • RTOG/EORTC Late Radiation Morbidity Scoring Criteria • SOMA-LENT scoring system • Common Terminology Criteria for Adverse Events (CTCAE v4.0)

CTCAEv4.0 • Descriptive terminology for adverse events of cancer treatment • Features: • Adverse events independent of treatment modality • No difference between acute and late adverse event • Adverse effect may or may not be considered related to the medical treatment/procedure

• E.g. cardiac events after left-sided breast cancer RT • E.g. secondary tumour after CRT Hodgkin lymphoma

CTCAEv4.0

Grading

Descriptions

Mild; asymptomatic or mild symptoms; clinical or diagnostic observations only; intervention not indicated Moderate; minimal, local or noninvasive intervention indicated; limiting age-appropriate instrumental ADL Severe or medically significant but not immediately life-threatening; hospitalization or prolongation of hospitalization indicated; disabling; limited self care ADL Life-threatening consequences; urgent intervention indicated

Grade 1

Grade 2

Grade 3

Grade 4 Grade 5

Death related to adverse event

CTCAEv4.0

Grading

Descriptions

Mild; asymptomatic or mild symptoms; clinical or diagnostic observations only; intervention not indicated Moderate; minimal, local or noninvasive intervention indicated; limiting age-appropriate instrumental ADL Severe or medically significant but not immediately life-threatening; hospitalization or prolongation of hospitalization indicated; disabling; limited self care ADL Life-threatening consequences; urgent intervention indicated

Grade 1

Grade 2

Grade 3

Grade 4 Grade 5

Death related to adverse event

Instrumental ADL: • Activities performed by a person who is living independently • During the course of a normal day • Examples: managing money, shopping, telephone use, travel in community, housekeeping, preparing meals, and taking medications correctly • More complex and are learned during teens

CTCAEv4.0

Grading

Descriptions

Mild; asymptomatic or mild symptoms; clinical or diagnostic observations only; intervention not indicated Moderate; minimal, local or noninvasive intervention indicated; limiting age-appropriate instrumental ADL Severe or medically significant but not immediately life-threatening; hospitalization or prolongation of hospitalization indicated; disabling; limited self care ADL Life-threatening consequences; urgent intervention indicated

Grade 1

Grade 2

Grade 3

Grade 4 Grade 5

Death related to adverse event

Activities of Daily Living (self care ADL) • Activities usually performed in the course of a normal day in a person's lif • Examples: eating, toileting, dressing, bathing, or brushing the teeth. Generally these activities are rather simple and learned during childhood.

Grading according CTCAEv4.0

Simple and relatively straightforward endpoints EXAMPLE: Anemia Grading Descriptions Grade 1

Hemoglobin (Hgb)

Hgb <10.0 - 8.0 g/dL; <6.2 - 4.9 mmol/L; <100 - 80g/L Hgb <8.0 g/dL; <4.9 mmol/L; <80 g/L; transfusion indicated Life-threatening consequences; urgent intervention indicated

Grade 2 Grade 3 Grade 4 Grade 5

Death

Grading according CTCAEv4.0

Endpoint with potential interobserver variability EXAMPLE: cheilitis Grading Descriptions Grade 1

Asymptomatic; clinical or diagnostic observations only; intervention not indicated

Grade 2 Grade 3 Grade 4 Grade 5

Moderate symptoms; limiting instrumental ADL

Severe symptoms; limiting self care ADL; intervention indicated

- -

21

RTOG/EORTC vs. LENT/SOMA

Van der Laan et al. Int J Radiat Oncol Biol Phys 2008;70:1138-1145

22

RTOG/EORTC vs. LENT/SOMA

Multivariate NTCP-models for Grade II or more rectal toxicity relative to the V70

Van der Laan et al. Int J Radiat Oncol Biol Phys 2008;70:1138-1145

Key points • Definitions of toxicity grading in different toxicity grading systems do not correspond: • May lead to different frequency measures • May lead to different NTCP-models • May lead to different dose thresholds • May lead to different dose constraints

Specificity versus relevance

Analytic

Objective signs

Subjective symptoms

Specificity

Quality of life

Patient relevance

Bentzen et al. Seminars in Radiation Oncology 2003; 13: 189-202

Specificity versus relevance

Salivary dysfunction and xerostomia

Other prognostic factors

Parotid gland dose

Parotid flow

Xerostomia

Quality of life

Submandibular gland dose

Submandibular flow

Sticky saliva

Minor salivary gland dose

Specificity

Beetz et al. Radiotherapy Oncology 2012

Clinical relevance

One factor model 1

Multi- factor model 2

Independent variables

Wilks' Lambda

p-value

Wilks' Lambda

p-value

RTOG late toxicity RTOG xerostomia

0.897 0.923 0.798 0.922 0.934 0.956 0.973 0.950 0.956 0.915 0.965

p<0.001 p=0.019 p<0.001 p=0.018

0.948 0.965 0.859 0.974 0.971 0.984 0.973 0.940 0.983 0.969 0.945

p=0.003

RTOG mucosal RTOG swallowing RTOG subcutanuous

ns

p<0.001

ns ns ns

RTOG larynx RTOG skin

ns ns

Other variables Sex

ns

ns

Age

p<0.001

p=0.001

UICC stage

ns

ns ns

Primary tumor site Treatment modality

p=0.022

ns

p=0.002

Langendijk et al. J Clin Oncol 2008;26:3770-3776

Clinical relevance xerostomia

Toxicity grading

Quality of life scale

P-value

Grade 0

Grade 1

Grade 2

Grade 3-4

Physical functioning

81

82

75

71

P-0.001

Role functioning

74

74

67

67

P=0.044

Emotional functioning

84

80

74

69

P=0.001

Social functioning

88

85

79

64

P<0.001

Global quality of life

73

75

65

55

P<0.001

Fatigue

25

28

36

42

P<0.001

Little effect Moderate effect Strong effect

Langendijk et al. J Clin Oncol 2008;26:3770-3776

Clinical relevance dysphagia

Toxicity grading

Quality of life scale

P-value

Grade 0

Grade 1

Grade 2

Grade 3-4

Physical functioning

81 4

82

75 2

71 68

P-0.001 <

Role functioning

74 7

74 6

67 2

67 55

P=0.044 < 01

Emotional functioning

84

80

74 0

69

P=0.001 <

Social functioning

88 91

85 3

79 3

64 8

P<0.001

Global quality of life

73 8

75 2

65 3

55 6

P<0.001

Fatigue

25 3

28 9

36 41

42 3

P<0.001

Little effect Moderate effect Strong effect

Langendijk et al. J Clin Oncol 2008;26:3770-3776

Key points

• Choice of endpoints – Strongly depends on research question: • Biological modelling: more specific • Clinical relevance: less specific • Relevance of endpoints not always acknowledged – e.g. rectal bleeding versus stool frequency – E.g. xerostomia versus dysphagia • Even grade I toxicity may have impact on QoL

Composed endpoints

• Endpoints that include different toxicity states in one endpoint • EXAMPLE: RTOG late small/large intestine Composed endpoints Grade 1 Grade 2 Grade 3 Grade 4 Grade 5

Mild diarrhea Mild cramping Bowel movement 5 times daily Slight rectal discharge or bleeding

Moderate diarrhea and colic Bowel movement >5 times daily Excessive rectal mucus or intermittent bleeding

Obstruction or bleeding requiring surgery

Necrosis/ Perforation Fistula

Death

• PROBLEM: different endpoints in corresponding grade may reflect different biological mechanisms and/or organs at risk Composed endpoints

Grade 1

Grade 2

Grade 3

Grade 4

Grade 5

Mild diarrhea Mild cramping Bowel movement 5 times daily Slight rectal discharge or bleeding

Moderate diarrhea and colic Bowel movement >5 times daily Excessive rectal mucus or intermittent bleeding

Obstruction or bleeding requiring surgery

Necrosis/ Perforation Fistula

Death

Composed endpoints

Peeters et al. Int J Radiat Oncol Biol Phys 2006;66:11-19

• Endpoints that include different toxicity states in one endpoint • EXAMPLE: CTCAEv4.0 Salivary duct inflammation Composed endpoints Grade 1 Grade 2 Grade 3 Grade 4 Slightly thickened saliva; slightly altered taste (e.g., metallic) Thick, ropy, sticky saliva; markedly altered taste; alteration in diet Acute salivary gland necrosis; severe secretion-induced

Life-threatening consequences; urgent intervention indicated

symptoms (e.g., thick saliva/oral secretions or gagging); tube feeding or TPN indicated; limiting self care ADL; disabling

indicated; secretion- induced symptoms; limiting instrumental ADL

Composed endpoints

Patient-rated STICKY SALIVA

Patient-rated ALTERED TASE

No

A bit

Quite a lot

Very much

No

30.8%

8.2%

2.4%

1.9%

A bit

13.3%

9.7%

5.8%

1.2%

Quite a lot Very much

3.9%

5.3%

7.5%

1.5%

1.7%

2.4%

2.2%

2.2%

Overall agreement: 75.4% (2 categories) Overall agreement: 50.2% (4 categories)

Source: Prospective Data Registration Program Head and Neck UMCG

Early side effects

Frequency measures • Peak prevalence • The proportion of cases with an event in a given population at a specific time point • Period prevalence • The proportion of cases with an event in a given population in a certain period of time • Incidence • The proportion of NEW cases with an event in a given population in a certain period of time • Cumulative incidence • = incidence with censored data

Acute toxicity scoring • Incidence grade 4: YES • Prevalence grade 4: YES (W7)

Acute toxicity scoring • Incidence grade 4: YES • Prevalence grade 4: YES (W4, W5, W6, W7, W8)

Acute toxicity scoring • Incidence grade 4: YES • Prevalence grade 4: YES (W4, W7, W8)

Acute toxicity scoring • Incidence grade 4: NO • Prevalence grade 4: YES (T0, W1, W2)

• Peak prevalence at different time points (e.g. weekly) during RT provides most accurate information on acute side effects • Essential information may be lost by using incidence and/or period prevalence • Assessment of baseline “toxicity” • Modelling studies on risk factors in general: • Include baseline “toxicity” as potential risk factor, OR • Exclude patients with baseline “toxicity” Key points

Late side effects

Cumulative incidence

• Each patient that ever had the relevant endpoint is considered an event • Even if the event is not present anymore

Cumulative incidence/prevalence

Vergeer et al. Int J Radiat Oncol Biol Phys 2010;78:682-688

• Irreversible persistent • Remains at same level • Irreversible progressive • Already detectable or clinically manifest with further progression into higher grades • Transient persistent • Partial recovery after peak severity • Complete recovery • Intermittent Patterns of toxicity

Patterns of toxicity

Surrogate endpoints

Toxicity endpoints in radiotherapy

90 days

5 years

Early effects

Late effects

Very late effects

Consequential late effects

Surrogate endpoints

• Definition surrogate marker: • Measurement of physical sign • Substitute for a clinically meaningful endpoints • Predict the effect of therapy • Example: • Tumour shrinkage as surrogate for survival • Definition biomarker • Measurement that reflects the current activity of a disease process Surrogate endpoints

Example

• Confluent mucositis

• Head and neck radiotherapy • Good indicator for overall acute morbidity (biomarker) • Poor indicator for late effect (bad surrogate marker for late effects)

Example of confluent mucositis (Grade III acute mucosal reaction)

Example

Acute toxicity

Late toxicity

0% 20% 40% 60% 80% 100%

CHART Observed/expected

Conventional Observed/expected

Toxicity

Superficial and deep mucosal ulceration

0.64

1.53

CHART results in significant increase in acute mucosal reactions but protects against late mucosal reactions: Acute toxicity is NO surrogate marker for late mucosal reactions

CHART

Conventional

No mucositis

Patchy mucositis

Confluent mucositis

Dische et al. Radiotherapy Oncology 1997; 44: 123-136

Clinical trials in radiation oncology

ESTRO Course: Quantitative Research in Radiation Oncology Maastricht, 8 November 2017

Hans Langendijk Department of Radiation Oncology University Medical Center Groningen GRONINGEN The Netherlands

Study designs

All studies

Descriptive

Analytic

Study designs

All studies

Descriptive

Analytic

• Descriptive studies

– What is happening in a population?

• e.g. the prevalence or incidence of a group – Investigate feasibility of modelling studies • e.g. power analysis

Descriptive studies (registry)

9546

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

8874

7458

6354 6739 6684

Lung cancer incidence in the Netherlands

1990 1995 2000 2005 2010 2015

Source: National Cancer Registry (The Netherlands)

Study designs

All studies

Descriptive

Analytic

Survey (cross sectional)

Qualitative

• Descriptive studies

– What is happening in a population?

• e.g. the prevalence or incidence of a group – Investigate feasibility of modelling studies • e.g. power analysis

Qualitative study

Event

Event

Incidence rate: 4/16 (25%)

Event

Event

Time è

Cross sectional study

Cross sectional study

Event

Event

Event

Event

Time è

Point prevalence: 3/12 (25%)

Cross sectional study

Cross sectional study

Event

Event

Event

Event

Time è

Period prevalence: 4/12 (33%)

Study designs

All studies

Descriptive

Analytic

• Analytic studies

Observational analytic

Experimental

– Attempts to quantify the relationship between factors: • Intervention

Randomized controlled trial

• Exposure • Outcome

Randomized controlled trial

Standard

= event

R

Outcomes for both groups are measured

Random allocation

Experimental

Randomized controlled trial

• Advantages

– Prevent bias between treatment arms • Similar settings in both arms ! – Prospective assessment of predictors and endpoints – Quality assurance – May allow for identifying predictive factors

• Prognostic factor: – Factor that is associated with outcome • Predictive factor: – Factor that predicts whether certain treatment approach is beneficial Prognostic vs. predictive factor

13

The effect of age on the added value of concomitant chemoradiation in head and neck cancer EXAMPLE: predictive factor

Pignol, et al. Radiother Oncol 2011

• Prognostic factor: – Factor that is associated with outcome • Predictive factor: – Factor that predicts whether certain treatment approach is beneficial • Obtained from RCT’s (hypothesis generating: power!) • Preferably obtained from meta-analysis – Factor that predicts whether certain prognostic factor is associated with endpoint • May negatively affect power and required number of patients in prognostic factor studies Prognostic vs. predictive factor

EXAMPLE: predictive factor

The effect of the mean parotid dose on RTOG xerostomia grade 2 or more (1 year) depends on RT technique (unilateral versus bilateral RT)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Bilateral irradiation

NTCP (%)

Unilateral irradiation

0

10

20

30

40

50

60

70

Mean dose parotid glands

Source: Prospective Data Registration program UMCG

Randomized controlled trial

• Advantages

– Prevent bias between treatment arms • Similar settings in both arms ! – Prospective assessment of predictors and endpoints – Quality assurance – May allow for identifying predictive factors • Disadvantage – Generalisibility • Very strict entry criteria • Burdensome assessments

Study designs

All studies

Descriptive

Analytic

• Modelling studies are typically observational analytic studies – Retrospective cohort study – Prospective cohort study – Cross sectional study – Case control study

Observational analytic

Experimental

Randomized controlled trial

Cohort study

Cross sectional (analytic) Case control study

Retrospective cohort study

Past

Present

Exposed

Incidence: 6/14

Association measure? = Odds ratio = (6/14) / (3/15) = 2.14

Unexposed

Population without event

Incidence: 3/15

Retrospective cohort study

• Most common type:

– Single center retrospective study

• Typical design:

– Identification from hospital records – Fulfill predefined eligibility criteria – Specific treatment period – Followed over time for a certain period of time • Advantages: – Simplicity and high feasibility – Relatively cheap (data available)

20

• Disadvantages: – Correct identification of patients (retrospective) • Incorrect recording • Missing data – Assessment of outcome: • Straightforward endpoints → e.g. survival – Missing data • Difficult retrospective endpoints: Retrospective cohort study

– Functional status (e.g. performance status): “is OK” – Toxicity

– Limited numbers (low power)

Missing data and bias

EXAMPLE

Patient chart (retroperitoneal sarcoma): OK. No changes.

6 months earlier:

OK. No changes.

CTCAEv4.0: Diarrhea Grade 1

Grade 2

Grade 3

Grade 4

Increase of <4 stools per day over baseline; mild increase in ostomy output compared to baseline

Increase of 4 - 6 stools per day over baseline; moderate increase in ostomy output compared to baseline

Increase of >=7 stools per day over baseline; incontinence; hospitalization indicated; severe increase in ostomy output compared to baseline; limiting self care ADL

Life-threatening consequences; urgent intervention indicated

Retrospective cohort study

Past

Present

Exposed

Incidence: 6/14

Association measure? = Hazard ratio = (6/14) / (3/15) = 2.14

Unexposed

Population without event

Incidence: 3/15

Prospective cohort study

Present

Future

Exposed

Incidence: 6/14

Association measure? = Hazard ratio = (6/14) / (3/15) = 2.14

Unexposed

Population without event

Incidence: 3/15

Prospective cohort study • Advantages: – Better control inclusion and exclusion criteria – Predefined and consistent definitions of candidate predictors – Predefined and consistent assessment of endpoints: • Fixed time points including baseline assessment • Additional guidelines/training for “difficult” assessments • Additional diagnostic procedures – Predefined guidelines for radiotherapy: • e.g. OAR delineation / fractionation / treatment planning – Permits quality assurance programs

Prospective cohort study • Advantages: – Better control inclusion and exclusion criteria – Predefined and consistent definitions of candidate predictors – Predefined and consistent assessment of endpoints: • Fixed time points including baseline assessment • Additional guidelines/training for “difficult” assessments • Additional diagnostic procedures – Predefined guidelines for radiotherapy: • e.g. OAR delineation / fractionation / treatment planning – Permits quality assurance programs

Rapid Learning Health Care

Knowledge stage

Multivariable NTCP model

Most relevant DVH parameters

Data stage

Prospective data registration

IMRT dose optimisation

Evaluation stage

Application stage

IMRT photons

Based on: Lambin, et al. Acta Oncol 2015

Single RCT

• Within framework of RLHC system

• Double blind RCT standard IMRT versus stem cell sparing IMRT Single RCT: EXAMPLE

Single RCT

• Within framework of RLHC system

Case control study

Past

Present

Exposed

Non-exposed

Non-cases

Exposed

Population

Non-exposed

Cases

• Compare patients who have the outcome of interest (cases) with patients who do not have the outcome (non cases) • Compare how frequently the exposure to a risk factor is present in each group • Objective: – To retrospectively determine the exposure to the risk factor of interest from each of the two groups of individuals: cases and controls Case control study

33

Case control study: EXAMPLE

Major Coronary Events

Total number

Breast radiotherapy

Cases (n=963)

Controls (n=1205)

Left-sided breast cancer

543 (a)

601 (b)

1144

Right-sided breast cancer

420 (c)

604 (d)

1024

Total number

963

1205

2168

• Association can be tested by the Odds Ratio (OR): = ad / bc = 543*604 / 601*420 = 1.30 • Rare disease assumption (<5%)

Darby, et al. New Engl J Med 2013: 987-998

34

Case control study Advantages

• Good for studying rare conditions or diseases • Less time needed to conduct the study because the condition or disease has already occurred • Multiple risk factors • Useful as initial studies to establish an association • Can answer questions that could not be answered through other study designs

35

Case control study Disadvantages • Retrospective studies – Problems with data quality – Risk exposure assessment: risk of recall bias. • Not good for evaluating diagnostic tests – Clear that the cases have the condition and the controls do not • Difficult to find a suitable control group • Incidence (absolute risk) cannot be obtained

The importance of quality assurance of radiation

RCT Chemoradiation vs. chemoradiation + tirapazamine

Richin, et al. J Clin Oncol 2010; 28: 2989-2995

Sources of non-compliance

• GTV not properly defined and therefore inadequate dose coverage • Treatment planning itself was inappropriate and therefore inadequate dose to targets

• Inappropriate prescribed dose • Protracted overall treatment time

Peters, et al. J Clin Oncol 2010; 28: 2996-3001

Risks of non-compliance

Peters, et al. J Clin Oncol 2010; 28: 2996-3001

Risks of non-compliance

Number with major adverse impact

Number of patients

Enrolment bracket

Percent

1-4 (26 centers)

57

17

29.8%

5-9 (22 centers)

130

28

21.5%

10-19 (22 centers)

279

33

11.8%

> 20 (11 centers)

352

19

5.4%

Better few centres with many patients, than many centres with few patients

Peters, et al. J Clin Oncol 2010; 28: 2996-3001

RCT Chemoradiation vs. chemoradiation + tirapazamine

Richin, et al. J Clin Oncol 2010; 28: 2989-2995

Guidelines: Do they work?

Unilateral or bilateral elective irradiation

Primary tumour site is: - Floor of mouth - Lateral tongue - Retromolar trigonum

NO

- Cheek - Tonsil

YES

Tumourextension across the midline?

YES

NO

Ipsilateral pN0 neck

Ipsilateral pN+ neck

Contralateral pN0 neck

Contralateral pN+ neck

Contralateral cN0 neck

Contralateral pN0 neck

Tumourextension less than 1 cm from the midline?

NO

NO

YES

YES

Bilateral irradiation of the neck is mandatory

Unilateral irradiation of the neck is mandatory

Guidelines: Do they work?

Unilateral or bilateral elective irradiation

Oral cavity carcinoma Stage and primary site

Ipsilateral neck

Contralateral neck, if inidicated (see figure x)

Level Ia

Level Ib

Level II

Level III

Level IV

Level V

Level VI

Level Ib

Level II

Level III

Level IV

Level V

Level VI

RP Level Ia

RP

pN0 and pN1

R1

R1

pN2a-pN2b and pN3

R1

R1

pN2c

R1

R5

R1

R5

Oropharyngeal carcinoma Stage and primary site

Ipsilateral neck

Contralateral neck, if inidicated (see figure x)

Level Ia

Level Ib

Level II

Level III

Level IV

Level V

Level VI

Level Ib

Level II

Level III

Level IV

Level V

Level VI

RP Level Ia

RP

pN0 and pN1

R2

pN2a-pN2b and pN3 pN2c

R5

R5

R5

R5

Hypopharyngeal carcinoma Stage and primary site

Ipsilateral neck

Contralateral neck, if inidicated (see figure x)

Level Ia

Level Ib

Level II

Level III

Level IV

Level V

Level VI

Level Ib

Level II

Level III

Level IV

Level V

Level VI

RP Level Ia

RP

pN0

R3 R2

R3 R2

pN1-pN2a-pN2b

R3

R3 R2

pN2c

R5

R5 R3

R5

R5

pN3

R3

Laryngeal carcinoma Stage and primary site

Ipsilateral neck

Contralateral neck, if inidicated (see figure x)

Level Ia

Level Ib

Level II

Level III

Level IV

Level V

Level VI

Level Ib

Level II

Level III

Level IV

Level V

Level VI

RP Level Ia

RP

pN0 and pN1 pN2a - pN2b

R4

R4

R4

R4

pN2c

R5

R5 R4

R5

R5 R4

pN3

R4

R4

R1: Include level Ia only in case of anterior tongue or anterior floor of mouth extension. R2: Include retropharyngeal nodes for posterior pharyngeal wall tumour extension. R3: include level VI in case of extension apex of prirform sinus or esophageal extension. R4: include level VI in case of trans- or subglottis extension.

R5: according to N-stage on each side of the neck. RP: retropharyngeal nodes.

NOTE: LEVELS ADJACENT TO POSITIVE LYMPH NODE AREAS SHOULD ALWAYS BE INCLUDED IN THE ELECTIVE CTV.

Made with FlippingBook - Online catalogs