OMB Meeting Book - January 8, 2015 - page 87

L A B O R A T O R Y M A N A G E M E N T
© A O A C I N T E R N A T I O N A L
N O V E M B E R / D E C E M B E R 2 0 1 4
22
laboratory reproducibility of a method
as measured by the relative standard
deviation for reproducibility [RSD
(R)
];
2. provides or confirms the accuracy
(trueness; when a certified reference
material is used) and repeatability
(precision) characteristics of a method;
3. determines if the instructions for a
method are clear and can be followed by
analysts who are not affiliated with the
method developer; and 4. determines
that the method has been designed
so that the operating parameters that
might affect the performance of the
method are truly known and under con-
trol (robustness).
Most of a method evaluation can be
completed in a single laboratory. For
example, accuracy, repeatability, and rug-
gedness can be determined in just one
laboratory, AOAC has a well-described
procedure, the Youden ruggedness pro-
cedure (6), to determine ruggedness of
a candidate method. (Ruggedness can
be determined in a single laboratory.
Robustness is demonstrated in a collab-
orative study.) Method instruction clarity
could be determined using an estab-
lished review procedure. Interlaboratory
reproducibility is the only parameter that
requires collaborators.
The obvious question to ask when
assessing the traditional collabora-
tive study design is: Are 8 valid data
sets really required? Clearly, 10 valid
data sets are better than eight, and
12 better than 10, but how many valid
data sets are really needed to satisfy
the purposes of a collaborative study to
quantify “reproducibility.” It is mainly
a question of the confidence associ-
ated with the calculated RSD
(R)
. It
may not be immediately obvious, but
organizations such as AOAC indirectly
establish a confidence interval around
the calculated RSD
(R)
by the simple act
of requiring a minimum number of data
sets. This has been the paradigm of
method validation for more than
50 years. (AOAC has been operating
for over 125 years, but for much of its
history, there was not an agreed upon
minimum number of valid data sets.
That didn’t happen until the 1980s.)
There is another paradigm that is
generally called “fitness-for-purpose.”
Instead of forcing method developers
and users to accept a confidence level
derived as a consequence of the mini-
mum number of collaborators, it is also
possible to allow method developers
to determine the appropriate confi-
dence level and then find the necessary
number of collaborators. The key to a
fitness-for-purpose validation model
is that a method developer would be
required to report the target confidence
interval. A target interval is not nor-
mally calculated or reported because
there is an implied target interval with
the current eight laboratory minimum
collaborative study model.
A fitness-for-purpose model has two
advantages: 1. potential method users
can decide if the reported reproduc-
ibility and confidence level are good
enough for their purposes, much as
a potential user can now assess the
recovery, accuracy, LOQ, and range
of applicability; and 2. in some cases,
notably government-sponsored valida-
tion projects, the number of data sets
far exceeds the eight
laboratory minimum.
In these admittedly
rare and rarer cases,
the estimate of the
reproducibility is
known with much
greater confidence,
and this could be
reported to potential
users.
There is a new
benefit to the fitness-
for-purpose model
in that the acceptance criteria for
the method validation can be clearly
and quantitatively stated using target
measurement uncertainty. A paper
by Weitzel and Johnson (7) describes
a process using decision rules and
probability to determine a target mea-
surement uncertainty that is then used
to set the acceptance criteria for a
method validation. Target measurement
uncertainty is defined as “measurement
uncertainty specified as an upper limit
and decided on the basis of the intended
use of measurement results (8).” The
target measurement uncertainty can
be used to decide appropriate values
for validation criteria, such as bias,
precision, LOD, and LOQ; thus, directly
linking the SMPR to fitness-for-purpose.
Proficiency Testing
Proficiency testing (PT) is a widely
recognized practice for monitoring ana-
lytical performance, and in some ways
the PT process is very similar to the
process of a collaborative study. Test
materials are prepared and distributed
by a program/project coordinator. Each
participating laboratory analyzes a
common set of blind test samples, and
reports their results back to the coor-
dinator. The coordinator then analyzes
the data. Of course, there are several
differences between PT programs and
collaborative studies: 1. the aim of PT
is to assess the performance of the lab-
oratory not the method; 2. laboratories
may use any appropriate method they
choose for PT; and 3. the data is ana-
lyzed to determine how the individual
laboratory performs in relation to the
whole group of laboratories.
For many years, it has been strictly
forbidden to even suggest that PT
data might be used for the purposes
of evaluating a method. However, in
2010, Ellison et al. published a paper
proposing that there might be a role for
proficiency testing data in method vali-
dation under certain conditions. They
concluded that a properly implemented
PT program provides very similar infor-
mation to a traditional collaborative
study, and should be given equal weight
in appraising methods for suitability (9).
Alternative Approaches to the Traditional Collaborative Study
P
roficiency testing (PT) is a
widely recognized practice
for monitoring analytical
performance, and in some ways
the PT process is very similar to the
process of a collaborative study.
82
1...,77,78,79,80,81,82,83,84,85,86 88,89,90
Powered by FlippingBook