Monitoring skeletal changes by radiological techniques

JOURNAL OF BONE AND MINERAL RESEARCH Volume 14, Number 11, 1999 Blackwell Science, Inc. 1999 American Society for Bone and Mineral Research
Monitoring Skeletal Changes by Radiological Techniques
ABSTRACT The longitudinal sensitivity of a technique, i.e., its ability to monitor skeletal changes, is affected by two parameters: the long-term precision error (PE ) and the subject group-specific response rate (i.e., annual rates of change). Both need to be considered to avoid misinterpretation of measured changes. A new concept to aid clinical decision making for longitudinal measurements is proposed which is based on three types of measures: criteria for detecting changes—the “least significant change” (LSC) is the smallest change to be considered statistically significant, but for certain clinical questions a smaller margin, the “trend assessment margin” (TAM), can be sufficient for decision making; follow-up time intervals—for follow-up exams the patient should be called in at about the time interval specified by the (population specific) “monitoring time interval” (MTI) or, about one-third of the time earlier, after the “trend assessment interval” (TAI), depending on whether the decision can be based on the LSC or the TAM; and the standard precision error (stdPE)—the smaller stdPE, the more sensitive the technique to monitor skeletal changes. Together, these three measures yield a good characterization of a technique’s ability to monitor skeletal changes. Compared with previous concepts, the proposed standardization by a response ratio instead of measures of spread or response rates makes the stdPE substantially less subject group dependent. It allows comparison of stdPE across different studies and could replace the misleading concept of expressing precision as a coefficient of variation. Application of this concept should facilitate the interpretation of measured skeletal changes. (J Bone Miner Res 1999;14:1952–1962) INTRODUCTION
proposed(2–8): division of precision errors by populationvariance, 10–90% range, normal age-related decline per an-num, etc. To date, none of these methods for standardiza-
FORTHEEVALUATIONofdiseaseprogression,responseto tionhasbeeninvestigatedthoroughly,comparedwithother
treatment, and the estimation of fracture risk it is im-
approaches, let alone been unanimously adopted. There is
portant to interpret measured changes in bone mineral den-
a lack of standardization of the methods of how to stan-
sity (BMD) and other skeletal parameters in a sensible
dardize precision. To judge the advantages and limitations
fashion. Bone densitometry is an accurate and precise
of the competing approaches, one needs to define the prob-
method, but due to limitations of the technique, the mea-
lems and point out the goals for standardization: What
sured results only approximate the true changes. The lon-
sense is in standardization of precision errors?
gitudinal sensitivity of a technique, defined as the ability to
When evaluating longitudinal changes over time, the
monitor changes in skeletal status,(1) is limited by technique
three following issues frequently need to be addressed in
imprecision. To allow a comparison of the imprecision of
techniques specified in different units, precision errors arecommonly reported on a percentage basis, calculated, e.g.,
● The interpretation of measured changes: Are the
as a coefficient of variation (CV) of repeated measure-
changes calculated meaningful and clinically rel-
ments. However, it is known that the apparent comparabil-
evant?—Random fluctuations are sometimes mistaken
ity of percentage units can be misleading and therefore dif-
ferent ways of standardizing precision errors have been
● Scheduling of a follow-up visit to determine rates of
Arbeitsgruppe Medizinische Physik, Klinik fu¨r Diagnostische Radiologie, Universita¨tsklinikum an der Christian-Albrechts-Universita¨t
MONITORING SKELETAL CHANGES
change: What time interval is required to allow accu-
a measure of longitudinal sensitivity called standardized
rate assessment of response to treatment or progres-
precision error thus appears to be useful, and various meth-
sion of disease?—Follow-up measurements performed
ods have been proposed. So, what is the problem with this
too early do not allow to judge the significance of mea-
One issue is that different measures of responsiveness
● Comparison of techniques: Which technique is suited
result in differently standardized precision errors, most of
best to detect changes accurately and quickly?—A
which are still substantially biased, e.g., affected by the co-
confusing number of insufficient methods for stan-
hort effect. Let’s take, just as a typical example, one of the
dardizing precision errors have been used.
common methods to standardize precision errors, i.e., divi-sion by the SD of the readings of the subject group. In case
To motivate the concept proposed, a few explanations re-
of a narrowly defined subject group (e.g., young normals)
garding the difficulty to answer the third question will be
the resulting “standardized precision error” will be larger
given. The simple answer that the technique with the best
than that calculated for a mixed group of healthy and os-
reproducibility or smallest precision error would be best
teoporotic individuals even if the technique has identical
suited for monitoring changes over time is flawed. First of
precision (expressed in absolute units) for healthy and os-
all, it is obvious that precision errors of different measure-
teoporotic subjects. Sample sizes for precision studies are
ment parameters cannot be directly compared when speci-
typically quite small. Subject selection therefore can easily
fied in absolute units (e.g., in g/cm2 vs. mg/cm3 vs. dB/MHz
introduce a substantial bias. In fact, one could easily “im-
vs. m/s). Expressing the precision error on a percentage
prove” the standardized precision by simply adding a few
basis is quite popular but it does not solve this problem.
more extreme cases (very healthy or very osteoporotic) to
Quite the contrary: superficially implying that the results
the subject group. As long as one limits the comparison of
are now readily comparable, this common percentage unit
precision of techniques to measurements all obtained in the
can be highly misleading; just change the definition of a
same subject group, this kind of standardization is helpful.
parameter (e.g., by adding an offset) and any desired level
But the moment this “standardized” precision is regarded
of percentage precision can be achieved. For example,
as a universal characteristic that describes the ability of a
original directly calculated broadband ultrasound attenua-
technique to monitor changes in any subject group, there is
tion (BUA) values typically range between 30 dB/MHz and
room for misinterpretation. Clearly, standardized precision
80 dB/MHz and a precision error, e.g., 2 dB/MHz would
errors calculated from different subject groups cannot be
yield percentage precision errors of 2.5% (2/80 × 100) to
directly compared; they have in fact not been standardized
6.7% (2/30 × 100). On some quantitative ultrasound (QUS)
devices, the original values are subjected to an offset. If this
What needs to be done to resolve this problem? Four
were, for simplicity’s sake, taken to be 50, the resulting
requirements for useful ways of standardization can be
range would be 80–130 dB/MHz. This simple manipulation
named: the measure(s) should reflect both imprecision and
term would reduce the precision error to a range of 1.5%
responsiveness; the measure(s) should make it possible to
(2/130 × 100) to 2.5% (2/80 × 100), without any true im-
directly compare the performance of techniques tested in
provement in longitudinal sensitivity. These examples also
different studies; the measure(s) should have an intuitive
illustrate a second problem, that percentage precision er-
clinical meaning; and the measure(s) should be as insensi-
rors appear to be better (i.e., lower) in healthy subjects,
tive to subject selection bias as possible.
simply because of the larger denominator (e.g., 2.5% vs.
Responsiveness varies for different genders, age groups,
6.7%). Again, this does not reflect any true difference in
and therapies, and reproducibility may also differ. Thus,
one needs to investigate those different cohorts separately
More fundamentally, however, one has to recognize that
in order to determine the respective levels of standardized
precision errors by themselves tell little about the ability of
precision. Consequently, standardized precision as a mea-
a technique to monitor changes: to characterize longitudi-
sure that reflects reproducibility as well as responsiveness
nal sensitivity the responsiveness of the monitoring param-
can no longer be represented by a single number.
eter needs to be considered as well. QUS parameters rep-
Finally, the quality of the estimate of standardized pre-
resent good examples to demonstrate what happens if this
cision will depend on the type of study design employed. As
aspect is neglected. Changes in speed of sound (SOS) are
usual, results derived from longitudinal studies are prefer-
typically in the range of only a few meters per second per
annum out of perhaps 1500–2000 m/s, reflecting responses
Keeping these caveats in mind, the following proposed
of less than 1% per annum; paralleling changes in BUA
concepts should facilitate an objective assessment of a tech-
amount to a few decibels per megahertz per annum out of
nique’s ability to monitor longitudinal changes.
perhaps 50–100 dB/MHz, reflecting responses of severalpercentage points. Not surprisingly, the percentage preci-sion error for SOS is typically smaller than that of BUA by
MATERIALS AND METHODS
at least an order of magnitude. This does not, however,
The interpretation of measured changes:
necessarily represent an equivalent advantage in the ability
of SOS to monitor changes because the lower responsive-ness of SOS has not been taken into account. Dividing per-
For clinical decision making it is important to know what
centage errors by some measure of responsiveness to obtain
magnitude of measured change is required to be sure that
¨ ER ET AL.
the patient has truly lost bone. In other words, which
which half of the patients with normal bone loss will show
change is statistically significant, taking into account the
a measured change exceeding the change criterion LSC. It
limitations of instrument performance? As other authors
have previously shown,(9) for two point measurements over
time only changes exceeding 2.8 times the precision errors
of a technique can be considered as a criterion for true
changes (with 95% confidence). The corresponding change
Similarly, the “trend assessment interval” (TAI) is an esti-
criterion has been termed “least significant change” (LSC):
mate of the (shorter) follow-up time period, after whichhalf of the patients with normal bone loss will demonstrate
a change exceeding the change criterion TAM. It is given
where PE is the largest precision error of the technique
However, clinicians have to balance the desire to attain
statistical certainty with the patient’s need to get treated as
quickly as possible if there is a valid indication to do so. To
For example, for a technique with a long-term precision
not withhold potentially important medication, the clinician
error of PE ס 1.5% and a patient for whom an annual
may be satisfied with confidence levels < 95%.(10) Indeed, it
change of 1% per annum could be expected, the TAI and
is conceivable that varying confidence limits may be appro-
MTI would be 2.7 years and 4.2 years, respectively. For a
priate under different clinical situations. For example, when
subject with a faster expected annual loss rate of 3%, the
identifying someone who has indeed responded to therapy
TAI and MTI would be 0.9 years and 1.4 years, respec-
in a situation where response is expected, the required con-
fidence may be somewhat less. However, in a situation
When scheduling a patient to assess response to treat-
where a change in a course of therapy is being considered,
ment, a similar strategy could be followed. In the previous
the clinician may require the 95% confidence in order to
section, the criteria by which patients can be considered to
change the intervention. Statistically, intervals for any level
have responded positively to treatment were established:
of confidence can be defined. Avoiding a plethora of dif-
those for whom the measured change was larger than the
ferent confidence levels, we propose to introduce one ad-
normal pretreatment loss by at least TAM or LSC. At what
ditional, less stringent change criterion, the “trend assess-
point in time is that expected to happen for the majority of
the treated patients? MTI and TAI for treatment could
it can be considered as a criterion for true changes at a
confidence level of 80% for two-sided tests or a level of
= 2.8 × PE րmedian improvement vs. placebo
90% for single-sided tests. The word “trend” should imply
that less strict requirements have to be met than for the test
for significance at the 95% confidence level.
T = 1.8 × PE րmedian improvement vs. placebo
Both change criteria, LSC and TAM, should be calcu-
lated using the long-term, not the short-term, precision er-ror specified for measurements in vivo in a comparable
The index “T” stands for treatment but it should be speci-
fied according to the treatment investigated. For example, ifestrogen is expected to improve bone by 3% per annum(median), while untreated individuals would lose bone at a
Scheduling of follow-up visits: Introducing
median rate of –1% per annum, the median treatment ef-
fect would be 4% per annum, and for a technique with a1.5% precision error the recommended TAI
After establishing the baseline status of a parameter (e.g.,
would be 8.1 months and 12.6 months, respectively. After
BMD), the rate of change of that parameter needs to be
these time periods, the median gain in BMD will be 2.02%
determined to assess progression of disease or response to
and 3.15%, respectively, which represents the levels at
treatment. What time interval between that baseline and a
which one can have 80% or 95% confidence that the sub-
follow-up measurement is sufficient to allow for an accurate
ject is indeed losing bone at less than the normal rate
and valid assessment? When answering this question one
(2.02% ס –0.68% + 2.7% and 3.15% ס –1.05% + 4.2%).
will face the dilemma of having to settle for either a quickanswer at an early follow-up visit associated with greaterstatistical uncertainty when estimating the true change from
Comparison of techniques: Introducing redefined
the measured change or a more solid answer at a later visit
with the risk of substantial bone loss and fractures in themeantime. Therefore, analogous to the preceding section,
Both the MTI and the TAI defined in the previous sec-
two different follow-up time intervals can be defined.
tion would be measures appropriate to characterize a tech-
The “monitoring time interval” for assessment of disease
nique’s ability to monitor skeletal changes: the shorter the
progression (MTI ) is an estimate of the time period after
MTI and the TAI, the better the longitudinal sensitivity. MONITORING SKELETAL CHANGES
Alternatively, longitudinal sensitivity could also be ex-
Consequently, one has to agree on the choice of a uni-
pressed by precision errors if these are corrected for differ-
versal reference technique to really make techniques com-
ences in responsiveness. Such a standardization procedure
parable across studies. We propose to use posterior–
would allow one to stay with the familiar concept of ex-
anterior dual-energy X-ray absorptiometry of the lumbar
pressing precision errors in percentage units (instead of the
spine (DXA ) as the reference technique because it is most
units of years for TAI and MTI). This facilitates the inter-
widely used in longitudinal studies. To denote clearly this
pretation since it is the kind of measure most researchers
choice, we propose to call a standardized precision error of
a technique A that has been standardized versus DXA the
Standardization can be achieved by correcting the preci-
“standard precision error,” stdPE (A):
sion error of the technique A investigated by the responseratio (rr); rr is given as the ratio of the response rate of the
reference technique R divided by the response rate of thetechnique A:
Use of standard precision error should be preferred over
standardized precision errors whenever possible, i.e., when-
rr(AvsR) = response rate (R)րresponse rate (A)
sured in the same subjects. The response rate of the tech-
The “standardized precision error,” sPE , of a technique A
that has been standardized relative to the reference tech-
If the techniques A and R have different units (e.g., m/s
and g/cm ), both the precision error and the response rates
need to be expressed on a percentage basis. If the tech-
niques A and R have the same units, standardized precision
Once standardized in this fashion, the standardized preci-
could alternatively also be evaluated in absolute units, but
sion error can now directly be compared with the precision
then both the precision errors and the response rates need
to be expressed consistently in absolute units. To make all
This method of standardization transforms the precision
equations as universally applicable as possible, the preci-
error of technique A to the scaling of the reference tech-
sion errors are all expressed on a percentage basis through-
nique R. The multiplication by the rr makes standardization
out the remainder of this manuscript.
precision errors truly comparable across techniques. All
Response ratios are likely to be less subject group de-
precision errors of techniques A, B, C. . . that have been
pendent than response rates (part of the cohort-bias cancels
standardized in this fashion can now directly be compared
out). Still, gender and ethnic group, health status (healthy,
among each other and also with the precision error of the
osteopenic, osteoporotic, etc.), and—if applicable—type
reference technique (which, by definition is equal to the
and dosage of treatment may have an impact and should
standardized precision error because it is standardized to
therefore be specified. Standard precision errors thus may
differ and the ranking of longitudinal sensitivity could de-
For example, if a QUS device has a (long-term) precision
pend on the cohort. Therefore, the following scenarios
error for SOS of 0.3%, and if one wishes to compare this
should be investigated before a generic statement on the
with the reported BUA performance of this device, in this
example set to 1.5%, one would standardize one or theother parameter versus the second parameter. Let us (ar-
● Longitudinal sensitivity for detecting normal aging
bitrarily) denote BUA as the reference technique. The pre-
cision error of SOS would be standardized by multiplication
● Longitudinal sensitivity for detecting disease progres-
with the rr of BUA versus SOS. If this were, for example,
found to be equal to 5 (i.e., the annual change of BUA is
● A standardized longitudinal precision error for detect-
five times larger than that for SOS), the standardized pre-
ing changes due to treatment which is treatment spe-
cision error of SOS would be 1.5%, i.e., equal to the pre-
cific and thus type of treatment and dosage need to be
cision error of BUA. Both devices would, in this example,
have the same longitudinal sensitivity.
If we had instead set SOS as the reference technique, the
precision error of BUA would have to be standardized. In
this case, the rr is 0.2 and the standardized precision error ofBUA would be equal to 1.5 × 0.2 ס 0.3%, i.e., again equal
To illustrate their utility, the concepts derived are being
to the precision error of SOS. No matter which technique
applied using data from the literature. Short-term precision
was selected as the reference technique, the result “equal
errors (since long-term precision errors are not established
standardized precision” remains the same. However, the
for QUS, yet) and typical response rates have been gath-
scaling of the standardized precision error depends on the
ered for two DXA and two QUS parameters. Since it is not
selection of the reference technique. In the first example,
the focus of this paper to compare techniques but to present
we calculated a standardized precision error of 1.5% for
the concept, the numbers given should only be taken as an
both techniques, whereas, if we switched the reference tech-
example of the application of the concept, not an assess-
nique, the standardized precision error was 0.3%.
ment of the longitudinal sensitivity of the four parameters. ¨ ER ET AL.
TABLE 1. HYPOTHETICAL EXAMPLE FOR THE CONCEPTS DERIVED
Skeletal parameters include bone mineral density (BMD) measured by posterior–anterior dual-energy X-ray absorptiometry (DXA)
) of the calcaneus. Despite large differences in uncorrected short-term precision errors (PE ) and response rates,
parameters reflecting the longitudinal sensitivity, such as trend assessment interval (TAI), monitoring time interval (MTI), and standardshort-term precision error (stdPE ), can be compared directly across techniques. Change criteria such as the trend assessment margin
(TAM) and the least significant change (LSC) provide threshold levels for assessing whether significant changes at the 95% and 80%confidence level, respectively (two-sided tests), have occurred.
change: What time interval is required to allow accu-rate assessment of response to treatment or progres-
The concepts proposed have been applied to hypotheti-
sion of disease? Proposed answer: re-examine patient
cal performance for two DXA approaches (BMD of pos-
after the follow-up time intervals MTI or TAI.
terior–anterior DXA of the lumbar spine, BMD
● Comparison of techniques: Which technique is suited
best to detect changes accurately and quickly? Pro-
QUS approaches (SOS and BUA of the calcaneus) pre-
posed answer: the technique with the lowest standard
Also, the four requirements for useful ways of standard-
DISCUSSION
ization listed in the introduction are largely fulfilled. Sub-ject selection bias is still an issue for the MTI but only
because it is meant to be specific for populations with dif-
“The search for difference seems to be, for current re-
fering rates of changes. For stdPE, this problem is minimal
search, what the search for the philosophers’ stone was for
as long as the response rates used to calculate the rr have
alchemy, or the Holy Grail for the knights of legend–
been obtained on the same individuals for both techniques.
beguiling, elusive and, all too often, illusory.”(11) A dozen
If this is not the case, care has to be taken to compare
years after Robert Heaney raised the issue, his assessment
similar populations. stdPE is best suited for direct compari-
remains largely true. Important contributions have been
sons of different techniques, even across different studies.
made in the meantime, but in clinical practice still today
All three parameters have fairly intuitive meanings. A
considerable confusion about the interpretation of mea-
change less than the TAM cannot be interpreted as clini-
sured changes and the comparative performance of tech-
cally relevant; a change less than the LSC is not a statisti-
niques remains. With the increasingly widespread use of
cally proven change. A follow-up time interval shorter than
ultrasound techniques, these problems are amplified since
the TAI or MTI, respectively, will yield such insufficient
the precision of QUS and bone densitometry techniques
changes in the majority of cases. The stdPE can be easily
cannot easily be compared because of different units and
interpreted since the scaling is simple and familiar: a per-
the fallacies of the expression on a percentage basis. In
formance of a stdPE of 1–1.5% is to be considered as fairly
addressing these issues, a new concept was developed to aid
good. This is similar to the level of precision reported in
the clinician in making decisions when following and treat-
many studies for DXA , which is familiar to most research-
ing individual patients. Researchers should benefit from
getting a tool for more objective ways of comparing the
The concepts derived are not limited to radiographic di-
longitudinal responsiveness of technique. The concept cen-
agnostic approaches. Change criteria, follow-up time inter-
ters around the three issues listed in the introduction sec-
vals, and standard precision errors could, for example, also
tion. Those issues and the components of the concept pro-
be calculated for markers of bone turnover. The huge dif-
ference in the response rates and precision errors for mark-ers versus radiographic parameters does not represent a
● The interpretation of measured changes: Are the
hurdle, since they cancel out when calculating follow-up
changes calculated meaningful and clinically relevant?
time intervals or standard precision errors. Therefore, the
Proposed answer: yes, if they exceed the change crite-
standard precision errors of a marker of bone resorption
can be put in perspective directly with the corresponding
● Scheduling of a follow-up visit to determine rates of
results for radiographic parameters. MONITORING SKELETAL CHANGES
introduced devices and methods. Therefore, this type ofstandardization should be used carefully.
The interpretation of the MTI (or TAI) as a measure of
Compared with previously proposed approaches, the
longitudinal sensitivity is intuitive and simple: it represents
new definitions of standard (and standardized) precision
the follow-up time required to test whether clinically rel-
errors presented here offer the advantages of ease of inter-
evant changes have occurred. The shorter the MTI, the
pretation (all parameters), suitability for comparison of any
two techniques (standardized precision errors), compara-
Still, a few caveats should be noted. First, there is no
bility across different studies (standard precision errors),
single MTI (or TAI) for each technique. The magnitude of
minimal cohort bias (corrections by rr’s rather than re-
the parameter is likely to be different for studies on disease
sponse rates), and applicability to radiographic as well as
progression (and again between normal and fast losers) and
response to treatment (here it may also depend on the type
These advantages will be discussed, and afterward the
of treatment investigated). When pursuing the latter issue,
limitations of definitions previously proposed by other au-
one should also note that response to treatment is quite
variable even for an established effective medication likeestrogen.(12,13) By definition, half of the patients will showa response, which is less than the median response, and,
Why introduce two concepts of standard and
consequently, their measured improvement during the
MTI (or TAI ) will be smaller than the LSC (or TAM).
Patients that do not reach the level of change expected
The advantage of the concept of the standardized preci-
after the MTI (or TAI ) may still have benefited from
sion error is that is can readily be used to compare the
treatment, albeit at a somewhat lower level. How do we
precision errors of any two techniques, provided that the
interpret such a “negative” insufficient response? How do
uncorrected precision errors and response rates are known
we detect true nonresponders? As long as a patient’s mea-
for both of the techniques. This will allow comparisons in a
sured change is “better” than the loss expected without
variety of research situations, whereas the concept of stan-
treatment, the patient is more likely to benefit from the
dard precision errors requires researchers to determine
treatment than not. However, the statistical uncertainty
both precision errors and response rates of DXA
would be unacceptably high. Depending on the health sta-
population, which may not always be feasible. However,
tus of the patient, one could still take the upward trend as
agreeing on a common reference standard—as required for
encouraging and schedule another follow-up visit at twice
the standard precision error—yields a well defined robust
the MTI (or TAI ). At this point in time, even patients
measure for comparison of the longitudinal sensitivity of
with only half the median response rate (comparing treated
techniques, even across different studies.
and untreated patients) can be expected to show a change
To facilitate assessment of standard precision errors for a
that exceeds the LSC (or TAM). According to published
large number of techniques, publication of the rr’s them-
studies, this would be met by ∼60% of the patients on es-
selves would be helpful. Such data would provide research-
trogen(12,14) and ∼80% of the patients on alendronate,(15)
ers with a methodology to determine the stdPE for a new
assuming normal distributions of the response. Further re-
technique, even if no direct comparison with posterior–
ductions in the change criteria appear to be clinically ques-
anterior dual-energy X-ray absorptiometry (PA-DXA) of
tionable, not only because the response is smaller, but be-
the lumbar spine can be carried out at the center. It would
cause the follow-up time intervals required to test
only be necessary to compare the new technique with a
responsiveness would become prohibitively long.
reference technique for which rr versus PA-DXA of the
Alternatively, it may also be justified to schedule follow-
spine is already available from the literature.
up visits at time intervals shorter than the TAI or MTI forthe purpose of identifying patients that continue to losebone at a rapid rate. Bone losses exceeding the TAM or
Why use BMD of PA-DXA of the spine as the
LSC would represent appropriate test criteria.
Previous standardization approaches failed to achieve
the goal of standardization because the result was still very
(Re-)Defined standardized precision errors
subject-group dependent. The proposed concept reduces
In the appendices, a number of different definitions for
the impact of this error source. Still, other forms of bias
standard precision errors have been developed. To avoid
needed to be considered. BMD of PA-DXA of the spine is
confusion, one should use the term stdPE only if the stan-
substantially affected by degenerative changes. Subjects af-
dard precision error has been obtained from truly longitu-
fected by degenerative changes need to be excluded when
dinal data. stdPE is preferable to other approaches. If nor-
calculating standard precision errors, specifically when
mative data of all manufacturers would be of equally good
evaluated from cross-sectional data. The choice of DXAsp
quality, the stdPE might be a good estimate of longitudi-
as the reference technique for calculation of the standard-
nal sensitivity to detect aging changes. However, it is known
ized precision error does not mean that DXA
that differences have been reported recently for DXA,(16)
nique with the best longitudinal sensitivity; it was only con-
and discrepancies again may be encountered for newly
sidered to be the best reference standard. ¨ ER ET AL. Why correct sPE and stdPE using response ratios
time intervals between follow-up measurements of 1 year
or longer. Therefore, the reproducibility of techniques hasto be based on long-term precision errors, which are usually
Correcting precision errors by division by response rates
larger than short-term precision errors.(17) There are addi-
yields a good measure of longitudinal sensitivity but this
tional error sources (e.g., long-term stability of equipment,
measure is very sensitive to the population sample studied
variability of body temperature for SOS measurements,
(cohort bias). This approach was used to define the MTI (or
etc.) that can only be determined from longitudinal data.(18)
TAI) because in the context of estimating follow-up times
Precision errors derived from short-term repeat measure-
the impact of the population is of critical importance. For a
ments only approximate true reproducibility errors. Still,
generic comparison of techniques, a more robust measure
their calculation can be helpful, particularly if one can as-
like the stdPE is preferable. As long as different techniques
sume that the ratio of short-term and long-term precision
measure a similar aspect of bone, their response rates will
errors (i.e., the precision error ratio) of the technique in-
be partially correlated. Therefore, a substantial fraction of
vestigated and that of the reference technique would be
the impact of the population studied is eliminated when
similar. Then, the ranking of the sensitivities of the tech-
using rr’s instead of response rates. Moreover, multiplica-
niques would not be affected (but the absolute magnitudes
tion by the rr, which will be unity for the reference tech-
of longitudinal sensitivity will be overestimated).
nique, leaves percentage precision errors in the range ofvalues (typically 1–5%) that researchers and clinicians arefamiliar with, increasing the likelihood of acceptance and
facilitating the interpretation. This is not the case for most
The previously published concepts of standardization all
definitions of standardized precision errors proposed pre-
have some of the aforementioned problems. Miller et al.
introduced the standardized CV based on normalization bythe dynamic range given by 90% interpercentile range,(3)
Why adjust for annual rates of change and not for a
and Greenspan et al. used a similar approach but standard-
measure of intersubject variability?
ized with the 95% interpercentile range.(8) Both measuresof population spread depend on subject selection criteria
When calculating response rates for standardized preci-
and thus are affected by the noted cohort and precision
sion errors, the measure “annual rates of change” was pro-
biases. Langton proposed the concept of ZSD, i.e., the stan-
posed. If standardized precision errors are meant to be used
dard deviation of the Z score which is taken as a measure of
as a measure of longitudinal sensitivity, it seems logical that
standardized precision.(5) Here, the problems with sam-
response should be defined as a change over time. This
pling bias are less severe since the population variance used
should be the most intuitive approach to quantitate a tech-
to calculate the Z score is usually obtained from large popu-
nique’s ability to monitor longitudinal changes. For longi-
lations measured to derive normative data. However, the
tudinal studies it is the obvious choice anyway, but for
current debate about the validity and comparability of nor-
cross-sectional estimates of longitudinal sensitivity one
mative data provided by the manufacturers puts some ques-
might consider other measures of responsiveness. However,
tion marks on this approach. More importantly, however,
standardization by annual rates of change was selected here
rather than being a good measure of responsiveness, the
as well, in order to make the (short-term, cross-sectional)
larger population variance could also be due to technique
definition of stdPE as similar as possible to that of (the
problems (precision bias) and to diversity in subjects which
is unrelated to osteoporosis (accuracy bias). In fact, a tech-
should note that any measure of spread or dynamic range
nique with a large age-related decline relative to its popu-
includes an error component caused by the precision (and
lation variance is more likely to allow monitoring of skeletal
accuracy) errors. Therefore, for two techniques of compa-
changes compared with a technique that—in the extreme—
rable true responsiveness, the one with the larger precision
would show no age-related change, even if that second tech-
error would show the larger apparent responsiveness. Con-
nique had an equally large or even larger population vari-
sequently, estimates of stdPE that are based on measures of
ance. Population variance does not appear to be a reliable
spread underestimate the differences in longitudinal sensi-
measure of responsiveness over time, and the ZSD may be
tivity between techniques. Techniques with poorer preci-
more suitable for characterizing diagnostic sensitivity.
sion will demonstrate an artificially enlarged dynamic range
In another approach, Blumsohn et al. have proposed the
and, consequently, their calculated standardized precision
index of individuality(4) which is affected by the noted sam-
error looks better than it really is (precision bias). One can
pling bias because it incorporates a measure of intersubject
correct for this, i.e., remove the precision error from the
variability. The problems are similar to those noted by
measure of spread, by two-way nested analysis of variance.
Quan and Shih for another measure of standardized preci-sion, the intraclass CV.(19) Both of these measures are per-
Why should parameters of longitudinal sensitivity
haps better suited to assess diagnostic sensitivity. Machado
be based on long-term rather than short-term
and colleagues have standardized precision by dividing pre-cision errors by the average difference between healthy and
osteoporotic individuals.(7) This measure is affected by co-
The assessment of skeletal changes via radiological tech-
hort bias due to the ambiguities in the degree of osteopo-
niques such as bone densitometry or QUS usually requires
rosis, which makes it impossible to compare standardized
MONITORING SKELETAL CHANGES
precision errors across different studies. A cross-sectional
good approximation of overall diagnostic, biological, and
comparison of subjects with and without osteoporosis is
problematic for assessing longitudinal sensitivity for an-
Statistical tests like the ones proposed in this paper might
other reason: the osteoporotic individuals may have had a
be incorporated in the device’s operating software. For ex-
low peak skeletal status to start with and therefore under
ample, in serial measurements, an automatic indication
these circumstances standardization based on the average
whether a change from previous exams is significant could
difference of healthy and osteoporotic individuals would
aid the clinician in the process of decision making.
overestimate true longitudinal responsiveness.
The above mentioned advantages and disadvantages of
the parameters of the concept are demonstrated by the data
While the proposed concepts avoid a number of the
shown in Table 1. As can be seen, the performance (i.e.,
problems addressed above, a few caveats need to be noted.
ability to monitor changes) of the technique cannot be
First of all, it is impossible to characterize longitudinal sen-
judged based on uncorrected precision errors since the re-
sitivity by a single universally applicable figure of merit.
sponse rates vary substantially. The change criteria can be
Only together will the change criteria LSC (or TAM), the
used directly to determine which changes reflect trends
follow-up time intervals MTI (or TAI), and the standard-
(TAM) or significant changes (LSC). The follow-up times
ized precision error provide the answers sought. More fun-
TAI and MTI and stdPE reflect both precision errors as
damentally, one could criticize the proposed approach be-
well as responsiveness to changes. stdPE is less dependent
cause it does not consider whether the observed change in
on the subject group than MTI (or TAI) and thus is closer
a bone parameter, even if highly significant, would relate to
to the goal of defining a single parameter that characterizes
a relevant change in fracture risk. Ross et al. have alluded
the overall performance of a technique. MTI (TAI) will
to this problem.(20) This does represent a limitation; how-
usually be different for each subject group and technique
ever, the relationships between changes in a bone param-
since they are meant to be direct indicators of follow-up
eter and subsequent changes in fracture risk have not been
times and will be subject group dependent.
well established to date. Increased bone loss can be a risk
The results of Table 1 are based on short-term preci-
factor in itself or because of the expected extrapolated long-
sion errors and thus need to be interpreted with caution
term reductions in BMD. Moreover, such a concept would
since they will likely underestimate long-term stdPE . In
reduce the relevance of bone loss measurements to simply
this hypothetical example, the longitudinal sensitivity of
the risk-related aspect, whereas for clinical decision mak-
ing, assessment of the efficacy of therapy or compliance
There are a number of assumptions to using the proposed
concept. The underlying bone parameters are considered tobe normally distributed. For calculating long-term precision
CONCLUSIONS
errors and response rates, the changes are assumed to belinear with time. For response to treatment, this is usually
A comprehensive assessment of the longitudinal sensitiv-
not the case. However, the proposed concepts could be
ity of a technique should be based on calculation of a
easily adapted. Nonlinear changes can be divided into
change criterion (like TAM or LSC), a follow-up time in-
piecewise linear segments. Compared with later responses,
terval (like TAI or MTI), and a standard precision error
the large early response to treatment would result in shorter
(stdPE). Together these three measures yield a good char-
MTIs (or TAIs). Long-term precision errors could also be
acterization of a technique’s ability to monitor skeletal
calculated from nonlinear models, should this make biologi-
changes: LSC is the smallest change to be considered sta-
cal and statistical sense. Whether this offers advantages re-
tistically significant, the patient should be called in at about
mains to be seen. Also, one needs to acknowledge that,
the time interval specified by the (population specific) MTI,
irrespective of the type of model selected, the standard er-
and the smaller the stdPE the more sensitive the technique.
ror of the estimate (SEE; see Appendix 1) includes two
For matters of clinical decision making that require or allow
components of variability, i.e., technique imprecision and
earlier judgement at lower levels of statistical significance,
true deviations from the fit. Therefore, prospectively de-
i.e., trend assessment, shortening the follow-up time inter-
fined standardized precision errors do not solely represent
val by 36% (next visit after TAI instead of MTI) may be
true technique limitations. In this regard, the term “preci-
sion error” may be considered misleading and the alterna-
Some of the previous methods of standardization of pre-
tive term “longitudinal sensitivity” may be preferable.
cision have been shown to represent cases of flawed appli-
However, for most clinical applications, this ambiguity does
cation (amplification of the cohort effect) of a useful con-
not represent a problem. If one is, for example, interested
cept (standardization) to a parameter that has sometimes
in estimating the follow-up time required to establish suc-
been misinterpreted in the past (precision, as a parameter
cess of treatment, the power to detect this will depend both
that for the purposes discussed here is not valuable in itself
on the technique’s imprecision and the true variability over
but only in conjunction with good responsiveness). The pre-
time.(21) Thus, the SEE can be considered to represent a
sented concept should improve the ability to investigate,
¨ ER ET AL.
characterize, and compare the ability of techniques to
18. Nguyen TV, Sambrook PN, Eisman JA 1997 Sources of vari-
ability in bone mineral density measurements: Implications for study design and analysis of bone loss. J Bone Miner Res 12:124–135.
19. Quan H, Shih WJ 1996 Assessing reproducibility by the within-
ACKNOWLEDGMENT
subject coefficient of variation with random effects models. Biometrics 52:1195–1203.
20. Ross PD, Davis JW, Wasnich RD, Vogel JM 1991 The clinical
I would like to acknowledge the helpful discussions with
application of serial bone mass measurements. Bone Miner
Richard Eastell, M.D., Sheffield, U.K., Ying Lu, Ph.D., San
12:189–199.
Francisco, CA, U.S.A., and Harry Genant, M.D., San Fran-
21. Blake GM, Fogelman I 1997 Technical principles of dual en-
ergy x-ray absorptiometry. Semin Nucl Med 27:210–228.
22. Glu¨er CC, Blake G, Lu Y, Blunt BA, Jergas M, Genant HK
1995 Accurate assessment of precision errors: How to measurethe reproducibility of bone densitometry techniques. Osteopo-
REFERENCES
ros Int 5:262–270.
1. Genant HK, Engelke K, Fuerst T, Glu¨er CC, Grampp S, Harris
ST, Jergas M, Lang T, Lu Y, Majumdar S, Mathur A, Takada
M 1996 Noninvasive assessment of bone mineral and structure:
State of the art. J Bone Miner Res 11:707–730. Klinik fu¨r Diagnostische Radiologie
2. Davis JW, Ross PD, Wasnich RD, MacLean CJ, Vogel JM
1991 Long-term precision of bone loss rate measurements among postmenopausal women. Calcif Tissue Int 48:311–318. Christian-Albrechts-Universita¨t zu Kiel
3. Miller CG, Herd RJM, Ramalingam T, Fogelman I, Blake GM
1993 Ultrasonic velocity measurements through the calcaneus:
Which velocity should be measured? Osteoporos Int 3:31–35.
4. Blumsohn A, Hannon RA, Al-Dehaimi AW, Eastell R 1994
Short-term intraindividual variability of markers of bone turn-
Received in original form May 11, 1998; in revised form April 2,
over in healthy adults. J Bone Miner Res 9 (Suppl 1):S153.
5. Langton CM 1997 ZSD: A universal parameter for precision in
the ultrasonic assessment of osteoporosis. Physiol Meas 18:67– 72.
6. Glu¨er CC, Blunt B, Engelke K, Jergas M, Grampp S, Genant
APPENDIX 1. GLOSSARY OF TERMS,
HK 1994 ‘Characteristic follow-up time’—A new concept forstandardized characterization of a technique’s ability to moni-
ABBREVIATIONS, AND DEFINITIONS
tor longitudinal changes. Bone Miner 25 (Suppl 2):S40.
7. Machado ABC, Hannon R, Henry Y, Eastell R 1997 Standard-
LSC: Least significant change: LSC ס 2.8 × PElt
ized coefficient of variation for dual energy x-ray absorptiom-
Criterion for smallest change in measurement results that
etry (DXA), quantitative ultrasound (QUS) and markers of
can be considered to be statistically significant with 95%
bone turnover. J Bone Miner Res 12 (Suppl 1):S258.
8. Greenspan SL, Bouxsein ML, Melton ME, Kolodny AH, Clair
confidence (two-sided test). For statistical assumptions
JH, DeLuca PT, Stek M, JrFaulkner KG, et al. 1997 Precision
and discriminatory ability of calcaneal bone assessment tech-
MTI: Monitoring time interval: LSC/median response
niques. J Bone Miner Res 12:1303–1313.
Follow-up time interval after which the majority of patients
9. Cummings SR, Black D 1986 Should perimenopausal women
can be expected to show a change exceeding the LSC, i.e.,
be screened for osteoporosis? Ann Intern Med 104:817–823.
10. Genant HK, Block JE, Steiger P, Glu¨er CC, Ettinger B, Harris
time interval recommended between follow-up visits if high
ST 1989 Appropriate use of bone densitometry. Radiology
95% confidence level (two-sided test) is required. MTI is a
170:817–822.
characteristic of a technique but it depends on the subject
11. Heaney RP 1986 En recherche de la diffe´rence (P < .05). Bone
group, e.g., disease progression (MTI ), response to treat-
Miner 1:99–114.
12. Lufkin EG, Wahner HW, O’Fallon WM 1992 Treatment of
postmenopausal osteoporosis with transdermal estrogen. Ann
PE : Short term precision error, expressed on a percentage
Intern Med 117:1–9.
13. Riis BJ, Thomsen K, Strøm V, Christiansen C 1987 The effect
Derived from two or more measurements repeated at short
of percutaneous estradiol and natural progesterone on post-
time intervals and obtained on i ס 1.m individuals; see
menopausal bone loss. Am J Obstet Gynecol 156:61–65.
14. Riis B, Christiansen C 1987 Prevention of postmenopausal os-
teoporosis by estrogen/ gestagen substitution therapy. Med
PE : Long term precision error, expressed on a percentage
Klin 82:238–241.
15. Liberman UA, Weiss SR, Bro¨ll J, Minne HW, Quan H, Bell
Derived from longitudinal studies of i ס 1.m individuals
NH, Rodriguez-Portales J, Downs RW, et al. 1995 Effect of
with a minimum of three repeated measurements per indi-
oral alendronate on bone mineral density and the incidence offractures in postmenopausal osteoporosis. N Engl J Med
333:1437–1443. RMS: Root-mean-square average; averaging method ap-
16. Faulkner KG, Roberts LA, McClung MR 1996 Discrepancies
propriate for averaging of variances (e.g., precision errors)
in normative data between Lunar and Hologic DXA systems.
which are not normally distributed, but according to the
Osteoporos Int 6:432–436.
17. Fuleihan GE-H, Testa M, Angell JE, Porrino N, LeBoff MS
1995 Reproducibility of DEXA absorptiometry: A model for
rr: Response ratio: rr(AvsR) ס response rate (reference
bone loss estimates. J Bone Miner Res 10:1004–1014.
technique R) /response rate (technique A investigated),
MONITORING SKELETAL CHANGES
where response rates reflect %changes per annum or
%changes per year of age for a given technique. SD: Standard deviation of repeated measurements; mea-
sure of short term precision. Compare SEE. SEE: Standard error of the estimate: measure of scatter
around the regression line and, therefore, of long term pre-cision. Compare SD.
Long-term precision errors can be calculated from linear
sPE: Standardized precision error: sPE ס PE × rr(AvsR)
regression analysis of an individual’s measurements over
Precision error adjusted for response ratio rr of reference
time. The standard error of the estimate (SEE ), which re-
technique versus technique A investigated. Expressed on a
flects the deviations of repeated measurements from the
percentage basis. As a result of the standardization proce-
fitted curve, can be taken as a measure of the absolute
dure, the scaling of the standardized precision error is now
long-term precision error of the ith individual. The indi-
equivalent to the scaling of the precision error of reference
vidual’s long-term precision error, PE , when expressed on
technique. Consequently, standardized precision errors of
both techniques can now directly be compared. stdPE: Standard precision error: StdPE ס PE × rr(PA-
Standardized precision error for which PA-DXA of the
spine was selected as the reference technique. Expressed
on a percentage basis. TAI: Trend assessment interval: TAM/median response
where xˆ ס a + bt is the predicted value of the jth mea-
Follow-up time interval after which the majority of pa-
surement in the ith individual at the time t according to the
tients can be expected to show a change exceeding the
fitted line with intercept a and slope b.
TAM, i.e., time interval recommended if somewhat relaxed
For a group of m individuals the average long term pre-
tests for change are sufficient. TAI is a characteristic of a
technique but it depends on the subject group, e.g., diseaseprogression (TAI ), response to treatment (e.g., TAI
TAM: Trend assessment margin: TAM ס 1.8 × PElt
Criterion for smallest change in measurement results that
can be considered to be statistically significant with 80%
confidence (two-sided test) or 90% confidence (single-sided test). For statistical assumptions see.(9) Compare
Standard (or standardized) precision errors are derived
from the precision errors defined above by first multiplyingthe individual’s short- or long-term precision error with theresponse ratio, and then, second, calculating the RMS av-erage across all subjects. The response ratio, rr, can be de-
APPENDIX 2. CALCULATION OF STANDARDIZED PRECISION ERRORS
(1) Longitudinal studies (preferred approach):
͑%slope per annum of reference technique͒
Short term precision errors are calculated in the follow-
ing way. For an individual, the absolute precision error is
͑%slope per annum of technique investigated͒i
given by the standard deviations (SD) of repeated measure-
% slope is the slope of the regression line (i.e., the precent
ments. Expressed on a percentage basis, the short-term pre-
change per annum) of an individual’s measurements over
for the ith individual is given by:
time. As a measure of the response observed for this indi-vidual, it is calculated for the technique investigated and thereference technique—both obtained in this individual—to
calculate the response ratio. For this method, unlike for the
Ί͚ twofollowingones,theresponseratioisspecifictoeach
individual and it can be used for estimating longitudinal
sensitivity for response to treatment.
where x is the bone parameter from the jth measure-
ment of the ith individual and x the mean of n repeated
The average short-term precision error of a group of m
individuals is not given by the arithmetic mean but ratherby the root-mean-square average (RMS) of the precision
Here, the response rates are based on the cross-sectional fit
of age-related changes in normative data. ¨ ER ET AL.
(3) Cross-sectional data (least desirable approach):
%slope per year of age for reference technique
%slope per year of age for technique investigated
This approach can be used if neither longitudinal response
studies nor normative data are available. If the techniqueinvestigated and the reference technique have been ob-
Depending on the data type, i.e., short-term or long-term
tained in the same individuals one would, separately for
precision errors, longitudinal or cross-sectional study de-
each of the two techniques, regress the parameter of the
sign, different types of stdPE could be calculated. The pre-
technique versus the age of the subjects included in order to
ferred approach, in which both long-term precision errors
obtain the slope per year of age as a measure of the re-
and response rates derived from longitudinal studies, is de-
Standard precision errors are then calculated from either
The standard precision error averaged across a group of m

FIGURE 22. DOSAGES OF DRUGS FOR ASTHMA EXACERBATIONS Medication Child Dose* Adult Dose Comments (not all inclusive) Inhaled Short-Acting Beta2-Agonists (SABA) Only selective beta agonists are recommended. For optimal delivery, dilute aerosols to minimum of 3 mL at gas flow of 6–8 L/min. Use large volume nebulizers for continuous administration. May mix with ipratropium nebu

Gabriel Jaramillo Ambassador Eric Goosby General Manager Global AIDS Coordinator Global Fund to Fight AIDS, Tuberculosis & President’s Emergency Plan for AIDS Relief US Department of State Geneva, Switzerland Washington, DC USA 9 August 2012 Dear Gabriel Jaramillo and Ambassador Eric Goosby: We are writing to call on the Global Fund to immediately