Monitoring skeletal changes by radiological techniques

Volume 14, Number 11, 1999
Blackwell Science, Inc.
1999 American Society for Bone and Mineral Research

Monitoring Skeletal Changes by Radiological Techniques ABSTRACT
The longitudinal sensitivity of a technique, i.e., its ability to monitor skeletal changes, is affected by two parameters:
the long-term precision error (PE ) and the subject group-specific response rate (i.e., annual rates of change). Both

need to be considered to avoid misinterpretation of measured changes. A new concept to aid clinical decision
making for longitudinal measurements is proposed which is based on three types of measures: criteria for detecting
changes—the “least significant change” (LSC) is the smallest change to be considered statistically significant, but
for certain clinical questions a smaller margin, the “trend assessment margin” (TAM), can be sufficient for decision
making; follow-up time intervals—for follow-up exams the patient should be called in at about the time interval
specified by the (population specific) “monitoring time interval” (MTI) or, about one-third of the time earlier, after
the “trend assessment interval” (TAI), depending on whether the decision can be based on the LSC or the TAM;
and the standard precision error (stdPE)—the smaller stdPE, the more sensitive the technique to monitor skeletal
changes. Together, these three measures yield a good characterization of a technique’s ability to monitor skeletal
changes. Compared with previous concepts, the proposed standardization by a response ratio instead of measures
of spread or response rates makes the stdPE substantially less subject group dependent. It allows comparison of
stdPE across different studies and could replace the misleading concept of expressing precision as a coefficient of
variation. Application of this concept should facilitate the interpretation of measured skeletal changes. (J Bone
Miner Res 1999;14:1952–1962)

proposed(2–8): division of precision errors by populationvariance, 10–90% range, normal age-related decline per an-num, etc. To date, none of these methods for standardiza- FORTHEEVALUATIONofdiseaseprogression,responseto tionhasbeeninvestigatedthoroughly,comparedwithother treatment, and the estimation of fracture risk it is im- approaches, let alone been unanimously adopted. There is portant to interpret measured changes in bone mineral den- a lack of standardization of the methods of how to stan- sity (BMD) and other skeletal parameters in a sensible dardize precision. To judge the advantages and limitations fashion. Bone densitometry is an accurate and precise of the competing approaches, one needs to define the prob- method, but due to limitations of the technique, the mea- lems and point out the goals for standardization: What sured results only approximate the true changes. The lon- sense is in standardization of precision errors? gitudinal sensitivity of a technique, defined as the ability to When evaluating longitudinal changes over time, the monitor changes in skeletal status,(1) is limited by technique three following issues frequently need to be addressed in imprecision. To allow a comparison of the imprecision of techniques specified in different units, precision errors arecommonly reported on a percentage basis, calculated, e.g., ● The interpretation of measured changes: Are the as a coefficient of variation (CV) of repeated measure- changes calculated meaningful and clinically rel- ments. However, it is known that the apparent comparabil- evant?—Random fluctuations are sometimes mistaken ity of percentage units can be misleading and therefore dif- ferent ways of standardizing precision errors have been ● Scheduling of a follow-up visit to determine rates of Arbeitsgruppe Medizinische Physik, Klinik fu¨r Diagnostische Radiologie, Universita¨tsklinikum an der Christian-Albrechts-Universita¨t MONITORING SKELETAL CHANGES
change: What time interval is required to allow accu- a measure of longitudinal sensitivity called standardized rate assessment of response to treatment or progres- precision error thus appears to be useful, and various meth- sion of disease?—Follow-up measurements performed ods have been proposed. So, what is the problem with this too early do not allow to judge the significance of mea- One issue is that different measures of responsiveness ● Comparison of techniques: Which technique is suited result in differently standardized precision errors, most of best to detect changes accurately and quickly?—A which are still substantially biased, e.g., affected by the co- confusing number of insufficient methods for stan- hort effect. Let’s take, just as a typical example, one of the dardizing precision errors have been used.
common methods to standardize precision errors, i.e., divi-sion by the SD of the readings of the subject group. In case To motivate the concept proposed, a few explanations re- of a narrowly defined subject group (e.g., young normals) garding the difficulty to answer the third question will be the resulting “standardized precision error” will be larger given. The simple answer that the technique with the best than that calculated for a mixed group of healthy and os- reproducibility or smallest precision error would be best teoporotic individuals even if the technique has identical suited for monitoring changes over time is flawed. First of precision (expressed in absolute units) for healthy and os- all, it is obvious that precision errors of different measure- teoporotic subjects. Sample sizes for precision studies are ment parameters cannot be directly compared when speci- typically quite small. Subject selection therefore can easily fied in absolute units (e.g., in g/cm2 vs. mg/cm3 vs. dB/MHz introduce a substantial bias. In fact, one could easily “im- vs. m/s). Expressing the precision error on a percentage prove” the standardized precision by simply adding a few basis is quite popular but it does not solve this problem.
more extreme cases (very healthy or very osteoporotic) to Quite the contrary: superficially implying that the results the subject group. As long as one limits the comparison of are now readily comparable, this common percentage unit precision of techniques to measurements all obtained in the can be highly misleading; just change the definition of a same subject group, this kind of standardization is helpful.
parameter (e.g., by adding an offset) and any desired level But the moment this “standardized” precision is regarded of percentage precision can be achieved. For example, as a universal characteristic that describes the ability of a original directly calculated broadband ultrasound attenua- technique to monitor changes in any subject group, there is tion (BUA) values typically range between 30 dB/MHz and room for misinterpretation. Clearly, standardized precision 80 dB/MHz and a precision error, e.g., 2 dB/MHz would errors calculated from different subject groups cannot be yield percentage precision errors of 2.5% (2/80 × 100) to directly compared; they have in fact not been standardized 6.7% (2/30 × 100). On some quantitative ultrasound (QUS) devices, the original values are subjected to an offset. If this What needs to be done to resolve this problem? Four were, for simplicity’s sake, taken to be 50, the resulting requirements for useful ways of standardization can be range would be 80–130 dB/MHz. This simple manipulation named: the measure(s) should reflect both imprecision and term would reduce the precision error to a range of 1.5% responsiveness; the measure(s) should make it possible to (2/130 × 100) to 2.5% (2/80 × 100), without any true im- directly compare the performance of techniques tested in provement in longitudinal sensitivity. These examples also different studies; the measure(s) should have an intuitive illustrate a second problem, that percentage precision er- clinical meaning; and the measure(s) should be as insensi- rors appear to be better (i.e., lower) in healthy subjects, tive to subject selection bias as possible.
simply because of the larger denominator (e.g., 2.5% vs.
Responsiveness varies for different genders, age groups, 6.7%). Again, this does not reflect any true difference in and therapies, and reproducibility may also differ. Thus, one needs to investigate those different cohorts separately More fundamentally, however, one has to recognize that in order to determine the respective levels of standardized precision errors by themselves tell little about the ability of precision. Consequently, standardized precision as a mea- a technique to monitor changes: to characterize longitudi- sure that reflects reproducibility as well as responsiveness nal sensitivity the responsiveness of the monitoring param- can no longer be represented by a single number.
eter needs to be considered as well. QUS parameters rep- Finally, the quality of the estimate of standardized pre- resent good examples to demonstrate what happens if this cision will depend on the type of study design employed. As aspect is neglected. Changes in speed of sound (SOS) are usual, results derived from longitudinal studies are prefer- typically in the range of only a few meters per second per annum out of perhaps 1500–2000 m/s, reflecting responses Keeping these caveats in mind, the following proposed of less than 1% per annum; paralleling changes in BUA concepts should facilitate an objective assessment of a tech- amount to a few decibels per megahertz per annum out of nique’s ability to monitor longitudinal changes.
perhaps 50–100 dB/MHz, reflecting responses of severalpercentage points. Not surprisingly, the percentage preci-sion error for SOS is typically smaller than that of BUA by MATERIALS AND METHODS
at least an order of magnitude. This does not, however, The interpretation of measured changes: necessarily represent an equivalent advantage in the ability of SOS to monitor changes because the lower responsive-ness of SOS has not been taken into account. Dividing per- For clinical decision making it is important to know what centage errors by some measure of responsiveness to obtain magnitude of measured change is required to be sure that ¨ ER ET AL.
the patient has truly lost bone. In other words, which which half of the patients with normal bone loss will show change is statistically significant, taking into account the a measured change exceeding the change criterion LSC. It limitations of instrument performance? As other authors have previously shown,(9) for two point measurements over time only changes exceeding 2.8 times the precision errors of a technique can be considered as a criterion for true changes (with 95% confidence). The corresponding change Similarly, the “trend assessment interval” (TAI) is an esti- criterion has been termed “least significant change” (LSC): mate of the (shorter) follow-up time period, after whichhalf of the patients with normal bone loss will demonstrate a change exceeding the change criterion TAM. It is given where PE is the largest precision error of the technique However, clinicians have to balance the desire to attain statistical certainty with the patient’s need to get treated as quickly as possible if there is a valid indication to do so. To For example, for a technique with a long-term precision not withhold potentially important medication, the clinician error of PE ס 1.5% and a patient for whom an annual may be satisfied with confidence levels < 95%.(10) Indeed, it change of 1% per annum could be expected, the TAI and is conceivable that varying confidence limits may be appro- MTI would be 2.7 years and 4.2 years, respectively. For a priate under different clinical situations. For example, when subject with a faster expected annual loss rate of 3%, the identifying someone who has indeed responded to therapy TAI and MTI would be 0.9 years and 1.4 years, respec- in a situation where response is expected, the required con- fidence may be somewhat less. However, in a situation When scheduling a patient to assess response to treat- where a change in a course of therapy is being considered, ment, a similar strategy could be followed. In the previous the clinician may require the 95% confidence in order to section, the criteria by which patients can be considered to change the intervention. Statistically, intervals for any level have responded positively to treatment were established: of confidence can be defined. Avoiding a plethora of dif- those for whom the measured change was larger than the ferent confidence levels, we propose to introduce one ad- normal pretreatment loss by at least TAM or LSC. At what ditional, less stringent change criterion, the “trend assess- point in time is that expected to happen for the majority of the treated patients? MTI and TAI for treatment could it can be considered as a criterion for true changes at a confidence level of 80% for two-sided tests or a level of = 2.8 × PE րmedian improvement vs. placebo 90% for single-sided tests. The word “trend” should imply that less strict requirements have to be met than for the test for significance at the 95% confidence level.
T = 1.8 × PE րmedian improvement vs. placebo Both change criteria, LSC and TAM, should be calcu- lated using the long-term, not the short-term, precision er-ror specified for measurements in vivo in a comparable The index “T” stands for treatment but it should be speci- fied according to the treatment investigated. For example, ifestrogen is expected to improve bone by 3% per annum(median), while untreated individuals would lose bone at a Scheduling of follow-up visits: Introducing median rate of –1% per annum, the median treatment ef- fect would be 4% per annum, and for a technique with a1.5% precision error the recommended TAI After establishing the baseline status of a parameter (e.g., would be 8.1 months and 12.6 months, respectively. After BMD), the rate of change of that parameter needs to be these time periods, the median gain in BMD will be 2.02% determined to assess progression of disease or response to and 3.15%, respectively, which represents the levels at treatment. What time interval between that baseline and a which one can have 80% or 95% confidence that the sub- follow-up measurement is sufficient to allow for an accurate ject is indeed losing bone at less than the normal rate and valid assessment? When answering this question one (2.02% ס –0.68% + 2.7% and 3.15% ס –1.05% + 4.2%).
will face the dilemma of having to settle for either a quickanswer at an early follow-up visit associated with greaterstatistical uncertainty when estimating the true change from Comparison of techniques: Introducing redefined the measured change or a more solid answer at a later visit with the risk of substantial bone loss and fractures in themeantime. Therefore, analogous to the preceding section, Both the MTI and the TAI defined in the previous sec- two different follow-up time intervals can be defined.
tion would be measures appropriate to characterize a tech- The “monitoring time interval” for assessment of disease nique’s ability to monitor skeletal changes: the shorter the progression (MTI ) is an estimate of the time period after MTI and the TAI, the better the longitudinal sensitivity.
Alternatively, longitudinal sensitivity could also be ex- Consequently, one has to agree on the choice of a uni- pressed by precision errors if these are corrected for differ- versal reference technique to really make techniques com- ences in responsiveness. Such a standardization procedure parable across studies. We propose to use posterior– would allow one to stay with the familiar concept of ex- anterior dual-energy X-ray absorptiometry of the lumbar pressing precision errors in percentage units (instead of the spine (DXA ) as the reference technique because it is most units of years for TAI and MTI). This facilitates the inter- widely used in longitudinal studies. To denote clearly this pretation since it is the kind of measure most researchers choice, we propose to call a standardized precision error of a technique A that has been standardized versus DXA the Standardization can be achieved by correcting the preci- “standard precision error,” stdPE (A): sion error of the technique A investigated by the responseratio (rr); rr is given as the ratio of the response rate of the reference technique R divided by the response rate of thetechnique A: Use of standard precision error should be preferred over standardized precision errors whenever possible, i.e., when- rr(AvsR) = response rate (R)րresponse rate (A) sured in the same subjects. The response rate of the tech- The “standardized precision error,” sPE , of a technique A that has been standardized relative to the reference tech- If the techniques A and R have different units (e.g., m/s and g/cm ), both the precision error and the response rates need to be expressed on a percentage basis. If the tech- niques A and R have the same units, standardized precision Once standardized in this fashion, the standardized preci- could alternatively also be evaluated in absolute units, but sion error can now directly be compared with the precision then both the precision errors and the response rates need to be expressed consistently in absolute units. To make all This method of standardization transforms the precision equations as universally applicable as possible, the preci- error of technique A to the scaling of the reference tech- sion errors are all expressed on a percentage basis through- nique R. The multiplication by the rr makes standardization out the remainder of this manuscript.
precision errors truly comparable across techniques. All Response ratios are likely to be less subject group de- precision errors of techniques A, B, C. . . that have been pendent than response rates (part of the cohort-bias cancels standardized in this fashion can now directly be compared out). Still, gender and ethnic group, health status (healthy, among each other and also with the precision error of the osteopenic, osteoporotic, etc.), and—if applicable—type reference technique (which, by definition is equal to the and dosage of treatment may have an impact and should standardized precision error because it is standardized to therefore be specified. Standard precision errors thus may differ and the ranking of longitudinal sensitivity could de- For example, if a QUS device has a (long-term) precision pend on the cohort. Therefore, the following scenarios error for SOS of 0.3%, and if one wishes to compare this should be investigated before a generic statement on the with the reported BUA performance of this device, in this example set to 1.5%, one would standardize one or theother parameter versus the second parameter. Let us (ar- ● Longitudinal sensitivity for detecting normal aging bitrarily) denote BUA as the reference technique. The pre- cision error of SOS would be standardized by multiplication ● Longitudinal sensitivity for detecting disease progres- with the rr of BUA versus SOS. If this were, for example, found to be equal to 5 (i.e., the annual change of BUA is ● A standardized longitudinal precision error for detect- five times larger than that for SOS), the standardized pre- ing changes due to treatment which is treatment spe- cision error of SOS would be 1.5%, i.e., equal to the pre- cific and thus type of treatment and dosage need to be cision error of BUA. Both devices would, in this example, have the same longitudinal sensitivity.
If we had instead set SOS as the reference technique, the precision error of BUA would have to be standardized. In this case, the rr is 0.2 and the standardized precision error ofBUA would be equal to 1.5 × 0.2 ס 0.3%, i.e., again equal To illustrate their utility, the concepts derived are being to the precision error of SOS. No matter which technique applied using data from the literature. Short-term precision was selected as the reference technique, the result “equal errors (since long-term precision errors are not established standardized precision” remains the same. However, the for QUS, yet) and typical response rates have been gath- scaling of the standardized precision error depends on the ered for two DXA and two QUS parameters. Since it is not selection of the reference technique. In the first example, the focus of this paper to compare techniques but to present we calculated a standardized precision error of 1.5% for the concept, the numbers given should only be taken as an both techniques, whereas, if we switched the reference tech- example of the application of the concept, not an assess- nique, the standardized precision error was 0.3%.
ment of the longitudinal sensitivity of the four parameters.
TABLE 1. HYPOTHETICAL EXAMPLE FOR THE CONCEPTS DERIVED Skeletal parameters include bone mineral density (BMD) measured by posterior–anterior dual-energy X-ray absorptiometry (DXA) ) of the calcaneus. Despite large differences in uncorrected short-term precision errors (PE ) and response rates, parameters reflecting the longitudinal sensitivity, such as trend assessment interval (TAI), monitoring time interval (MTI), and standardshort-term precision error (stdPE ), can be compared directly across techniques. Change criteria such as the trend assessment margin (TAM) and the least significant change (LSC) provide threshold levels for assessing whether significant changes at the 95% and 80%confidence level, respectively (two-sided tests), have occurred.
change: What time interval is required to allow accu-rate assessment of response to treatment or progres- The concepts proposed have been applied to hypotheti- sion of disease? Proposed answer: re-examine patient cal performance for two DXA approaches (BMD of pos- after the follow-up time intervals MTI or TAI.
terior–anterior DXA of the lumbar spine, BMD ● Comparison of techniques: Which technique is suited best to detect changes accurately and quickly? Pro- QUS approaches (SOS and BUA of the calcaneus) pre- posed answer: the technique with the lowest standard Also, the four requirements for useful ways of standard- DISCUSSION
ization listed in the introduction are largely fulfilled. Sub-ject selection bias is still an issue for the MTI but only because it is meant to be specific for populations with dif- “The search for difference seems to be, for current re- fering rates of changes. For stdPE, this problem is minimal search, what the search for the philosophers’ stone was for as long as the response rates used to calculate the rr have alchemy, or the Holy Grail for the knights of legend– been obtained on the same individuals for both techniques.
beguiling, elusive and, all too often, illusory.”(11) A dozen If this is not the case, care has to be taken to compare years after Robert Heaney raised the issue, his assessment similar populations. stdPE is best suited for direct compari- remains largely true. Important contributions have been sons of different techniques, even across different studies.
made in the meantime, but in clinical practice still today All three parameters have fairly intuitive meanings. A considerable confusion about the interpretation of mea- change less than the TAM cannot be interpreted as clini- sured changes and the comparative performance of tech- cally relevant; a change less than the LSC is not a statisti- niques remains. With the increasingly widespread use of cally proven change. A follow-up time interval shorter than ultrasound techniques, these problems are amplified since the TAI or MTI, respectively, will yield such insufficient the precision of QUS and bone densitometry techniques changes in the majority of cases. The stdPE can be easily cannot easily be compared because of different units and interpreted since the scaling is simple and familiar: a per- the fallacies of the expression on a percentage basis. In formance of a stdPE of 1–1.5% is to be considered as fairly addressing these issues, a new concept was developed to aid good. This is similar to the level of precision reported in the clinician in making decisions when following and treat- many studies for DXA , which is familiar to most research- ing individual patients. Researchers should benefit from getting a tool for more objective ways of comparing the The concepts derived are not limited to radiographic di- longitudinal responsiveness of technique. The concept cen- agnostic approaches. Change criteria, follow-up time inter- ters around the three issues listed in the introduction sec- vals, and standard precision errors could, for example, also tion. Those issues and the components of the concept pro- be calculated for markers of bone turnover. The huge dif- ference in the response rates and precision errors for mark-ers versus radiographic parameters does not represent a ● The interpretation of measured changes: Are the hurdle, since they cancel out when calculating follow-up changes calculated meaningful and clinically relevant? time intervals or standard precision errors. Therefore, the Proposed answer: yes, if they exceed the change crite- standard precision errors of a marker of bone resorption can be put in perspective directly with the corresponding ● Scheduling of a follow-up visit to determine rates of results for radiographic parameters.
introduced devices and methods. Therefore, this type ofstandardization should be used carefully.
The interpretation of the MTI (or TAI) as a measure of Compared with previously proposed approaches, the longitudinal sensitivity is intuitive and simple: it represents new definitions of standard (and standardized) precision the follow-up time required to test whether clinically rel- errors presented here offer the advantages of ease of inter- evant changes have occurred. The shorter the MTI, the pretation (all parameters), suitability for comparison of any two techniques (standardized precision errors), compara- Still, a few caveats should be noted. First, there is no bility across different studies (standard precision errors), single MTI (or TAI) for each technique. The magnitude of minimal cohort bias (corrections by rr’s rather than re- the parameter is likely to be different for studies on disease sponse rates), and applicability to radiographic as well as progression (and again between normal and fast losers) and response to treatment (here it may also depend on the type These advantages will be discussed, and afterward the of treatment investigated). When pursuing the latter issue, limitations of definitions previously proposed by other au- one should also note that response to treatment is quite variable even for an established effective medication likeestrogen.(12,13) By definition, half of the patients will showa response, which is less than the median response, and, Why introduce two concepts of standard and consequently, their measured improvement during the MTI (or TAI ) will be smaller than the LSC (or TAM).
Patients that do not reach the level of change expected The advantage of the concept of the standardized preci- after the MTI (or TAI ) may still have benefited from sion error is that is can readily be used to compare the treatment, albeit at a somewhat lower level. How do we precision errors of any two techniques, provided that the interpret such a “negative” insufficient response? How do uncorrected precision errors and response rates are known we detect true nonresponders? As long as a patient’s mea- for both of the techniques. This will allow comparisons in a sured change is “better” than the loss expected without variety of research situations, whereas the concept of stan- treatment, the patient is more likely to benefit from the dard precision errors requires researchers to determine treatment than not. However, the statistical uncertainty both precision errors and response rates of DXA would be unacceptably high. Depending on the health sta- population, which may not always be feasible. However, tus of the patient, one could still take the upward trend as agreeing on a common reference standard—as required for encouraging and schedule another follow-up visit at twice the standard precision error—yields a well defined robust the MTI (or TAI ). At this point in time, even patients measure for comparison of the longitudinal sensitivity of with only half the median response rate (comparing treated techniques, even across different studies.
and untreated patients) can be expected to show a change To facilitate assessment of standard precision errors for a that exceeds the LSC (or TAM). According to published large number of techniques, publication of the rr’s them- studies, this would be met by ∼60% of the patients on es- selves would be helpful. Such data would provide research- trogen(12,14) and ∼80% of the patients on alendronate,(15) ers with a methodology to determine the stdPE for a new assuming normal distributions of the response. Further re- technique, even if no direct comparison with posterior– ductions in the change criteria appear to be clinically ques- anterior dual-energy X-ray absorptiometry (PA-DXA) of tionable, not only because the response is smaller, but be- the lumbar spine can be carried out at the center. It would cause the follow-up time intervals required to test only be necessary to compare the new technique with a responsiveness would become prohibitively long.
reference technique for which rr versus PA-DXA of the Alternatively, it may also be justified to schedule follow- spine is already available from the literature.
up visits at time intervals shorter than the TAI or MTI forthe purpose of identifying patients that continue to losebone at a rapid rate. Bone losses exceeding the TAM or Why use BMD of PA-DXA of the spine as the LSC would represent appropriate test criteria.
Previous standardization approaches failed to achieve the goal of standardization because the result was still very (Re-)Defined standardized precision errors subject-group dependent. The proposed concept reduces In the appendices, a number of different definitions for the impact of this error source. Still, other forms of bias standard precision errors have been developed. To avoid needed to be considered. BMD of PA-DXA of the spine is confusion, one should use the term stdPE only if the stan- substantially affected by degenerative changes. Subjects af- dard precision error has been obtained from truly longitu- fected by degenerative changes need to be excluded when dinal data. stdPE is preferable to other approaches. If nor- calculating standard precision errors, specifically when mative data of all manufacturers would be of equally good evaluated from cross-sectional data. The choice of DXAsp quality, the stdPE might be a good estimate of longitudi- as the reference technique for calculation of the standard- nal sensitivity to detect aging changes. However, it is known ized precision error does not mean that DXA that differences have been reported recently for DXA,(16) nique with the best longitudinal sensitivity; it was only con- and discrepancies again may be encountered for newly sidered to be the best reference standard.
Why correct sPE and stdPE using response ratios time intervals between follow-up measurements of 1 year or longer. Therefore, the reproducibility of techniques hasto be based on long-term precision errors, which are usually Correcting precision errors by division by response rates larger than short-term precision errors.(17) There are addi- yields a good measure of longitudinal sensitivity but this tional error sources (e.g., long-term stability of equipment, measure is very sensitive to the population sample studied variability of body temperature for SOS measurements, (cohort bias). This approach was used to define the MTI (or etc.) that can only be determined from longitudinal data.(18) TAI) because in the context of estimating follow-up times Precision errors derived from short-term repeat measure- the impact of the population is of critical importance. For a ments only approximate true reproducibility errors. Still, generic comparison of techniques, a more robust measure their calculation can be helpful, particularly if one can as- like the stdPE is preferable. As long as different techniques sume that the ratio of short-term and long-term precision measure a similar aspect of bone, their response rates will errors (i.e., the precision error ratio) of the technique in- be partially correlated. Therefore, a substantial fraction of vestigated and that of the reference technique would be the impact of the population studied is eliminated when similar. Then, the ranking of the sensitivities of the tech- using rr’s instead of response rates. Moreover, multiplica- niques would not be affected (but the absolute magnitudes tion by the rr, which will be unity for the reference tech- of longitudinal sensitivity will be overestimated).
nique, leaves percentage precision errors in the range ofvalues (typically 1–5%) that researchers and clinicians arefamiliar with, increasing the likelihood of acceptance and facilitating the interpretation. This is not the case for most The previously published concepts of standardization all definitions of standardized precision errors proposed pre- have some of the aforementioned problems. Miller et al.
introduced the standardized CV based on normalization bythe dynamic range given by 90% interpercentile range,(3) Why adjust for annual rates of change and not for a and Greenspan et al. used a similar approach but standard- measure of intersubject variability? ized with the 95% interpercentile range.(8) Both measuresof population spread depend on subject selection criteria When calculating response rates for standardized preci- and thus are affected by the noted cohort and precision sion errors, the measure “annual rates of change” was pro- biases. Langton proposed the concept of ZSD, i.e., the stan- posed. If standardized precision errors are meant to be used dard deviation of the Z score which is taken as a measure of as a measure of longitudinal sensitivity, it seems logical that standardized precision.(5) Here, the problems with sam- response should be defined as a change over time. This pling bias are less severe since the population variance used should be the most intuitive approach to quantitate a tech- to calculate the Z score is usually obtained from large popu- nique’s ability to monitor longitudinal changes. For longi- lations measured to derive normative data. However, the tudinal studies it is the obvious choice anyway, but for current debate about the validity and comparability of nor- cross-sectional estimates of longitudinal sensitivity one mative data provided by the manufacturers puts some ques- might consider other measures of responsiveness. However, tion marks on this approach. More importantly, however, standardization by annual rates of change was selected here rather than being a good measure of responsiveness, the as well, in order to make the (short-term, cross-sectional) larger population variance could also be due to technique definition of stdPE as similar as possible to that of (the problems (precision bias) and to diversity in subjects which is unrelated to osteoporosis (accuracy bias). In fact, a tech- should note that any measure of spread or dynamic range nique with a large age-related decline relative to its popu- includes an error component caused by the precision (and lation variance is more likely to allow monitoring of skeletal accuracy) errors. Therefore, for two techniques of compa- changes compared with a technique that—in the extreme— rable true responsiveness, the one with the larger precision would show no age-related change, even if that second tech- error would show the larger apparent responsiveness. Con- nique had an equally large or even larger population vari- sequently, estimates of stdPE that are based on measures of ance. Population variance does not appear to be a reliable spread underestimate the differences in longitudinal sensi- measure of responsiveness over time, and the ZSD may be tivity between techniques. Techniques with poorer preci- more suitable for characterizing diagnostic sensitivity.
sion will demonstrate an artificially enlarged dynamic range In another approach, Blumsohn et al. have proposed the and, consequently, their calculated standardized precision index of individuality(4) which is affected by the noted sam- error looks better than it really is (precision bias). One can pling bias because it incorporates a measure of intersubject correct for this, i.e., remove the precision error from the variability. The problems are similar to those noted by measure of spread, by two-way nested analysis of variance.
Quan and Shih for another measure of standardized preci-sion, the intraclass CV.(19) Both of these measures are per- Why should parameters of longitudinal sensitivity haps better suited to assess diagnostic sensitivity. Machado be based on long-term rather than short-term and colleagues have standardized precision by dividing pre-cision errors by the average difference between healthy and osteoporotic individuals.(7) This measure is affected by co- The assessment of skeletal changes via radiological tech- hort bias due to the ambiguities in the degree of osteopo- niques such as bone densitometry or QUS usually requires rosis, which makes it impossible to compare standardized MONITORING SKELETAL CHANGES
precision errors across different studies. A cross-sectional good approximation of overall diagnostic, biological, and comparison of subjects with and without osteoporosis is problematic for assessing longitudinal sensitivity for an- Statistical tests like the ones proposed in this paper might other reason: the osteoporotic individuals may have had a be incorporated in the device’s operating software. For ex- low peak skeletal status to start with and therefore under ample, in serial measurements, an automatic indication these circumstances standardization based on the average whether a change from previous exams is significant could difference of healthy and osteoporotic individuals would aid the clinician in the process of decision making.
overestimate true longitudinal responsiveness.
The above mentioned advantages and disadvantages of the parameters of the concept are demonstrated by the data While the proposed concepts avoid a number of the shown in Table 1. As can be seen, the performance (i.e., problems addressed above, a few caveats need to be noted.
ability to monitor changes) of the technique cannot be First of all, it is impossible to characterize longitudinal sen- judged based on uncorrected precision errors since the re- sitivity by a single universally applicable figure of merit.
sponse rates vary substantially. The change criteria can be Only together will the change criteria LSC (or TAM), the used directly to determine which changes reflect trends follow-up time intervals MTI (or TAI), and the standard- (TAM) or significant changes (LSC). The follow-up times ized precision error provide the answers sought. More fun- TAI and MTI and stdPE reflect both precision errors as damentally, one could criticize the proposed approach be- well as responsiveness to changes. stdPE is less dependent cause it does not consider whether the observed change in on the subject group than MTI (or TAI) and thus is closer a bone parameter, even if highly significant, would relate to to the goal of defining a single parameter that characterizes a relevant change in fracture risk. Ross et al. have alluded the overall performance of a technique. MTI (TAI) will to this problem.(20) This does represent a limitation; how- usually be different for each subject group and technique ever, the relationships between changes in a bone param- since they are meant to be direct indicators of follow-up eter and subsequent changes in fracture risk have not been times and will be subject group dependent.
well established to date. Increased bone loss can be a risk The results of Table 1 are based on short-term preci- factor in itself or because of the expected extrapolated long- sion errors and thus need to be interpreted with caution term reductions in BMD. Moreover, such a concept would since they will likely underestimate long-term stdPE . In reduce the relevance of bone loss measurements to simply this hypothetical example, the longitudinal sensitivity of the risk-related aspect, whereas for clinical decision mak- ing, assessment of the efficacy of therapy or compliance There are a number of assumptions to using the proposed concept. The underlying bone parameters are considered tobe normally distributed. For calculating long-term precision CONCLUSIONS
errors and response rates, the changes are assumed to belinear with time. For response to treatment, this is usually A comprehensive assessment of the longitudinal sensitiv- not the case. However, the proposed concepts could be ity of a technique should be based on calculation of a easily adapted. Nonlinear changes can be divided into change criterion (like TAM or LSC), a follow-up time in- piecewise linear segments. Compared with later responses, terval (like TAI or MTI), and a standard precision error the large early response to treatment would result in shorter (stdPE). Together these three measures yield a good char- MTIs (or TAIs). Long-term precision errors could also be acterization of a technique’s ability to monitor skeletal calculated from nonlinear models, should this make biologi- changes: LSC is the smallest change to be considered sta- cal and statistical sense. Whether this offers advantages re- tistically significant, the patient should be called in at about mains to be seen. Also, one needs to acknowledge that, the time interval specified by the (population specific) MTI, irrespective of the type of model selected, the standard er- and the smaller the stdPE the more sensitive the technique.
ror of the estimate (SEE; see Appendix 1) includes two For matters of clinical decision making that require or allow components of variability, i.e., technique imprecision and earlier judgement at lower levels of statistical significance, true deviations from the fit. Therefore, prospectively de- i.e., trend assessment, shortening the follow-up time inter- fined standardized precision errors do not solely represent val by 36% (next visit after TAI instead of MTI) may be true technique limitations. In this regard, the term “preci- sion error” may be considered misleading and the alterna- Some of the previous methods of standardization of pre- tive term “longitudinal sensitivity” may be preferable.
cision have been shown to represent cases of flawed appli- However, for most clinical applications, this ambiguity does cation (amplification of the cohort effect) of a useful con- not represent a problem. If one is, for example, interested cept (standardization) to a parameter that has sometimes in estimating the follow-up time required to establish suc- been misinterpreted in the past (precision, as a parameter cess of treatment, the power to detect this will depend both that for the purposes discussed here is not valuable in itself on the technique’s imprecision and the true variability over but only in conjunction with good responsiveness). The pre- time.(21) Thus, the SEE can be considered to represent a sented concept should improve the ability to investigate, ¨ ER ET AL.
characterize, and compare the ability of techniques to 18. Nguyen TV, Sambrook PN, Eisman JA 1997 Sources of vari- ability in bone mineral density measurements: Implications for
study design and analysis of bone loss. J Bone Miner Res
19. Quan H, Shih WJ 1996 Assessing reproducibility by the within- ACKNOWLEDGMENT
subject coefficient of variation with random effects models.
Biometrics 52:1195–1203.
20. Ross PD, Davis JW, Wasnich RD, Vogel JM 1991 The clinical I would like to acknowledge the helpful discussions with application of serial bone mass measurements. Bone Miner Richard Eastell, M.D., Sheffield, U.K., Ying Lu, Ph.D., San 12:189–199.
Francisco, CA, U.S.A., and Harry Genant, M.D., San Fran- 21. Blake GM, Fogelman I 1997 Technical principles of dual en- ergy x-ray absorptiometry. Semin Nucl Med 27:210–228.
22. Glu¨er CC, Blake G, Lu Y, Blunt BA, Jergas M, Genant HK 1995 Accurate assessment of precision errors: How to measurethe reproducibility of bone densitometry techniques. Osteopo- REFERENCES
ros Int 5:262–270.
1. Genant HK, Engelke K, Fuerst T, Glu¨er CC, Grampp S, Harris ST, Jergas M, Lang T, Lu Y, Majumdar S, Mathur A, Takada M 1996 Noninvasive assessment of bone mineral and structure: State of the art. J Bone Miner Res 11:707–730.
Klinik fu¨r Diagnostische Radiologie 2. Davis JW, Ross PD, Wasnich RD, MacLean CJ, Vogel JM 1991 Long-term precision of bone loss rate measurements
among postmenopausal women. Calcif Tissue Int 48:311–318.
Christian-Albrechts-Universita¨t zu Kiel 3. Miller CG, Herd RJM, Ramalingam T, Fogelman I, Blake GM 1993 Ultrasonic velocity measurements through the calcaneus: Which velocity should be measured? Osteoporos Int 3:31–35.
4. Blumsohn A, Hannon RA, Al-Dehaimi AW, Eastell R 1994 Short-term intraindividual variability of markers of bone turn- Received in original form May 11, 1998; in revised form April 2, over in healthy adults. J Bone Miner Res 9 (Suppl 1):S153.
5. Langton CM 1997 ZSD: A universal parameter for precision in the ultrasonic assessment of osteoporosis. Physiol Meas 18:67–
6. Glu¨er CC, Blunt B, Engelke K, Jergas M, Grampp S, Genant APPENDIX 1. GLOSSARY OF TERMS,
HK 1994 ‘Characteristic follow-up time’—A new concept forstandardized characterization of a technique’s ability to moni- ABBREVIATIONS, AND DEFINITIONS
tor longitudinal changes. Bone Miner 25 (Suppl 2):S40.
7. Machado ABC, Hannon R, Henry Y, Eastell R 1997 Standard- LSC: Least significant change: LSC ס 2.8 × PElt
ized coefficient of variation for dual energy x-ray absorptiom- Criterion for smallest change in measurement results that etry (DXA), quantitative ultrasound (QUS) and markers of can be considered to be statistically significant with 95% bone turnover. J Bone Miner Res 12 (Suppl 1):S258.
8. Greenspan SL, Bouxsein ML, Melton ME, Kolodny AH, Clair confidence (two-sided test). For statistical assumptions JH, DeLuca PT, Stek M, JrFaulkner KG, et al. 1997 Precision and discriminatory ability of calcaneal bone assessment tech- MTI: Monitoring time interval: LSC/median response
niques. J Bone Miner Res 12:1303–1313.
Follow-up time interval after which the majority of patients 9. Cummings SR, Black D 1986 Should perimenopausal women can be expected to show a change exceeding the LSC, i.e., be screened for osteoporosis? Ann Intern Med 104:817–823.
10. Genant HK, Block JE, Steiger P, Glu¨er CC, Ettinger B, Harris time interval recommended between follow-up visits if high ST 1989 Appropriate use of bone densitometry. Radiology 95% confidence level (two-sided test) is required. MTI is a 170:817–822.
characteristic of a technique but it depends on the subject 11. Heaney RP 1986 En recherche de la diffe´rence (P < .05). Bone group, e.g., disease progression (MTI ), response to treat- Miner 1:99–114.
12. Lufkin EG, Wahner HW, O’Fallon WM 1992 Treatment of postmenopausal osteoporosis with transdermal estrogen. Ann PE : Short term precision error, expressed on a percentage
Intern Med 117:1–9.
13. Riis BJ, Thomsen K, Strøm V, Christiansen C 1987 The effect Derived from two or more measurements repeated at short of percutaneous estradiol and natural progesterone on post- time intervals and obtained on i ס 1.m individuals; see menopausal bone loss. Am J Obstet Gynecol 156:61–65.
14. Riis B, Christiansen C 1987 Prevention of postmenopausal os- teoporosis by estrogen/ gestagen substitution therapy. Med PE : Long term precision error, expressed on a percentage
Klin 82:238–241.
15. Liberman UA, Weiss SR, Bro¨ll J, Minne HW, Quan H, Bell Derived from longitudinal studies of i ס 1.m individuals NH, Rodriguez-Portales J, Downs RW, et al. 1995 Effect of with a minimum of three repeated measurements per indi- oral alendronate on bone mineral density and the incidence offractures in postmenopausal osteoporosis. N Engl J Med 333:1437–1443.
RMS: Root-mean-square average; averaging method ap-
16. Faulkner KG, Roberts LA, McClung MR 1996 Discrepancies propriate for averaging of variances (e.g., precision errors) in normative data between Lunar and Hologic DXA systems.
which are not normally distributed, but according to the Osteoporos Int 6:432–436.
17. Fuleihan GE-H, Testa M, Angell JE, Porrino N, LeBoff MS 1995 Reproducibility of DEXA absorptiometry: A model for rr: Response ratio: rr(AvsR) ס response rate (reference
bone loss estimates. J Bone Miner Res 10:1004–1014.
technique R) /response rate (technique A investigated), MONITORING SKELETAL CHANGES
where response rates reflect %changes per annum or %changes per year of age for a given technique.
SD: Standard deviation of repeated measurements; mea-
sure of short term precision. Compare SEE.
SEE: Standard error of the estimate: measure of scatter
around the regression line and, therefore, of long term pre-cision. Compare SD.
Long-term precision errors can be calculated from linear sPE: Standardized precision error: sPE ס PE × rr(AvsR)
regression analysis of an individual’s measurements over Precision error adjusted for response ratio rr of reference time. The standard error of the estimate (SEE ), which re- technique versus technique A investigated. Expressed on a flects the deviations of repeated measurements from the percentage basis. As a result of the standardization proce- fitted curve, can be taken as a measure of the absolute dure, the scaling of the standardized precision error is now long-term precision error of the ith individual. The indi- equivalent to the scaling of the precision error of reference vidual’s long-term precision error, PE , when expressed on technique. Consequently, standardized precision errors of both techniques can now directly be compared.
stdPE: Standard precision error: StdPE ס PE × rr(PA-
Standardized precision error for which PA-DXA of the spine was selected as the reference technique. Expressed on a percentage basis.
TAI: Trend assessment interval: TAM/median response
where xˆ ס a + bt is the predicted value of the jth mea- Follow-up time interval after which the majority of pa- surement in the ith individual at the time t according to the tients can be expected to show a change exceeding the fitted line with intercept a and slope b.
TAM, i.e., time interval recommended if somewhat relaxed For a group of m individuals the average long term pre- tests for change are sufficient. TAI is a characteristic of a technique but it depends on the subject group, e.g., diseaseprogression (TAI ), response to treatment (e.g., TAI TAM: Trend assessment margin: TAM ס 1.8 × PElt
Criterion for smallest change in measurement results that can be considered to be statistically significant with 80% confidence (two-sided test) or 90% confidence (single-sided test). For statistical assumptions see.(9) Compare Standard (or standardized) precision errors are derived from the precision errors defined above by first multiplyingthe individual’s short- or long-term precision error with theresponse ratio, and then, second, calculating the RMS av-erage across all subjects. The response ratio, rr, can be de- APPENDIX 2. CALCULATION OF
(1) Longitudinal studies (preferred approach): ͑%slope per annum of reference technique͒ Short term precision errors are calculated in the follow- ing way. For an individual, the absolute precision error is ͑%slope per annum of technique investigated͒i given by the standard deviations (SD) of repeated measure- % slope is the slope of the regression line (i.e., the precent ments. Expressed on a percentage basis, the short-term pre- change per annum) of an individual’s measurements over for the ith individual is given by: time. As a measure of the response observed for this indi-vidual, it is calculated for the technique investigated and thereference technique—both obtained in this individual—to calculate the response ratio. For this method, unlike for the Ί͚ ΋ twofollowingones,theresponseratioisspecifictoeach individual and it can be used for estimating longitudinal sensitivity for response to treatment.
where x is the bone parameter from the jth measure- ment of the ith individual and x the mean of n repeated The average short-term precision error of a group of m individuals is not given by the arithmetic mean but ratherby the root-mean-square average (RMS) of the precision Here, the response rates are based on the cross-sectional fit of age-related changes in normative data.
(3) Cross-sectional data (least desirable approach): %slope per year of age for reference technique %slope per year of age for technique investigated This approach can be used if neither longitudinal response studies nor normative data are available. If the techniqueinvestigated and the reference technique have been ob- Depending on the data type, i.e., short-term or long-term tained in the same individuals one would, separately for precision errors, longitudinal or cross-sectional study de- each of the two techniques, regress the parameter of the sign, different types of stdPE could be calculated. The pre- technique versus the age of the subjects included in order to ferred approach, in which both long-term precision errors obtain the slope per year of age as a measure of the re- and response rates derived from longitudinal studies, is de- Standard precision errors are then calculated from either The standard precision error averaged across a group of m


Microsoft word - cso malawi letter final.docx

Gabriel Jaramillo Ambassador Eric Goosby General Manager Global AIDS Coordinator Global Fund to Fight AIDS, Tuberculosis & President’s Emergency Plan for AIDS Relief US Department of State Geneva, Switzerland Washington, DC USA 9 August 2012 Dear Gabriel Jaramillo and Ambassador Eric Goosby: We are writing to call on the Global Fund to immediately

Copyright © 2010 Medicament Inoculation Pdf