November 30, 2005 PDF 197k Download Overview Scale stability is an important quality for any large-scale computer adaptive test (CAT) program and should be maintained through research on scale drift evaluations in the CAT operations. However, there is scarcely any literature on evaluating scale drift with CAT using both observed and simulated data. A method for evaluating scale drift is outlined and illustrated in this paper. In this study, a special online data collection method for the GMAT™ Quantitative measure was designed and implemented. A modified root mean squared difference statistic was used to measure the difference in item parameters. Then an empirical baseline was established using simulations for evaluating the difference. The result showed that scale drift was not detected in the GMAT™ Quantitative measure and the observed differences between the two sets of item parameters calibrated at two time points were random variations. Related Items Expected Classification Accuracy Using Latent Distributions Exploring Potential Quantitative Item Bias across Groups Guess What? Score Differences with Rapid Replies versus Omissions on a Computerized Adaptive Test