Overview
A variety of educational constituencies are increasingly using assessments that evaluate students’ analytical writing abilities. The complexity of these assessments makes it challenging to evaluate the reliability of task ratings. Many performance assessments do not lend themselves to standard generalizability (G) theory designs or to inter-rater reliability estimation procedures. This paper used a series of methods, including G theory, inter-rater reliability, and test-retest reliability, to determine the consistency of scores for one writing assessment, the Graduate Management Admission Test™ (GMAT™) Analytical Writing Assessment (AWA). Results are compared and suggestions for determining the reliability for nonconventional writing assessment designs are discussed.