[NIFL-ASSESSMENT:154] RE: norm vs criterion

From: John Sabatini (sabatini@literacy.upenn.edu)
Date: Tue Jul 16 2002 - 22:49:48 EDT


Return-Path: <nifl-assessment@literacy.nifl.gov>
Received: from literacy (localhost [127.0.0.1]) by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id g6H2nmX16517; Tue, 16 Jul 2002 22:49:48 -0400 (EDT)
Date: Tue, 16 Jul 2002 22:49:48 -0400 (EDT)
Message-Id: <200207170248.g6H2m0K11127@charger.oldcity.dca.net>
Errors-To: listowner@literacy.nifl.gov
Reply-To: nifl-assessment@literacy.nifl.gov
Originator: nifl-assessment@literacy.nifl.gov
Sender: nifl-assessment@literacy.nifl.gov
Precedence: bulk
From: John Sabatini <sabatini@literacy.upenn.edu>
To: Multiple recipients of list <nifl-assessment@literacy.nifl.gov>
Subject: [NIFL-ASSESSMENT:154] RE: norm vs criterion
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="iso-8859-1"
X-Mailer: QuickMail Pro 1.5.4r2 (Mac)
Status: O
Content-Length: 9705
Lines: 106

         Reply to:   RE: [NIFL-ASSESSMENT:152] RE: norm vs criterion
Hi John,

I expect you'll get a lot of advice on how to evaluate tests and which one to use.  A couple thoughts.

I would put the most weight on aligning your assessments with your curriculum and instructional approaches.  While many tests purport to be general achievement tests, somewhat independent of instruction, this is rarely true except in the broadest sense.  
If you are teaching using a basic skills approach,then a TABE style basic skills test should measure whether you've been effective at doing it. If you are aiming at the kinds of competencies covered on the CASAS, then choose it as your outcome measure.  The DAR (which was used in the NCSALL ARCS study and therefore should have several hundred adults who have taken it) measures component reading skills like oral reading fluency and decoding.  If you teach those or want to diagnose an individual's proficiency so as to decide whether to address those learning needs, then DAR is a choice.  For GED, the GED practice tests and ultimately passing the GED is the most valid outcome measure.

It is not only appropriate to use different outcome measures to assess different instructional programs, its critical.  I don't mean to downplay reliability and validity, I just think assessment serves instruction, not the other way around.  Use a portfolio if you want a developmental record that shows real world proficiencies.  It can be as valid as the norm or criteria tests.

Finally, the fact is that very few, if any, of the assessments available to adults would get high marks on the more stringent measurement criteria you list below.  I would contend that there is not sufficient agreement on the content or constructs of adult literacy to support the validity claims we'd like to make and you unduly restrict the range of assessments that might serve your needs by setting the bar too high.  
Most measures for adults are highly correlated with each other (across the full range of abilities) because the rank order of adults on ability is often stable. Why? Because most poor reading adults are still poor performers whether one uses a CASAS, a TABE, or a math test for that matter, relative to better reading adults.  Some individuals show divergent patterns (good at math, poor at reading; good at basic skills, poor at competency tests), some perform better under some test conditions than others, but not as many as you think and not enough to lower the correlations significantly. 
Well this post is too long already. Sorry.

Best,

John
         jsabatini@ets.org   or  sabatini@literacy.upenn.edu

John Makay wrote:
>Thanks for the clarification on norm and criterion referenced tests.  >Although, my concern is selecting a test for my school to do placement.  I >have some questions in this area.  Is the CASAS, which is a criterion >referenced test, appropriate to place students in different levels of a >program if the curriculum of the program is not predominantly a CASAS-based >curriculum?
>
>Also, in evaluating a test to see which test has the best psychometric >variables, how important is the population (sampling) factor if the test is >a criterion-referenced test such as the CASAS.  We have been been >considering giving the Diagnostic Assessment of Reading (DAR) to place >students in the varying levels of our Pre-GED program, but discovered that >its was intended for a K-12 population.  What are the issues in using a test >like this for an adult population?
>
>Also,  what are the strongest variables in assessing a test?  In other >words, are some variable not as important as others.  Are some of the >test-quality criteria more important than other criteria?  For example, >several reviewers in the Buros Mental Measurements Yearbook point out that >many test manuals are lax in providing information in the area of content >validity; therefore, giving great emphasis to content validity when judging >tests may not be the best way to judge them since this information is rarely >available.
>
>I have found a few sources that identify criteria for test evaluation.
>
>According to Popham (2000, p. 195-196), the following factors should be >considered when reviewing a set of comparative data for norm-referenced >tests.
>
>1.	Sample size.  Is the sample in the norm group large enough to assure a >reasonable degree of stability in the database from which educators must >draw interpretations?
>
>2.	Representative ness.  Is the sample drawn in such a way as to represent >the kinds of students for whom interpretations must be made?
>
>3.	Recency.  Were the normative data gathered in the last few years or is >the information out-of-date because it was collected too long ago?
>
>4.	Description of procedures.  Are the procedures associated with the >gathering of the normative data sufficiently well described so that those >procedures can be properly evaluated?
>
>In addition to the criteria above, “key standards should be considered from >the Standards for Educational and Psychological Testing established by the >American Educational Research Association, the American Psychological >Association, and the National Council on Measurement ” (Rudner, 2000, v-13). >  These seem to have a broader range and appear to cover both norm and >criterion referenced tests.
>
>Assessment Standards for Selection of a Test
>
>Test Coverage and Usage - There must be a clear statement of recommended >uses and a description of the population for which the test is intended.  >The use intended by the test developer must be justified by the publisher on >technical grounds.
>
>Appropriate Samples for Test Validation and Norming - The samples used for >test validation and norming must be of adequate size and representative of >the group for which the test is intended in terms of age, experience, and >background.
>
>Reliability - Test publishers should be able to demonstrate that thetest is >sufficiently reliable to permit stable estimates of individual ability.
>
>Predictive Validity - Evidence of the predictive validity of the test must >include a comparison of performance on the test being validated against >performance on some outside criteria such as course grades, class rank, >other tests, teacher ratings, or other related criteria.
>
>Content Validity - Content validity can be evaluated by examining the >planand procedures reportedly used in the construction of the test.
>
>Construct Validity - Test publishers are in a position to demonstrate that >the test adequately measures a particular construct.
>
>Test Administration - All test administration specifications, such as >instructions to test takers, time limits, use of reference materials, use of >calculators, lighting, equipment, assigning seats, monitoring, room >requirements, testing sequence, and time of day, should be fully described.
>
>Test Reporting - Test publishers are responsible for fully describing the >methods used to report test results, including scaled scores, subtest >results and combined test results.
>
>Test and Item Bias - Test developers are expected to exhibit a sensitivity >to the demographic characteristics of test takers, and steps should be taken >during test development, validation, standardization, and documentation to >minimize the influence of cultural factors on individual test scores.
>
>So, what variable above are more important and for situation?  Are there any >books or reference out their to answer this question or can some one share >their experience on this one?  Also, more specifically, what kind of test is >appropriate for a placement instrument in an adult basic education program >with many levels.  Can both criterion-referenced and norm-referenced test >both do the job?  If anyone can give me some insight into how to best >evaluate a test for broad use in our school for placement purposes please, >please step forward.
>
>John Makay
>Literacy Instructor
>Baltimore City Communtiy College
>
>REFERENCES
>
>Mueller, R. O. and Freitag, P. K. (n.d.) Comprehensive Adult Student >Assessment System [Review of the CASAS test]. Mental Measurements Yearbook >(13th ed.). Lincoln:  University of Nebraska Press.
>
>Popham, W. J.  (2000). Modern educational measurement. Practical guidelines >for the education leader. Needham: Allyn & Bacon.
>
>Rudner, L. E. (2000). Assessing Student Learning. Newark: Delaware Education >Research and Development Center. (Originally from ERIC ERIC/ AE 12/ 93)
>
>
>
>
>
>_________________________________________________________________
>Join the world’s largest e-mail service with MSN Hotmail. >http://www.hotmail.com
>
>
>RFC822 header
>-----------------------------------
>
> Return-Path: <nifl-assessment@literacy.nifl.gov>
> Received: from literacy.nifl.gov ([192.188.111.2])
> 	by mustang.oldcity.dca.net (8.11.6/8.9.3/DCANET) with ESMTP id g6GIDvC12513
> 	for <sabatini@literacy.upenn.edu>; Tue, 16 Jul 2002 14:13:57 -0400
> Received: from literacy (localhost [127.0.0.1])
> 	by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id g6GIDtX07813;
> 	Tue, 16 Jul 2002 14:13:55 -0400 (EDT)
> Date: Tue, 16 Jul 2002 14:13:55 -0400 (EDT)
> Message-Id: <F9ywJxrUxwhbuKzzp8q00015010@hotmail.com>
> Errors-To: listowner@nifl.gov
> Reply-To: nifl-assessment@nifl.gov
> Originator: nifl-assessment@literacy.nifl.gov
> Sender: nifl-assessment@nifl.gov
> Precedence: bulk
> From: "John Makay" <makay00@hotmail.com>
> To: Multiple recipients of list <nifl-assessment@literacy.nifl.gov>
> Subject: [NIFL-ASSESSMENT:152] RE: norm vs criterion
> X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
> Content-Type: text/plain; format=flowed
> Mime-Version: 1.0
>



This archive was generated by hypermail 2b30 : Fri Jan 17 2003 - 14:46:25 EST