[NIFL-ASSESSMENT:156] RE: norm vs criterion

From: John Sabatini (sabatini@literacy.upenn.edu)
Date: Thu Jul 18 2002 - 09:29:23 EDT


Return-Path: <nifl-assessment@literacy.nifl.gov>
Received: from literacy (localhost [127.0.0.1]) by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id g6IDTNX03141; Thu, 18 Jul 2002 09:29:23 -0400 (EDT)
Date: Thu, 18 Jul 2002 09:29:23 -0400 (EDT)
Message-Id: <200207180301.g6I31dK01573@charger.oldcity.dca.net>
Errors-To: listowner@literacy.nifl.gov
Reply-To: nifl-assessment@literacy.nifl.gov
Originator: nifl-assessment@literacy.nifl.gov
Sender: nifl-assessment@literacy.nifl.gov
Precedence: bulk
From: John Sabatini <sabatini@literacy.upenn.edu>
To: Multiple recipients of list <nifl-assessment@literacy.nifl.gov>
Subject: [NIFL-ASSESSMENT:156] RE: norm vs criterion
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="iso-8859-1"
X-Mailer: QuickMail Pro 1.5.4r2 (Mac)
Status: O
Content-Length: 11846
Lines: 132

         Reply to:   RE: [NIFL-ASSESSMENT:155] RE: norm vs criterion
Ok, these questions are getting tougher every round.  I think I can give this one a shot as well.  
Both norm and criteria referenced tests are essentially developed the same way.  Here's how.

Tests are typically developed based on a 'construct', which may be a theory of how the knowledge, skills, and abilities are organized in the person's head, but more often is operationalized as a  matrix of skills, knowledge and abilities.  A typical basic skills reading matrix may have skills as one dimension (literal meaning, inferences, main idea, drawing conclusions) and the other types of print genres (fiction texts, poetry, nonfiction, etc.).  The 'construct' of the CASAS is based on identifying proficiencies in doing real world tasks such as 'Complete a personal information form'  or 'Identify product containers and interpret weight and volume'. 
The matrix represents the universe of skills. Test development then consists of taking a sample of items from this universe that covers most or all cells in the matrix, and then writing test items that address each proficiency or skill-knowledge cell one at a time. 
The tricky part is that you cannot predict difficulty of an item strictly from the proficiency or matrix cell.  You might think that literal questions are easier than inferential, but that's only true sometimes and you can manipulate items to make them harder or easier.  Same for proficiences. Pick a tricky or complex info form and it's harder than an user-friendly form.

So you just mix items that are a mix of difficulties, with more items that most people in the middle ability ranges will find challenging, but not too easy or too hard. If all the items were too easy or too hard, then everybody get's the same score and nobody buys the test.  Then you put in a few easy items and a few hard items for those of high or low ability (you have some knowledge of who might take your tests). The consequence is a distribution that can 'discriminate' people in the middle, though less so on either end (where there are fewer items). Of course, there are less people in the 'tails' so you need fewer items there to get some spread or variance or different scores between individuals.  
In principle, a criteria referenced test wouldn't care whether everybody got all the items right for a proficiency.  If a proficiency was easy, so be it, then everybody get's it right. Just put as many items as you need to be sure the person really has the skills or knowledge you care about.  In practice, no one buys a test in which you can't tell the difference between individuals, so they make sure items come in a range of difficulties anyway. 
As a consequence, criteria vs. Norm-reference in the tests we are talking about come after the test development and has more to do with interpretation.  Scoring high on a criteria reference test suggests you have the proficiency, i.e., reached criteria and therefore can 'complete a personal information form'.  This is most assuredly an overstatement of what the test has told us about any given individuals ability.  What we know is better stated as 'that there is a high probability that this individual can complete the average personal information forms of the difficulty we had on this test'.  
  Norm reference merely states your relative standing without committing to a claim about what makes a proficient reader or what constitutes proficiency in math.  In most cases,  we can make the same kind of statement, that is, a person scoring in the top 20% is 'highly likely to be able to comprehend texts of the average difficulty of the texts we included on this test'. 
Another longwinded explanation.  I hope helpful.

Best,

John     
 


jsabatini@ets.org   or  sabatini@literacy.upenn.edu

Ira Yankwitt wrote:
>I'm finding this discussion quite helpful.  Thank you all.
>
>I feel like I have a good sense of the difference between norm-referenced
>and criterion-referenced tests in terms of how they differ in use for
>evaluation purposes.  My question is how they differ in the way they are
>created.  For example, my understanding is that some norm-referenced tests
>are designed (and sampled) to ensure that performance falls on a bell
>curve.  Is this true for all norm-referenced tests?  Is it true for some or
>all criterion-referenced tests?  Can someone shed some light on the test
>creation process.  Thanks.
>
>At 02:13 PM 7/16/02 -0400, John Makay wrote:
>>Thanks for the clarification on norm and criterion referenced tests.  >>Although, my concern is selecting a test for my school to do placement.  I >>have some questions in this area.  Is the CASAS, which is a criterion >>referenced test, appropriate to place students in different levels of a >>program if the curriculum of the program is not predominantly a CASAS-based >>curriculum?
>>
>>Also, in evaluating a test to see which test has the best psychometric >>variables, how important is the population (sampling) factor if the test is >>a criterion-referenced test such as the CASAS.  We have been been >>considering giving the Diagnostic Assessment of Reading (DAR) to place >>students in the varying levels of our Pre-GED program, but discovered that >>its was intended for a K-12 population.  What are the issues in using a test >>like this for an adult population?
>>
>>Also,  what are the strongest variables in assessing a test?  In other >>words, are some variable not as important as others.  Are some of the >>test-quality criteria more important than other criteria?  For example, >>several reviewers in the Buros Mental Measurements Yearbook point out that >>many test manuals are lax in providing information in the area of content >>validity; therefore, giving great emphasis to content validity when judging >>tests may not be the best way to judge them since this information is rarely >>available.
>>
>>I have found a few sources that identify criteria for test evaluation.
>>
>>According to Popham (2000, p. 195-196), the following factors should be >>considered when reviewing a set of comparative data for norm-referenced >>tests.
>>
>>1.	Sample size.  Is the sample in the norm group large enough to assure a >>reasonable degree of stability in the database from which educators must >>draw interpretations?
>>
>>2.	Representative ness.  Is the sample drawn in such a way as to represent >>the kinds of students for whom interpretations must be made?
>>
>>3.	Recency.  Were the normative data gathered in the last few years or is >>the information out-of-date because it was collected too long ago?
>>
>>4.	Description of procedures.  Are the procedures associated with the >>gathering of the normative data sufficiently well described so that those >>procedures can be properly evaluated?
>>
>>In addition to the criteria above, “key standards should be considered from >>the Standards for Educational and Psychological Testing established by the >>American Educational Research Association, the American Psychological >>Association, and the National Council on Measurement ” (Rudner, 2000, v-13). >>  These seem to have a broader range and appear to cover both norm and >>criterion referenced tests.
>>
>>Assessment Standards for Selection of a Test
>>
>>Test Coverage and Usage - There must be a clear statement of recommended >>uses and a description of the population for which the test is intended.  >>The use intended by the test developer must be justified by the publisher on >>technical grounds.
>>
>>Appropriate Samples for Test Validation and Norming - The samples used for >>test validation and norming must be of adequate size and representative of >>the group for which the test is intended in terms of age, experience, and >>background.
>>
>>Reliability - Test publishers should be able to demonstrate that thetest is >>sufficiently reliable to permit stable estimates of individual ability.
>>
>>Predictive Validity - Evidence of the predictive validity of the test must >>include a comparison of performance on the test being validated against >>performance on some outside criteria such as course grades, class rank, >>other tests, teacher ratings, or other related criteria.
>>
>>Content Validity - Content validity can be evaluated by examining the >>planand procedures reportedly used in the construction of the test.
>>
>>Construct Validity - Test publishers are in a position to demonstrate that >>the test adequately measures a particular construct.
>>
>>Test Administration - All test administration specifications, such as >>instructions to test takers, time limits, use of reference materials, use of >>calculators, lighting, equipment, assigning seats, monitoring, room >>requirements, testing sequence, and time of day, should be fully described.
>>
>>Test Reporting - Test publishers are responsible for fully describing the >>methods used to report test results, including scaled scores, subtest >>results and combined test results.
>>
>>Test and Item Bias - Test developers are expected to exhibit a sensitivity >>to the demographic characteristics of test takers, and steps should be taken >>during test development, validation, standardization, and documentation to >>minimize the influence of cultural factors on individual test scores.
>>
>>So, what variable above are more important and for situation?  Are there any >>books or reference out their to answer this question or can some one share >>their experience on this one?  Also, more specifically, what kind of test is >>appropriate for a placement instrument in an adult basic education program >>with many levels.  Can both criterion-referenced and norm-referenced test >>both do the job?  If anyone can give me some insight into how to best >>evaluate a test for broad use in our school for placement purposes please, >>please step forward.
>>
>>John Makay
>>Literacy Instructor
>>Baltimore City Communtiy College
>>
>>REFERENCES
>>
>>Mueller, R. O. and Freitag, P. K. (n.d.) Comprehensive Adult Student >>Assessment System [Review of the CASAS test]. Mental Measurements Yearbook >>(13th ed.). Lincoln:  University of Nebraska Press.
>>
>>Popham, W. J.  (2000). Modern educational measurement. Practical guidelines >>for the education leader. Needham: Allyn & Bacon.
>>
>>Rudner, L. E. (2000). Assessing Student Learning. Newark: Delaware Education >>Research and Development Center. (Originally from ERIC ERIC/ AE 12/ 93)
>>
>>
>>
>>
>>
>>_________________________________________________________________
>>Join the world’s largest e-mail service with MSN Hotmail. >>http://www.hotmail.com
>>
>>
>
>
>Ira Yankwitt
>Director of Adult Literacy Services
>Literacy Assistance Center
>32 Broadway, 10th Floor
>NY, NY 10004
>(212) 803-3356
>
>
>RFC822 header
>-----------------------------------
>
> Return-Path: <nifl-assessment@literacy.nifl.gov>
> Received: from literacy.nifl.gov ([192.188.111.2])
> 	by mustang.oldcity.dca.net (8.11.6/8.9.3/DCANET) with ESMTP id g6HGx3C16928;
> 	Wed, 17 Jul 2002 12:59:03 -0400
> Received: from literacy (localhost [127.0.0.1])
> 	by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id g6HD90X04040;
> 	Wed, 17 Jul 2002 09:09:00 -0400 (EDT)
> Date: Wed, 17 Jul 2002 09:09:00 -0400 (EDT)
> Message-Id: <3.0.6.32.20020717085609.0092c5b0@mail.psnyc.com>
> Errors-To: listowner@nifl.gov
> Reply-To: nifl-assessment@nifl.gov
> Originator: nifl-assessment@literacy.nifl.gov
> Sender: nifl-assessment@nifl.gov
> Precedence: bulk
> From: Ira Yankwitt <iray@lacnyc.org>
> To: Multiple recipients of list <nifl-assessment@literacy.nifl.gov>
> Subject: [NIFL-ASSESSMENT:155] RE: norm vs criterion
> X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
> Content-Type: text/plain; charset="us-ascii"
> Mime-Version: 1.0
> X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.6 (32)
>



This archive was generated by hypermail 2b30 : Fri Jan 17 2003 - 14:46:25 EST