[NIFL-ASSESSMENT:161] RE: norm vs criterion

From: Dianna Baycich (dbaycich@archon.educ.kent.edu)
Date: Wed Jul 24 2002 - 16:21:11 EDT


Return-Path: <nifl-assessment@literacy.nifl.gov>
Received: from literacy (localhost [127.0.0.1]) by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id g6OKL7X04626; Wed, 24 Jul 2002 16:21:11 -0400 (EDT)
Date: Wed, 24 Jul 2002 16:21:11 -0400 (EDT)
Message-Id: <NGBBJDHKPMGKJEBFMDKOAEJKCDAA.dbaycich@literacy.kent.edu>
Errors-To: listowner@literacy.nifl.gov
Reply-To: nifl-assessment@literacy.nifl.gov
Originator: nifl-assessment@literacy.nifl.gov
Sender: nifl-assessment@literacy.nifl.gov
Precedence: bulk
From: "Dianna Baycich" <dbaycich@archon.educ.kent.edu>
To: Multiple recipients of list <nifl-assessment@literacy.nifl.gov>
Subject: [NIFL-ASSESSMENT:161] RE: norm vs criterion
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
Content-Transfer-Encoding: 8bit
Content-Type: text/plain;
Status: O
Content-Length: 12562
Lines: 257

So, here's my understanding of what standardized tests tell us: how well a
student performed on the test items but not necessarily how well the student
will perform on a real life task. To me this means that portfolio assessment
makes more sense than standardized testing because you can see how well a
student performs real life tasks. But that doesn't solve the problem of the
best way to place a student if we want to sort them by level. So it seems
there is still an important role for standardized testing to play, yes?

-----Original Message-----
From: John Sabatini [mailto:sabatini@literacy.upenn.edu]
Sent: Thursday, July 18, 2002 9:29 AM
To: Multiple recipients of list
Subject: [NIFL-ASSESSMENT:156] RE: norm vs criterion


         Reply to:   RE: [NIFL-ASSESSMENT:155] RE: norm vs criterion
Ok, these questions are getting tougher every round.  I think I can give
this one a shot as well.
Both norm and criteria referenced tests are essentially developed the same
way.  Here's how.

Tests are typically developed based on a 'construct', which may be a theory
of how the knowledge, skills, and abilities are organized in the person's
head, but more often is operationalized as a  matrix of skills, knowledge
and abilities.  A typical basic skills reading matrix may have skills as one
dimension (literal meaning, inferences, main idea, drawing conclusions) and
the other types of print genres (fiction texts, poetry, nonfiction, etc.).
The 'construct' of the CASAS is based on identifying proficiencies in doing
real world tasks such as 'Complete a personal information form'  or
'Identify product containers and interpret weight and volume'.
The matrix represents the universe of skills. Test development then consists
of taking a sample of items from this universe that covers most or all cells
in the matrix, and then writing test items that address each proficiency or
skill-knowledge cell one at a time.
The tricky part is that you cannot predict difficulty of an item strictly
from the proficiency or matrix cell.  You might think that literal questions
are easier than inferential, but that's only true sometimes and you can
manipulate items to make them harder or easier.  Same for proficiences. Pick
a tricky or complex info form and it's harder than an user-friendly form.

So you just mix items that are a mix of difficulties, with more items that
most people in the middle ability ranges will find challenging, but not too
easy or too hard. If all the items were too easy or too hard, then everybody
get's the same score and nobody buys the test.  Then you put in a few easy
items and a few hard items for those of high or low ability (you have some
knowledge of who might take your tests). The consequence is a distribution
that can 'discriminate' people in the middle, though less so on either end
(where there are fewer items). Of course, there are less people in the
'tails' so you need fewer items there to get some spread or variance or
different scores between individuals.
In principle, a criteria referenced test wouldn't care whether everybody got
all the items right for a proficiency.  If a proficiency was easy, so be it,
then everybody get's it right. Just put as many items as you need to be sure
the person really has the skills or knowledge you care about.  In practice,
no one buys a test in which you can't tell the difference between
individuals, so they make sure items come in a range of difficulties anyway.
As a consequence, criteria vs. Norm-reference in the tests we are talking
about come after the test development and has more to do with
interpretation.  Scoring high on a criteria reference test suggests you have
the proficiency, i.e., reached criteria and therefore can 'complete a
personal information form'.  This is most assuredly an overstatement of what
the test has told us about any given individuals ability.  What we know is
better stated as 'that there is a high probability that this individual can
complete the average personal information forms of the difficulty we had on
this test'.
  Norm reference merely states your relative standing without committing to
a claim about what makes a proficient reader or what constitutes proficiency
in math.  In most cases,  we can make the same kind of statement, that is, a
person scoring in the top 20% is 'highly likely to be able to comprehend
texts of the average difficulty of the texts we included on this test'.
Another longwinded explanation.  I hope helpful.

Best,

John



jsabatini@ets.org   or  sabatini@literacy.upenn.edu

Ira Yankwitt wrote:
>I'm finding this discussion quite helpful.  Thank you all.
>
>I feel like I have a good sense of the difference between norm-referenced
>and criterion-referenced tests in terms of how they differ in use for
>evaluation purposes.  My question is how they differ in the way they are
>created.  For example, my understanding is that some norm-referenced tests
>are designed (and sampled) to ensure that performance falls on a bell
>curve.  Is this true for all norm-referenced tests?  Is it true for some or
>all criterion-referenced tests?  Can someone shed some light on the test
>creation process.  Thanks.
>
>At 02:13 PM 7/16/02 -0400, John Makay wrote:
>>Thanks for the clarification on norm and criterion referenced tests.
>>Although, my concern is selecting a test for my school to do placement.  I
>>have some questions in this area.  Is the CASAS, which is a criterion
>>referenced test, appropriate to place students in different levels of a
>>program if the curriculum of the program is not predominantly a
CASAS-based >>curriculum?
>>
>>Also, in evaluating a test to see which test has the best psychometric
>>variables, how important is the population (sampling) factor if the test
is >>a criterion-referenced test such as the CASAS.  We have been been
>>considering giving the Diagnostic Assessment of Reading (DAR) to place
>>students in the varying levels of our Pre-GED program, but discovered that
>>its was intended for a K-12 population.  What are the issues in using a
test >>like this for an adult population?
>>
>>Also,  what are the strongest variables in assessing a test?  In other
>>words, are some variable not as important as others.  Are some of the
>>test-quality criteria more important than other criteria?  For example,
>>several reviewers in the Buros Mental Measurements Yearbook point out that
>>many test manuals are lax in providing information in the area of content
>>validity; therefore, giving great emphasis to content validity when
judging >>tests may not be the best way to judge them since this information
is rarely >>available.
>>
>>I have found a few sources that identify criteria for test evaluation.
>>
>>According to Popham (2000, p. 195-196), the following factors should be
>>considered when reviewing a set of comparative data for norm-referenced
>>tests.
>>
>>1.	Sample size.  Is the sample in the norm group large enough to assure a
>>reasonable degree of stability in the database from which educators must
>>draw interpretations?
>>
>>2.	Representative ness.  Is the sample drawn in such a way as to represent
>>the kinds of students for whom interpretations must be made?
>>
>>3.	Recency.  Were the normative data gathered in the last few years or is
>>the information out-of-date because it was collected too long ago?
>>
>>4.	Description of procedures.  Are the procedures associated with the
>>gathering of the normative data sufficiently well described so that those
>>procedures can be properly evaluated?
>>
>>In addition to the criteria above, “key standards should be considered
from >>the Standards for Educational and Psychological Testing established
by the >>American Educational Research Association, the American
Psychological >>Association, and the National Council on Measurement ”
(Rudner, 2000, v-13). >>  These seem to have a broader range and appear to
cover both norm and >>criterion referenced tests.
>>
>>Assessment Standards for Selection of a Test
>>
>>Test Coverage and Usage - There must be a clear statement of recommended
>>uses and a description of the population for which the test is intended.
>>The use intended by the test developer must be justified by the publisher
on >>technical grounds.
>>
>>Appropriate Samples for Test Validation and Norming - The samples used for
>>test validation and norming must be of adequate size and representative of
>>the group for which the test is intended in terms of age, experience, and
>>background.
>>
>>Reliability - Test publishers should be able to demonstrate that thetest
is >>sufficiently reliable to permit stable estimates of individual ability.
>>
>>Predictive Validity - Evidence of the predictive validity of the test must
>>include a comparison of performance on the test being validated against
>>performance on some outside criteria such as course grades, class rank,
>>other tests, teacher ratings, or other related criteria.
>>
>>Content Validity - Content validity can be evaluated by examining the
>>planand procedures reportedly used in the construction of the test.
>>
>>Construct Validity - Test publishers are in a position to demonstrate that
>>the test adequately measures a particular construct.
>>
>>Test Administration - All test administration specifications, such as
>>instructions to test takers, time limits, use of reference materials, use
of >>calculators, lighting, equipment, assigning seats, monitoring, room
>>requirements, testing sequence, and time of day, should be fully
described.
>>
>>Test Reporting - Test publishers are responsible for fully describing the
>>methods used to report test results, including scaled scores, subtest
>>results and combined test results.
>>
>>Test and Item Bias - Test developers are expected to exhibit a sensitivity
>>to the demographic characteristics of test takers, and steps should be
taken >>during test development, validation, standardization, and
documentation to >>minimize the influence of cultural factors on individual
test scores.
>>
>>So, what variable above are more important and for situation?  Are there
any >>books or reference out their to answer this question or can some one
share >>their experience on this one?  Also, more specifically, what kind of
test is >>appropriate for a placement instrument in an adult basic education
program >>with many levels.  Can both criterion-referenced and
norm-referenced test >>both do the job?  If anyone can give me some insight
into how to best >>evaluate a test for broad use in our school for placement
purposes please, >>please step forward.
>>
>>John Makay
>>Literacy Instructor
>>Baltimore City Communtiy College
>>
>>REFERENCES
>>
>>Mueller, R. O. and Freitag, P. K. (n.d.) Comprehensive Adult Student
>>Assessment System [Review of the CASAS test]. Mental Measurements Yearbook
>>(13th ed.). Lincoln:  University of Nebraska Press.
>>
>>Popham, W. J.  (2000). Modern educational measurement. Practical
guidelines >>for the education leader. Needham: Allyn & Bacon.
>>
>>Rudner, L. E. (2000). Assessing Student Learning. Newark: Delaware
Education >>Research and Development Center. (Originally from ERIC ERIC/ AE
12/ 93)
>>
>>
>>
>>
>>
>>_________________________________________________________________
>>Join the world’s largest e-mail service with MSN Hotmail.
>>http://www.hotmail.com
>>
>>
>
>
>Ira Yankwitt
>Director of Adult Literacy Services
>Literacy Assistance Center
>32 Broadway, 10th Floor
>NY, NY 10004
>(212) 803-3356
>
>
>RFC822 header
>-----------------------------------
>
> Return-Path: <nifl-assessment@literacy.nifl.gov>
> Received: from literacy.nifl.gov ([192.188.111.2])
> 	by mustang.oldcity.dca.net (8.11.6/8.9.3/DCANET) with ESMTP id
g6HGx3C16928;
> 	Wed, 17 Jul 2002 12:59:03 -0400
> Received: from literacy (localhost [127.0.0.1])
> 	by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id g6HD90X04040;
> 	Wed, 17 Jul 2002 09:09:00 -0400 (EDT)
> Date: Wed, 17 Jul 2002 09:09:00 -0400 (EDT)
> Message-Id: <3.0.6.32.20020717085609.0092c5b0@mail.psnyc.com>
> Errors-To: listowner@nifl.gov
> Reply-To: nifl-assessment@nifl.gov
> Originator: nifl-assessment@literacy.nifl.gov
> Sender: nifl-assessment@nifl.gov
> Precedence: bulk
> From: Ira Yankwitt <iray@lacnyc.org>
> To: Multiple recipients of list <nifl-assessment@literacy.nifl.gov>
> Subject: [NIFL-ASSESSMENT:155] RE: norm vs criterion
> X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
> Content-Type: text/plain; charset="us-ascii"
> Mime-Version: 1.0
> X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.6 (32)
>



This archive was generated by hypermail 2b30 : Fri Jan 17 2003 - 14:46:25 EST