[NIFL-ASSESSMENT:536] RE: To Standardize or Not To Standardize

From: Eileen Eckert (eileeneckert@hotmail.com)
Date: Fri May 07 2004 - 19:40:56 EDT


Return-Path: <nifl-assessment@literacy.nifl.gov>
Received: from literacy (localhost [127.0.0.1]) by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id i47Netm18508; Fri, 7 May 2004 19:40:56 -0400 (EDT)
Date: Fri, 7 May 2004 19:40:56 -0400 (EDT)
Message-Id: <SEA2-F33YmKgWAFZMUy00012651@hotmail.com>
Errors-To: listowner@literacy.nifl.gov
Reply-To: nifl-assessment@literacy.nifl.gov
Originator: nifl-assessment@literacy.nifl.gov
Sender: nifl-assessment@literacy.nifl.gov
Precedence: bulk
From: "Eileen Eckert" <eileeneckert@hotmail.com>
To: Multiple recipients of list <nifl-assessment@literacy.nifl.gov>
Subject: [NIFL-ASSESSMENT:536] RE: To Standardize or Not To Standardize
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Type: text/plain; format=flowed
Status: O
Content-Length: 4629
Lines: 102

Hi Marie and others,
I'd like to pick up on one of the points Marie summarized:

"You can standardize any type of assessment."

If I seem to contradict things I've said before, then consider it a result 
of my own learning from this discussion.

I don't know if you can standardize any type of assessment, and I'm not sure 
I want to think about that question long enough to answer it, but one point 
I'd like to consider is that if you're going to standardize performance 
assessment, it's crucial to choose the "right" things to standardize to get 
the quality of results you need. I'm substituting the word quality because I 
think that's what V & R (or trustworthiness in the framework I propose) is 
supposed to ensure.

Standardizing conditions and tasks increases the likelihood that the results 
you get can be attributed to the instruction and learning, and can be 
compared from person to person, or from class to class, or program to 
program, depending on the extent of the assessment. If conditions are 
standardized, you're trying to hold constant as much as possible so that the 
student performance on the task can be attributed to what they've learned 
instead of to differences in conditions (e.g., people given more time, or a 
quieter environment, do better than those with less time or a noisier 
environment).

Standardizing the assessment criteria (e.g., performance indicators or 
scoring rubric) increases the likelihood that the results mean the same 
thing from person to person, class to class, etc. You could still have 
different raters using the criteria differently, or one rater using the 
criteria differently as s/he scores more student work, gets tired, etc., but 
that's why you re-norm the scoring periodically.

I've avoided the words so far, but basically I think standardizing 
conditions mainly addresses reliability, while standardizing the criteria 
mainly addresses validity. Correct me if I'm wrong (if you don't use the 
terms and concepts all the time, it gets fuzzy fast).

If you decide that certain characteristics of performance are important, 
then those become the criteria against which you can judge lots of different 
examples of work. For example, if my faculty group thinks it is important 
that students write with an awareness of the intended audience, then we can 
make that a criterion, describe it at different levels, and use it to 
evaluate performance on emails, postings to the discussion board of the 
online component of our classes, homework, essays, a copy of the note the 
student wrote to her child's teacher, etc., etc. We can use real-life 
writing to assess how well the student meets this criterion, and we don't 
have to standardize conditions to do so.

I guess the question is, what information do we need to get from the 
assessments we use?

Eileen


From: "Marie Cora" <mariecora@hotmail.com>
Reply-To: nifl-assessment@nifl.gov
To: Multiple recipients of list <nifl-assessment@literacy.nifl.gov>
Subject: [NIFL-ASSESSMENT:535] To Standardize or Not To Standardize
Date: Thu, 6 May 2004 11:13:55 -0400 (EDT)

Hi everyone,

I want to get us back to this valuable conversation regarding V&R. Briefly, 
for the past couple of weeks, I note that folks have discussed:

•	Some definitions of validity and reliability and described how they are 
used in some contexts;

•	I noted that a few folks feel we need some universally understood terms in 
order to help us all learn and conduct assessment correctly and similarly;

•	That if not understood well, V&R can be a principle problem in terms of 
test misuse or test invalidation;

•	That perhaps there should be some work on alternatives to V&R

To try and pick up where we left off, I have some statements to post here 
(maybe true, maybe false) that should generate some further discussion in 
this area:



4.	There are presently no commercial standardized performance assessments 
appropriate for the ABE/ESOL system.

5.	Performance assessment (known as “constructed response”) is always 
preferable to a multiple choice test format (known as “selected response”).

What do you think about this?  Please feel free to comment on my summary of 
what we’ve been discussing this past couple weeks, or, of course, any of the 
5 statements that I’ve posted for your thoughts.  And certainly, if you've 
got questions on top of all this, we want to hear them!

marie cora
NIFL Assessment List Moderator

_________________________________________________________________
Getting married? Find tips, tools and the latest trends at MSN Life Events. 
http://lifeevents.msn.com/category.aspx?cid=married



This archive was generated by hypermail 2b30 : Thu Dec 23 2004 - 09:46:15 EST