Return-Path: <nifl-assessment@literacy.nifl.gov> Received: from literacy (localhost [127.0.0.1]) by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id i47Netm18508; Fri, 7 May 2004 19:40:56 -0400 (EDT) Date: Fri, 7 May 2004 19:40:56 -0400 (EDT) Message-Id: <SEA2-F33YmKgWAFZMUy00012651@hotmail.com> Errors-To: listowner@literacy.nifl.gov Reply-To: nifl-assessment@literacy.nifl.gov Originator: nifl-assessment@literacy.nifl.gov Sender: nifl-assessment@literacy.nifl.gov Precedence: bulk From: "Eileen Eckert" <eileeneckert@hotmail.com> To: Multiple recipients of list <nifl-assessment@literacy.nifl.gov> Subject: [NIFL-ASSESSMENT:536] RE: To Standardize or Not To Standardize X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas Content-Type: text/plain; format=flowed Status: O Content-Length: 4629 Lines: 102 Hi Marie and others, I'd like to pick up on one of the points Marie summarized: "You can standardize any type of assessment." If I seem to contradict things I've said before, then consider it a result of my own learning from this discussion. I don't know if you can standardize any type of assessment, and I'm not sure I want to think about that question long enough to answer it, but one point I'd like to consider is that if you're going to standardize performance assessment, it's crucial to choose the "right" things to standardize to get the quality of results you need. I'm substituting the word quality because I think that's what V & R (or trustworthiness in the framework I propose) is supposed to ensure. Standardizing conditions and tasks increases the likelihood that the results you get can be attributed to the instruction and learning, and can be compared from person to person, or from class to class, or program to program, depending on the extent of the assessment. If conditions are standardized, you're trying to hold constant as much as possible so that the student performance on the task can be attributed to what they've learned instead of to differences in conditions (e.g., people given more time, or a quieter environment, do better than those with less time or a noisier environment). Standardizing the assessment criteria (e.g., performance indicators or scoring rubric) increases the likelihood that the results mean the same thing from person to person, class to class, etc. You could still have different raters using the criteria differently, or one rater using the criteria differently as s/he scores more student work, gets tired, etc., but that's why you re-norm the scoring periodically. I've avoided the words so far, but basically I think standardizing conditions mainly addresses reliability, while standardizing the criteria mainly addresses validity. Correct me if I'm wrong (if you don't use the terms and concepts all the time, it gets fuzzy fast). If you decide that certain characteristics of performance are important, then those become the criteria against which you can judge lots of different examples of work. For example, if my faculty group thinks it is important that students write with an awareness of the intended audience, then we can make that a criterion, describe it at different levels, and use it to evaluate performance on emails, postings to the discussion board of the online component of our classes, homework, essays, a copy of the note the student wrote to her child's teacher, etc., etc. We can use real-life writing to assess how well the student meets this criterion, and we don't have to standardize conditions to do so. I guess the question is, what information do we need to get from the assessments we use? Eileen From: "Marie Cora" <mariecora@hotmail.com> Reply-To: nifl-assessment@nifl.gov To: Multiple recipients of list <nifl-assessment@literacy.nifl.gov> Subject: [NIFL-ASSESSMENT:535] To Standardize or Not To Standardize Date: Thu, 6 May 2004 11:13:55 -0400 (EDT) Hi everyone, I want to get us back to this valuable conversation regarding V&R. Briefly, for the past couple of weeks, I note that folks have discussed: • Some definitions of validity and reliability and described how they are used in some contexts; • I noted that a few folks feel we need some universally understood terms in order to help us all learn and conduct assessment correctly and similarly; • That if not understood well, V&R can be a principle problem in terms of test misuse or test invalidation; • That perhaps there should be some work on alternatives to V&R To try and pick up where we left off, I have some statements to post here (maybe true, maybe false) that should generate some further discussion in this area: 4. There are presently no commercial standardized performance assessments appropriate for the ABE/ESOL system. 5. Performance assessment (known as “constructed response”) is always preferable to a multiple choice test format (known as “selected response”). What do you think about this? Please feel free to comment on my summary of what we’ve been discussing this past couple weeks, or, of course, any of the 5 statements that I’ve posted for your thoughts. And certainly, if you've got questions on top of all this, we want to hear them! marie cora NIFL Assessment List Moderator _________________________________________________________________ Getting married? Find tips, tools and the latest trends at MSN Life Events. http://lifeevents.msn.com/category.aspx?cid=married
This archive was generated by hypermail 2b30 : Thu Dec 23 2004 - 09:46:15 EST