Return-Path: <nifl-assessment@literacy.nifl.gov> Received: from literacy (localhost [127.0.0.1]) by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id i3AI3Hm24598; Sat, 10 Apr 2004 14:03:18 -0400 (EDT) Date: Sat, 10 Apr 2004 14:03:18 -0400 (EDT) Message-Id: <Sea2-F61nHs0vbtqFZf00013308@hotmail.com> Errors-To: listowner@literacy.nifl.gov Reply-To: nifl-assessment@literacy.nifl.gov Originator: nifl-assessment@literacy.nifl.gov Sender: nifl-assessment@literacy.nifl.gov Precedence: bulk From: "Eileen Eckert" <eileeneckert@hotmail.com> To: Multiple recipients of list <nifl-assessment@literacy.nifl.gov> Subject: [NIFL-ASSESSMENT:494] why "valid and reliable"? X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas Content-Type: text/plain; format=flowed Status: O Content-Length: 4468 Lines: 74 Hi Robert and David, You’ve both given definitions, but those don’t get to the underlying assumptions. And it looks to me like you’re both assuming that if I really understood validity and reliability, then their appropriateness would be self-evident. If that’s the case, could you try to set that assumption aside for a while if you continue reading? I do know about validity and reliability, and I still don’t think they are meaningful or useful criteria for judging assessment of adult learning. Robert, you said: A valid assessment tests what you want it to test: the specific skills and knowledge being taught. Underlying that definition of validity is the assumption that you can isolate knowledge and skills and test them apart from the context in which they are used, and that doing so will tell you something meaningful about what the student has learned. Robert gave the example, “If I teach a class in number skills to a group of limited English speakers and then assess their skills with a series of word problems that use vocabulary the students don't understand, then the assessment is not valid.” Using that example, in order to have valid test items, you’d have to either construct word problems using vocabulary the students know, or test the number skills without the words to confound the issue. If you construct word problems the students know, then you likely introduce problems of reliability—will students in every testing situation share that particular vocabulary so that the test functions the same from group to group? Then, to address that, you need to have standardized teaching—make sure the students in every test situation share the same vocabulary by teaching that same vocabulary. Or you could test the math skills without the words, but we like to use word problems because they address the concern that students don’t encounter numbers in isolation; they encounter them in context. But there’s the rub. They encounter numbers in different contexts, and transfer of skills from one to another is one of the most difficult issues in teaching and learning, so being able to choose the “right” answer on a test doesn’t necessarily mean they can use the skill to do something that matters to them personally. And being able to use the skill in a way that matters to them personally doesn’t necessarily mean they can de-contextualize that skill and re-contextualize it to pick the right answer on a test. So what does the test score mean? In order to get a valid assessment, you have to isolate the knowledge and/or skill you are trying to assess and remove other, confounding knowledge and skills (or else make sure you have taught all the knowledge and skills being tested by standardizing the teaching). This leads to narrowly focused test items (or performance tasks). In order to get a reliable assessment, you have to make sure the items are not open to different interpretations; they have to function the same way with every group of students tested. Essentially, you have to separate what’s been learned from the person (or people) who have learned it, and the more we know about how people learn, the less sense this makes. To do valid and reliable assessment, you’re always trying to make up for the fact that each person is a unique individual. The criteria of validity and reliability in educational assessment share their roots with the criteria of validity and reliability in social sciences research; they’re open to the same critiques and you can find those critiques in any number of books and articles on grounded theory, qualitative research, and naturalistic research. There are possible alternatives to this view of assessment, as there are to the positivist view of research. There are also all sorts of other issues we haven't talked about, like using a single measure to assess learning, as we do with mandates to use CASAS, TABE, or other standardized tests, but this message is probably too long already. I think that to choose validity and reliability as the standard we endorse, we have to look at the assumptions on which that standard is based, how well it matches and represents what we know about learning, how it works in practice, and alternatives to it. Eileen _________________________________________________________________ FREE pop-up blocking with the new MSN Toolbar – get it now! http://toolbar.msn.com/go/onm00200415ave/direct/01/
This archive was generated by hypermail 2b30 : Thu Dec 23 2004 - 09:46:14 EST