Return-Path: <nifl-assessment@literacy.nifl.gov> Received: from literacy (localhost [127.0.0.1]) by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id i3AJJgm25280; Sat, 10 Apr 2004 15:19:42 -0400 (EDT) Date: Sat, 10 Apr 2004 15:19:42 -0400 (EDT) Message-Id: <1083547A-8B20-11D8-9394-00039381D39E@theworld.com> Errors-To: listowner@literacy.nifl.gov Reply-To: nifl-assessment@literacy.nifl.gov Originator: nifl-assessment@literacy.nifl.gov Sender: nifl-assessment@literacy.nifl.gov Precedence: bulk From: David Rosen <DJRosen@TheWorld.com> To: Multiple recipients of list <nifl-assessment@literacy.nifl.gov> Subject: [NIFL-ASSESSMENT:495] Re: why "valid and reliable"? X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas Content-Transfer-Encoding: 8bit X-Mailer: Apple Mail (2.552) Content-Type: text/plain; charset=ISO-8859-1; format=flowed Status: O Content-Length: 5936 Lines: 107 Eileen and others, The definition I gave of validity, perhaps oversimplified, was "If an assessment is valid it measures what it says it measures, not something else." I did not assume anything about your understanding of validity. I see nothing in this definition which requires an "isolat(ion) of knowledge and skills and test(ing) them apart from the context in which they are used." The definition I gave says nothing about standardized testing, or even testing, which is only one kind of assessment. The definition I gave does not preclude direct, "authentic" assessment -- it's often what is needed to reach the gold standard in validity. I also said the problem is that holding this standard requires more resources than our field -- including programs -- often have. I might add that I am concerned that holding these standards for assessment, validity and reliability, but not significantly increasing resources results in sacrifice of instruction time, compromises the standards in practice, and drives teachers nuts. Raising standards, especially assessment standards, has a cost. Congress has raised the standards but so far has not paid for the cost. It's an unfunded mandate. David J. Rosen djrosen@comcast.net On Saturday, April 10, 2004, at 02:02 PM, Eileen Eckert wrote: > Hi Robert and David, > You’ve both given definitions, but those don’t get to the underlying > assumptions. And it looks to me like you’re both assuming that if I > really understood validity and reliability, then their appropriateness > would be self-evident. If that’s the case, could you try to set that > assumption aside for a while if you continue reading? I do know about > validity and reliability, and I still don’t think they are meaningful > or useful criteria for judging assessment of adult learning. > > Robert, you said: A valid assessment tests what you want it to test: > the specific skills and knowledge being taught. > > Underlying that definition of validity is the assumption that you can > isolate knowledge and skills and test them apart from the context in > which they are used, and that doing so will tell you something > meaningful about what the student has learned. Robert gave the > example, “If I teach a class in number skills to a group of limited > English speakers and then assess their skills with a series of word > problems that use vocabulary the students don't understand, then the > assessment is not valid.” > > Using that example, in order to have valid test items, you’d have to > either construct word problems using vocabulary the students know, or > test the number skills without the words to confound the issue. If you > construct word problems the students know, then you likely introduce > problems of reliability—will students in every testing situation share > that particular vocabulary so that the test functions the same from > group to group? Then, to address that, you need to have standardized > teaching—make sure the students in every test situation share the same > vocabulary by teaching that same vocabulary. Or you could test the > math skills without the words, but we like to use word problems > because they address the concern that students don’t encounter numbers > in isolation; they encounter them in context. > > But there’s the rub. They encounter numbers in different contexts, and > transfer of skills from one to another is one of the most difficult > issues in teaching and learning, so being able to choose the “right” > answer on a test doesn’t necessarily mean they can use the skill to do > something that matters to them personally. And being able to use the > skill in a way that matters to them personally doesn’t necessarily > mean they can de-contextualize that skill and re-contextualize it to > pick the right answer on a test. So what does the test score mean? > > In order to get a valid assessment, you have to isolate the knowledge > and/or skill you are trying to assess and remove other, confounding > knowledge and skills (or else make sure you have taught all the > knowledge and skills being tested by standardizing the teaching). This > leads to narrowly focused test items (or performance tasks). In order > to get a reliable assessment, you have to make sure the items are not > open to different interpretations; they have to function the same way > with every group of students tested. Essentially, you have to separate > what’s been learned from the person (or people) who have learned it, > and the more we know about how people learn, the less sense this > makes. To do valid and reliable assessment, you’re always trying to > make up for the fact that each person is a unique individual. > > The criteria of validity and reliability in educational assessment > share their roots with the criteria of validity and reliability in > social sciences research; they’re open to the same critiques and you > can find those critiques in any number of books and articles on > grounded theory, qualitative research, and naturalistic research. > There are possible alternatives to this view of assessment, as there > are to the positivist view of research. There are also all sorts of > other issues we haven't talked about, like using a single measure to > assess learning, as we do with mandates to use CASAS, TABE, or other > standardized tests, but this message is probably too long already. I > think that to choose validity and reliability as the standard we > endorse, we have to look at the assumptions on which that standard is > based, how well it matches and represents what we know about learning, > how it works in practice, and alternatives to it. > > Eileen > > _________________________________________________________________ > FREE pop-up blocking with the new MSN Toolbar – get it now! > http://toolbar.msn.com/go/onm00200415ave/direct/01/ >
This archive was generated by hypermail 2b30 : Thu Dec 23 2004 - 09:46:14 EST