[NIFL-ASSESSMENT:494] why "valid and reliable"?

From: Eileen Eckert (eileeneckert@hotmail.com)
Date: Sat Apr 10 2004 - 14:03:18 EDT


Return-Path: <nifl-assessment@literacy.nifl.gov>
Received: from literacy (localhost [127.0.0.1]) by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id i3AI3Hm24598; Sat, 10 Apr 2004 14:03:18 -0400 (EDT)
Date: Sat, 10 Apr 2004 14:03:18 -0400 (EDT)
Message-Id: <Sea2-F61nHs0vbtqFZf00013308@hotmail.com>
Errors-To: listowner@literacy.nifl.gov
Reply-To: nifl-assessment@literacy.nifl.gov
Originator: nifl-assessment@literacy.nifl.gov
Sender: nifl-assessment@literacy.nifl.gov
Precedence: bulk
From: "Eileen Eckert" <eileeneckert@hotmail.com>
To: Multiple recipients of list <nifl-assessment@literacy.nifl.gov>
Subject: [NIFL-ASSESSMENT:494] why "valid and reliable"?
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Type: text/plain; format=flowed
Status: O
Content-Length: 4468
Lines: 74

Hi Robert and David,
You’ve both given definitions, but those don’t get to the underlying 
assumptions. And it looks to me like you’re both assuming that if I really 
understood validity and reliability, then their appropriateness would be 
self-evident. If that’s the case, could you try to set that assumption aside 
for a while if you continue reading? I do know about validity and 
reliability, and I still don’t think they are meaningful or useful criteria 
for judging assessment of adult learning.

Robert, you said: A valid assessment tests what you want it to test: the 
specific skills and knowledge being taught.

Underlying that definition of validity is the assumption that you can 
isolate knowledge and skills and test them apart from the context in which 
they are used, and that doing so will tell you something meaningful about 
what the student has learned. Robert gave the example, “If I teach a class 
in number skills to a group of limited English speakers and then assess 
their skills with a series of word problems that use vocabulary the students 
don't understand, then the assessment is not valid.”

Using that example, in order to have valid test items, you’d have to either 
construct word problems using vocabulary the students know, or test the 
number skills without the words to confound the issue. If you construct word 
problems the students know, then you likely introduce problems of 
reliability—will students in every testing situation share that particular 
vocabulary so that the test functions the same from group to group? Then, to 
address that, you need to have standardized teaching—make sure the students 
in every test situation share the same vocabulary by teaching that same 
vocabulary. Or you could test the math skills without the words, but we like 
to use word problems because they address the concern that students don’t 
encounter numbers in isolation; they encounter them in context.

But there’s the rub. They encounter numbers in different contexts, and 
transfer of skills from one to another is one of the most difficult issues 
in teaching and learning, so being able to choose the “right” answer on a 
test doesn’t necessarily mean they can use the skill to do something that 
matters to them personally. And being able to use the skill in a way that 
matters to them personally doesn’t necessarily mean they can 
de-contextualize that skill and re-contextualize it to pick the right answer 
on a test. So what does the test score mean?

In order to get a valid assessment, you have to isolate the knowledge and/or 
skill you are trying to assess and remove other, confounding knowledge and 
skills (or else make sure you have taught all the knowledge and skills being 
tested by standardizing the teaching). This leads to narrowly focused test 
items (or performance tasks). In order to get a reliable assessment, you 
have to make sure the items are not open to different interpretations; they 
have to function the same way with every group of students tested. 
Essentially, you have to separate what’s been learned from the person (or 
people) who have learned it, and the more we know about how people learn, 
the less sense this makes. To do valid and reliable assessment, you’re 
always trying to make up for the fact that each person is a unique 
individual.

The criteria of validity and reliability in educational assessment share 
their roots with the criteria of validity and reliability in social sciences 
research; they’re open to the same critiques and you can find those 
critiques in any number of books and articles on grounded theory, 
qualitative research, and naturalistic research. There are possible 
alternatives to this view of assessment, as there are to the positivist view 
of research. There are also all sorts of other issues we haven't talked 
about, like using a single measure to assess learning, as we do with 
mandates to use CASAS, TABE, or other standardized tests, but this message 
is probably too long already. I think that to choose validity and 
reliability as the standard we endorse, we have to look at the assumptions 
on which that standard is based, how well it matches and represents what we 
know about learning, how it works in practice, and alternatives to it.

Eileen

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar – get it now! 
http://toolbar.msn.com/go/onm00200415ave/direct/01/



This archive was generated by hypermail 2b30 : Thu Dec 23 2004 - 09:46:14 EST