[NIFL-ASSESSMENT:501] RE: why "valid and reliable"?

From: Eileen Eckert (eileeneckert@hotmail.com)
Date: Mon Apr 12 2004 - 13:47:32 EDT


Return-Path: <nifl-assessment@literacy.nifl.gov>
Received: from literacy (localhost [127.0.0.1]) by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id i3CHlWm14433; Mon, 12 Apr 2004 13:47:32 -0400 (EDT)
Date: Mon, 12 Apr 2004 13:47:32 -0400 (EDT)
Message-Id: <Sea2-F167eISiOyNANG000368da@hotmail.com>
Errors-To: listowner@literacy.nifl.gov
Reply-To: nifl-assessment@literacy.nifl.gov
Originator: nifl-assessment@literacy.nifl.gov
Sender: nifl-assessment@literacy.nifl.gov
Precedence: bulk
From: "Eileen Eckert" <eileeneckert@hotmail.com>
To: Multiple recipients of list <nifl-assessment@literacy.nifl.gov>
Subject: [NIFL-ASSESSMENT:501] RE: why "valid and reliable"?
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Type: text/plain; format=flowed
Status: O
Content-Length: 6404
Lines: 112

Hi Allan,
I'm structuring your statements and my responses as a kind of dialogue. I 
hope that doing so will make it easier to follow.

Allan: Eileen, I believe that your examples do not negate the usefulness of 
the characteristics validity and reliability, but show the limitations of 
each.

Eileen: It's the limitations that concern me. In order to accommodate or 
overcome the limitations, I think we move in a direction that is 
counterproductive for students. I think this is illustrated in David's 
example of performance assessment in voc ed, in which the circumstances of 
the task are controlled to increase reliability. While his example 
illustrates a "good" assessment, I don't think controlling the environment 
serves any purpose <except> increasing reliability, and to me, that's not a 
compelling reason to do it when there are other quality criteria that could 
be used (read on, please, I will get to that).

Allan: Validity and reliability need to be considered, not as absolutes, but 
in relative terms, or in degrees.

Eileen: Agreed.

Allan: Assessment instruments are not simply valid or invalid, but more or 
less valid in comparison to each other, and in comparison to some ideal of 
those terms or some model assessment.

Eileen: Yes, and they are not only valid or invalid, but they are valid or 
not valid for the use to which they are put. Say, for example, someone comes 
to a literacy program with a specific goal of passing the written portion of 
the test for a driver's license. The student takes a CASAS test upon entry. 
Then the tutor works with the student on the goal of passing the written 
test for a driver's license, and the student passes the test. Then the 
program coordinator administers the "correct" level CASAS test to assess 
progress, and the pre- to post-test scores indicate very little progress. 
The CASAS hasn't changed, it hasn't changed in relation to the TABE or BEST, 
its validity as an assessment of the "competencies" it measures are the 
same. But it is not a valid assessment of what that person has learned. In 
this case, the written driver's license exam the student passed is the valid 
assessment--it measures what we want to measure, that is, did the student 
meet her or his goal? But that assessment would only be valid for students 
who shared that goal, not for others who want to enter vocational training, 
pass the GED, or read to their kids.

Allan: Still, they are necessary and useful criteria for choosing or 
creating assessment tools.

Eileen: No, in many cases they're not. I think that instead of pursuing 
validity and reliability we should be looking at the criteria for building 
trustworthiness. Trustworthiness encompasses credibility, dependability, 
transferability, and generalizability. In research, it's an alternative to 
the positivist paradigm (from which validity and reliability come). I think 
it can be adapted to assessment, and that to do so would be more worthwhile, 
more meaningful, and more consistent with the aims of adult literacy and 
basic education than pursuing validity and reliability. I will try to make 
that case, but probably not all in one message!

Allan: Surely, you cannot contest the argument that a test of math ability 
which gives low scores to Vietnamese students because they can't read the 
English is not an accurate (i.e., valid) assessment of the math skills of 
those test-takers.  Some level of validity is an important and even 
necessary characteristic for any assessment, but it is not the only 
characteristic and it is not an absolute one.

Eileen: Allan, I'm not even trying to make the argument that the test you 
posit above is valid. When you say "validity is not the only characteristic 
and it is not an absolute one," I think your concern for quality is 
consistent with the idea of credibility and dependability. That is,  the 
assessment has "truth value" to the subjects and the assessment reflects 
their math skills, not their language. But whereas a "valid" assessment 
would be one that tries to be free of bias, a credible and dependable one 
might be one in which the students document their math skills, and that 
documentation might take various forms. This would bring us back to 
assessments such as portfolios, which have kind of dropped off the radar 
with the focusing of accountability requirements on standardized tests.

Allan: Many have argued that assessment instruments are only estimates of 
the knowledge and abilities of those they assess.  In assessing human 
endeavors, any assessment is frought with uncontrollable and nonquantifiable 
conditions and variables.**  But that doesn't mean we don't assess or we 
don't make judgements of our students' abilities.  And it doesn't mean we 
simply reject what is not perfect, but try in many cases to make the 
imperfect better, and to keep the discussion going about what constitutes 
good assessment.

Eileen: Again, I agree, but I think that "better" in terms of validity and 
reliability is not "better" in terms of education and learning. As your note 
at the end illustrates, validity and reliability are better suited to 
scientific experiments in which the variables can be controlled to a much 
greater extent than people and their learning environments can (or should 
be). "Better" in terms of trustworthiness <is> consistent with improving 
learning and education. The "uncontrollable and nonquantifiable conditions 
and variables" you speak of are a huge concern, but instead of seeing them 
as barriers to validity and reliability, we can see them as part of "complex 
systems"* and make our assessments work with them, instead of trying to 
control for them.

I think I'll stop there. There's much more to your message, and to this 
discussion, but this is getting too long.

Eileen

*A good article on complex systems is Davis, B. & Sumara, D. (2001). 
Learning communities: Understanding the workplace as a complex system. In 
Imel, S. (Series Ed.) & Fenwick, T. (Vol. Ed.), Sociocultural perspectives 
on learning through work: No. 92. New Directions for Adult and Continuing 
Education (pp. 85-95). San Francisco: Jossey-Bass.

_________________________________________________________________
Free up your inbox with MSN Hotmail Extra Storage! Multiple plans available. 
http://join.msn.com/?pgmarket=en-us&page=hotmail/es2&ST=1/go/onm00200362ave/direct/01/



This archive was generated by hypermail 2b30 : Thu Dec 23 2004 - 09:46:14 EST