[NIFL-4EFF:1229] RE: Performance assessment validity

From: Judith Lashof (jlashof@rcsu.org)
Date: Mon Oct 23 2000 - 15:05:34 EDT


Return-Path: <nifl-4eff@literacy.nifl.gov>
Received: from literacy (localhost [127.0.0.1]) by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id e9NJ5D928838; Mon, 23 Oct 2000 15:05:34 -0400 (EDT)
Date: Mon, 23 Oct 2000 15:05:34 -0400 (EDT)
Message-Id: <NDBBIHDGKLFHANLINLHDCENHCBAA.jlashof@rcsu.org>
Errors-To: rgspacone@aol.com
Reply-To: nifl-4eff@literacy.nifl.gov
Originator: nifl-4eff@literacy.nifl.gov
Sender: nifl-4eff@literacy.nifl.gov
Precedence: bulk
From: "Judith Lashof" <jlashof@rcsu.org>
To: Multiple recipients of list <nifl-4eff@literacy.nifl.gov>
Subject: [NIFL-4EFF:1229] RE: Performance assessment validity
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
Status: O
Content-Length: 8885
Lines: 163

Question/info for Regie.

The State of Vermont has just begun work on the Performance Quality
Indicators for our Even Start program.  These are really learner outcome
measures.  For adults they must align with the NRS.  Until this year
standardized tests were not necessary in Vermont ABE.  This year they have
begun to use the AIMES and the TABE, because there were no locally developed
authentic measures with any significant validity.

Vermont is unique.  We hope that Even Start Adult PQI group may
develop/select some assessment measures which will be adopted by the ABE
community as a whole.  In our Even Start program we are piloting using a
writing prompt and assessing levels and progress on a rubric which spells
out entry and completion skills at each level of the NRS.  I think we have
the resources to develop this measure and its implementation sufficiently to
assure reasonable validity.  My question is are there EFF standards and
measures of them that we might be able to consider for our PQI when we meet
next month?  Also are there authentic or holistic approaches or tools for
reading that could be used for Adult PQI in Even start or in ABE as a whole?

Judith Lashof
Rutland Region Even Start Coordinator
RCSU/257 S. Main St./Rutland, VT 05701
802-775-4342
jlashof@rcsu.org


-----Original Message-----
From: nifl-4eff@nifl.gov [mailto:nifl-4eff@nifl.gov]On Behalf Of Regie
Stites
Sent: Saturday, October 21, 2000 5:50 PM
To: Multiple recipients of list
Subject: [NIFL-4EFF:1226] Performance assessment validity


Hello again,
I apologize for the scrambling of my last message.  In case you missed
it, the message was continued below my name and contact information.
I'll try to do better this time.

This message is in response to questions that were asked about the pros
and cons of using performance assessment and portfolios for internal
(instructional) and external (accountability) purposes.  I think Amy
Trawick hit the nail on the head when she noted that "(m)ultiple
customers will need to buy in to a new way of thinking about assessment"
to make EFF's reform vision a reality.  That new thinking includes an
understanding of the value of aligning standards and assessments at the
program, state, and national levels -- for reasons that I described in
the last post.  It also includes the goal of making assessment an
integral part of instruction and learning, rather than -- as is too
often the case in using standardized tests -- a separate (and often
painful) event that interrupts learning and instruction and distances
adult learners from their instructors and from their motivation to
learn.

Amy went on to ask my opinion on the factors that affect the utility of
a performance-based assessment system for "learner, teacher, program,
funder, *and* state/federal purposes."  In my view, this is a question
about the validity of performance-based assessment and it is exactly the
right way to frame such a question.  The validity of any assessment
should be judged in terms of the purpose of the assessment.  For
example, I would argue that the methods used to assess certain types of
reading and numeracy skills in the last National Adult Literacy Survey
(NALS) are valid for the purpose of profiling the distribution of
various levels of those reading and numeracy skills in the adult
population of the U. S.  On the other hand, I would argue that the NALS
measures are not valid for the purpose of assessing the overall impact
of the adult language and literacy educational system on literacy levels
among U.S. adults.  The primary reason that the NALS is not valid for
the latter purpose is the narrow range and poor alignment of the skills
it measures relative to what is being taught and learned in the adult
language and literacy educational system.

Mary Hannaman pointed out a similar validity problem (and one that is
closer to home) in her questioning of the appropriateness of using
standardized tests that have little connection to standards that "have
been developed based on the needs or goals of the state."   Bringing the
issue of validity even closer to home was Donna Curry's question about
whether teachers need to be concerned about validity in informal and
day-to-day assessment.  I think teachers should always be concerned
about the validity of any assessment, formal or informal.  I would also
argue that validity is much easier to achieve when assessment is closely
aligned with instructional goals and integrated into instructional
activities.  This is why many assessment specialists see
performance-based assessment as a potentially more valid alternative to
traditional testing.

To understand why alternative assessment systems (performance tasks,
portfolio assessments, and other integrated measures of knowledge and
skills) may be more valid than traditional forms of testing
(multiple-choice, fill-in-the-blank, and other discrete measures of
knowledge and skills) for various purposes you may want to look back at
the three general areas of validity that I described in my "Introductory
remarks" -- construct validity, consequential validity, and face
validity.

In terms of construct validity, the advantage of alternative assessment
is the opportunity to create more direct and more authentic measures of
desired knowledge, skills, and abilities than is typically possible with
traditional testing.  On the other hand, construct validity also
includes concerns about reliability (consistency of scores/ratings over
time and among raters).  Standardized tests are strong on reliability.
Performance based assessments are scored more subjectively and therefore
reliability must be strengthened by use of well-structured scoring
guidelines and training of teachers to make effective and consistent use
of scoring guidelines.

In terms of consequential validity, alternative assessment systems again
have the advantage over traditional testing in many situations because
of the fact that performance-based assessments and especially portfolios
assessment systems give learners more opportunities (in more
"real-world" contexts) to demonstrate desirable knowledge, skills, and
abilities.  I recently heard Sri Ananda make the argument that the
relatively high costs of using performance assessment (because of the
training and process needed to achieve reliability) is justified in
cases where only direct measures of performance will do. She used the
example of the behind-the-wheel test required to get a driver's
license.  A paper and pencil test alone will clearly not suffice to
guide this high-stakes decision.  Even in low-stakes testing situations,
performance assessment (particularly when results are collected and
regularly reviewed in a portfolio) has the advantage of providing more
feedback to learners and instructors that is more directly applicable to
improving learning activities and opportunities than the guidance that
standardized tests can provide.

On the issue of face validity, performance assessment is again a clear
winner.  Scoring criteria used in performance-based assessment are more
easily communicated (and often more meaningful) to learners and teachers
than is the case in traditional forms of assessment.  Within an
alternative assessment system, learning and assessment activities are
combined.  A well-structured performance task should also be a learning
activity.  Good use of a portfolio is one way to capitalize on the
potential for strong face validity in performance assessment.  The
primary purpose of the portfolio should be to aid communication between
the learner and the instructor so that learning goals and progress can
be reviewed and evaluated in an ongoing dialogue.  In the end, the
portfolio can become a richly textured and substantial piece of evidence
of learning achievement.  The challenge is to convince policy makers and
funding agencies that such evidence is as valid (and reliable) as
standardized test results.  Basically, this means changing the ways that
policy makers think about validity.  The face validity of a standardized
test rests mostly on the authority of the experts who design the test
and analyze its results.  This is seen as having advantages for
high-stakes testing because the judgements of experts are viewed as
legitimate (even though the bases for arriving at judgements are not
widely understood).  In my opinion, education is a different sort of
system than law or medicine.  In medicine and to some extent in law, the
public puts its trust in authoritative judgements above popular
understanding.  In education, it is relatively more important to work
toward achieving a balance (and making connections) between expertise
and popular understanding.

Regie

Regie Stites, Ph.D.
Education Researcher
Center for Education and Human Services
SRI International
Menlo Park, CA
e-mail: regie.stites@sri.com
voice: (650) 859-3768



This archive was generated by hypermail 2b30 : Mon Oct 29 2001 - 15:04:17 EST