Return-Path: <nifl-assessment@literacy.nifl.gov>
Received: from literacy (localhost [127.0.0.1]) by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id g9MKf1X03346; Tue, 22 Oct 2002 16:41:01 -0400 (EDT)
Date: Tue, 22 Oct 2002 16:41:01 -0400 (EDT)
Message-Id: <3DB5B89A.17D5DB1C@doe.state.vt.us>
Errors-To: listowner@literacy.nifl.gov
Reply-To: nifl-assessment@literacy.nifl.gov
Originator: nifl-assessment@literacy.nifl.gov
Sender: nifl-assessment@literacy.nifl.gov
Precedence: bulk
From: Sandra Robinson <srobinson@doe.state.vt.us>
To: Multiple recipients of list <nifl-assessment@literacy.nifl.gov>
Subject: [NIFL-ASSESSMENT:221] RE: Level 3 SPL
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=iso-8859-1
X-Mailer: Mozilla 4.77 [en] (Win95; U)
Status: O
Content-Length: 14111
Lines: 143
Kevin,
Thanks so much for taking the time to respond. You did just what you hoped. You're not a gripe and do offer understandable constructive feedback to the test makers. Even I understood the points you make. Thanks, again. Sandra
Kevin O'Connor wrote:
> Sandra, I based this on what I learned for the Standards of Educational and Psychological Testing. It is the Rosetta Stone for psycholinguistics. Sorry it's so long, but I really wanted to make this more than another gripe piece, but not an impenetrable mass of "assessment-speak".
>
> Questioning validity: Realigning the BEST test with the NRS descriptors
>
> The BEST Oral Test was developed in the 1980's through the Office of Refugee Resettlement and Department of Health and Human Services to help measure the life skills of Asian immigrants who were coming to the U.S. Since then, it has been adopted as a placement tool by many programs across the country. In recent years, it has been mandated by several states through Department of Education as a measure of educational gains. The BEST is a venerable and respected tool, but I believe that its use has grown beyond the original framework. Today it is being used to test linguistic domains that are different from its designed, and with a population far beyond the designers' expectations. I understand that here is a revision being developed by the Center for Applied Linguistics, and I hope that they will consider some of my following concerns to help produce a valid version of the test.
>
> I am the Assessment Specialist for a large ESL and ABE program in Massachusetts (over 600 students in ESL). The Mass. Department of Education has mandated all DOE-funded programs to use the BEST Oral Test, Long Form, to test all students within the range of SPL 1-4. As a result, we have tested over 300 people during this pre-test cycle. As Assessment Coordinator, I have been closely involved in the testing, and I wanted to share some of our concerns.
>
> The BEST is similar to a curriculum-based test that we had been evolving and using at our program for over 10 years. As a result, we were very familiar with interview-formatted performance assessments and how to build and use them. We have found that there are some major problems with the BEST test that should be addressed in any revision. I understand that there is a new version of the BEST test being developed, and I wanted to forward some of the biggest concerns to arise from our experience with the BEST.
>
> I understand that it was the Mass. DOE's decision to use this test, not CAL's. However, many states are now using the BEST for this purpose (as stated on the CAL website). I believe that there are many people out there who would benefit from a revision aimed at validly measuring the domain reportedly being measured: oral communication skills. Those of us who have been using this test may be of help in revising it.
>
> Test usage has expanded beyond the intended use and domain
> I have read the history of the BEST test. When the test was developed, its conceptual framework did not incorporate high-stakes testing of oral proficiency based on the scores derived. To begin with, some items do not test the domain that we (Massachusetts ESL programs) are looking for. Specifically, here in Massachusetts (and in other states I imagine), the BEST is being used to measure oral proficiency of English. However, several of the test items do not seem valid for this purpose (counting money, following maps, pointing to clocks...). Neither the NRS Speaking/ Listening Descriptors nor the Massachusetts Curriculum Frameworks Oral Communication Strand incorporate these kinds of life skill constructs in this category. This is because the BEST Test was designed to assess grammar, but also "Topic areas identified as crucial to "survival level" competency in English..." (BEST Test Manual, p.53).
>
> When developing a test, it is important to clearly state the intended use and design it accordingly. The BEST was not originally built for this purpose, and its validity suffers because of this. The potential uses of a test should shape its conceptual framework. "Validation logically begins with an explicit statement of the proposed interpretation of test scores.... The proposed interpretation refers to the constructs or concepts the test is intended to measure" (Standards for Educational and Psychological Testing, p.9). Therefore, a test designed to place students in relation to the domain represented by the NRS Speaking/Listening Descriptors should not incorporate money, maps and clocks, since they are not included in the construct measured to report educational gain in NRS Speaking/Listening skills.
>
> Test has expanded beyond the normed population
> I have been told that the original BEST series was normed on 987 test-takers. This group was encompassed speakers of five Asian language groups along with one Latin-based and one Slavic language. No statistics gave percentages for each language (BEST Test Manual, p.54). This does not seem to be an adequate sample size considering the tens of thousands of students now taking the test. Our program alone has just finished giving it to over 300 students, and by June, we will have tested more than 950 who speak 32 different languages.
>
> I am not second-guessing the test designers. They designed a good tool for their intended purpose. It would have been impossible for them to adequately sample every possible language group. In addition, the test designers never intended this tool for such wide use of such a narrow domain. However, the BEST has been brought out into the wider world for a larger, more high-stakes use. If its use is to be continued (as indicated by the revision), the developers should know of the information out here in the field that could inform test revision.
>
> SUGGESTIONS
> Redesign the order/difficulty of test items
> Everyone I have talked to believes that the order of the test could benefit from revision. The cutoff point ("cut score") of four correct answers before #14 leads to too many low-SPL students needing to complete the whole test. Some of the questions that occur before #14 are just too easy or, as I have said, not relevant to the construct of speaking/listening. The fact that someone can say their name and point to two clock pictures does not indicate that they are ready to move on to a 19-word question about the price of apples. Because of this, many students have been forced to complete the whole test and still end up at an SPL 0. That is 15 minutes of feeling like a failure. This is crushing to a student's morale.
>
> Some have noted:
> "...We must continue the test because there are some easier questions later in the test, and test takers may score points on them. It is unfair to quit too early and deprive the student of a chance to score those extra points and show the full extent of their abilities."
> I agree that students should have the chance to answer all the easy questions, but I DO NOT AGREE that they should be raked over the coals on all the hard questions to get there. Almost every assessment tool I have examined graduates test items based on their difficulty, so students at low levels can find test items that suit their skill level BEFORE they are demoralized by test items that are too difficult.
>
> By the time SPL 1 students gets to the relatively easy item #42, many have given up. The items that a low-SPL learner could answer are placed too far into the test. CAL should move the easy questions up and move the clock questions beyond the "cut score" point (currently #14), or better yet, remove them entirely.
>
> Technical concerns with specific items
> Construct-irrelevant variance:
> "The test scores may be systematically influenced to some extent by components that are not part of the construct.... construct-irrelevant components might include an emotional reaction to the test content"
> Standards for Educational and Psychological Testing p.13
> "An attempt is generally made to avoid words or topics that may offend or otherwise disturb some test takers, if less offensive material is equally useful"
> Ibid, p.39
>
> Please consider these excerpts when selecting test items and pictures. Many students react badly to the picture of the child struck who has been struck by a car. This "emotional reaction" may be affecting their test scores.
>
> Question #1 and the standard error of measurement
> "My name is..." not a discriminating question. Out of 300 students, only one was not able to answer this question. That is 0.3% of our testing sample. Any question that 99.7% of the students can answer correctly is not providing you with any real data about what students in general can and cannot do. This item does not" discriminate among test takers of different standing on the scale" (ibid, p.39); therefore it should NOT be included in the "cut score" of four correct before #14.
>
> Non-computer based BEST needed.
> I understand the possibilities that computer-based testing (CBT) allow. I am impressed with the research, theory and mathematical data that go into creating a "computer-adaptive test". These tests gradually increase the difficulty of the test items to best challenge the tester (again, by graduating difficulty as I have proposed before).
>
> However, computer-based testing poses construct-irrelevant difficulties for an immigrant population. Many of my students have never used a computer before, and this presents serious challenges, difficulty and psychological intimidation. Studies are currently being conducted to determine how much variance this causes.
>
> In addition, not all programs have the facilities to test all their students on a computer. I hope that the BEST revisions will appear in a hard copy form as well, thereby allowing all programs to be able to use a valid test, not just those who can afford the hardware.
>
> As I have said, I understand how this test was developed. I respect the work done through ORR and DHHS, and I appreciate the role that the Center for Applied Linguistics has played in supporting and distributing the BEST. There are many things going for this test, which is why so many states are choosing it. However, it needs to be adapted to better fill the niche it has found in the last 20 years. I hope the revision team at CAL will consider some of our concerns and suggestions and give us a valid, hard copy version of the BEST that will help us measure our students speaking/listening skills.
>
> Kevin O'Connor
> ESL Teacher and
> Assessment Specialist
> Framingham Adult ESL
> 508-626-4282
> koconnor@framingham.k12.ma.us
>
> -----Original Message-----
> From: Sandra Robinson [mailto:srobinson@doe.state.vt.us]
> Sent: Thursday, October 17, 2002 3:51 PM
> To: Multiple recipients of list
> Subject: [NIFL-ASSESSMENT:216] RE: Level 3 SPL
>
> Kevin,
> I wonder if you could share those suggestions with the list serve. We would certainly welcome the help.
> Sandra Robinson
>
> Kevin O'Connor wrote:
>
> > Hell, Carol. I work with a large Massachusetts ESL program and we have quite a few BEST test revision suggestions, based on our program's concerns about validity. Who could I forward them to?
> >
> > Kevin O'Connor
> > ESL Teacher and
> > Assessment Specialist
> > Framingham Adult ESL
> > 508-626-4282
> > koconnor@framingham.k12.ma.us
> >
> > -----Original Message-----
> > From: Carol Van Duzer [mailto:carol@cal.org]
> > Sent: Thursday, October 17, 2002 2:02 PM
> > To: Multiple recipients of list
> > Subject: [NIFL-ASSESSMENT:214] RE: Level 3 SPL
> >
> > Hi Cheryl,
> >
> > Part of the problem is that the BEST Oral interview assesses oral skills and the REEP writing rubric assesses writing skills. For most learners, speaking and listening skills do not develop at the same pace--perhaps one is used (or needed immediately) more than the other or instruction focusses on the most needed language skill. Placement should reflect what is happening instructionally so learners are placed in levels that best meet their needs (and proficiency). It frequently happens that learners have high speaking skills, but lower writing skills.
> >
> > What did you use for exiting learners from your Level 3 before you began using these assessments? I assume that would still be valid. Then what you want to do is increase writing skills so that the learners can advance on the REEP writing rubric to meet the state's NRS requirements.Perhaps a stronger writing component will need to be added to the Level 3 curriculum.
> >
> > Carol
> > Carol H. Van Duzer
> > National Center for ESL Literacy Education
> > Center for Applied Linguistics
> > 202-362-0700
> > carol@cal.org
> >
> > visit our website at www.cal.org/ncle
> >
> > -----Original Message-----
> > From: Cheryl Pyburn [mailto:cpyburn7@yahoo.com]
> > Sent: Wednesday, October 16, 2002 2:52 PM
> > To: Multiple recipients of list
> > Subject: [NIFL-ASSESSMENT:209] Level 3 SPL
> >
> > Hi everyone.
> >
> > I have a question about ESOL levels with regards to
> > the REEP and BEST tests...I'll explain...
> >
> > We're having some trouble at our learning center
> > trying to decide where our ESOL level 3 class should
> > end. We have just started using the REEP, and we're
> > running into the problem of level 3 students who have
> > an SPL 7 (BEST oral), but their REEP score is quite
> > low: 2 or 3. When would this student move on? At what
> > point does a student 'finish'?
> >
> > Right now we have 3 ESOL levels. Level 1 is SPL 0-3;
> > Level 2 is SPL 4-5; Level 3 is SPL 6+; however, with
> > the REEP assessment, it seems that students could end
> > up being in level 3 forever. Any thoughts?
> > Suggestions? What do other learning centers do?
> >
> > Thank you.
> >
> > =====
> > Cheryl Pyburn
> > Operation Bootstrap
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Faith Hill - Exclusive Performances, Videos & More
> > http://faith.yahoo.com
This archive was generated by hypermail 2b30 : Fri Jan 17 2003 - 14:46:26 EST