Computerized Adaptive Testing: Effective Measurement for All Students
by Linda Clark, Ph.D.
Most educators today would agree that the mission of our profession has become more complex and much more challenging than ever before. Changes in district demographics, societal pressures that require more personalized student attention, evermore stringent governmental demands, and constant budgetary concerns all have an impact on our daily routines. As director of student achievement for the Meridian School District in Idaho, I've seen firsthand how these and other issues, while controversial at times and always challenging, have forced all of us to find new and creative ways to reach what we consider the ultimate goal of education: fostering continuous growth in every student to prepare him or her for the future.
Community Support
The Meridian School District experiences steady growth across the full spectrum of its students - from highly gifted children to kids with special needs. While we benefit from unusually strong community support and a nationally recognized reputation for high student achievement, a large part of our success is due to the quality and amount of data that guides us.
One reason we're successful is that our testing methods assure more effective teaching, more substantive learning, and better-prepared students. Our initial search in this area, which began seven years ago, led us to partner with the Portland, Ore.-based Northwest Evaluation Association (NWEA, www.nwea.org) to initiate an achievement-level test - first in paper-and-pencil form and later via computer.
Uniquely adaptive, this computerized test automatically presents each student with different items based on ability level and prior responses. When the student answers a question correctly, subsequent questions become more difficult, while incorrect answers lead to easier questions. The tests help eliminate student frustration and boredom, and offer results that provide a solid foundation of quality data delivered in days, not months.
Unlike traditional standardized tests that measure a student's status compared to others, computerized adaptive tests (CATs) enable us to track the growth of each student in specific subjects over time. This allows us to see and foster ongoing individualized improvement. Besides years of strong, continuous growth, the testing system has found unanimous support from teachers, administrators, parents and even students. The test was so successful in our district and many others statewide that Idaho contracted with NWEA to develop our state test, the Idaho Standards Achievement Test (ISAT). This test is a blended solution of a CAT and a fixed-form test designed to meet No Child Left Behind mandates.
Are standardized tests fair and helpful evaluation tools?
Not really. Standardized tests are tests on which all students answer the same questions, usually in multiple-choice format, and each question has only one correct answer. They reward the ability to quickly answer superficial questions that do not require real thought. They do not measure the ability to think or create in any field. Their use encourages a narrowed curriculum, outdated methods of instruction, and harmful practices such as retention in grade and tracking. They also assume all test-takers have been exposed to a white, middle-class background. (See "How Standardized Testing Damages Education," a FairTest fact sheet.)
Are standardized tests objective?
The only objective part of most standardized tests is the scoring, when it is done by machine. What items to include on the test, the wording and content of the items, the determination of the "correct" answer, choice of test, how the test is administered, and the uses of the results are all decisions made by subjective human beings.
Are test scores "reliable"?
A test is completely reliable if you would get exactly the same results the second time you administered it. All existing tests have "measurement error." This means an individual's score may vary from day to day due to testing conditions or the test-taker's mental or emotional state. As a result, many individual's scores are frequently wrong. Test scores of young children and scores on sub-sections of tests are much less reliable than test scores on adults or whole tests.
Do test scores reflect real differences among people?
Not necessarily. To construct a norm-referenced test (a test on which half the test-takers score above average, the other half below), test makers must make small differences among people appear large. Because item content differs from one test to another, even tests that claim to measure the same thing often produce very different results. Because of measurement error, two people with very different scores on one test administration might get the same scores on a second administration. On the SAT, for example, the test-makers admit that two students' scores must differ by at least 144 points (out of 1600) before they are willing to say the students' measured abilities really differ.
Don't test-makers remove bias from tests?
Most test-makers review items for obvious biases, such as offensive words. But this is inadequate, since many forms of bias are not superficial. Some test-makers also use statistical bias-reduction techniques. However, these techniques cannot detect underlying bias in the test's form or content. As a result, biased cultural assumptions built into the test as a whole are not exposed or removed by test-makers.
Do IQ tests measure intelligence?
IQ tests assume that intelligence is one thing that can be easily measured and put on a scale, rather than a variety of abilities. They also assume intelligence is fixed and permanent. However, psychologists cannot agree whether there is one thing that can be called intelligence, or whether it is fixed, let alone meaningfully measure "it." Studies have shown that IQ scores can be changed by training, nutrition, or simply by having more friendly people administer the test. In reality, IQ tests are nothing more than a type of achievement test which primarily measures knowledge of standard English and exposure to the cultural experiences of middle class whites.
Do tests reflect what we know about how students learn?
No. Standardized tests are based in behaviorist psychological theories from the nineteenth century. While our understanding of the brain and how people learn and think has progressed enormously, tests have remained the same. Behaviorism assumed that knowledge could be broken into separate bits and that people learned by passively absorbing these bits. Today, cognitive and developmental psychologists understand that knowledge is not separable bits and that people (including children) learn by connecting what they already know with what they are trying to learn. If they cannot actively make meaning out of what they are doing, they do not learn or remember. But most standardized tests do not incorporate the modern theories and are still based on recall of isolated facts and narrow skills.
Do multiple-choice tests measure important student achievement?
Multiple-choice tests are a very poor yardstick of student performance. They do not measure the ability to write, to use math, to make meaning from text when reading, to understand scientific methods or reasoning, or to grasp social science concepts. Nor do these tests adequately measure thinking skills or assess what people can do on real-world tasks.
Are test scores helpful to teachers?
Standardized, multiple choice tests were not originally designed to provide help to teachers. Classroom surveys show teachers do not find scores from standardized tests very helpful, so they rarely use them. The tests do not provide information that can help a teacher understand what to do next in working with a student because they do not indicate how the student learns or thinks. Good evaluation would provide helpful information to teachers.
Are readiness or screening tests helpful?
Readiness tests, used to determine if a child is ready for school, are very inaccurate and unsound. They encourage overly academic, developmentally inappropriate primary schooling. Screening tests for disabilities are often not adequately validated; they also promote a view of children as having deficits to be corrected, rather than having individual differences and strengths on which to build.
Are there better ways to evaluate student achievement or ability?
Yes. Good teacher observation, documentation of student work, and performance-based assessment, all of which involve the direct evaluation of student effort on real learning tasks, provide useful material for teachers, parents, the community and the government.
A C-Test is an integrative
written test of general language proficiency based on the concept of reduced
redundancy. A C-Test consists of five to six short authentic texts, each
complete as a sense unit in itself. In these texts the first sentence is left
standing. Then the 'rule of two' is applied: beginning at word two in sentence
two the second half of every second word is deleted. Numbers and proper names
are usually left undamaged, but otherwise the deletion is entirely mechanical.
The process is continued until the required number of blanks has been produced
(in the canonical C-Test either 20 or 25). Then the text is allowed to run on
to its natural conclusion. The instructions for the examinees say something
like 'In this test parts of some of the words have been damaged. Please replace
the missing parts.' Texts are arranged in order of difficulty with the easiest
text first.
Placement exams determine your skill and knowledge of English, reading, and mathematics. Their results will not affect your admission to NYIT, so don't worry about how you'll do on them. You might even "place out" of some introductory-level courses, or you may require additional assistance in certain academic areas. Here are the answers to some frequently asked questions about placement tests.
When are placement exams offered? In general, placement exams are offered during the summer, just before classes begin, and during the first week of each semester. When you are accepted to NYIT, you will receive information regarding specific dates for tests.
What should I bring to the test? Be sure to bring a photo ID (driver’s license, passport, or student ID card). All placement exams are computer-based. You can bring a dictionary to the English exam. A calculator is provided for you at the mathematics exam. Scrap paper is provided, but you should bring a pen and pencil to work out math problems.
How do I study? Placement exams are not pass-fail tests, and you do not need to study for them since they are designed to measure your current knowledge and abilities. Sample questions are available at the testing company's site.
How long is the test? The NYIT COMPASS placement test is computer-based and not timed. The math portion is divided into five areas: pre-algebra, algebra, college algebra, geometry, and trigonometry. Your answers throughout the test will determine how many questions you will receive and the areas in which you will be tested.
Even though the test is not timed, please allow at least two hours to complete it. The exam will stop on its own once it has determined the appropriate placement for you based on your answers. You must complete the test to receive a score.
An achievement test is a test of developed skill or
knowledge. The most common type of achievement test is a standardized test
developed to measure skills and knowledge learned in a given grade level, usually
through planned instruction, such as training or classroom instruction.Achievement
tests are often contrasted with tests that measure aptitude, a more general and
stable cognitive trait.
Achievement test scores are often used in an educational
system to determine what level of instruction for which a student is prepared. High
achievement scores usually indicate a mastery of grade-level material, and the
readiness for advanced instruction. Low achievement scores can indicate the
need for remediation or repeating a course grade.
Under No Child Left Behind, achievement tests have taken on
an additional role of assessing proficiency of students. Proficiency is defined
as the amount of grade-appropriate knowledge and skills a student has acquired
up to the point of testing. Better teaching practices are expected to increase
the amount learned in a school year, and therefore to increase achievement
scores, and yield more "proficient" students than before.
When writing achievement test items, writers usually begin
with a list of content standards (either written by content specialists or
based on state-created content standards) which specify exactly what students
are expected to learn in a given school year. The goal of item writers is to
create test items that measure the most important skills and knowledge attained
in a given grade-level. The number and type of test items written is determined
by the grade-level content standards. Content validity is determined by the
representativeness of the items included on the final test.
از یک دیدگاه کلی انواع ارزشیابی را مورد بررسی قرار می دهیم.
البته فراموش نشود این با انواع تست فرق می کند.
Assessments can be classified in many different ways. The
most important distinctions are: (1) formative and summative; (2) objective and
subjective; (3) referencing (criterion-referenced, norm-referenced, and
ipsative); and (4) informal and formal.
Formative and summative
There are two main types of assessment:
* Summative
assessment - Summative assessment is generally carried out at the end of a
course or project. In an educational setting, summative assessments are
typically used to assign students a course grade.
* Formative assessment
- Formative assessment is generally carried out throughout a course or project.
Formative assessment, also referred to as educative assessment, is used to aid
learning. In an educational setting, formative assessment might be a teacher (or
peer) or the learner, providing feedback on a student's work, and would not
necessarily be used for grading purposes.
Summative and formative assessment are referred to in a
learning context as "assessment of learning" and "assessment for
learning" respectively.
A common form of formative assessment is diagnostic
assessment. Diagnostic assessment measures a student's current knowledge and
skills for the purpose of identifying a suitable program of learning. Self-assessment
is a form of diagnostic assessment which involves students assessing themselves.
Forward-looking assessment asks those being assessed to consider themselves in
hypothetical future situations. Assessments can also be done on pieces of
legislation.
Performance-based assessment is similar to formative
assessment, as it focuses on achievement. It is often aligned with the
standards-based education reform and outcomes-based education movement. Though
ideally they are significantly different from a traditional multiple choice
test, they are most commonly associated with standards-based assessment which
use free-form responses to standard questions scored by human scorers on a
standards-based scale, meeting, falling below, or exceeding a performance
standard rather than being ranked on a curve.
A well-defined task is identified and students are asked to
create, produce, or do something, often in settings that involve real-world
application of knowledge and skills. Proficiency is demonstrated by providing
an extended response. Performance formats are further differentiated into
products and performances. The performance may result in a product, such as a
painting, portfolio, paper, or exhibition, or it may consist of a performance, such
as a speech, athletic skill, musical recital, or reading.
Objective and subjective
Assessment (either summative or formative) can be subjective.
Objective assessment is a form of questioning which has a single correct answer.
Subjective assessment is a form of questioning which may have more than one
correct answer (or more than one way of expressing the correct answer). There
are various types of objective and subjective questions. Objective question
types include true/false answers, multiple choice, multiple-response and
matching questions. Subjective questions include extended-response questions
and essays. Objective assessment is becoming more popular[citation needed] due
to the increased use of online assessment (e-assessment) since this form of
questioning is well-suited to computerisation.
Informal and formal
Assessment can be either formal or informal. Formal
assessment usually implicates a written document, such as a test, quiz, or
paper. A formal assessment is given a numerical score or grade based on student
performance, whereas an informal assessment does not contribute to a student's
final grade. An informal assessment usually occurs in a more casual manner and
may include observation, inventories, checklists, rating scales, rubrics, performance
and portfolio assessments, participation, peer and self evaluation, and
discussion.
Internal and external
Internal assessment is set and marked by the school (i.e. teachers).
Students get the mark and feedback regarding the assessment. External
assessment is set by the governing body, and is marked by non-biased personnel.
With external assessment, students only receive a mark. Therefore, they have no
idea how they actually performed (i.e. what bits they answered correctly.)