|
Analysis
Using the released items from the Spring 1999 administration of the AIMS
mathematics portion, High School Form A (Released January 26, 2001, nearly two
years later), I have analyzed each item for mathematical accuracy, potential
for multiple interpretations (which tends to cause confusion in children in
a high-stakes situation, unrelated to their degree of understanding of the
content), and realism in terms of any pragmatic context within which an item
may have been embedded. Of the 38 Core items, fully 17 (45%) had some problem
associated with it that could have caused a consistent measurement error,
meaning that the score the student received for that item may not reflect
their actual level of understanding of or skill in, the content. Of
those 17, ten have problems significant enough to warrant their removal from
the assessment. This analysis indicates that over 1/4 of the AIMS mathematics
assessment, if the released items are a representative sample, provide
incorrect data to the state department of education, school districts, parents
and children anxious to graduate. If the AIMS test were subjected to the
same level of rigor as I apply to my students, it would receive a C-
grade--enough to warrant academic probation at any collegiate institution in
the country, suspension from academic activities at any high school, and
dismissal from any corporation who commissioned an employee to oversee
its quality control.
To the lay eye, it may appear that I am being picky,
criticizing the minutest detail of the exam. The lay eye is perceptive.
I am being picky. Any first semester student of psychometrics (the
statistical study of test design, administration, and analysis) could
tell you that if a test is to provide reliable and valid data, its items
must be designed well, reflect the standards of the content, and
clearly allow students who understand the content, to demonstrate that
understanding. All standardized assessments are subjected to rigorous
developmental cycles to perfect the tool and make it useful for the
purposes of the assessment. At this time, the AIMS test has not undergone
enough work to conform to these standards of quality. This analysis points
out the flaws in the released items, critiquing the instrument, and
by extension the attenuated time frame and conceptual framework of its
development. It does not make the case for completely dismantling the
process, nor does it remonstrate any individual or agency who might be held
accountable for these mistakes. Instead, it suggests that the people of
Arizona have resources at their disposal in a multitude of institutions,
that should work together in designing a useful and cost-effective program
of assessment for Arizona's children.
The trouble begins on page 1, the AIMS
Reference Sheet, on which are placed potentially useful formulas and
theorems for the students to use in taking the test. Unfortunately, the
students cannot trust the Reference Sheet as the formula for the Volume of
a Sphere is incorrect. Instead of 4/3 pi r2, the stated formula,
the actual formula should be 4/3 pi r3. Moreover, even if the
student caught the mistake, they may not remember the value of pi, since
the Key on page one suggests that students use 3.14 or 22/7 as the value
for p, the Greek symbol for rho, not pi.
It gets worse from there...
The items on the exam with potential problems are listed here: 1, 2,
5, 12, 16, 17, 18, 20, 21, 23,
24, 25, 27, 29, 33, 34, and 36. Those in
plainface type have some problems, but could be salvaged and used if they
undergo some revision. Items in boldface are those that I determine to
be seriously flawed. Below, I cite a few egregious examples. The figures
and text for the items are redrawn and retyped. The intent is to faithfully
reproduce the items as well as the computer of the author will allow in a
short time frame for quick turnaround of this paper. Where there are
differences, these are not mathematically relevant, nor do they relate to
the context within which the problems are situated. For the actual text
of the released items, download the PDF version from the State Department
website:
http://www.ade.state.az.us/AIMSReleaseSummary1-26.pdf.
Examples
Problem 16:
Alex is building a ramp for a bike competition. He has two rectangular boards.
One board is 6 meters long and the other is 5 meters long. If the ramp has to
form a right triangle, what should its height be?
A 3 meters
B 4 meters
C 3.3 meters
D 7.8 meters
In this item, none of the answers is correct. The student is expected to
use the Pythagorean Theorem (Hypotenuse2 = Side12 +
Side22). So, (6m)2 = (5m)2 +
(EF)2. To maintain a right triangle, the only correct answer is
(11)1/2 meters, one that is cumbersome in real life, and so
requires rounding off to an acceptable level of accuracy. Depending on the
convention for rounding, a reasonable height could be 3 meters (if the
convention is rounding to the nearest meter), 3.3 meters (if the convention
is rounding to the nearest decimeter), 3.32 meters (if the convention is
rounding to the nearest centimeter), and so on.
The answer marked as correct, 3.3 meters is actually about 1.2 centimeters
off (about 1/2 inch). Any carpenter worth his or her salt would not make an
error of 1/2 inch given a tape measure that is precise to 1/32 inch.
Moreover, as a male, I cringe at the thought of a bike competition that
requires riders to jump off 3.3 meter heights (between 10 and 11 feet, ouch!).
Or if the rider is to ride down the ramp, a slope of 66% (33.5 degrees) is
steep enough to scare the bejeebers out of me.
Lastly, a 6 m board? Come on! When was the last time you found a board of
20 feet at Home Depot? In short, the context within which the problem is
embedded shows a lack of the everyday sense for numbers that is required in
the elementary standards for Arizona children.
Problem 18:
Which of the following is a secant of circle P?
A (line) AB
B (line) CE
C (line segment) GP
D (line segment) FD
(I use parenthetical terms in reconstructing this problem because my word
processor has difficulties with the mathematical symbols--j.m.)
In this problem, there are three correct answers. Only answer C is
incorrect. Line AB is a secant of circle P because it is tangent, and all
tangents are defined in elementary calculus courses as degenerate secants
using the epsilon-delta definition of a derivative at a point. Line segment
FD is a secant, as it is the diameter of the circle, and therefore intersects
the circle in two points. Line CE (the "right answer") is a secant to P since
it intersects the circumference at two points.
Which answer should the student choose? The definition of a secant is
"a straight line that intersects a curve in two points." There may be some
argument over whether FD is a secant, as it is a line segment, and
therefore only lies on the secant that contains FD. It may surprise Americans
to realize that the term for straight objects in much of the world is the
equivalent of "line", and a special designation of "infinite" or
"unending" is placed before the word to denote what Euclid termed,
"breadthless width." While this kind of argument over terms may be useful
to establish norms for communication among people who speak different
languages, it is unclear whether all high school graduates need to be so well
versed in specific definitions that could be looked up in any mathematics
dictionary. An advanced student would NOT want to choose the obvious answer
as the case of AB and FD are much more interesting mathematically than CE.
Unfortunately, the AIMS test is scored where only one answer can be counted
as correct. What about a student who was unsure, seeing three examples of
secants, but only being able to choose one. "Do I remember the definition
correctly?" "What if it is something else?" These kinds of questions in a
high stakes exam throw the marginal student into unnecessary confusion, often
leading to frustration and unnecessary errors merely as a result of taking the
test.
Problem 23:
Aaron used the Pythagorean Theorem to find the height of a tree.
He calculated that the tree was square-root(625) feet tall. Which of the
following should be used to write the height of the tree?
A +- 25 feet
B 25 feet
C - 25 feet
D 252 feet
This problem illustrates lack of attention to the context within which the
intended content is situated. Though it is not technically impossible to use
the Pythagorean Theorem to calculate the height of a tree, it is absurdly
impractical. To calculate the height using Pythagorus, one must first have
the distance from the tree, and the length of the hypotenuse of the right
triangle (the length of a wire if it were stretched from the tip top of the
tree to the point where the observer, Aaron, is standing, see below).
Why go to the trouble of climbing the tree, stringing a wire and pacing off
the distance, when a simple use of the tangent ratio can calculate the height
with just the distance to the tree and the angle of elevation of the top. The
tangent of the angle of elevation (tan alpha) is equal to the height of the
tree divided by the distance (h/d). So, the height is equal to the tangent of
the angle multiplied by the distance (h = d(tan alpha). This is a common
middle school geometry activity.
Another reasonable method would be to hold up your thumb in front of your
face and walk to or away from the tree until the tree appears to be the height
of the tip of your thumb to the first knuckle (~ 1 inch). Then pace off the
distance to the tree. The height of the tree is found using similar triangles
where the ratio of the distance from your eye to your thumb : size of your
thumb (here the ~ 1 inch becomes useful) is equivalent to the distance
from your original position to the tree : height of the tree.
Any Scout could tell you this.
What the test designers are looking for is for students to find the
positive square root of 625. Why not just ask, "What is the positive square
root of 625?" Alternatively, if knowledge of the Pythagorean Theorem is
desired, one could ask, "The dimensions of a rectangular parking lot are 25m
by 15m. What is the length of the diagonal?"
This lack of attention to the details of context, is indicative of the
generally shoddy engineering of the AIMS items.
Problem 29.
The graph depicts a real-world situation. Which of the following situations
could it depict?
A A person dove into the water
B A person jumped from a tree to the grass below
C A plane landed safely
D A plane crashed into the runway
This problem is just awful. First, the problem brazenly states that the
graph depicts a real-world situation. The authors of the test then go on
to provide a graph that doesn't reasonably depict any of the situations
presented as answers.
As any student of physics knows, the relationship between height and time
for a body in freefall is curvilinear (parabolic, actually). This means that
the first two answers (A and B) are both impossible (assuming a continuous
time scale), as a jumping person does not reach terminal velocity in the short
heights people can safely jump from.
We don't know from the graph what the scale is for either height or time.
Does the graph depict the first millisecond, second, ten seconds, minute? Is
the height astronomical? Infinitessimal? Reasonable? Are the scales equal
interval or are they logarithmic? Are they idealized or do they depict actual
data. Without these bits of information (which are necessary for the
interpretation of any graph that depicts a real-world situation and not a
purely mathematical one), we really cannot tell whether or not the last two
answers (C and D) are plausible or not.
Is the Zero point for height the altitude of the runway? If so, D could be
the most reasonable response: Because a plane has an engine, it could,
conceiveably, put the engine in reverse to eliminate the acceleration of
gravity or alternatively speed up to a velocity greater than (or equal to)
terminal velocity. Either way, a plane could hit the earth after traveling a
constant velocity for a period of time. The plane could have then plowed into
the runway, where it hit a layer underneath the ground so elastic it took no
appreciable time for the plane to bounce back up to ground level at
approximately the same rate as it entered it.
C could also be the correct answer if the plane (in this case a small plane
of the kind still thrown, I am told by first hand sources, in classes where
the AIMS test is administered) dips below the Zero point, and then
bounces up off an object to be caught in a net.
The "correct" answer, A, has the following plausible shape for the scenario
proposed (again, forgive my computer's lack of attention to perfect drawings),
and can therefore be eliminated as a reasonable answer to the item:
At any rate, again, the lack of attention to the realism of the context,
and the ways in which problems may be interpreted here shows that the design
of the test itself: The mathematical content as well as the context within
which the problems are situated, is fundamentally flawed, causing the flawed
items to be inaccurate indicators of student learning and achievement.
So, should AIMS be scrapped?
I would like to state up front that I am not against a statewide assessment
of mathematics achievement. In fact, I advocate the development and
administration of high quality assessments to assist schools in providing the
best quality curriculum and instructions to all of our children--this is what
they deserve. The key here, is that the assessment must be designed to
provide detailed information to students, teachers, administration, and the
state (in that order), as to how they are achieving high quality standards,
and especially how they can improve. The current furor over AIMS is due, to a
large extent, to the fact that the results of the test (disregarding any case
that might be made about the dubious validity of the test itself) do nothing
for anyone. So we found out that 48% of the high school students who took
AIMS could sketch a cone (problem 24 in Form A). What does that say about
instruction? What if a student did not sketch a cone successfully. Does that
mean he or she didn't know what a cone was? Does it mean that the teacher
did not provide enough "cone" experiences? Should we now focus our
instruction on coneness? Hold on, 37% of the students tested did not even
fill out the response. We don't know if they could answer the question or not,
just that they chose not to. Hold on again, students who drew a "net" of a
cone (a 2-d map of the figure that could be folded up to form a cone if cut
out) only got 1 point instead of 2 possible points. If my understanding of
development holds true (and it does), the ability to reallot shape is much
more sophisticated both cognitively, and mathematically than being able to
draw a figure from memory. Why isn't this taken into account?
The point here is that even if we know students do not perform up to our
minimum standards on the AIMS assessment, the information the test provides to
those who hold a stake in public education is virtually useless except to
berate the system for failing again. What would happen if we designed an
assessment system that supports high quality learning for all students, and
constitutes an important piece of a feedback loop for the continual
improvement of the system? What would such an assessment look like?
First,
such an assessment must be embedded in both the quality mathematics that can
be applied across pragmatic situations and those pragmatic situations that
prove to all concerned that the mathematics being taught is, in fact, related
to future life success. The superficial contextualization of problems as
embodied by the current AIMS test does neither.
Second, the assessment must
have a "high ceiling," so that as subsequent cohorts of students take
the exam, the ability to show improvement is built in. Currently, the level
of content of the AIMS test is pretty good. I anticipate that all children
who pass through our high schools should be able to reason with the level of
algebra, geometry, statistics, and discrete mathematics that the designers of
the AIMS test have chosen--in time, as school districts adjust to a more
rigorous mathematics curriculum, coupled with teaching methods with sound
empirical data to back them up. I think the items measuring this level of
content could be made to illustrate more useful and important situations that
all informed citizens should be aware of, but the actual level of content as
is, is NOT too difficult.
Despite this, however, the question of what an appropriate "cut score"
is for such an assessment is problematic. Suppose 60 percent correct is
determined to be an appropriate minimum standard. That means that an 18 year
old student, who has gone through 13 years of public instruction in good
faith, can be denied graduation (and all of those benefits of graduation such
as a decent job, decent housing and self-respect) only by reason of not
passing a sit-down math test. This in spite of 13 years of passing
grades. Whose failure is this? The student's? Sure, he/she didn't meet
minimum standards. I'll buy that. However, the system must also be
held responsible, because the child and his/her parents have kept their
part of the bargain that is entered into when a child first steps foot into
the public schools. Furthermore, who determines the level to which the test
measures potential for contribution to a vital economy? Isn't
this what compulsory education is about, ostensibly? Does a 60 percent
correct student really contribute while a 59 percenter does not? What
recourse do the child and parents have, should the student fail?
Third, a tight feedback loop must be built from the results of the AIMS,
back to the teaching and learning of the student. This is, after all, the
only real justification for such an exam, that teaching and learning improve,
continually, consistently. To do this, the design of the test itself must
reflect the development of reasoning within the areas of algebra, geometry,
statistics, and discrete mathematics, such that when a student's responses are
scored, a defensible indicator of his or her level of understanding of
fundamental concepts is produced. The teacher and student could then use this
information to bolster areas where understanding is lacking. Without such a
loop, only broad policy-level changes can be made, that may or may not benefit
any individual student (see the past 10 years or so of standards-setting, etc.
to reveal a national lesson in futility). The point here is, if you want to
affect education where the rubber hits the road, you need information
that helps you redesign either the tire or the road, or both. Without the
information at the right level of detail, the public will be kept wondering,
"What can we do to help education?" With the information, the public can
answer the more telling question, "How effective have been our reforms?"
Last, the articulation of our education system must begin in the preschool
years, where the quantitative and spatial bases for subsequent mathematics are
laid down, and continue through university work and corporate training to
insure that what happens early on really does affect the impact that a worker
has on the economy of Arizona and the nation. Currently, in our state, the
public school system, the teacher's unions, the universities, and the state
government, are working at cross purposes. Some individuals have been able
to cross boundaries, but institutionally, we do not support each other under
the banner of "Whatever it takes for our children." No single entity is to
blame for the difficulties we face. However, all are to blame if we fail our
students. If an assessment system is to be developed that provides solid
information about the health of public education, in whatever content is
deemed necessary to sustain the New Economy, the answer lies in using the
distributed expertise residing in the state, to design a coherent and
consistent educational system of which the assessment is a small but integral
part.
Should AIMS be scrapped? If AIMS is defined as the attempt to design a
quality assessment system by which the stakeholders in education, the most
important of which are children, their parents, and teachers, can come to
understand where the system flourishes, where it is flawed, and what steps can
be taken to improve it, then I say, "No." Such a system insures that
accountability is based on evidence, and that changes in the system are
designed to fill an identified need. If, however AIMS is considered the
current instrument, of which the released items must be considered a
representative sample (otherwise, why were they released?), then I say, "Yes,
of course" because to continue to utilize a fundamentally flawed instrument
would continue to waste the time of students, teachers, and administrators,
the effort and attention of the State Department of Education, Board of
Education, and various constituent groups, and the money of the Taxpayers of
the State of Arizona. Enough has been wasted already.
Editor's Note: While searching for some additional information,
we discovered a second version of the released AIMS items, which
contains some additional items as well as a few that are slightly different
from those presented in the version this article discusses. In particular,
in the second form, the graph for item 29 is curvilinear, which raises the
question of which version of this item ultimately appeared on the exam taken by
students.
|