|
Summary Notes of "Student Ratings of Teaching: The Research
Revisited," by William E. Cashin, (Referred by Nancy Simpson),
iDEA Paper No. 32, Center for Faculty Evaluation & Development,
Kansas State University, September 1995, 9 pages.
Summarized by James T. P. Yao 8/26/99
"There are now [as of 1995] more than 1,500 references dealing
with research on student evaluation of teaching.
This paper
will attempt to summarize the conclusions of the major reviews of
the student rating literature from
1971 to the present.
"
"The ERIC descriptor for student ratings is student
evaluation of teacher performance. I suggest that the term
student ratings is preferable to student evaluation.
Evaluation has a definitive and terminal connotation;
it suggests that we have an answer. Rating implies that
we have data which need to be interpreted.
Viewing student
ratings as data rather than as evaluation may also help to put them
in proper perspective. Writers of faculty evaluation are almost
universal in recommending the use of multiple sources of data.
Further, there are important aspects of teaching that students are
not competent to rate
"
"
[There are] six factors commonly found in student
rating forms:
- Course organization and planning.
- Clarity, communication skills.
- Teacher student interaction, rapport.
- Course difficulty, workload.
- Grading and examinations.
- Student self-rated learning.
[Other factors include] learning/value, enthusiasm,
, group
interaction,
, breadth of coverage,
, assignments,
When interpreting student rating data, we must distinguish among
the various items and their dimensions to insure that all the appropriate
dimensions are rated. Averaging dissimilar items is not
appropriate.
one or a few global or summary type items might
provide sufficient student rating data for personnel decisions.
"
"In the educational measurement literature, reliability
covers consistency, stability, and generalizability of items. For
student rating items, reliability refers most often to consistency
or interrater agreement
Reliability varies depending upon
the number of raters, i.e., the more raters, the more reliable.
As a rule of thumb, I recommend that items with fewer than
ten raters
be interpreted with particular caution."
"Stability is concerned with agreement between raters
over time. In general, ratings of the same instructor tend to be
similar over time.
"
"Generalizability is concerned with how confident we
can be that our data accurately reflect the instructors general
teaching effectiveness, not just how effective he or she was in
that particular course that term.
The instructor, not the
course, is the primary determinant of the student rating items.
If the instructor teaches only one course,
, consistent
ratings from two different terms may be sufficient. For most instructors,
however, use ratings from a variety of courses, for two or more
courses from every term for at least two years, totaling at least
five courses.
"
"Validity
to what extent do student rating items
measure some aspect of teaching effectiveness? Unfortunately there
is no agreed upon definitions of effective teaching
nor any single, all-embracing criterion.
The best one can
do is to try various approaches, collecting data that either support
or contest the conclusion that student ratings reflect effective
teaching."
"Approach One Student Learning Theoretically,
the best criterion of effective teaching is student learning. Other
things being equal, the students of more effective teaching should
learn more.
the classes in which the students gave
the instructor higher ratings tended to be the classes where the
students learned more, i.e., scored higher on the external exam.
On the other hand, the correlations are far from perfect. In part
because many of the variables that relate to students learning
will be related to student characteristics (e.g., motivation or
ability), not to instructor characteristics."
"Approach Two Instructors Self Ratings
Researchers have sought for a criterion of effective teaching that
would be acceptable to faculty. One possibility is the self ratings
of the instructor.
studies provide further support
for the validity of the students ratings."
"Approach Three The Rating of Others
Administrators Ratings Student ratings correlates
with administrators ratings
Colleagues Ratings -
Some faculty question
whether the students have an appropriate conception of what effective
teaching is.
Students tended to place more weight on the
instructor being interesting, having good speaking skills, and being
available to help; students also focused more on the outcomes of
instruction, e.g., what they learned. Faculty placed relatively
more weight on intellectual challenge, motivating students, setting
high standards, and fostering student self-initiated learning.
Alumni Ratings Student ratings correlate with alumni
ratings
This belies the conventional wisdom that the students
will come to appreciate our teaching after they get into the real
world as working adults."
Trained Observers A few studies have used external
observers who were trained
the median reliability for trained
observers was .76. This suggests that peer ratings based on classroom
observation would be reliable if the observers were trained."
"Approach Four Comparison with Student Comments
Some faculty question the value of student ratings but accept student
written comments to open-ended questions.
These studies suggest,
for personnel decisions, the information from student ratings overlaps
considerably the information in student comments."
"Approach Five Possible Sources of Bias
Some writers have suggested that bias be defined as anything not
under the control of the instructor.
bias in student ratings
should be restricted to variables NOT related to teaching effectiveness.
By this definition, the correlations between student ratings and
class size, or the students interest in the course are not
biases, or classes of students who are interested in the subject
matter actually do learn more.
I suggested an even narrower
definition when using ratings for personnel decisions or the instructors
improvement. I suggested restricting bias to variables not a function
of the instructors teaching effectiveness. Thus, student motivation
or class size might impact teaching effectiveness, but instructors
should not be faulted if they were less effective teaching large
classes of unmotivated students than their colleagues who were teaching
small classes of motivated students.
"
"Variables Not Requiring Control
- Instructor variables not related to student ratings:
- age, and teaching experience -
- gender of the instructor -
- race -
- personality -
- research productivity -
- Student variables not related to student ratings:
- age of the student -
- gender of the student - ..
- level of the student -
- students GPA -
- students personality -
C. Course variables not related to student ratings:
- class size -
- time of the day -
Administrative variables not related to student ratings:
- time during the term -
"
"Variables Possibly Requiring Control
- Instructor variables related to student ratings:
- faculty rank regular faculty tend to receive higher ratings
than graduate teaching assistants
This variable does NOT
require control because regular faculty as a group tend to be
more effective teachers than GTAs as a group.
- expressiveness -
Expressiveness tends to enhancing learning
and does NOT require control.
- Student variables related to student ratings:
- student motivation -
- expected grades -
Three possible hypotheses have been
proposed for these correlations.
- validity hypothesis the students who learn more earn
higher grades and give higher ratings
- grading leniency
- student characteristics -
To control for the possibility of grade leniency, my recommendation
is to have peers (
) review the course material, particularly
exams, computer scored test results, graded samples of essays,
projects, etc.; and judge whether grades are inflated.
- Course variables related to student ratings:
- level of the course
- academic field
- workload/difficulty -
contrary to faculty belief, they
are correlated positively, i.e., students give higher ratings
in difficult courses where they have to work hard. Although positive,
the correlations are not large [less than .29].
- Administrative variables related to student ratings:
- non-anonymous ratings -
- instructor present while students complete ratings -
- purpose of the ratings some studies have found that if
the directions say the ratings will be used for personnel decisions,
the ratings tend to be higher than if they will be used only by
the instructor for improvement
"
"Many faculty will grant the usefulness of student ratings
for personnel decisions, but question their usefulness for improvement,
preferring to rely on students open-ended comments.
If an institution really intends to use student ratings to improve
teaching, it needs to provide some kind of consultation to the instructors."
"There are probably more studies of student ratings than of
all the other data used to evaluate college teaching combined.
In general, student ratings tend to be statistically reliable, valid,
and relatively free from bias or the need for control; probably
more so than any other data for evaluation. Nevertheless, student
ratings are only one source of data about teaching and must be used
in combination with multiple sources of data if one wishes to make
a judgment about all of the components of college teaching. Further,
student ratings are data that must be interpreted. We should not
confuse a source of data with the evaluators who use student rating
data in combination with other kinds of data to make
their judgments about an instructors teaching effectiveness."
Return to
the Lohman homepage |