Summary Notes of "Student Ratings of Teaching: The Research Revisited," by William E. Cashin, (Referred by Nancy Simpson), iDEA Paper No. 32, Center for Faculty Evaluation & Development, Kansas State University, September 1995, 9 pages.

Summarized by James T. P. Yao – 8/26/99

"There are now [as of 1995] more than 1,500 references dealing with research on student evaluation of teaching. … This paper will attempt to summarize the conclusions of the major reviews of the student rating literature from … 1971 to the present. …"

"The ERIC descriptor for student ratings is ‘student evaluation of teacher performance.’ I suggest that the term ‘student ratings’ is preferable to ‘student evaluation.’ ‘Evaluation’ has a definitive and terminal connotation; it suggests that we have an answer. ‘Rating’ implies that we have data which need to be interpreted. … Viewing student ratings as data rather than as evaluation may also help to put them in proper perspective. Writers of faculty evaluation are almost universal in recommending the use of multiple sources of data. … Further, there are important aspects of teaching that students are not competent to rate …"

"… [There are] six factors commonly found in student rating forms:

  1. Course organization and planning.
  2. Clarity, communication skills.
  3. Teacher student interaction, rapport.
  4. Course difficulty, workload.
  5. Grading and examinations.
  6. Student self-rated learning.

[Other factors include] learning/value, enthusiasm,…, group interaction, …, breadth of coverage, …, assignments, … When interpreting student rating data, we must distinguish among the various items and their dimensions to insure that all the appropriate dimensions are rated. Averaging dissimilar items is not appropriate. … one or a few global or summary type items might provide sufficient student rating data for personnel decisions. …"

"In the educational measurement literature, reliability covers consistency, stability, and generalizability of items. For student rating items, reliability refers most often to consistency or interrater agreement… Reliability varies depending upon the number of raters, i.e., the more raters, the more reliable. … As a rule of thumb, I recommend that items with fewer than ten raters … be interpreted with particular caution."

"Stability is concerned with agreement between raters over time. In general, ratings of the same instructor tend to be similar over time. …"

"Generalizability is concerned with how confident we can be that our data accurately reflect the instructor’s general teaching effectiveness, not just how effective he or she was in that particular course that term. … The instructor, not the course, is the primary determinant of the student rating items. … If the instructor teaches only one course,…, consistent ratings from two different terms may be sufficient. For most instructors, however, use ratings from a variety of courses, for two or more courses from every term for at least two years, totaling at least five courses. …"

"Validity … to what extent do student rating items measure some aspect of teaching effectiveness? Unfortunately there is no agreed upon definitions of ‘effective teaching’ nor any single, all-embracing criterion. … The best one can do is to try various approaches, collecting data that either support or contest the conclusion that student ratings reflect effective teaching."

"Approach One – Student Learning Theoretically, the best criterion of effective teaching is student learning. Other things being equal, the students of more effective teaching should learn more. … … the classes in which the students gave the instructor higher ratings tended to be the classes where the students learned more, i.e., scored higher on the external exam. On the other hand, the correlations are far from perfect. In part because many of the variables that relate to students’ learning will be related to student characteristics (e.g., motivation or ability), not to instructor characteristics."

"Approach Two – Instructor’s Self Ratings Researchers have sought for a criterion of effective teaching that would be acceptable to faculty. One possibility is the self ratings of the instructor. … … studies provide further support for the validity of the students’ ratings."

"Approach Three – The Rating of Others

Administrator’s Ratings – Student ratings correlates with administrator’s ratings …

Colleague’s Ratings - … Some faculty question whether the students have an appropriate conception of what effective teaching is. … Students tended to place more weight on the instructor being interesting, having good speaking skills, and being available to help; students also focused more on the outcomes of instruction, e.g., what they learned. Faculty placed relatively more weight on intellectual challenge, motivating students, setting high standards, and fostering student self-initiated learning.

Alumni Ratings – Student ratings correlate with alumni ratings … This belies the conventional wisdom that the students will come to appreciate our teaching after they get into the real world as working adults."

Trained Observers – A few studies have used external observers who were trained … the median reliability for trained observers was .76. This suggests that peer ratings based on classroom observation would be reliable if the observers were trained."

"Approach Four – Comparison with Student Comments Some faculty question the value of student ratings but accept student written comments to open-ended questions. … These studies suggest, for personnel decisions, the information from student ratings overlaps considerably the information in student comments."

"Approach Five – Possible Sources of Bias … Some writers have suggested that bias be defined as anything not under the control of the instructor. … bias in student ratings should be restricted to variables NOT related to teaching effectiveness. By this definition, the correlations between student ratings and class size, or the students’ interest in the course are not biases, or classes of students who are interested in the subject matter actually do learn more. … I suggested an even narrower definition when using ratings for personnel decisions or the instructor’s improvement. I suggested restricting bias to variables not a function of the instructor’s teaching effectiveness. Thus, student motivation or class size might impact teaching effectiveness, but instructors should not be faulted if they were less effective teaching large classes of unmotivated students than their colleagues who were teaching small classes of motivated students. …"

"Variables Not Requiring Control …

  1. Instructor variables not related to student ratings:
  1. age, and teaching experience - …
  2. gender of the instructor - …
  3. race - …
  4. personality - …
  5. research productivity - …
  1. Student variables not related to student ratings:
  1. age of the student - …
  2. gender of the student - ..
  3. level of the student - …
  4. student’s GPA - …
  5. student’s personality - …

C. Course variables not related to student ratings:

  1. class size - …
  2. time of the day - …

Administrative variables not related to student ratings:

  1. time during the term - …"

"Variables Possibly Requiring Control …

  1. Instructor variables related to student ratings:
  1. faculty rank – regular faculty tend to receive higher ratings than graduate teaching assistants … This variable does NOT require control because regular faculty as a group tend to be more effective teachers than GTAs as a group.
  2. expressiveness - … Expressiveness tends to enhancing learning and does NOT require control.
  1. Student variables related to student ratings:
  1. student motivation - …
  2. expected grades - … Three possible hypotheses have been proposed for these correlations. …
  • validity hypothesis – the students who learn more earn higher grades and give higher ratings …
  • grading leniency – …
  • student characteristics - …

… To control for the possibility of grade leniency, my recommendation is to have peers (…) review the course material, particularly exams, computer scored test results, graded samples of essays, projects, etc.; and judge whether grades are inflated.

  1. Course variables related to student ratings:
  1. level of the course – …
  2. academic field – …
  3. workload/difficulty - … contrary to faculty belief, they are correlated positively, i.e., students give higher ratings in difficult courses where they have to work hard. Although positive, the correlations are not large [less than .29]. …
  1. Administrative variables related to student ratings:
  1. non-anonymous ratings - …
  2. instructor present while students complete ratings - …
  3. purpose of the ratings – some studies have found that if the directions say the ratings will be used for personnel decisions, the ratings tend to be higher than if they will be used only by the instructor for improvement …"

"Many faculty will grant the usefulness of student ratings for personnel decisions, but question their usefulness for improvement, preferring to rely on students’ open-ended comments. … If an institution really intends to use student ratings to improve teaching, it needs to provide some kind of consultation to the instructors."

"There are probably more studies of student ratings than of all the other data used to evaluate college teaching combined. … In general, student ratings tend to be statistically reliable, valid, and relatively free from bias or the need for control; probably more so than any other data for evaluation. Nevertheless, student ratings are only one source of data about teaching and must be used in combination with multiple sources of data if one wishes to make a judgment about all of the components of college teaching. Further, student ratings are data that must be interpreted. We should not confuse a source of data with the evaluators who use student rating data – in combination with other kinds of data – to make their judgments about an instructor’s teaching effectiveness."

 

 

 

 

Return to the Lohman homepage

© 2001 The Lohman Professorship all rights reserved. Last modified