Rutgers Teaching and Curriculum Evaluation Grant: Summary of Readings

EXTENDING THE DOMAIN OF TEACHING EFFECTIVENESS ASSESSMENT PART III: SELECTED READINGS SUMMARIZED We are grateful for the $4800 Rutgers University Teaching and Curriculum Evaluation grant to Dr. Ronald E. Rice, Dr. Lea Stewart, Dr. Linda Lederman, and Dr. Brent Ruben, awarded by Dr. Susan Forman, Office of the Vice President for Undergraduate Education, 18 Bishop Place, College Ave. Campus. HOME CONTENTS PAGE (lists table of contents of each part) PART I: OVERVIEW AND RESULTS PART II: BACKGROUND LITERATURE REVIEW RONALD E. RICE Initial reading summaries by MICHELE HUJBER October, 1998

CONTENTS

Aleamoni, L. M. (Ed.). (1987). Techniques for evaluating and improving Instruction. San Francisco: Jossey-Bass.

1. "Can Evaluating Instruction Improve Teaching?", by W. J. McKeachie, 3-7.

2. "Toward Excellence in Teaching," by Robert C. Wilson, 9-24.

3. "Typical Faculty Concerns about Student Evaluation of Teaching," by Lawrence Aleamoni, 25-31.

4. "Using Student Ratings to Improve Instruction," by Joseph Stevens, 33-38.

6. "Formative and Summative Evaluation: Parody or Paradox?" by John Centra, 47-55.

7. "Instructional Evaluation as a Feedback Process," by Doron Gill, 57-64.

8. "A Faculty Evaluation Model for Community and Junior Colleges," by Raoul Arreola, 65-74.

9. "Some Practical Approaches for Faculty and Administrators," by Lawrence Aleamoni, 75-78.

10. "Concluding Comments" by Lawrence Aleamoni, 79-81.

Allen, Mary J., Armstrong, C. A., & Gutierrez, D. M. (1990). Alumni vs. faculty opinion on undergraduate psychology programs. Paper presented at the Annual Convention of the American Psychological Association, Boston, August.

Belcher, Marcia J. (1996). BSU’s impact on skills valued by graduates. Boise: Boise State University, Idaho. (tables only)

Belcher, Marcia J. (1996). In their own words: BSU graduates tell of best and hardest and recommend changes. Boise: Boise State University, Idaho. (alumni survey only)

Braskamp, L. A. & Ory, J. C. (1994). Assessing faculty work: Enhancing individual and institutional performance. San Francisco: Jossey-Bass.

Carson, B. H. (1996). Thirty years of stories: The professor’s place in student memories. Change, Nov./Dec. , 11-17.

Donald, Janet G. & Denison, D. B. (1996). Evaluating undergraduate education: The use of broad indicators. Assessment & Evaluation in Higher Education, 21, 23-39.

Perry, R. P. & Smart, J. C. (Eds.).(1997). Effective teaching in higher eduction: Research and practice. New York: Agathon Press: New York.

5. "Teaching Effectively: Which Students? What Methods?" by Raymond P. Perry, 154-168.
6. "Effective Teaching Behaviors in the College Classroom," by Harry G. Murray, 171-204.
9. "The Dimensionality of Student Ratings of Instruction: What we Know and What we do Not," by Philip C. Abrami, Sylvia d’Apllonia, and Steven Rosenfeld, 321-367.
10. "Identifying Exemplary Teachers and Teaching: Evidence from Student Ratings," by Kenneth A. Feldman, 368-395.
11. "Good Teaching Makes a Difference—And We Know What It Is," by W. J. McKeachie, 396-408.
12. "Exploring the Implications: From Research to Practice," by Maryellen Weimer, 411-435. Krahn, H. and Silzer, B. J. (1995). A study of exit surveys: The Graduand survey at the University of Alberta. College & University, 71(1), 12-23.

Aleamoni, L. M. (Ed.). (1987). Techniques for evaluating and improving instruction. San Francisco: Jossey-Bass.

1. "Can Evaluating Instruction Improve Teaching?", by W. J. McKeachie, 3-7.

This article addresses the uses of instructional evaluation, making a distinction between instructional evaluation used for improvement and instructional evaluation used for personnel decisions. When used for improvement, instructional evaluation serves as a diagnostic tool. The more precise the definitions of particular areas of difficulty, the more likely it is that an appropriate prescription for change can be developed. (p.4) By asking students for examples to support the comments or criticisms they make (in an interview of class members), the specialist is able to obtain more concrete data than are usually found in student ratings. (p.5)

Comment
McKeachie expresses the opinion that the responses obtained from student instructional rating questionnaires do little (or nothing) to provide data useful to pinpointing areas of improvement of teaching. He advocates interviewing students in the classroom to obtain details of why they chose the responses that they chose in the questionnaires—I believe that our initial phone interviews of students should accomplish the same goal of eliciting detailed descriptions of specific qualities of excellent teaching. McKeachie’s assumption is that the elements that students identify as good teaching are elements that can be taught to teachers-- but are some things, such as charisma and intrinsic enthusiasm for teaching, able to be taught?

2. "Toward Excellence in Teaching," by Robert C. Wilson, 9-24.

Wilson describes a similar approach as McKeachie, e.g., starts with the responses from a standardized evaluation form, but instead of asking students to expand on the responses, talks to the faculty member (chosen because they had won at least one teaching excellence award) about whom the questionnaire was administered. Here, the faculty member was asked what they thought it was about their teaching that caused students to give them such high ratings. Again, the result was more specific data about what makes a teacher effective.

3. "Typical Faculty Concerns about Student Evaluation of Teaching," by Lawrence Aleamoni, 25-31.

In this article, Aleamoni outlines faculty concerns with student evaluations of faculty, and gives a rebuttal to each concern:

"Students cannot make consistent judgments concerning the instructor and instruction because of their immaturity, lack of experience, and capriciousness." Aleamoni’s rebuttal: If we examine the research and concentrate only on the studies that used reliable and valid instruments, then we find evidence that students’ judgments tend to be pretty stable.

"Only colleagues with excellent publication records and experience are qualified to evaluate their peers’ instruction." Aleamoni’s rebuttal: Aleamoni and Yimer (1973) asked faculty members, whom they had rated according to number of scholarly publications or appropriate creative work, to rate the instructional effectiveness of their colleagues in their departments; they also gathered information from students on the various courses, instructors, and so on. There were no significant correlations between colleague ratings of instructional effectiveness and research productivity, nor was there a significant relationship between student ratings and faculty productivity. Furthermore, the students and faculty gave very similar ratings to each of the faculty members.

"Most student rating schemes are nothing more than a popularity contest with the warm, friendly, humorous, easy-grading instructor emerging as the winner every time." Aleamoni’s rebuttal: He cites several studies, including one of his own, that show that there is a very low correlation between a student’s subjective opinion of the instructor’s personality and their objective opinion of their instructional excellence.

"Students are not able to make accurate judgments concerning either instruction or instructor until they have been away from the course, and possibly away from the institution, for several years." Aleamoni’s rebuttal: He cites several studies, including one of his own, that show that there is a very high positive relationship between the judgments made by students who had been away and those made by students who were currently taking the course.

"Student rating forms are unreliable and invalid." Aleamoni’s rebuttal: While it is true that student evaluation forms will be unreliable if they have not been professionally constructed and tested, there are instruments with reliability measuring 0.90 and above. As for validity, Aleamoni poses the question, "how highly related is student learning to the way students rate?" Generally, in studies where objective measures of student learning have been obtained, those studies have reported a fairly high positive relationship between the objective measures of learning and the way students rate.

"Extraneous variables, or conditions, could affect student ratings. Some of these conditions include: the size of the class; the gender of the student; the time of day that the course is offered; whether the students are taking the course as a requirement or as an elective; whether the student is a major or nonmajor in the field; the term or semester that the course is offered; the level of the course; and the rank of the teacher, ranging from instructor to full professor." Aleamoni’s rebuttal: The majority of research Aleamoni has looked at indicates little or no relationship between such variables as class size, gender of the student or gender of the instructor, the time of day that the class is offered, the major or nonmajor status of the student, or the term or semester that the course is offered and the way in which students rate a course or instructor. Although in some studies close to being statistically significant, there is no real pattern showing that full professors are rated more positively than the lecturer or the assistant professor. The variables that distinguish a required course from an elective and that identify courses by level (freshman, sophomore, and so on) do seem to generate significant differences in student ratings.

4. "Using Student Ratings to Improve Instruction," by Joseph Stevens, 33-38.

The use of student evaluations for instructional improvement is the focus of this article. The research literature reports inconsistent results when student ratings are used to provide feedback for instructional improvement. Thus, we need to determine how to maximize the effects of feedback for instructional improvement and how to identify factors that either constrain or facilitate improvement. (p.34) Instructor improvement after student-ratings feedback is inconsistent. This is due in part to the cognitive state of the instructor, which will vary greatly from one individual to another, and, as a result, the manner in which feedback information is received by the instructor will also vary greatly. However, the inconsistencies of improvement in feedback studies does not necessarily argue against the utility of student ratings for instructional improvement; rather, it is more likely that the reported results demonstrate the complexity of the instructional milieu and the inadequacy of the "treatment" design of instructional intervention studies.

Stevens proposes that feedback in combination with consultation are more effective that feedback alone, but writes that most studies have not reported or controlled experimentally the specific components of consultation.

6. "Formative and Summative Evaluation: Parody or Paradox?" by John Centra, 47-55.

This article discusses six topics:

student ratings:
When ratings were first used in the thirties and forties, they were used on a voluntary basis, but their use has shifted from the formative, which encourages instructional development, to the summative, where they play an important role in personnel decisions.

Many studies in the last ten years have looked at the validity of student ratings by comparing the ratings to how much students learn from a particular professor (studies of different sections of the same lower-level course taught by ten or twenty instructors). Such studies have found that student ratings are reasonably correlated with student learning. But "we really cannot be sure to what extent these findings will generalize to upper-level courses where only one or two teachers are teaching a section."

Centra asks, "Is there a point of diminishing returns if the same form is used term after term? My guess is that their formative impact (on the instructor) diminishes considerably and that the ratings are then used only for personnel decisions, after all"

colleague evaluation:
Centra lists attempts that have been made at colleague evaluation programs, all of which had disappointing results. He notes one program at the University of Cincinnati that "developed the notion of peer triads, where three faculty members get together and share materials and objectives, visit each other’s classes, and then make suggestions to each other. It’s a nice idea, but, other than at Cincinnati, it has yet to catch on elsewhere."

what is good teaching:
Surveys of faculty members, students, and administrators, in which the question "What are the characteristics of good teachers?" was asked, have resulted in the following list of nine characteristics:

communication skills

favorable attitudes toward students

knowledge of subject

good organization of subject matter and course

enthusiasm about subject

fairness in examinations and grading

willingness to experiment

encouragement of students to think

good speaking ability

Centra notes that a good teacher has some of the characteristics on the list, but there would be very few people who would exhibit every one of them, and that, unfortunately, some colleges simply take a sum total of ratings on the variety of characteristics to evaluate a person’s overall teaching. He goes on to argue that good teaching occurs when the instructor uses a method that is best suited to his or her abilities and also best suited to accomplishing what the course should accomplish.

whether teacher-designed examinations should be used as a measure of teaching

Centra argues that teacher-designed examinations are generally so poorly constructed that they do not reflect what students actually learned in the course.

evaluation of research scholarship

the politics of evaluation

7. "Instructional Evaluation as a Feedback Process," by Doron Gill, 57-64.

Gill lists a "variety of purposes" for teacher evaluation:

allocation of teaching or faculty resources

development of awareness, sensitivity, and appreciation of teaching

improvement of instruction

tenure and promotion decisions

program evaluation

research on teaching

He argues that "instructional evaluation, as it is currently conducted and practiced, has little, if any, effect on college instruction. Feedback, instead of evaluation, needs to be the main technique used in faculty development, and its primary focus should be on instructional improvement. Feedback is defined as information provided to instructors about their performance that includes recommendations for future improvement. The focus is on the instructor, not the measurement. However, simply evaluating teaching does not necessarily have an effect on teaching improvement or student achievement. He proposes improvements similar to those suggested by Stevens, eg., "facilitating conditions must also exist for improvement to occur."

Gill cites James Bess when he discusses the integration of faculty and student satisfaction:

Many (instructors) … do research in the belief that its career rewards will provide higher personal satisfactions…. They are thus seduced into giving relatively less attention through teaching to meeting student needs—an activity which might, under different conditions, yield them profound satisfactions in great abundance.

Students at most institutions under present circumstances are also not able to fulfill their most important needs, particularly those which involve their developing personalities. Colleges and universities usually give greater attention to establish structures designed to help students acquire cognitive knowledge, in service of broad liberal education and/or, presumably, career preparation. Satisfaction of student needs for emotional and interpersonal growth and for self-knowledge are, at best, by-products of the college experience. They are rarely explicit goals of the institution.

A study showed that psychology instructors classified as "facilitator-person were more effective than teachers regarded as "expert and "authority" (effectiveness judged in terms of student motivation for taking additional psychology courses). They concluded that the effect of the teacher is enhanced if the teacher is seen as a person, rather than as someone who only teaches content.

Comment
Bess’ quote explains how the evaluation and feedback process could possibly improve instruction by encouraging faculty to place a higher value on those activities that deal more with the holistic development of their students. By showing that personality development during their undergraduate education is considered very important by alumni, faculty and administration might be more likely to respond to, trust, and use holistic measurements. Personality development is not measured at all in the RU Student Instructional Rating Form (nor anything else about the student’s own intellectual or personal development, for that matter). As Gill concludes, "Feedback can help teachers not only improve their teaching practices, but also change their attitude toward the act (or art) of teaching, so that they can perceive of it as a challenging activity."

8. "A Faculty Evaluation Model for Community and Junior Colleges," by Raoul Arreola, 65-74.

Arreola states that, "In contrast to four-year colleges and universities, the community and junior colleges seem much better able to focus on the evaluation of teaching and to incorporate it into their overall decision making—in particular, into their promotion and tenure structures. Apparently this is because teaching, at community and junior colleges, is considered an important mission in and of itself. The linkage of faculty evaluation with faculty development is more readily accepted in a unionized situation, perhaps because clearly defined agreements about many aspects of faculty employment have already reached the bargaining table.

Ten out of twelve community and junior colleges that implemented faculty evaluation programs considered the following:

Most institutions assume that teaching is to be evaluated, but they have not always given careful attention to what teaching includes and to the other responsibilities, such as advising, community service, and research that faculty are expected to carry out.

After agreeing on the different roles teachers perform, the institution must determine how much value or weight should be placed on each role.

Arreola defines good teaching as composed of instructional delivery skill, knowledge of subject, enthusiasm, and concern for students. Teaching can also be defined to include instructional design skills, such as test construction, development of syllabi, and course organization. Various measures of the content expertise of the instructor could be added, as well as dimensions of record keeping and management.

He suggests constructing a data-gathering specification matrix that lists the teaching roles that are being evaluated, and the different sources from which evaluations of each role will be collected (the sources will be different for different roles).

9. "Some Practical Approaches for Faculty and Administrators," by Lawrence Aleamoni, 75-78.

In order to measure and evaluate instructional effectiveness accurately, one needs to set up criteria and guidelines for that evaluation. This is accomplished most effectively at the departmental level. Departments should be able to come up with 25 or more criteria that the faculty can agree on; then the departmental faculty should develop guidelines to use in evaluating those criteria. It is imperative that departmental faculty be aware that, if they do not develop their own standards, someone else will impose his or her own.

If one wants faculty to take seriously any comprehensive instructional evaluation system, then faculty will have to be convinced of the administration’s commitment to the system.

10. "Concluding Comments" by Lawrence Aleamoni, 79-81.

Common themes run through all articles in this book:

A comprehensive system of instructional evaluation needs to be established with various components differentially weighted at the departmental level (effective uses of student evaluations).

Student ratings should be one of the components of a comprehensive system but should not be expected to carry 100 percent (or even 80 percent) of the weight (developing a list of the dimensions of teaching evaluation that are important to alumni).

Experienced instructional development consultants should be used to provide evaluative feedback to the teaching faculty and to guide them in their use of that feedback in their instructional improvement efforts (effective uses of student evaluations).

University and college administrations must make a stated commitment to instruction and formally place it in the promotion-tenure reward system if it is to be taken seriously by faculty (effective uses of student evaluations).

Student governments have a role to play in the instructional development and evaluation process.

Allen, Mary J., Armstrong, C. A., and Gutierrez, D. M. (1990). Alumni vs. faculty opinion on undergraduate psychology programs. Paper presented at the Annual Convention of the American Psychological Association, Boston, August.

Many measures of institutional quality (eg., campus size, library contents, prestige of funding sources) have little relationship to what students find of critical value: how much they learned.

The authors mention different assessment techniques:

management perspective--gauges success by numbers of graduates and job placements or resource counts (library size, computer availability), and

standardized exams to evaluate student learning

These methods are insufficient because they don’t measure what’s important from the student perspective, e.g., how much they learned.

Responses were analyzed and yielded the following factors:

Personal Growth (adaptability, interest in lifelong learning, self-understanding)

Cultural Diversity

Basic Skills

Scientific Principles

Program Quality

There was also a concurrent survey of faculty opinions of program quality done for comparison.

Belcher, Marcia J. (1996). BSU’s impact on skills valued by graduates. Boise: Boise State University, Idaho. (tables only)

This report presents the findings of a 1995 survey of 1992-93 and 1993-94 graduates of Boise State University (BSU), Idaho, which sought to identify what students valued in a college education and the extent to which they felt BSU had helped them grow in these areas. The survey listed 17 skills or abilities that individuals might hope to attain from going to college, then too rate the extent of impact BSU had on attaining these skills. The 17 skills, listed from highest to lowest impact on alumni were:

Using effective written skills

Defining and solving problems

Using effective oral communication

Developing skills employers need

Committing to lifelong learning

Living life by own standard

Thinking objectively re. beliefs

Working cooperatively in groups

Getting along with different people

Using effective leadership skills

Developing original ideas/products

Accessing a variety of info sources

Drawing conclusions from data

Suggesting solutions to employers

Learning about career options

Understanding humans and environment

Understanding international issues.

Results showed that the skills graduates valued most differed depending upon the major chosen by the graduate.

Belcher, Marcia J. (1996). In their own words: BSU graduates tell of best and hardest and recommend changes. Boise: Boise State University, Idaho. (alumni survey only)

This paper contains the results of a 1995 survey of 1992-93 and 1993-94 graduates of BSU. The report presents respondents’ answers to three open-ended questions of the survey concerning "what they liked most" about their BSU experience, "the hardest thing they had to do" in order to complete their education at BSU, and suggestions for improvement. The paper also reports the common themes that emerged:

people, usually faculty, were cited most often as the aspect of the BSU experience that they liked most.

after people, the learning process was most cited

the hardest thing they had to do most commonly involved academic issues, administrative issues such as class scheduling, personal issues such as balancing school work, or coping with financial difficulties.

advice for improvement fell into categories including more support, understanding respect, and communication; administration; and advisement.

Braskamp, L. A., Brandenburg, D. C., and Ory, J. C. (1984). Evaluating teaching effectiveness: A practical guide. Newbury Park, CA: Sage.

This book is a practical guide for faculty and administrators in the critique, design, and implementation of the evaluation of teaching. Teaching is defined as encompassing classroom activities, organizing a course, developing a curriculum, and advising students (this would eliminate some of the out-of-classroom experiences described in the Carson article, "Thirty Years of Stories, which is summarized below). The authors, similarly to several authors in Aleamoni’s Techniques for Evaluating and Improving Instruction, stress that evaluation of teaching should be assessed from a variety of perspectives; no single piece of evidence (eg. ratings) collected from one source (eg. students) is sufficient to judge the competence of a teacher. The authors refer to this approach as the "multiple purpose, criteria, source, method approach." A second major principle expressed in this book is that in evaluating reaching effectiveness the purpose of the evaluation, such as personnel decision and improvement, needs to be taken into account.

The authors list three major emphases for defining good teaching:

1. Input (what students and teachers bring to the classroom)

If input is emphasized, the basis of judging excellence is much of what has occurred before the course even begins. Although input factors need to be taken into account since they may and can influence student ratings and learning, information focusing on these factors will yield a rather incomplete portrayal or assessment of teacher performance. 2. Process (what students and teachers do in a course) If process is emphasized, the basis for judging effective instruction centers around teacher rather than student behaviors. However, the linkage between what an instructor does and amount learned by students is not always clear, and thus sole reliance on process factors is also not recommended. 3. Product (what students learn or accomplish in the course) If product is emphasized, the basis for judging effective teaching is amount of student learning. There are two major problems in linking student learning to conclusions about effective teaching:

questionable validity of test results—do the tests adequately tap what students learn in a course? (this limitation was also pointed out by Centra in Aleamoni’s Techniques for Evaluating and Improving Instruction. Alumni can give the additional evaluation of learning by reporting whether or not their education is helping them in the workplace).

factors not related to quality of instruction, including student ability, motivation, and prior knowledge, can affect student learning.

When selecting criteria for evaluating teaching effectiveness, it is helpful to distinguish among the three types—the importance put on input, process, or product will depend upon the values of the discipline and an institution’s view of teaching effectiveness.

Criteria also vary in the extent to which they are specified, described, and measurable. Both explicit (quantitative) and implicit (qualitative) criteria are often needed.

Multiple sources. Information about an instructor can be collected form a number of different sources, since not everyone judges an instructor in the same way. Sources include self, alumni, students, records, and colleagues.

Major dimensions of questionnaire items based that have been identified through research (omnibus form) can be classified are:

communication skill

rapport with students

course organization

student self-rated accomplishments

course difficulty

grading and examinations

Major dimensions of questions could also be based on stated course objectives (goal-based form)—(or based upon the objectives of a department, to evaluate a program rather than a single course.)

There are two types of reliability that are relevant to examining the trustworthiness of student ratings:

Agreement—The extent of agreement among students within a class rating the instructor and course

Stability—The extent to which the same students using the same student rating form would rate the instructor and course similarly at two different times.

Generalizations about the reliability of student ratings (p.42):

Student agreement on global ratings is sufficiently high if the class has over 15 students.

Students are consistent in their global ratings of the same instructor at different times in the course.

An instructor’s overall teaching performance in a course can be generalized from ratings from five or more classes taught by the instructor in which at least 15 students are enrolled in each class.

The same instructor teaching different sections of the same course receives similar global ratings from each section.

The book contains a table (p. 47) that shows the relation between student ratings and other measures of effective instruction—which says there is a high positive correlation between student and alumni ratings of overall instructor competence.

On using alumni as sources of evaluative information: Evaluations about the sequence and depth of course material and support and advice faculty gave to the students during their college career are valuable kinds of information to a department in its examination of its curriculum offerings and the role of its faculty in instruction.

The book contains a detailed table showing the phases in using evaluative information, from "having available the collected evaluative information" to "justification of previous decisions."

The book also contains a list of 24 suggestions for enhancing the use of evaluations, from forming a consultative relationship with to respecting the privacy of the person being evaluated.

Carson, B. H. (1996). Thirty years of stories: The professor’s place in student memories. Change, Nov./Dec., 11-17.

In 1995 the author queried students who had graduated from Rollins College from 1964 through 1990. She asked if they "could think back to a professor in their major (and then a professor outside their major) whom they regarded most highly as an effective teacher and to describe as fully as possible "specific incidents or other details (from inside the classroom or outside) contributing to your high regard." (p. 12) 222 alumni responded. Carson explains that she was not trying to discover the characteristics that distinguish excellent professors, but does list the themes that emerged:

Outstanding teachers love the subjects they teach

They respect and like their students

They are committed to and skilled at connecting the two things they care deeply about—their subject matter and their students.

Carson writes (p.12): What I was hoping to find in the narratives … were the specifics behind these familiar generalizations. Instead of reading that X percent of outstanding teachers are "enthusiastic" about their subjects or "caring" toward their students, I wanted to hear real voices talking about the behaviors that students translate—and remember—as signs of professors’ love of their disciplines or concern for their students. Beyond that, I wanted to hear what the former students had to say about how those professorial behaviors affected them. Carson received many "critical incident" stories. The alums connected their transformative experiences not with the subject matter (or not primarily with that), but with a "complex and personal encounter linking professor, student, and subject matter in an exchange as much affective as cognitive.

Carson suggests that the academic content was not mentioned as much because the cognitive learning had, by the time of the survey, become second-nature, but "critical moments where we learn to look at life in a different way may be remembered with the clarity of a conversion experience." (p.12)

A few observations about what good teachers do:

Some teachers are excellent because intellectual passion for their subject is shown bytheir energy and excitement, often accompanied by humor, which energizes their students. The students then want to maintain that feeling of excitement by imitating the professor (eg. pursuing the academic field with the same level of passion). (p. 13)

Some teachers conveyed their intellectual passion in quieter ways—just by showing by their actions (being emotionally moved by the subject matter) that they loved their subject. (p. 13)

Effective professors linked students and subject matter in a variety of ways; their classes were marked by clarity and organization and by lively exchanged among professor and student…. The better teachers seemed to be "great" moderators of conversational/discussion type classes. In the best classes, the professor posed questions that lead the students far beyond recitation-level responses, and the class discussions included student exchanges with each other as well as with the professor. (p. 16).

The other strategy for connecting students with subject matter most often associated with effective teachers was their capacity to tell stories, to introduce real-life examples, to exemplify tough concepts with anecdotes and illustrations (distinctions were drawn between stories that were and weren’t relevant to class—the professors telling non-relevant stories were criticized.)

The single quality the Rollins alumni most frequently associated with effective teachers—more often than brilliance and love of subject and even more often than enthusiasm in the classroom—was a special attitude toward and relationship with students…. While expressions of that caring took many forms, the constant behind the variety was the students’ sense of specific, personal attention.

The single most frequently cited evidence of a professor’s caring was accessibility.
Many students interpreted the interest the professors took in them as affirmations of their own self-worth. (p. 14)

The professor’s ability to see personal worth and academic ability—unrecognized by the students themselves—is referred to by Carson as "tapping." She conclude that "tapping" may be the "single most influential act a professor can perform.

Many of the Rollins graduates have come to realize that one of the most telling signals of their professors’ respectful caring lay in challenging them to higher levels of achievement that they had thought possible.

"Perhaps, really, what effective professors provide in their very toughness are situations of disequilibrium, challenges to old ideas and old behaviors that, in the presence of encouragement and direction, nudge students into significant developmental changes." (p. 17)

Carson concludes that the broad range of responses about what makes a teacher effective indicates that teachers don’t have to be all things to all students—they can make a positive impact in a way consistent with their own personalities. (p. 17)

Explaining why interpersonal relationships are not just "nice," but actually have a solid link to learning, Carson writes (p. 16):

In an article published in American Psychologist in 1980, Robert Zajonc explains that it is likely that the very first stages of both learning and remembering what was learned are affective. ‘When we try to recall, recognize, or retrieve an episode, a person, a piece of music, a story, a name, in fact anything at all,’ Zajonc writes, ‘the affective quality of the original input is the first element to emerge.' More recent research has revealed the physiological basis for the connection: when we respond to something with emotional intensity, stress hormones excite the part of the brain that transforms impressions or short-term memories into long-term memories. The greater the affective intensity, the easier both the original imprinting and the recall. "The graduates were emphatic in identifying the instructional techniques distinguishing effective from ineffective teachers. The two charges most often leveled against ineffective teachers were that their presentation and purpose were muddled and that they ‘taught straight from the book and didn’t make you think.’"

Carson concludes with some concerns (p. 17):

given their emphasis on the personal relationship between professor and student, I’m troubled by the movement in education today toward teacher-free "learning environments."

I’m troubled by … the focus of academic assessment today… I worry about the tendency to easy reductionism, to a quantification that cannot possibly capture the complex and ambiguous—but lasting—education reflected in these students’ memories of their college experience…. I worry that many… confuse simplistic assessment outcomes with quality of teaching.

Donald, Janet G. and Denison, D. B. (1996). Evaluating undergraduate education: The use of broad indicators. Assessment & Evaluation in Higher Education, 21, 23-39.

"The aim of this study was to examine the extent to which broad indicators of performance, such as student satisfaction with program, teaching, student life and experiences after graduation, could be used for program improvement…. Perceived quality of teaching was found to contribute significantly to graduates’ rating of the overall quality of their academic program." (p. 23)

"A broad indicator is a performance indicator which can be used at several levels, across domains or throughout the institutional system. Broad indicators five coarse-grained or general rather than fine-grained or detailed information which is a function of the fact that they are expressed as a single item…. They are broad as opposed to specific observable indicators and thus require elaboration if they are to be used for program improvement." (p. 24) Student evaluations and alumni surveys are a subgroup of broad indicators.

"…the first two years following completion of education constitute a critical formative period for graduates when they establish their career direction and , through their work experience, reinforce the skills and knowledge acquired through formal learning. Because of this, graduates are particularly aware of the shortcomings of their formal education. The argument is that alumni can provide valuable insights, since they have the benefit of hindsight and can evaluate college and work experiences and their relative importance (Graham & Cockreil, 1990). In contrast with undergraduates, who can only speculate about the utility or significance of various aspects of their educational experience, graduates can report the actual significance in relation to their current employment or life status (Moden & Williford, 1988)…. Retrospective evaluations which relate undergraduate experience to (p. 25) subsequent employment or further study thus may provide more concrete and operational advice for improving undergraduate education than have specific measures of instruction. Suggested uses of graduate surveys include a broad range of decisions about the curriculum, course content and major requirements, faculty roles and teaching methods, student services, and information for resource allocation and institutional planning (Moden & Williford, 1988). "

Two questions dealt with graduates’ level of satisfaction at the institutional level:
their retrospective choice of institution and their assessment of the quality of student life.

At the program level, students were asked if, in retrospect, they would choose the same program. Satisfaction with the program provided a baseline for viewing other aspects of educational experience.

At the program level, graduates were also asked to rate:

Quality of teaching

Relevance of knowledge gained in the program, quality of job preparation, and success in finding employment

Quality of Preparation for Graduate Studies Provided by the Undergraduate Program

"…the features cited by the graduates were at a different level of specificity and concerned different topics from those they had been asked to rate. The most frequently cited features related to students’ academic development, both general, for example, to acquire knowledge and the opportunity to improve themselves generally, and specific, for example, to acquire an in-depth knowledge of an academic discipline. These items are the kind found in the student outcome, assessment and experience literature, but not in the teacher and course ratings or program review literature." (p. 31)

Aspects of the educational experience mentioned most frequently by graduates as particularly meaningful (specific examples of each appear on pp. 32-33):

Students’ academic development—general

ability to analyze, synthesize, think critically

sense of responsibility toward own education, eg., self discipline

communication skills, verbal & written

Students’ academic development—specific

solid grounding in fundamentals of program

opportunities to interact with peers/faculty

good mix of theory and practice

The findings of this study raise three sets of issues:

1. The relative feasibility and utility of surveys and of the rating scales and open-ended questions

"…some caution in their interpretation as part of an academic program review is warranted given the emerging empirical evidence regarding the complexity of the constructs involved. For example,

the presumed causal relationship between academic performance and satisfaction has been brought into question…. (Bean & Bradley, 1986; Pike, 1991).
"…students expressing the greatest degree of satisfaction with their academic program have… preferences regarding the purpose, nature and process of higher education…" that agrees with preferences of the faculty. The degree of satisfaction in these cases are not necessarily indicative of program excellence.

2. The focus of the remarks of graduates, their limitations and their particular value

Whereas the designers of the survey (in concert with university decision makers) focused on traditional administrative distinctions between teaching, program and student life, alumni focused more on learning and development outcomes, criteria of quality for student development rather than for programs or institutions.

Earlier studies have found that "intellectual and cultural experiences are extremely important in determining (graduates’) attitudes toward the colleges they attended (Pace, 1974; Spaeth & Greely, 1970). (p. 35)

"Graduates’ comments about meaningful features of undergraduate education tended to blur the distinction between in-class and out-of-class learning experiences, a phenomenon noted in the research literature (for example, Kuh et al., 1991)

3. The different aims and perspectives of evaluations done in post-secondary institutions and how they might be made more coherent and useful

"Post-secondary reviews and evaluations and institutional research in general rarely ask questions about student development and learning." (p. 35)

Lack of communication between different parts of the institution is a weakness.

Evaluations by students, institutional research and planning offices, and program reviews need to be brought together.

Perry, R. P. and Smart, J. C. (Eds.). (1997). Effective teaching in higher education: Research and practice. New York: Agathon Press.

5. "Teaching Effectively: Which Students? What Methods?", by Raymond P. Perry, 154-168.

Covington’s Self-Worth Typology: In classrooms, students are motivated in specific ways to optimize their self-worth. Students can be divided into four distinct groups:

Overstrivers

Success-Oriented

Failure-Avoiding

Failure-Accepting

Each group has different motivations, and if an instructor knows which group(s) their students are in, they can adjust their teaching methods accordingly. "In reality, however, classrooms will typically consist of all four types, thereby possibly forcing an instructor to target a specific type, while recognizing that not all the students may benefit form the teaching practice chosen." (p. 158)

Perry’s Perceived Control Typology. "…students differ in their perceived control over their academic performance and … these differences engender divergent thoughts, feelings, and actions." (p. 158) Level of perceived control occurs along a continuum, with the "no control" end corresponding to Covington’s "failure accepting," or "helpless" category and "control" corresponding to Covington’s "success-oriented" category. "High control students are most likely to believe that they have personal control over their academic performance… low control students believe that they can do little to influence the course of events around them. …moderate control students seem to combine attributes of both mastery and helpless students, believing that they have control over some aspects of their performance but not others." (p. 160) …both typologies (Covington and Perry’s) are reasonably valid and practical, particularly in comparison to more common experientially based, idiosyncratic typologies. The challenge… is for college instructors to make greater use of typologies such as these in making decisions about their teaching practices.

Perry’s comments about specific teaching practices, all of which are grounded in logic, theory, and empirical evidence address how major dimensions of teaching are directly related to student learning:

Foster Self-Worth

Since most students cannot excel in all situations, it is better to emphasize that each student do their best (compete against themselves) rather than to compete with their classmates for the best grade.

Covington recommends three teaching practices that would contribute to overall effective instruction:

Use engaging and challenging tasks

Ensure that sufficient reinforcers are available to all students

Use systematic feedback, given routinely and tied specifically to performance.

"In each instance (of applying the above recommendations to one’s own teaching practices) the practice could be adjusted so that it involves a task-focused rather than in ability-focused approach to student learning, the goal being to optimize achievement motivation in every student." (p. 162)

Perry points out that, while it would be ideal to have different teaching practices for each type of student in Covington’s typography, this is not practical. Perry points out that "Covington’s typology makes the problem somewhat more manageable by identifying a key factor underlying student differences and then describing the types of students explicitly. Considerable advantage can be gained, therefore, in developing a teaching practice to match his four types of students and being able to anticipate its eventual impact."

Organizing Content (Instructor Organization)

Kiewra’s thesis is that "information needs to be organized for optimal learning through the use of a knowledge representation system known as the matrix….. According to Kiewra, knowledge is both factual and structural, the former concerned with things, events, ideas; the latter, with the interrelationships between them…. Abundant empirical evidence is now available attesting to the significance of instructor organization for student learning. For example, in a comprehensive re-analysis of several meta-analyses involving effective college teaching Feldman (1989) reported a correlation of +.57 between instructor organization and student achievement. This means that professors who are organized also have students who do better than professors who are not organized. While not proving definitively a causal connection, this correlational evidence suggests one possible interpretation in which better organization by the professor enables students to achieve more. Such an interpretation would also suggest that matrix representations, as one aspect of instructor organization, should also contribute to improved academic performance. Accordingly, matrix representations can be placed in the larger context of college instruction as a specific teaching practice.

Enhancing Perceived Control

"Presumably, repeated exposure to this type of teaching could increase helpless students’ perceived control to the pint at which students become more mastery-oriented and are thereby able to benefit form effective instruction…. Thus, being organized, clear, interactive, expressive, etc. could serve to increase helpless students’ internal locus which in turn, may eventually improve their performance." (p. 165-166)

"Instead of adopting new teaching practices, the professor may wish to modify existing ones with the sole purpose of enhancing perceived control in students." (p. 166)

6. "Effective Teaching Behaviors in the College Classroom," by Harry G. Murray, 171-204.

Murray distinguishes between "low inference" and "high inference behaviors, the former referring to "a concrete, denotable action of the instructor that can be recorded with little or no inference on the part of an observer," and the later referring to "one (action) that can be assessed only through observer inference or judgement."

Murray mentions benefits of studying classroom teaching behaviors (p. 173):

helps us understand what effective teaching is, why it is effective, and how it impacts on student development.

knowledge of factors underlying effective teaching can provide guidelines on how to train or select college teachers, how to evaluate teaching, and how to improve the performance of current teachers. For example, research on low-inference teaching behaviors can be applied to the development of student instructional rating forms that focus on specific, denotable characteristics of instructors, and thus are more useful in providing diagnostic feedback than the typical global rating forms in current use (Murray, 1987).

research can be applied to the design of in-service faculty training programs that focus on a limited set of classroom behaviors known to contribute significantly to overall teaching effectiveness (Murray and Lawrence, 1980).

"Although data are limited, it would appear that teachers do make a difference in the amount learned by students, and in some cases this difference is quite large. When affective or attitudinal outcome measures are used, teacher effects tend to be larger and more reliable than those reported for final exam scores. Murray (1983b) found that teachers accounted for more variance in non-cognitive measures such as course and instructor ratings and subsequent course enrollment than in final examination performance. It may be, then, that teachers not only influence student learning, but more importantly, influence student motivation for further learning."

Following are general conclusions drawn from observational studies concerning low-inference teaching behaviors (p. 188):

Low-inference behavioral data are objective and accurate.

Classroom teaching behaviors make a significant (up to 80 percent) difference in student attitudes, learning of course content, and motivation for further learning.

Three dimensions of teaching behavior have consistently emerged as strong predictors of instructional outcomes:

enthusiasm/expressiveness

clarity of explanation

rapport/interaction

The impact of classroom teaching behaviors on student development can be interpreted in terms of cognitive theories of information-processing and learning. Murray (1983a) proposed that teacher enthusiasm plays an attention-getting role in information-processing, whereas teacher clarity facilitates the encoding of information in long-term memory, and teacher interaction encourages active responding and memory retrieval.

Teaching behaviors have typically shown an uneven profile of correlations with different instructional outcomes. For example, behaviors that correlate with affective outcome measures often fail to correlate similarly with cognitive outcomes, while behaviors that predict cognitive gain may fail to predict affective development.

It remains to be seen whether classroom behaviors found to be effective in the lecture method of teaching are similarly effective in non-lecture contexts.

Within the traditional lecture method, available evidence suggests that specific teaching behaviors contribute similarly to overall teaching effectiveness in different academic disciplines.

Following are general conclusions drawn from experimental studies regarding low-inference classroom teaching behaviors:

Classroom teaching behaviors, at least in the enthusiasm and clarity domains, appear to be causal antecedents (rather than mere correlates) of various instructional outcome measures.

Low-inference teaching behaviors have been shown to influence not only student instructional ratings, but objective measures of student learning as well.

Teaching behaviors accounted for a sizable proportion of outcome measure variance. As a general rule, teaching behaviors accounted for more variance in student instructional ratings than in objective measures of student learning.

Recent evidence suggests that enthusiastic or expressive classroom teaching behaviors may affect student motivational and attributional processes that extend far beyond the classroom.

"Given that the teaching behaviors found to be effective in prior research are specific, concrete, denotable, and presumably aquirable, the most obvious implication of this research is that college and university instructors can improve their classroom performance simply by exhibiting these behaviors with greater frequency.

Caveats associated with this "behavioral" prescription for teaching improvement (p. 196) include:

Some low-inference teaching behaviors are easy to acquire, while others are extremely difficult to acquire.

Rather than trying to mechanically emulate a wide array of teaching behaviors, instructors would be better advised to focus on a small subset of behaviors that are compatible with the instructor’s basic traits, abilities, and educational values, and are relevant to areas of needed improvement.

There is more to effective college teaching than effective classroom behaviors. "This fact, parenthetically, supports the argument that student instructional ratings, when used for summative purposes, should always be supplemented by colleague assessment of ‘content’ or ‘substance’ aspects of instruction."

There is resistance in the minds of many faculty members to the idea of implementing certain teaching behaviors, particularly expressive or enthusiastic behaviors, in the college classroom.

The above caveats notwithstanding, there is evidence that research on low-inference teaching behaviors can be successfully applied to the improvement of college and university teaching.

Application of findings to improvement of instruction include:

One way in which research on low-inference behaviors can be applied to improvement of teaching is through the development of better procedures for providing diagnostic feedback to instructors. A fruitful approach to improvement of diagnostic feedback would be to construct student rating forms that "focus directly on low-inference classroom behaviors, and thus provide clearer prescriptions for remedial action. "Contrary to this hypothesis, early attempts to demonstrate beneficial effects of behavioral feedback met with limited success." (p. 197) However, despite the pessimistic outcomes of earlier research, a "Murray and Smith study suggests that, under the right conditions, behavioral feedback can contribute significantly to improvement of classroom teaching. More research is needed to confirm or disconfirm the effectiveness of behavioral feedback, both in comparison to absence of feedback and in comparison to obvious alternatives, such as global feedback.

A second way in which research on low-inference teaching behaviors can be applied to improvement of instruction is through intensive training of faculty on a limited subset of classroom behaviors known to contribute significantly to instructional outcome measures. Murray and Lawrence (1980) assessed the impact of speech and drama training for lecturers. Experimental teachers showed significant gains in student ratings.

"Probably for reasons of expedience and practicality, both observational and experimental studies of low-inference teaching have relied almost exclusively on final exams and recall as cognitive outcome measures. Only two of the studies reviewed (Tom and Cushman, 1975; Smith, 1977) examined teaching behaviors in relation to gains in student thinking or problem solving skills. While recall of facts and comprehension of concepts are important educational outcomes, it could be argued that the most important goal of higher education is to teach students to think for themselves. It would be interesting to know the extent to which progress toward this goal is influenced by specific behaviors of the instructor." (p. 202).

9. "The Dimensionality of Student Ratings of Instruction: What We Know and What We Do Not," by Philip C. Abrami, Sylvia d’Apollonia, and Steven Rosenfeld, 321-367.

I. Instructional Dimensions
The authors discuss in depth the methods for empirically determining effective teaching. In doing so, they emphasize the use of student ratings for each of the three definitions of effective teaching. Their discussion addresses (p. 322):

the difficulties of directly assessing the products of instruction—they suggest the use of a table of specifications as one way to develop a rating from to indirectly measure what and how students have learned.

student ratings as process measures must contain items which assess the relevant aspects of teaching accurately in each instructional context—the Dimensionality of student ratings varies with course characteristics, and "some items which evaluate specific aspects of teaching vary in relevance across contexts."

Multidimensional student rating forms do not contain items which evaluate the same, specific teaching qualities; the rating forms lack both comprehensiveness and uniformity. "We conclude that since the qualities of teaching evaluated by different student rating forms appear to differ both in their nature and structure, it is of value to explore the forms further and determine if there are dimensions of teaching common to a collection of student rating forms."

The authors examine the quantitative review of 43 multisection validity studies, and describe "what we have learned from these studies and what remains to be learned of the relationship between what instructors do when they teach and how this affects student learning. They note that "reviews to date suggest that the specific dimensions of teaching appear to differentially and, in some cases, poorly predict instructor impacts on learning compared to global ratings." (p. 323)

II. How Dimensions Relate to Student Learning
"Now that we have identified the common structure of student ratings, the next phase of research will be to use the techniques of quantitative research integration to explore the relationship between this structure and teacher-produced student achievement as well as the substantive and methodological variables which explain inconsistencies in the relationships."

"The relationship between the process and product views of effective teaching seeks to find the links between what teachers do and whether and how students change as a result." (p. 324)

"We hypothesize that the varied products of effective teaching are affected by different teaching processes. But we cannot describe with any great confidence the specific nature of these causal relationships. We further hypothesize that the causal relationship between any one teaching process and any one teaching product will vary as a function of external influences including student, course, and setting influences." (p. 326)

IV. Factors that do and do not Influence Ratings
The authors evaluate three validation designs (p. 323):

the laboratory design—"uses the experimental manipulation of instructional conditions to study the causal effects of instruction on students. It is often considered low in external validity." (p. 323)

the multisection validation design—"uses multiple sections of the same course taught by different instructors employing common measures of student ratings and student learning. The correlations between curse section means for student ratings and means for student achievement explore the relationship between instructional processes and an important instructional product. We consider the multisection design particularly strong because it reduces the probability of rival explanations to instructor impacts and is high in generalizability to classrooms…. We conclude that studies employing the multisection design are worthy of special attention."

the multitrait-multimethod design—student ratings and several criterion measures (e.g., instructor self-ratings) are collected across a wide range of courses, without controlling for biasing or extraneous influences. We consider this design weaker both in internal validity, since controls are lacking, and in external validity, since important product measures of instruction (e.g. student learning) are not included.

V. Effective uses of Student Evaluations
The authors point out that there has been much study, and disagreement, about the dimensions of effective teaching, "regarding, in particular, whether and how data from multidimensional student rating forms should be used in summative decisions about teaching (e.g. promotions, merit, tenure, etc.). This paper critically examines many of these issues and reaches important conclusions about the Dimensionality of teaching as reflected in student ratings, makes practical suggestions, as well as suggests directions for future research." (p. 322)

VI. Issues of Concern or Qualification about the use of Student Evaluations
The authors list concerns (p. 322):

Are rating results consistent over time?

Are students uniform in their assessments of instructors?

Are ratings free from the influence of biasing characteristics?

What is the dimensionality of student ratings?

Are these dimensions consistent across students, courses, settings, and rating forms?

Which dimensions reflect the impact of instruction on student learning and other outcomes?

The authors present and critically analyze three alternative definitions of effective teaching: the product definition, the process definition, and the process-product definition. They argue that "the relationships between teaching processes and teaching products is of major interest to researchers and practitioners.

"We note that reviews to date suggest that the specific dimensions of teaching appear to differentially and, in some cases, poorly predict instructor impacts on learning compared to global ratings. We suggest that there are several limitation of prior reviews…. There is a lack of a comprehensive, empirically validated system for organizing the findings from different rating forms into a common framework…. Consequently, a more comprehensive research integration is called for using an empirically determined scheme for coding and findings from different rating forms.

"Student ratings measure directly one product of instruction; namely, student satisfaction with teaching…. Otherwise, student ratings do not measure directly how much or how well a class of students has leaned or any other aspect of achievement in the cognitive domain including how well the content is retained. Student ratings also do not often measure directly: most affective products of instruction such as student expectations, beliefs, and concepts about themselves as learners; student attitudes values, and interests toward the subject matter including enrolling in other courses in the area or adopting the area as a field of major study; student interpersonal and social skills generally and such skills within the context of executing a complex academic task, etc.…. Ratings are used to infer that highly rated instructors positively affect instructional products…. To what extent do student ratings reflect the impact of instructors on students learning of course content, their motivation to learn, development of interpersonal skills…? …on average, there is a modest, positive relationship between global ratings of instruction and instructor-produced student learning of lower-level academic skills… Much less is known about the validity of ratings as predictors of other outcomes of instruction…. Rating forms occasionally include items that ask student to assess the success of instructors at encouraging them to learn but seldom include items that assess the specific behaviors associated with that motivation. Similarly, rating forms do not often contain items that ask students to assess an instructor’s impact on specific cognitive and meta-cognitive achievements." (p. 330)

The authors present an evaluation form that asks for evaluations of how much the student has learned in specific topics taught in the course. This is something that is empirically defensible. However, asking about the enthusiasm (for example) of the instructor does not empirically prove anything, unless the student makes a direct link between the enthusiasm and what they have learned from the course. (p. 330)

"The accuracy of student ratings of teaching process is a concern about criterion-related validity. Are students able to accurately judge whether (quantity) and how well (quality) instructors teach according to the dimensions specified on the rating form? In general, criterion-related validation studies require alternative measures of the teaching process in addition to student ratings." (p.331)

"Their (Cashin and Downey, 1992) results were that global items accounted for a substantial amount of the variance (more than 50%). They concluded: `the results of this study have supported that single, global items—as suggested by Abrami (1985)—can account for a great deal of the variance resulting from a weighted composite of many multidimensional student rating items’ (Cashin and Downey, 1992, p. 569). They recommended that short student rating forms should be used for summative evaluations and longer forms should be reserved for teaching improvement. (p. 335)

"Collectively, the results of the reviews suggest that some specific rating dimensions, as well as student global ratings, are moderately correlated with student learning in multisection college courses. On average, there exists a reasonable, but far from perfect, relationship between some student ratings and learning. To a moderate extent, student ratings are able to identify those instructors whose students learn best. Furthermore, regardless of the coding scheme used, the average of global ratings of instructional effectiveness explains a greater percentage of variance in student learning than the average of specific ratings. It also appears that not all specific ratings are related to achievement; for example, ratings of course difficulty generally do not predict student achievement at all. Consequently, we recommend using the results of specific rating dimensions to judge which teachers best promote student learning with caution especially when making promotion and tenure decisions. The same caution is not necessary when using global ratings of instruction." (p.344)

10. "Identifying Exemplary Teachers and Teaching: Evidence from Student Ratings," by Kenneth A. Feldman, 368-395.

I. Instructional Dimensions
While Feldman ranked the dimensions of teaching by ranking the correlations between specific evaluations and student achievement, he also discusses another method of determining what dimensions of instruction are most important from the student’s point of view, i.e. by "comparing the magnitudes of the correlations between the actual overall evaluations by students of their teachers and their ratings of each of the specific attitudinal and behavioral characteristics of these teachers. "Those specific instructional dimensions that are the most highly associated with student achievement tend to be the same ones that that best discriminate among teachers with respect to the overall evaluation they receive from students. The correlation is not a perfect one, however." (p. 382).

II. How Dimensions Relate to Student Learning
Feldman investigated the question of how various teaching dimensions relate to student learning. He discovered the following (only the top 5 are listed), which are listed in order of greatest correlation to student learning to the least (p. 376):

Teacher’s preparation; organization of the course

Clarity and understandableness

Teacher pursued and/or met course objectives

Perceived outcome or impact of instruction

Teacher’s stimulation of interest in the course and its subject matter

However, Feldman cautions that, "It is important to recognize that the associations between specific evaluations of teachers and student achievement by themselves do not establish the causal connections between the instructional characteristics under investigation and student achievement…. Some third variable such as student motivation, ability or aptitude of the class might independently affect both teacher performance and student learning, which would account for the correlations between instructional characteristics and student achievement even if there were no direct causal connection." (p. 377-378). However, in studies reviewed by Cohen (1980a) and Feldman (1989b) that investigated the ratings of students who had been randomly assigned to a multisection class, thereby preventing self-selection into classes, "studies in which students were randomly assigned to sections gave about the same results as did studies where students picked their own sections." (p. 378) Feldman concludes that, "Results such as these increase the likelihood that the instructional characteristics and student achievement are causally connected, although the possibility of spurious elements has not been altogether ruled out." (p. 378)

Feldman points out that there is much to be learned about the psychological and social psychological dynamics that influence student learning: "…although a case can be made that many of the different instructional characteristics could be expected to facilitate student learning…, what is needed are specific articulations about which particular dimensions of instruction theoretically and empirically are more likely and which less likely to produce achievement. A crucial aspect of this interest is specifying exactly how those dimensions that affect achievement do so—even when, at first glance, the mechanisms seem obvious." (p. 379-380)

III. How SETs Compare to Others’ Evaluations of Teaching
Feldman asked students and teachers about the importance of various components of instruction. "Students and faculty were generally similar, though not identical in their views…. However, the ordering of the instructional dimensions by either of these groups shows differences (as well as some similarities) with that based on the two indicators of importance using student ratings of actual teachers."

IV. Factors that do and do not Influence Ratings
Feldman presents a careful definition of bias, explaining that to define bias as an unfair situation in which the instructor is unfairly evaluated does not go far enough, saying, "bias here refers to one or more factors directly and somehow inappropriately influencing students’ judgments about and evaluation of teachers or courses." (p. 370) Bias, according to Feldman, is a factor, unrelated to the teaching itself, which students consider when evaluating the instructor or the course. For example, in the situation where a teacher did not teach as well in a large classroom than in a small one, and was therefore evaluated lower in the large class than in the small one, the evaluation has not been biased, because it is a fair and appropriate assessment of the instruction that was given in that particular situation.

Feldman reviews numerous research reviews (p. 370) and draws the following conclusions about the question of bias:

The following statements are untrue:

students cannot make consistent judgments about the instructor and instruction because of their immaturity, lack of experience, and capriciousness

only colleagues with excellent publication records and expertise are qualified to teach and to evaluate their peers’ instruction—good instruction and good research being so closely allied that it is unnecessary to evaluate them separately

most student rating schemes are nothing more than a popularity contest, with the warm, friendly, humorous instructor emerging as the winner every time

students are not able to make accurate judgments until they have been away from the course, and possibly away from the university for several years

student ratings are both unreliable and invalid

the time and day the course is offered affect student ratings

students cannot meaningfully be used to improve instruction

Feldman also agrees, with some reservations, that the belief that the gender of the student and the instructor affects student ratings. He points out, however, that "there is some indication of an interaction effect between the gender of the student and the gender of the teacher."

Feldman questions some of the beliefs about SETs that Aleamoni classifies as "myths," and points out that "although the results of pertinent studies are somewhat mixed, some weak trends can be discerned (p. 371). He cites studies that show that slightly higher ratings are given to teachers of smaller rather than larger courses; to teachers of upper-level rather than lower level courses; to teachers of higher rather than lower academic ranks; by students taking a course as an elective; and by students taking a course that is in their major rather than one that is not. But, he cautions that these factors do not necessarily explain why the ratings are higher in these situations.

Feldman disagrees with Aleamoni about calling the statement, "the grades or marks students receive in the course are highly correlated with their ratings of the course and instructor." He flatly disagrees that there is a "high" correlation, but does agree that there is a "small or even modest association." (p. 372) In addition, Feldman points out that students who receive high grades have learned a lot from the course, and are justified in giving a high evaluation to the instructor. However, he does cite Marsh and Dunkin (p. 373), who concluded that: "Evidence… supports the validity hypothesis and the student characteristics hypothesis, but does not rule out the possibility that a grading leniency effect operates simultaneously."

Feldman points out a possible bias that Aleamoni missed: academic discipline of the course—he found that teachers in different academic fields tend to be rated "somewhat" differently. (p. 373)

VI. Issues of Concern or Qualification about the use of Student Evaluations
Having explained that SETs may sometimes be valid while at the same time being unfair to the instructor, since there are conditions, such as the size of the course or the pre-existing level of motivation of the students which may that are beyond the instructors control, Feldman cautions that, "Although rating bias may not necessarily be involved, those interested in using teaching evaluations to help in decisions about promotions and teaching awards may well want to take into account the fact that it may be somewhat harder to be effective in some courses than in others." (p. 372)

11. "Good Teaching Makes a Difference—And We Know What It Is," by W. J. McKeachie, 396-408.

II. How Dimensions Related to Student Learning
McKeachie points out that "While there are overlaps between motivational and cognitive aspects of the Marsh dimensions, most can be fairly easily classified as affecting either student motivation or cognition." (p. 399)

McKeachie points out that specific dimensions of teaching can have different effects upon learning, depending on the context. "Criticism, for example, may be taken by a student as evidence that he or she lacks the ability to succeed, or it may be interpreted as evidence that the teacher thinks that one has the ability to improve. This the kind of feedback and the previous relationship between the teacher and the student may determine whether the feedback produces a reduction in motivation or increased motivation. Similarly, organization has a rather tricky relationship to student prior knowledge, the difficulty of the material, and the heterogeneity of the students in a class." (p. 406)

McKeachie points out that much is known about the cognitive processes that are affected by teaching: "enthusiasm enhances student attention; teacher clarity aids encoding; interaction of students and teachers promotes the surfacing of misunderstanding, and permits clarification and elaboration." (p. 406)

Understanding of motivation has led to insights about how teaching affects motivation for learning. "The teacher’s enthusiasm about the interest and value of the subject acts as a model that influences the value students place upon learning the material; moreover, as Feldman notes, teacher enthusiasm includes spontaneity and variability, which not only affects attention but is also relevant to curiosity and interest…. Similarly, interaction of students and teachers increases opportunity for students to feel a greater sense of personal control." (p. 406)

III. How SETs Compare to Others’ Evaluations of Teaching
McKeachie, in the context of explaining why peer evaluations are more readily accepted than SETs, despite the fact that SETs are more valid, declares, based on personal experience as a department chair and member of his college executive committee and reviews of "probably well over a thousand" letters, that "these evaluations are almost always positive." (p. 402) But, student ratings are also mostly positive: "At the University of Michigan 90% of our faculty are rated as excellent by over half of their students." (p.402)

V. Effective uses of Student Evaluations
McKeachie makes a distinction of how evaluations should be used. For research and personnel purposes, " only a general factor, such as Abrami, d’Apollonia and Rosenfeld’s general factor or Marsh’s higher order factors may be sufficient; for analyzing a particular course, in helping a particular group of teachers improve, or for research on the effect of interventions in teaching, a finer cut, such as Feldman’s, may be more useful."

McKeachie points out that although research on SETs shows that they lead to some improvement in teaching, the amount of improvement is small unless the feedback involves consultation. He points out that "a major reason for this…is that many faculty members resist using them," but that there is no great resistance to midterm evaluations. His offers the explanation that, while in the case of evaluations elicited at the end of the course, the instructor does not have the opportunity to "make it right," or have control over the interpretation and use of the results (when they are used by administration to make personnel decisions), midterm evaluations enable the instructor to use the information to improve the end-of-course evaluation. The reward for the instructor to apply feedback to their teaching practices is more immediate, and it is something they have personal control over. "Perry shows the importance of perceived personal control in student motivation, and faculty members are, if anything, even more motivated for personal control than the average person." (p. 402).

VI. Issues of Concern or Qualification about the use of Student Evaluations
In the context of explaining why there is widespread resistance to SETs by faculty, McKeachie argues that, even though good teaching is quantifiable, numbers can be misused: "Once numbers are assigned, faculty promotion committees begin to make comparisons between teachers and assume that if one number is larger than another, there is a real difference between the teachers to whom the numbers have been assigned."

McKeachie speculates that a second reason for resistance to SETs among faculty is that the evaluations are "seldom used as a positive factor in determining the promotion of faculty members…. Thus, a teacher being evaluated runs the risk of negative results with little chance of positive rewards."

"Faculty members should be asked to include data from several classes in their portfolio, but they should be free to opt out when they are trying new methods or developing a risky innovation, just as they are free to avoid publishing research that didn’t pan out.

12. "Exploring the Implications: From Research to Practice," by Maryellen Weimer, 411-435.

I. Instructional Dimensions
"Feldman sorts and prioritizes the dimensions in two ways. First, he attempts to establish the relative importance of the dimensions by looking at them in terms of student achievement…. Second, he looks at the relationship between specific items and global ratings in terms of correlations assuming that the ‘overall assessment of teachers would be more highly associated with instructional characteristics that students generally consider to be important to good teaching…. (p.380)" (p. 413)"

"While Abrami, d’Apollonia, and Rosenfield agree that teaching is multidimensional, they do not accept the dimensions proposed in the other two chapters, particularly in terms of their universality. In their work they identified four factors, three of which are highly correlated, and all of which suggest a large global component that can alone be the basis for instructional judgment."

VI. Issues of Concern or Qualification about the use of Student Evaluations
The Marsh and Dunkin review considers (p. 413):

Evaluation instruments are reliable—meaning the instrument itself does not get in the way of what it intends to measure.

Research confirms the generalizability of the ratings—they are primarily a function of the instructor who teaches the course and not of the course that is taught.

The validity of student ratings is more complicated. Basically, they measure student satisfaction, but do they also measure whether or not the instruction facilitates learning or levels of instructional effectiveness? (p. 414) "Establishing validity by means of this more meaningful criteria presents a number of challenges" (p. 414):

The learning must be shown to be a consequence of the teaching. Marsh and Dunkin are critical of multisection validity studies because, even though there are high correlations between class averages on standardized exams and student ratings of instructors, the final exams often measure "low-level learning outcomes, like how much a student has memorized…. Marsh and Dunkin look at larger conceptions of validity—like the relationships between self and peer ratings—and in most areas find other evidence of validity." (p. 414)

Weimer warns against:

instruments used to evaluate instruction in which the dimensions that are and are not reflected are not chosen by an empirically rigorous development process. (p. 417)

institutional climates in which there is far more competition than cooperation. Students can be polled too often, and published lists can cause unnecessary morale problems among faculty.

"Overall rating results may motivate improvement but they do not inform it…. And so, when instructional change is the objective, multidimensional forms aid and direct the process. The debate over global vs. multidimensional ratings can be resolved practically by using global ratings for summative purposes and multidimensional forms to accomplish formative ends." (p. 418)

Krahn, H. and Silzer, B. J. (1995). A study of exit surveys: The Graduand survey at the University of Alberta. College & University, 71(1), 12-23.

I. Instructional Dimensions
"The Graduand survey invites evaluations of university services and facilities, teaching and learning experiences, and acquisition of a range of skills and competencies, and overall satisfaction with the University of Alberta." (p. 14)

V. Effective uses of Student Evaluations
Krahn and Silzer discuss the reasons for the need for student evaluations, pointing out that society is increasingly demanding accountability from the university. They mention the connection with Total Quality Management moving from the private to the public sector, and the different levels of acceptance of the emphasis upon continuous improvement and the accountability for the use of public funds. Some are favorable about performance assessment, believing that self-assessment will lead to higher-quality institutions, while others believe that TQM-type assessments are inappropriate for higher education because the outcomes of a university education are difficult to quantify. In the opinion of Krahn and Silzer, "postsecondary institutions would benefit more, and perhaps suffer less, if they took the initiative to devise and implement a valid set of performance indicators rather than wait for someone else to impose a less appropriate set of measures."

In general, faculty and departments responded positively to the results of the survey. In one year, three more detailed in-class surveys were developed in response to the Graduand survey to find out more about dissatisfactions that had been revealed in the earlier survey.

The authors strongly recommend that results of the survey not be used to punish faculty or departments about whom dissatisfaction has been discovered: "In our opinion, such use of performance indicators would increase but performance might not improve.

VI. Issues of Concern or Qualification about the use of Student Evaluations
Krahn and Silzer emphasized the importance of understanding why a student enrolled in the university in the first place, because their original goals and expectations would influence how they would evaluate their experience in retrospect. For example, whether or not the student’s main goal was to launch a career in the field would be a major influence on how they evaluated the university’s effectiveness in preparing them for a career.

The authors recommend that, while administrators will interpret and use the responses to broad questions, individual departments should be the ones to interpret and use the more specific data about instructors and courses.

Top of Page