Aleamoni, L. M. (Ed.). (1987). Techniques for evaluating and improving instruction. San Francisco: Jossey-Bass.
1. "Can Evaluating Instruction Improve Teaching?", by W. J. McKeachie, 3-7.
This article addresses the uses of instructional evaluation, making a distinction between instructional evaluation used for improvement and instructional evaluation used for personnel decisions. When used for improvement, instructional evaluation serves as a diagnostic tool. The more precise the definitions of particular areas of difficulty, the more likely it is that an appropriate prescription for change can be developed. (p.4) By asking students for examples to support the comments or criticisms they make (in an interview of class members), the specialist is able to obtain more concrete data than are usually found in student ratings. (p.5)
Comment
McKeachie expresses the opinion that the responses obtained from
student instructional rating questionnaires do little (or nothing) to
provide
data useful to pinpointing areas of improvement of teaching. He
advocates
interviewing students in the classroom to obtain details of why they
chose
the responses that they chose in the questionnaires—I believe that our
initial phone interviews of students should accomplish the same goal of
eliciting detailed descriptions of specific qualities of excellent
teaching.
McKeachie’s assumption is that the elements that students identify as
good
teaching are elements that can be taught to teachers-- but are some
things,
such as charisma and intrinsic enthusiasm for teaching, able to be
taught?
2. "Toward Excellence in Teaching," by Robert C. Wilson, 9-24.
Wilson describes a similar approach as McKeachie, e.g., starts with the responses from a standardized evaluation form, but instead of asking students to expand on the responses, talks to the faculty member (chosen because they had won at least one teaching excellence award) about whom the questionnaire was administered. Here, the faculty member was asked what they thought it was about their teaching that caused students to give them such high ratings. Again, the result was more specific data about what makes a teacher effective.
3. "Typical Faculty Concerns about Student Evaluation of Teaching," by Lawrence Aleamoni, 25-31.
In this article, Aleamoni outlines faculty concerns with student evaluations of faculty, and gives a rebuttal to each concern:
"Students cannot make consistent judgments concerning the instructor and instruction because of their immaturity, lack of experience, and capriciousness." Aleamoni’s rebuttal: If we examine the research and concentrate only on the studies that used reliable and valid instruments, then we find evidence that students’ judgments tend to be pretty stable.
"Only colleagues with excellent publication records and experience are qualified to evaluate their peers’ instruction." Aleamoni’s rebuttal: Aleamoni and Yimer (1973) asked faculty members, whom they had rated according to number of scholarly publications or appropriate creative work, to rate the instructional effectiveness of their colleagues in their departments; they also gathered information from students on the various courses, instructors, and so on. There were no significant correlations between colleague ratings of instructional effectiveness and research productivity, nor was there a significant relationship between student ratings and faculty productivity. Furthermore, the students and faculty gave very similar ratings to each of the faculty members.
"Most student rating schemes are nothing more than a popularity contest with the warm, friendly, humorous, easy-grading instructor emerging as the winner every time." Aleamoni’s rebuttal: He cites several studies, including one of his own, that show that there is a very low correlation between a student’s subjective opinion of the instructor’s personality and their objective opinion of their instructional excellence.
"Students are not able to make accurate judgments concerning either instruction or instructor until they have been away from the course, and possibly away from the institution, for several years." Aleamoni’s rebuttal: He cites several studies, including one of his own, that show that there is a very high positive relationship between the judgments made by students who had been away and those made by students who were currently taking the course.
"Student rating forms are unreliable and invalid." Aleamoni’s rebuttal: While it is true that student evaluation forms will be unreliable if they have not been professionally constructed and tested, there are instruments with reliability measuring 0.90 and above. As for validity, Aleamoni poses the question, "how highly related is student learning to the way students rate?" Generally, in studies where objective measures of student learning have been obtained, those studies have reported a fairly high positive relationship between the objective measures of learning and the way students rate.
"Extraneous variables, or conditions, could affect student ratings. Some of these conditions include: the size of the class; the gender of the student; the time of day that the course is offered; whether the students are taking the course as a requirement or as an elective; whether the student is a major or nonmajor in the field; the term or semester that the course is offered; the level of the course; and the rank of the teacher, ranging from instructor to full professor." Aleamoni’s rebuttal: The majority of research Aleamoni has looked at indicates little or no relationship between such variables as class size, gender of the student or gender of the instructor, the time of day that the class is offered, the major or nonmajor status of the student, or the term or semester that the course is offered and the way in which students rate a course or instructor. Although in some studies close to being statistically significant, there is no real pattern showing that full professors are rated more positively than the lecturer or the assistant professor. The variables that distinguish a required course from an elective and that identify courses by level (freshman, sophomore, and so on) do seem to generate significant differences in student ratings.
4. "Using Student Ratings to Improve Instruction," by Joseph Stevens, 33-38.
The use of student evaluations for instructional improvement is the focus of this article. The research literature reports inconsistent results when student ratings are used to provide feedback for instructional improvement. Thus, we need to determine how to maximize the effects of feedback for instructional improvement and how to identify factors that either constrain or facilitate improvement. (p.34) Instructor improvement after student-ratings feedback is inconsistent. This is due in part to the cognitive state of the instructor, which will vary greatly from one individual to another, and, as a result, the manner in which feedback information is received by the instructor will also vary greatly. However, the inconsistencies of improvement in feedback studies does not necessarily argue against the utility of student ratings for instructional improvement; rather, it is more likely that the reported results demonstrate the complexity of the instructional milieu and the inadequacy of the "treatment" design of instructional intervention studies.
Stevens proposes that feedback in combination with consultation are more effective that feedback alone, but writes that most studies have not reported or controlled experimentally the specific components of consultation.
6. "Formative and Summative Evaluation: Parody or Paradox?" by John Centra, 47-55.
This article discusses six topics:
student ratings:
When ratings were first used in the thirties and forties, they were
used on a voluntary basis, but their use has shifted from the
formative,
which encourages instructional development, to the summative, where
they
play an important role in personnel decisions.
Many studies in the last ten years have looked at the validity of student ratings by comparing the ratings to how much students learn from a particular professor (studies of different sections of the same lower-level course taught by ten or twenty instructors). Such studies have found that student ratings are reasonably correlated with student learning. But "we really cannot be sure to what extent these findings will generalize to upper-level courses where only one or two teachers are teaching a section."
Centra asks, "Is there a point of diminishing returns if the same form is used term after term? My guess is that their formative impact (on the instructor) diminishes considerably and that the ratings are then used only for personnel decisions, after all"
colleague evaluation:
Centra lists attempts that have been made at colleague evaluation
programs, all of which had disappointing results. He notes one program
at the University of Cincinnati that "developed the notion of peer
triads,
where three faculty members get together and share materials and
objectives,
visit each other’s classes, and then make suggestions to each other.
It’s
a nice idea, but, other than at Cincinnati, it has yet to catch on
elsewhere."
what is good teaching:
Surveys of faculty members, students, and administrators, in which
the question "What are the characteristics of good teachers?" was
asked,
have resulted in the following list of nine characteristics:
7. "Instructional Evaluation as a Feedback Process," by Doron Gill, 57-64.
Gill lists a "variety of purposes" for teacher evaluation:
Gill cites James Bess when he discusses the integration of faculty and student satisfaction:
Students at most institutions under present circumstances are also not able to fulfill their most important needs, particularly those which involve their developing personalities. Colleges and universities usually give greater attention to establish structures designed to help students acquire cognitive knowledge, in service of broad liberal education and/or, presumably, career preparation. Satisfaction of student needs for emotional and interpersonal growth and for self-knowledge are, at best, by-products of the college experience. They are rarely explicit goals of the institution.
Comment
Bess’ quote explains how the evaluation and feedback process could
possibly improve instruction by encouraging faculty to place a higher
value
on those activities that deal more with the holistic development of
their
students. By showing that personality development during their
undergraduate
education is considered very important by alumni, faculty and
administration
might be more likely to respond to, trust, and use holistic
measurements.
Personality development is not measured at all in the RU Student
Instructional
Rating Form (nor anything else about the student’s own intellectual or
personal development, for that matter). As Gill concludes, "Feedback
can
help teachers not only improve their teaching practices, but also
change
their attitude toward the act (or art) of teaching, so that they can
perceive
of it as a challenging activity."
8. "A Faculty Evaluation Model for Community and Junior Colleges," by Raoul Arreola, 65-74.
Arreola states that, "In contrast to four-year colleges and universities, the community and junior colleges seem much better able to focus on the evaluation of teaching and to incorporate it into their overall decision making—in particular, into their promotion and tenure structures. Apparently this is because teaching, at community and junior colleges, is considered an important mission in and of itself. The linkage of faculty evaluation with faculty development is more readily accepted in a unionized situation, perhaps because clearly defined agreements about many aspects of faculty employment have already reached the bargaining table.
Ten out of twelve community and junior colleges that implemented faculty evaluation programs considered the following:
He suggests constructing a data-gathering specification matrix that lists the teaching roles that are being evaluated, and the different sources from which evaluations of each role will be collected (the sources will be different for different roles).
9. "Some Practical Approaches for Faculty and Administrators," by Lawrence Aleamoni, 75-78.
In order to measure and evaluate instructional effectiveness accurately, one needs to set up criteria and guidelines for that evaluation. This is accomplished most effectively at the departmental level. Departments should be able to come up with 25 or more criteria that the faculty can agree on; then the departmental faculty should develop guidelines to use in evaluating those criteria. It is imperative that departmental faculty be aware that, if they do not develop their own standards, someone else will impose his or her own.
If one wants faculty to take seriously any comprehensive instructional evaluation system, then faculty will have to be convinced of the administration’s commitment to the system.
10. "Concluding Comments" by Lawrence Aleamoni, 79-81.
Common themes run through all articles in this book:
A comprehensive system of instructional evaluation needs to be established with various components differentially weighted at the departmental level (effective uses of student evaluations).
Student ratings should be one of the components of a comprehensive system but should not be expected to carry 100 percent (or even 80 percent) of the weight (developing a list of the dimensions of teaching evaluation that are important to alumni).
Experienced instructional development consultants should be used to provide evaluative feedback to the teaching faculty and to guide them in their use of that feedback in their instructional improvement efforts (effective uses of student evaluations).
University and college administrations must make a stated commitment to instruction and formally place it in the promotion-tenure reward system if it is to be taken seriously by faculty (effective uses of student evaluations).
Student governments have a role to play in the instructional
development
and evaluation process.
Allen, Mary J., Armstrong, C. A., and Gutierrez, D. M. (1990). Alumni vs. faculty opinion on undergraduate psychology programs. Paper presented at the Annual Convention of the American Psychological Association, Boston, August.
Many measures of institutional quality (eg., campus size, library contents, prestige of funding sources) have little relationship to what students find of critical value: how much they learned.
The authors mention different assessment techniques:
Responses were analyzed and yielded the following factors:
Belcher, Marcia J.
(1996).
BSU’s impact on skills valued by graduates. Boise: Boise State
University,
Idaho. (tables only)
This report presents the findings of a 1995 survey of 1992-93 and
1993-94 graduates of Boise State University (BSU), Idaho, which sought
to identify what students valued in a college education and the extent
to which they felt BSU had helped them grow in these areas. The survey
listed 17 skills or abilities that individuals might hope to attain
from
going to college, then too rate the extent of impact BSU had on
attaining
these skills. The 17 skills, listed from highest to lowest impact on
alumni
were:
after people, the learning process was most cited
the hardest thing they had to do most commonly involved academic issues, administrative issues such as class scheduling, personal issues such as balancing school work, or coping with financial difficulties.
advice for improvement fell into categories including more support, understanding respect, and communication; administration; and advisement.
Braskamp, L. A., Brandenburg, D. C., and Ory, J. C. (1984). Evaluating teaching effectiveness: A practical guide. Newbury Park, CA: Sage.
This book is a practical guide for faculty and administrators in the critique, design, and implementation of the evaluation of teaching. Teaching is defined as encompassing classroom activities, organizing a course, developing a curriculum, and advising students (this would eliminate some of the out-of-classroom experiences described in the Carson article, "Thirty Years of Stories, which is summarized below). The authors, similarly to several authors in Aleamoni’s Techniques for Evaluating and Improving Instruction, stress that evaluation of teaching should be assessed from a variety of perspectives; no single piece of evidence (eg. ratings) collected from one source (eg. students) is sufficient to judge the competence of a teacher. The authors refer to this approach as the "multiple purpose, criteria, source, method approach." A second major principle expressed in this book is that in evaluating reaching effectiveness the purpose of the evaluation, such as personnel decision and improvement, needs to be taken into account.
The authors list three major emphases for defining good teaching:
1. Input (what students and teachers bring to the classroom)
factors not related to quality of instruction, including student ability, motivation, and prior knowledge, can affect student learning.
Criteria also vary in the extent to which they are specified, described, and measurable. Both explicit (quantitative) and implicit (qualitative) criteria are often needed.
Multiple sources. Information about an instructor can be collected form a number of different sources, since not everyone judges an instructor in the same way. Sources include self, alumni, students, records, and colleagues.
Major dimensions of questionnaire items based that have been identified through research (omnibus form) can be classified are:
There are two types of reliability that are relevant to examining the trustworthiness of student ratings:
Stability—The extent to which the same students using the same student rating form would rate the instructor and course similarly at two different times.
Students are consistent in their global ratings of the same instructor at different times in the course.
An instructor’s overall teaching performance in a course can be generalized from ratings from five or more classes taught by the instructor in which at least 15 students are enrolled in each class.
The same instructor teaching different sections of the same course receives similar global ratings from each section.
On using alumni as sources of evaluative information: Evaluations about the sequence and depth of course material and support and advice faculty gave to the students during their college career are valuable kinds of information to a department in its examination of its curriculum offerings and the role of its faculty in instruction.
The book contains a detailed table showing the phases in using evaluative information, from "having available the collected evaluative information" to "justification of previous decisions."
The book also contains a list of 24 suggestions for enhancing the use of evaluations, from forming a consultative relationship with to respecting the privacy of the person being evaluated.
Carson, B. H. (1996). Thirty years of stories: The professor’s place in student memories. Change, Nov./Dec., 11-17.
In 1995 the author queried students who had graduated from Rollins College from 1964 through 1990. She asked if they "could think back to a professor in their major (and then a professor outside their major) whom they regarded most highly as an effective teacher and to describe as fully as possible "specific incidents or other details (from inside the classroom or outside) contributing to your high regard." (p. 12) 222 alumni responded. Carson explains that she was not trying to discover the characteristics that distinguish excellent professors, but does list the themes that emerged:
Carson suggests that the academic content was not mentioned as much because the cognitive learning had, by the time of the survey, become second-nature, but "critical moments where we learn to look at life in a different way may be remembered with the clarity of a conversion experience." (p.12)
A few observations about what good teachers do:
Some teachers conveyed their intellectual passion in quieter ways—just by showing by their actions (being emotionally moved by the subject matter) that they loved their subject. (p. 13)
Effective professors linked students and subject matter in a variety of ways; their classes were marked by clarity and organization and by lively exchanged among professor and student…. The better teachers seemed to be "great" moderators of conversational/discussion type classes. In the best classes, the professor posed questions that lead the students far beyond recitation-level responses, and the class discussions included student exchanges with each other as well as with the professor. (p. 16).
The other strategy for connecting students with subject matter most often associated with effective teachers was their capacity to tell stories, to introduce real-life examples, to exemplify tough concepts with anecdotes and illustrations (distinctions were drawn between stories that were and weren’t relevant to class—the professors telling non-relevant stories were criticized.)
The single most frequently cited evidence of a professor’s caring
was accessibility.
Many students interpreted the interest the professors took in them
as affirmations of their own self-worth. (p. 14)
The professor’s ability to see personal worth and academic ability—unrecognized by the students themselves—is referred to by Carson as "tapping." She conclude that "tapping" may be the "single most influential act a professor can perform.
Many of the Rollins graduates have come to realize that one of the most telling signals of their professors’ respectful caring lay in challenging them to higher levels of achievement that they had thought possible.
Explaining why interpersonal relationships are not just "nice," but actually have a solid link to learning, Carson writes (p. 16):
Carson concludes with some concerns (p. 17):
Donald, Janet G. and Denison, D. B. (1996). Evaluating undergraduate education: The use of broad indicators. Assessment & Evaluation in Higher Education, 21, 23-39.
"The aim of this study was to examine the extent to which broad indicators of performance, such as student satisfaction with program, teaching, student life and experiences after graduation, could be used for program improvement…. Perceived quality of teaching was found to contribute significantly to graduates’ rating of the overall quality of their academic program." (p. 23)
"A broad indicator is a performance indicator which can be used at several levels, across domains or throughout the institutional system. Broad indicators five coarse-grained or general rather than fine-grained or detailed information which is a function of the fact that they are expressed as a single item…. They are broad as opposed to specific observable indicators and thus require elaboration if they are to be used for program improvement." (p. 24) Student evaluations and alumni surveys are a subgroup of broad indicators.
"…the first two years following completion of education constitute a critical formative period for graduates when they establish their career direction and , through their work experience, reinforce the skills and knowledge acquired through formal learning. Because of this, graduates are particularly aware of the shortcomings of their formal education. The argument is that alumni can provide valuable insights, since they have the benefit of hindsight and can evaluate college and work experiences and their relative importance (Graham & Cockreil, 1990). In contrast with undergraduates, who can only speculate about the utility or significance of various aspects of their educational experience, graduates can report the actual significance in relation to their current employment or life status (Moden & Williford, 1988)…. Retrospective evaluations which relate undergraduate experience to (p. 25) subsequent employment or further study thus may provide more concrete and operational advice for improving undergraduate education than have specific measures of instruction. Suggested uses of graduate surveys include a broad range of decisions about the curriculum, course content and major requirements, faculty roles and teaching methods, student services, and information for resource allocation and institutional planning (Moden & Williford, 1988). "
Two questions dealt with graduates’ level of satisfaction at the
institutional level:
their retrospective choice of institution and their assessment of
the quality of student life.
At the program level, students were asked if, in retrospect, they would choose the same program. Satisfaction with the program provided a baseline for viewing other aspects of educational experience.
At the program level, graduates were also asked to rate:
Aspects of the educational experience mentioned most frequently by graduates as particularly meaningful (specific examples of each appear on pp. 32-33):
1. The relative feasibility and utility of surveys and of the rating scales and open-ended questions
the presumed causal relationship between academic performance
and
satisfaction has been brought into question…. (Bean & Bradley,
1986;
Pike, 1991).
"…students expressing the greatest degree of satisfaction with
their
academic program have… preferences regarding the purpose, nature and
process
of higher education…" that agrees with preferences of the faculty. The
degree of satisfaction in these cases are not necessarily indicative of
program excellence.
Earlier studies have found that "intellectual and cultural experiences are extremely important in determining (graduates’) attitudes toward the colleges they attended (Pace, 1974; Spaeth & Greely, 1970). (p. 35)
"Graduates’ comments about meaningful features of undergraduate education tended to blur the distinction between in-class and out-of-class learning experiences, a phenomenon noted in the research literature (for example, Kuh et al., 1991)
Lack of communication between different parts of the institution is a weakness.
Evaluations by students, institutional research and planning offices, and program reviews need to be brought together.
5. "Teaching Effectively: Which Students? What Methods?", by Raymond P. Perry, 154-168.
Covington’s Self-Worth Typology: In classrooms, students are motivated in specific ways to optimize their self-worth. Students can be divided into four distinct groups:
Perry’s Perceived Control Typology. "…students differ in their perceived control over their academic performance and … these differences engender divergent thoughts, feelings, and actions." (p. 158) Level of perceived control occurs along a continuum, with the "no control" end corresponding to Covington’s "failure accepting," or "helpless" category and "control" corresponding to Covington’s "success-oriented" category. "High control students are most likely to believe that they have personal control over their academic performance… low control students believe that they can do little to influence the course of events around them. …moderate control students seem to combine attributes of both mastery and helpless students, believing that they have control over some aspects of their performance but not others." (p. 160) …both typologies (Covington and Perry’s) are reasonably valid and practical, particularly in comparison to more common experientially based, idiosyncratic typologies. The challenge… is for college instructors to make greater use of typologies such as these in making decisions about their teaching practices.
Perry’s comments about specific teaching practices, all of which are grounded in logic, theory, and empirical evidence address how major dimensions of teaching are directly related to student learning:
Foster Self-Worth
Since most students cannot excel in all situations, it is better to emphasize that each student do their best (compete against themselves) rather than to compete with their classmates for the best grade.
Covington recommends three teaching practices that would contribute to overall effective instruction:
Perry points out that, while it would be ideal to have different teaching practices for each type of student in Covington’s typography, this is not practical. Perry points out that "Covington’s typology makes the problem somewhat more manageable by identifying a key factor underlying student differences and then describing the types of students explicitly. Considerable advantage can be gained, therefore, in developing a teaching practice to match his four types of students and being able to anticipate its eventual impact."
Organizing Content (Instructor Organization)
Kiewra’s thesis is that "information needs to be organized for optimal learning through the use of a knowledge representation system known as the matrix….. According to Kiewra, knowledge is both factual and structural, the former concerned with things, events, ideas; the latter, with the interrelationships between them…. Abundant empirical evidence is now available attesting to the significance of instructor organization for student learning. For example, in a comprehensive re-analysis of several meta-analyses involving effective college teaching Feldman (1989) reported a correlation of +.57 between instructor organization and student achievement. This means that professors who are organized also have students who do better than professors who are not organized. While not proving definitively a causal connection, this correlational evidence suggests one possible interpretation in which better organization by the professor enables students to achieve more. Such an interpretation would also suggest that matrix representations, as one aspect of instructor organization, should also contribute to improved academic performance. Accordingly, matrix representations can be placed in the larger context of college instruction as a specific teaching practice.
Enhancing Perceived Control
"Presumably, repeated exposure to this type of teaching could increase helpless students’ perceived control to the pint at which students become more mastery-oriented and are thereby able to benefit form effective instruction…. Thus, being organized, clear, interactive, expressive, etc. could serve to increase helpless students’ internal locus which in turn, may eventually improve their performance." (p. 165-166)
"Instead of adopting new teaching practices, the professor may
wish
to modify existing ones with the sole purpose of enhancing perceived
control
in students." (p. 166)
6. "Effective Teaching Behaviors in the College Classroom," by Harry G. Murray, 171-204.
Murray distinguishes between "low inference" and "high inference behaviors, the former referring to "a concrete, denotable action of the instructor that can be recorded with little or no inference on the part of an observer," and the later referring to "one (action) that can be assessed only through observer inference or judgement."
Murray mentions benefits of studying classroom teaching behaviors (p. 173):
knowledge of factors underlying effective teaching can provide guidelines on how to train or select college teachers, how to evaluate teaching, and how to improve the performance of current teachers. For example, research on low-inference teaching behaviors can be applied to the development of student instructional rating forms that focus on specific, denotable characteristics of instructors, and thus are more useful in providing diagnostic feedback than the typical global rating forms in current use (Murray, 1987).
research can be applied to the design of in-service faculty training programs that focus on a limited set of classroom behaviors known to contribute significantly to overall teaching effectiveness (Murray and Lawrence, 1980).
Following are general conclusions drawn from observational studies concerning low-inference teaching behaviors (p. 188):
Teaching behaviors have typically shown an uneven profile of correlations with different instructional outcomes. For example, behaviors that correlate with affective outcome measures often fail to correlate similarly with cognitive outcomes, while behaviors that predict cognitive gain may fail to predict affective development.
It remains to be seen whether classroom behaviors found to be effective in the lecture method of teaching are similarly effective in non-lecture contexts.
Within the traditional lecture method, available evidence suggests that specific teaching behaviors contribute similarly to overall teaching effectiveness in different academic disciplines.
Following are general conclusions drawn from experimental studies regarding low-inference classroom teaching behaviors:
Classroom teaching behaviors, at least in the enthusiasm and clarity domains, appear to be causal antecedents (rather than mere correlates) of various instructional outcome measures.
Low-inference teaching behaviors have been shown to influence not only student instructional ratings, but objective measures of student learning as well.
Teaching behaviors accounted for a sizable proportion of outcome measure variance. As a general rule, teaching behaviors accounted for more variance in student instructional ratings than in objective measures of student learning.
Recent evidence suggests that enthusiastic or expressive classroom teaching behaviors may affect student motivational and attributional processes that extend far beyond the classroom.
"Given that the teaching behaviors found to be effective in prior research are specific, concrete, denotable, and presumably aquirable, the most obvious implication of this research is that college and university instructors can improve their classroom performance simply by exhibiting these behaviors with greater frequency.
Caveats associated with this "behavioral" prescription for teaching improvement (p. 196) include:
Rather than trying to mechanically emulate a wide array of teaching behaviors, instructors would be better advised to focus on a small subset of behaviors that are compatible with the instructor’s basic traits, abilities, and educational values, and are relevant to areas of needed improvement.
There is more to effective college teaching than effective classroom behaviors. "This fact, parenthetically, supports the argument that student instructional ratings, when used for summative purposes, should always be supplemented by colleague assessment of ‘content’ or ‘substance’ aspects of instruction."
There is resistance in the minds of many faculty members to the idea of implementing certain teaching behaviors, particularly expressive or enthusiastic behaviors, in the college classroom.
Application of findings to improvement of instruction include:
A second way in which research on low-inference teaching behaviors can be applied to improvement of instruction is through intensive training of faculty on a limited subset of classroom behaviors known to contribute significantly to instructional outcome measures. Murray and Lawrence (1980) assessed the impact of speech and drama training for lecturers. Experimental teachers showed significant gains in student ratings.
9. "The Dimensionality of Student Ratings of Instruction: What We Know and What We Do Not," by Philip C. Abrami, Sylvia d’Apollonia, and Steven Rosenfeld, 321-367.
I. Instructional Dimensions
The authors discuss in depth the methods for empirically determining
effective teaching. In doing so, they emphasize the use of student
ratings
for each of the three definitions of effective teaching. Their
discussion
addresses (p. 322):
Multidimensional student rating forms do not contain items which evaluate the same, specific teaching qualities; the rating forms lack both comprehensiveness and uniformity. "We conclude that since the qualities of teaching evaluated by different student rating forms appear to differ both in their nature and structure, it is of value to explore the forms further and determine if there are dimensions of teaching common to a collection of student rating forms."
II. How Dimensions Relate to Student Learning
"Now that we have identified the common structure of student
ratings,
the next phase of research will be to use the techniques of
quantitative
research integration to explore the relationship between this structure
and teacher-produced student achievement as well as the substantive and
methodological variables which explain inconsistencies in the
relationships."
"The relationship between the process and product views of effective teaching seeks to find the links between what teachers do and whether and how students change as a result." (p. 324)
"We hypothesize that the varied products of effective teaching are affected by different teaching processes. But we cannot describe with any great confidence the specific nature of these causal relationships. We further hypothesize that the causal relationship between any one teaching process and any one teaching product will vary as a function of external influences including student, course, and setting influences." (p. 326)
IV. Factors that do and do not Influence Ratings
The authors evaluate three validation designs (p. 323):
the multisection validation design—"uses multiple sections of the same course taught by different instructors employing common measures of student ratings and student learning. The correlations between curse section means for student ratings and means for student achievement explore the relationship between instructional processes and an important instructional product. We consider the multisection design particularly strong because it reduces the probability of rival explanations to instructor impacts and is high in generalizability to classrooms…. We conclude that studies employing the multisection design are worthy of special attention."
the multitrait-multimethod design—student ratings and several criterion measures (e.g., instructor self-ratings) are collected across a wide range of courses, without controlling for biasing or extraneous influences. We consider this design weaker both in internal validity, since controls are lacking, and in external validity, since important product measures of instruction (e.g. student learning) are not included.
VI. Issues of Concern or Qualification about the use of
Student
Evaluations
The authors list concerns (p. 322):
"We note that reviews to date suggest that the specific dimensions of teaching appear to differentially and, in some cases, poorly predict instructor impacts on learning compared to global ratings. We suggest that there are several limitation of prior reviews…. There is a lack of a comprehensive, empirically validated system for organizing the findings from different rating forms into a common framework…. Consequently, a more comprehensive research integration is called for using an empirically determined scheme for coding and findings from different rating forms.
"Student ratings measure directly one product of instruction; namely, student satisfaction with teaching…. Otherwise, student ratings do not measure directly how much or how well a class of students has leaned or any other aspect of achievement in the cognitive domain including how well the content is retained. Student ratings also do not often measure directly: most affective products of instruction such as student expectations, beliefs, and concepts about themselves as learners; student attitudes values, and interests toward the subject matter including enrolling in other courses in the area or adopting the area as a field of major study; student interpersonal and social skills generally and such skills within the context of executing a complex academic task, etc.…. Ratings are used to infer that highly rated instructors positively affect instructional products…. To what extent do student ratings reflect the impact of instructors on students learning of course content, their motivation to learn, development of interpersonal skills…? …on average, there is a modest, positive relationship between global ratings of instruction and instructor-produced student learning of lower-level academic skills… Much less is known about the validity of ratings as predictors of other outcomes of instruction…. Rating forms occasionally include items that ask student to assess the success of instructors at encouraging them to learn but seldom include items that assess the specific behaviors associated with that motivation. Similarly, rating forms do not often contain items that ask students to assess an instructor’s impact on specific cognitive and meta-cognitive achievements." (p. 330)
The authors present an evaluation form that asks for evaluations of how much the student has learned in specific topics taught in the course. This is something that is empirically defensible. However, asking about the enthusiasm (for example) of the instructor does not empirically prove anything, unless the student makes a direct link between the enthusiasm and what they have learned from the course. (p. 330)
"The accuracy of student ratings of teaching process is a concern about criterion-related validity. Are students able to accurately judge whether (quantity) and how well (quality) instructors teach according to the dimensions specified on the rating form? In general, criterion-related validation studies require alternative measures of the teaching process in addition to student ratings." (p.331)
"Their (Cashin and Downey, 1992) results were that global items accounted for a substantial amount of the variance (more than 50%). They concluded: `the results of this study have supported that single, global items—as suggested by Abrami (1985)—can account for a great deal of the variance resulting from a weighted composite of many multidimensional student rating items’ (Cashin and Downey, 1992, p. 569). They recommended that short student rating forms should be used for summative evaluations and longer forms should be reserved for teaching improvement. (p. 335)
"Collectively, the results of the reviews suggest that some specific rating dimensions, as well as student global ratings, are moderately correlated with student learning in multisection college courses. On average, there exists a reasonable, but far from perfect, relationship between some student ratings and learning. To a moderate extent, student ratings are able to identify those instructors whose students learn best. Furthermore, regardless of the coding scheme used, the average of global ratings of instructional effectiveness explains a greater percentage of variance in student learning than the average of specific ratings. It also appears that not all specific ratings are related to achievement; for example, ratings of course difficulty generally do not predict student achievement at all. Consequently, we recommend using the results of specific rating dimensions to judge which teachers best promote student learning with caution especially when making promotion and tenure decisions. The same caution is not necessary when using global ratings of instruction." (p.344)
10. "Identifying Exemplary Teachers and Teaching: Evidence from Student Ratings," by Kenneth A. Feldman, 368-395.
I. Instructional Dimensions
While Feldman ranked the dimensions of teaching by ranking the
correlations
between specific evaluations and student achievement, he also discusses
another method of determining what dimensions of instruction are most
important
from the student’s point of view, i.e. by "comparing the magnitudes of
the correlations between the actual overall evaluations by students of
their teachers and their ratings of each of the specific attitudinal
and
behavioral characteristics of these teachers. "Those specific
instructional
dimensions that are the most highly associated with student achievement
tend to be the same ones that that best discriminate among teachers
with
respect to the overall evaluation they receive from students. The
correlation
is not a perfect one, however." (p. 382).
II. How Dimensions Relate to Student Learning
Feldman investigated the question of how various teaching dimensions
relate to student learning. He discovered the following (only the top 5
are listed), which are listed in order of greatest correlation to
student
learning to the least (p. 376):
Feldman points out that there is much to be learned about the psychological and social psychological dynamics that influence student learning: "…although a case can be made that many of the different instructional characteristics could be expected to facilitate student learning…, what is needed are specific articulations about which particular dimensions of instruction theoretically and empirically are more likely and which less likely to produce achievement. A crucial aspect of this interest is specifying exactly how those dimensions that affect achievement do so—even when, at first glance, the mechanisms seem obvious." (p. 379-380)
III. How SETs Compare to Others’ Evaluations of Teaching
Feldman asked students and teachers about the importance of various
components of instruction. "Students and faculty were generally
similar,
though not identical in their views…. However, the ordering of the
instructional
dimensions by either of these groups shows differences (as well as some
similarities) with that based on the two indicators of importance using
student ratings of actual teachers."
IV. Factors that do and do not Influence Ratings
Feldman presents a careful definition of bias, explaining that to
define bias as an unfair situation in which the instructor is unfairly
evaluated does not go far enough, saying, "bias here refers to one or
more
factors directly and somehow inappropriately influencing students’
judgments
about and evaluation of teachers or courses." (p. 370) Bias, according
to Feldman, is a factor, unrelated to the teaching itself, which
students
consider when evaluating the instructor or the course. For example, in
the situation where a teacher did not teach as well in a large
classroom
than in a small one, and was therefore evaluated lower in the large
class
than in the small one, the evaluation has not been biased, because it
is
a fair and appropriate assessment of the instruction that was given in
that particular situation.
Feldman reviews numerous research reviews (p. 370) and draws the following conclusions about the question of bias:
The following statements are untrue:
only colleagues with excellent publication records and expertise are qualified to teach and to evaluate their peers’ instruction—good instruction and good research being so closely allied that it is unnecessary to evaluate them separately
most student rating schemes are nothing more than a popularity contest, with the warm, friendly, humorous instructor emerging as the winner every time
students are not able to make accurate judgments until they have been away from the course, and possibly away from the university for several years
student ratings are both unreliable and invalid
the time and day the course is offered affect student ratings
students cannot meaningfully be used to improve instruction
Feldman questions some of the beliefs about SETs that Aleamoni classifies as "myths," and points out that "although the results of pertinent studies are somewhat mixed, some weak trends can be discerned (p. 371). He cites studies that show that slightly higher ratings are given to teachers of smaller rather than larger courses; to teachers of upper-level rather than lower level courses; to teachers of higher rather than lower academic ranks; by students taking a course as an elective; and by students taking a course that is in their major rather than one that is not. But, he cautions that these factors do not necessarily explain why the ratings are higher in these situations.
Feldman disagrees with Aleamoni about calling the statement, "the grades or marks students receive in the course are highly correlated with their ratings of the course and instructor." He flatly disagrees that there is a "high" correlation, but does agree that there is a "small or even modest association." (p. 372) In addition, Feldman points out that students who receive high grades have learned a lot from the course, and are justified in giving a high evaluation to the instructor. However, he does cite Marsh and Dunkin (p. 373), who concluded that: "Evidence… supports the validity hypothesis and the student characteristics hypothesis, but does not rule out the possibility that a grading leniency effect operates simultaneously."
Feldman points out a possible bias that Aleamoni missed: academic discipline of the course—he found that teachers in different academic fields tend to be rated "somewhat" differently. (p. 373)
VI. Issues of Concern or Qualification about the use of
Student
Evaluations
Having explained that SETs may sometimes be valid while at the same
time being unfair to the instructor, since there are conditions, such
as
the size of the course or the pre-existing level of motivation of the
students
which may that are beyond the instructors control, Feldman cautions
that,
"Although rating bias may not necessarily be involved, those interested
in using teaching evaluations to help in decisions about promotions and
teaching awards may well want to take into account the fact that it may
be somewhat harder to be effective in some courses than in others." (p.
372)
11. "Good Teaching Makes a Difference—And We Know What It Is," by W. J. McKeachie, 396-408.
II. How Dimensions Related to Student Learning
McKeachie points out that "While there are overlaps between
motivational
and cognitive aspects of the Marsh dimensions, most can be fairly
easily
classified as affecting either student motivation or cognition." (p.
399)
McKeachie points out that specific dimensions of teaching can have different effects upon learning, depending on the context. "Criticism, for example, may be taken by a student as evidence that he or she lacks the ability to succeed, or it may be interpreted as evidence that the teacher thinks that one has the ability to improve. This the kind of feedback and the previous relationship between the teacher and the student may determine whether the feedback produces a reduction in motivation or increased motivation. Similarly, organization has a rather tricky relationship to student prior knowledge, the difficulty of the material, and the heterogeneity of the students in a class." (p. 406)
McKeachie points out that much is known about the cognitive processes that are affected by teaching: "enthusiasm enhances student attention; teacher clarity aids encoding; interaction of students and teachers promotes the surfacing of misunderstanding, and permits clarification and elaboration." (p. 406)
Understanding of motivation has led to insights about how teaching affects motivation for learning. "The teacher’s enthusiasm about the interest and value of the subject acts as a model that influences the value students place upon learning the material; moreover, as Feldman notes, teacher enthusiasm includes spontaneity and variability, which not only affects attention but is also relevant to curiosity and interest…. Similarly, interaction of students and teachers increases opportunity for students to feel a greater sense of personal control." (p. 406)
III. How SETs Compare to Others’ Evaluations of Teaching
McKeachie, in the context of explaining why peer evaluations are
more readily accepted than SETs, despite the fact that SETs are more
valid,
declares, based on personal experience as a department chair and member
of his college executive committee and reviews of "probably well over a
thousand" letters, that "these evaluations are almost always positive."
(p. 402) But, student ratings are also mostly positive: "At the
University
of Michigan 90% of our faculty are rated as excellent by over half of
their
students." (p.402)
V. Effective uses of Student Evaluations
McKeachie makes a distinction of how evaluations should be used.
For research and personnel purposes, " only a general factor, such as
Abrami,
d’Apollonia and Rosenfeld’s general factor or Marsh’s higher order
factors
may be sufficient; for analyzing a particular course, in helping a
particular
group of teachers improve, or for research on the effect of
interventions
in teaching, a finer cut, such as Feldman’s, may be more useful."
McKeachie points out that although research on SETs shows that they lead to some improvement in teaching, the amount of improvement is small unless the feedback involves consultation. He points out that "a major reason for this…is that many faculty members resist using them," but that there is no great resistance to midterm evaluations. His offers the explanation that, while in the case of evaluations elicited at the end of the course, the instructor does not have the opportunity to "make it right," or have control over the interpretation and use of the results (when they are used by administration to make personnel decisions), midterm evaluations enable the instructor to use the information to improve the end-of-course evaluation. The reward for the instructor to apply feedback to their teaching practices is more immediate, and it is something they have personal control over. "Perry shows the importance of perceived personal control in student motivation, and faculty members are, if anything, even more motivated for personal control than the average person." (p. 402).
VI. Issues of Concern or Qualification about the use of
Student
Evaluations
In the context of explaining why there is widespread resistance
to SETs by faculty, McKeachie argues that, even though good teaching is
quantifiable, numbers can be misused: "Once numbers are assigned,
faculty
promotion committees begin to make comparisons between teachers and
assume
that if one number is larger than another, there is a real difference
between
the teachers to whom the numbers have been assigned."
McKeachie speculates that a second reason for resistance to SETs among faculty is that the evaluations are "seldom used as a positive factor in determining the promotion of faculty members…. Thus, a teacher being evaluated runs the risk of negative results with little chance of positive rewards."
"Faculty members should be asked to include data from several classes in their portfolio, but they should be free to opt out when they are trying new methods or developing a risky innovation, just as they are free to avoid publishing research that didn’t pan out.
12. "Exploring the Implications: From Research to Practice," by Maryellen Weimer, 411-435.
I. Instructional Dimensions
"Feldman sorts and prioritizes the dimensions in two ways. First,
he attempts to establish the relative importance of the dimensions by
looking
at them in terms of student achievement…. Second, he looks at the
relationship
between specific items and global ratings in terms of correlations
assuming
that the ‘overall assessment of teachers would be more highly
associated
with instructional characteristics that students generally consider to
be important to good teaching…. (p.380)" (p. 413)"
"While Abrami, d’Apollonia, and Rosenfield agree that teaching is multidimensional, they do not accept the dimensions proposed in the other two chapters, particularly in terms of their universality. In their work they identified four factors, three of which are highly correlated, and all of which suggest a large global component that can alone be the basis for instructional judgment."
VI. Issues of Concern or Qualification about the use of
Student
Evaluations
The Marsh and Dunkin review considers (p. 413):
Evaluation instruments are reliable—meaning the instrument itself does not get in the way of what it intends to measure.
Research confirms the generalizability of the ratings—they are primarily a function of the instructor who teaches the course and not of the course that is taught.
The validity of student ratings is more complicated. Basically, they measure student satisfaction, but do they also measure whether or not the instruction facilitates learning or levels of instructional effectiveness? (p. 414) "Establishing validity by means of this more meaningful criteria presents a number of challenges" (p. 414):
institutional climates in which there is far more competition than cooperation. Students can be polled too often, and published lists can cause unnecessary morale problems among faculty.
Krahn, H. and Silzer, B. J. (1995). A study of exit surveys: The Graduand survey at the University of Alberta. College & University, 71(1), 12-23.
I. Instructional Dimensions
"The Graduand survey invites evaluations of university services
and facilities, teaching and learning experiences, and acquisition of a
range of skills and competencies, and overall satisfaction with the
University
of Alberta." (p. 14)
V. Effective uses of Student Evaluations
Krahn and Silzer discuss the reasons for the need for student
evaluations,
pointing out that society is increasingly demanding accountability from
the university. They mention the connection with Total Quality
Management
moving from the private to the public sector, and the different levels
of acceptance of the emphasis upon continuous improvement and the
accountability
for the use of public funds. Some are favorable about performance
assessment,
believing that self-assessment will lead to higher-quality
institutions,
while others believe that TQM-type assessments are inappropriate for
higher
education because the outcomes of a university education are difficult
to quantify. In the opinion of Krahn and Silzer, "postsecondary
institutions
would benefit more, and perhaps suffer less, if they took the
initiative
to devise and implement a valid set of performance indicators rather
than
wait for someone else to impose a less appropriate set of measures."
In general, faculty and departments responded positively to the results of the survey. In one year, three more detailed in-class surveys were developed in response to the Graduand survey to find out more about dissatisfactions that had been revealed in the earlier survey.
The authors strongly recommend that results of the survey not be used to punish faculty or departments about whom dissatisfaction has been discovered: "In our opinion, such use of performance indicators would increase but performance might not improve.
VI. Issues of Concern or Qualification about the use of
Student
Evaluations
Krahn and Silzer emphasized the importance of understanding why
a student enrolled in the university in the first place, because their
original goals and expectations would influence how they would evaluate
their experience in retrospect. For example, whether or not the
student’s
main goal was to launch a career in the field would be a major
influence
on how they evaluated the university’s effectiveness in preparing them
for a career.
The authors recommend that, while administrators will interpret and use the responses to broad questions, individual departments should be the ones to interpret and use the more specific data about instructors and courses.