This study makes an initial attempt to extend
the domain of evaluating teacher effectiveness. Issues of workplace
relevance,
personal usefulness, longer-term impact, perceived value over time,
etc.,
cannot be well assessed at the completion of a course. Rather, these
important
dimensions of instruction and its impact require over-time assessment.
Changing notions of educational accountability have extended our notion
of who are potential stakeholders in both obtaining and evaluating
services
from academic institutions. Former students and alumni are ideal
sources
to help us extend the domain of teacher effectiveness assessment. We
have
not had a method for systematically gathering, utilizing, comparing and
tracking alumni assessments of teaching effectiveness over time in a
manner
that would permit their meaningful use in course and instructor
development
and evaluation. This study conducts a set of alumni focus
groups and surveys to enhance our understanding of the dimensions of
teacher
effectiveness.
As useful as this information is, it does not systematically capture information relative to the longer-term assessment of instructor and course effectiveness that could be provided, for example, by gathering retrospective perceptions from alumni. As a recent editorial in Change: The Magazine of Higher Learning (Marchese, 1997, p. 4) notes, "we see the simple 'objectivity' of the numbers drive down -- or out -- richer forms of student and peer feedback for the evaluation and improvement of teaching." Plater (1997, p. 17), responding to a study in that magazine showing that simply improving one's level of enthusiasm in presentation (voice pitch variability and use of gestures), a content-free change in course delivery, can substantially raise student's evaluation of all items concerning both instructor and course, but without actually influencing any change in learned content, concludes that
...student evaluations can only be one part of a holistic assessment of teaching and learning. Clearly, students have a necessary role within this larger process, but surely not the primary one they have assumed...students, peers, administrators, and -- most critically -- the teachers themselves (in the form of reflective self-assessment) can complete the full picture of teaching effectiveness and student learning.
Thus, we propose to use the Rutgers rating form's criteria as a base for understanding and extending the domain of evaluating teacher effectiveness by interviewing and surveying alumni.
Recently, a Harvard University report identified "six clear standards" on which professors' teaching effectiveness can be assessed (Huber et al., 1997): clear goals, adequate preparation, appropriate methods, significant results, effective presentation, and reflective self-evaluation. Some of these dimensions are reflected in the Student Instructional Rating Form, while some are not.
Changing notions of educational accountability, along with concepts from Quality Management, have extended our notion of who are potential stakeholders in both obtaining and evaluating services from academic institutions (Fram & Camp, 1995; Ruben, 1994, 1997, in press).
The effectiveness of instruction can be assessed from a number of points of view. Clearly the perspective of the student, based on his or her classroom experience, is vital. From his or her perspective, the student is able to provide meaningful assessments of preparation, delivery, interest, accessibility, exam fairness, etc. Yet, it is equally clear that there are limitations to the criteria employed by students at the end of a course, and consequently there are limitations to the value of the information provided. For current students, issues of workplace relevance, personal usefulness, longer-term impact, perceived value over time, etc., cannot be well assessed at the completion of a course. Rather, these important dimensions of instruction and its impact require over-time assessment. Former students and alumni are ideal roles to provide this kind of assessment.
Thus, while students are clearly the most important stakeholders, they are not the sole stakeholders. Others might include alumni, faculty within one's own unit and elsewhere, review committees and university administrators, parents, state legislators, and professionals who hire and manage university graduates. Each of these may well have overlapping criteria for teaching effectiveness, tempered by experience, time perspective, their own stakeholders, and underlying values.
While our department, and others no doubt, has utilized alumni information on an anecdotal and periodic basis, we have not had a method for systematically gathering, utilizing, comparing and tracking these assessments over time in a manner that would permit their meaningful use in course and instructor development and evaluation.
Developing systematic assessment methods for alumni would be valuable in its own right; it also provides the basis for the addition of the perspectives of other important stakeholders in instructional excellence. The familiar student evaluation form and ratings are just one cell of a wider teaching effectiveness domain, and the teaching rating "grid" reported in review packets is just one component of a multi-dimensional teaching effectiveness assessment. A more comprehensive matrix would include multiple evaluation criteria along one dimension and multiple significant stakeholders along the other dimension. It would be valuable to assess the intersections between these domains and stakeholders. For example, we may find that alumni in the workplace now value or remember only some of the criteria that current students rank, and that they value other, different criteria as the most influential, valuable, or memorable. Faculty colleagues, and peers in the discipline, are also able to provide critical assessments on comprehensiveness, currency of content, etc., and such assessments might also be systematically factored into the evaluation of courses and instructors.
Faculty, evaluators, and stakeholders should be more aware of the different overlapping sets of criteria that are applied to teaching effectiveness. Some teaching approaches might well gain high current ratings, but low later ones, or even be assessed negatively by other stakeholders. Stakeholders may include parents, employers, university administrators, faculty, alumni, students. Domains may include standard student evaluation forms, workplace criteria and skills, motivations, teacher goals, etc.
We propose to incorporate feedback and evaluation processes in every activity, so that insights, experiences and learning from all participants thereby become available to subsequent other participants. Such an approach can serve both purposes of faculty evaluation as well as faculty development.
We propose to begin to extend the domain of teaching excellence by reviewing the workplace report, and bringing those findings to SCILS alumni for discussion in focus groups or telephone interviews. Questions might include: what was important to them as undergrads; what made for a good learning experience; what constituted a good teacher; perhaps give them the student evaluation form and ask them what the items meant to them as undergrads, what the items would mean now, and what measures and items they would use now instead. We will also conduct a literature review of the research on teaching evaluations. We will then use input from these discussions and interviews to develop an alumni survey, from which we would rank and quantify responses.
Fram, E. & Camp, R. (1995). Finding and implementing best practices in higher education. Quality Progress, 28(2), 69-73.
Huber, M.T. et al. (1997). Scholarship reassessed: Evaluation of the professoriate. Carnegie Foundation for the Advancement of Teaching.
Marchese, T. (1997). Student evaluations of teaching. Change: The Magazine of Higher Learning, Sep/Oct, 4.
Plater, W. (Dean of Faculties, Indiana University) (1997). In response.... Change: The Magazine of Higher Learning, Sep/Oct, 16-17.
Rice, R.E., Chapin, J., Pressman, R., Park, S., & Funkhouser, E. (1996). What's in a name? Bibliometric analysis of 40 years of the Journal of Broadcasting (and Electronic Media). Journal of Broadcasting and Electronic Media, 40, 511-539.
Ruben, B. (1994). Process improvement in higher education (Kendall-Hunt, 1994).
Ruben, B. (1997). Excellence in higher education: A guide to self-assessment, strategic planning and improvement. NJ: Kendall-Hunt
Ruben, B. (in press). Quality and service excellence in higher education: Faculty issues.
Smart, J. (Ed.) (1997). Higher education: Handbook of theory and research, vol. 11. NY: Agathon Press.
Trout, P. (1997). What the numbers mean: Providing a context for numerical student evaluations of courses. Change: The Magazine of Higher Learning, Sep/Oct, 25-30.
Initially, we had planned to conduct four focus groups with recent and older, and male and female, Department graduates. The focus group interview generates qualitative data for insights into those interviewed and/or as the basis for formulation of more quantitative procedures. Its conceptualization is based on the therapeutic assumption that people who share a common problem will be more willing to talk about it amid the security of others sharing that problem. Thus the emphasis in the group interviews is on their ability to generate data about the "why" behind the behavior. The data generated in focus groups are often richer and deeper than data elicited in one-on-one interview situations. Candor is permitted both because the members of the group understand and feel comfortable with one another, and also because they draw social strength from each other. Lederman identifies five other fundamental assumptions upon which the technique rests: (1) that people themselves are valuable sources of information, including information about themselves; (2) that people can report on and about themselves, and that they are articulate enough to put into words their thoughts, feelings, and behavior; (3) that people need to help in "mining" that information, a role served by the interviewer, or researcher who "focuses" the interview; (4) that the dynamics of the group can be used to surface genuine information rather than creating a "group think" phenomena; and (5) that the interview of the group is superior to the interview of an individual.
However, it became clear that the sample candidates were too busy and/or distant to be able to meet on campus at the same time. As we decided the maintenance of an appropriate sample was more important than the initial, preliminary data-gathering technique, we conducted and recorded interviews by telephone, and then transcribed the comments. We thus lost some of the synergy of the interactions among focus group participants, but we were able to obtain comments and insights from a useful sample of recent and older, male and female, Department of Communication graduates.
Using the roster of graduates from 1989/90 and 1995/96, we created a quota sampling strategy to obtain 5 males and 5 females from each of the two time periods. It is not random, because we did not impose a call-back procedure to avoid potential selection biases involving those who were, or were not, available for the phone call. However, as the goal was the same as with focus groups, that is, to generate likely candidate themes to complement the criteria identified by the literature review, it was sufficient.
The following pages list the open-ended questions and selected comments, by class.
1. What are some of the courses that you can remember taking in the Communication Department when you were an undergraduate?
2. Thinking back over your experience as an undergraduate, what memories or experiences with a faculty member comes to mind? Would you describe it to me?
3. What experiences did you have with faculty outside the classroom (e.g., independent studies, advising, etc.)?
Practice and
Theory:
Presents practical
information;
has
practical
experience
3 3
Balances theory and
research;
able
to put theory into action 2 2
Understands the
'working
world' 2 2
Offers solution-based
instruction 1 1
Balances teaching and
research effort 1 1
Interaction:
Makes time for
students;
spends
time with individual students;
available
for office
hours
4 1
Very
approachable
1 1
Listens to students'
interests 1
1
Has a good
personality
1 1
Gets along well with
students
1 1
Communication:
Effective
interpersonal
comm. skills 2 1
Excellent
communication
skills 1 1
Effectively
communicates
information 1 1
Able to explain
material
clearly 1 1
Course:
Makes class/lectures
interesting & fun 1 2
Willing to adapt the
class to suit
the
types of people in
it
1 1
Explains expectations
of course 1 1
Tests assess ability
accurately 1 1
--------------------------------------------------------------------------
5. Student evaluations are traditionally done in conjunction with a particular course. Do you think there are other ways teachers should be evaluated? Such as?
* Evaluations are a waste. The University should think about the type of people they are hiring in the first place. If they're hiring them just because they can do research, is their focus going to be on teaching? Will they be able to effectively get the information across at a level students can understand? Some of them focus on their teaching ability for the first year. But, after that, if they don't focus on their research, they'll be looking around for another job real soon!
* The evaluations are a good idea. I don't know if they do anything when a professor with tenure has negative evaluations though. I don't think they do anything in that case. Another thing, is that I don't think many of the students take them seriously. Maybe they should do them mid-semester, and then do them again at the end. I don't know, Rutgers is such a huge place.
* You know, I would have much rather evaluated the class, and how it fits into the program itself. They really should separate the class from the teacher. A lot of times what is being taught is out of the Professor's hands. They are only teaching what the department tells them to. It really isn't fair.
* I think evaluations are the most objective way to rate a teacher's performance. You have a chance to compare objectives on a scale. It's easy to use. No, I don't really think another way is necessary.
* It’s important that the teacher is being rated truthfully. It is unfortunate that some students and teachers are unable to communicate with one another and personality conflicts evaluate the professor harshly for this reason. Although, one good thing about student evaluations, is that the school can get a very large consensus of the teachers overall ability to work with students. I guess in the end it all balances out.
* What they are doing is fine. Although they could possibly get more feedback from students who do Internships, or maybe the faculty can evaluate each other anonymously.
* Maybe by having someone sit in the class and observe. But who's to say that for that one lecture, the Professor wouldn't put on a good show. Especially, when they know their job is at stake. I think it's better when the students evaluate the Professor.
* Ask the students to reflect on what they learned in the course. Maybe in a essay.
* No, I don't. I know when I was there, no one took the evaluations seriously. If a student was having a good day, they gave the a Professor a nice compliment. But, if they were having a bad day forget it. Most of the time students did not really want to take the time to fill out the evaluation appropriately.
* What! You mean someone actually reads those things besides the professor! I can't believe it. Do the Professors have to submit their curriculum to their department for review? Maybe if they had to, they would stay on top of the times.
* Not everyone takes the time to fill them out. Some are afraid that the professor will find out who he or she is, and fail them if they answer negatively.
Based on the teaching excellence criteria identified from the literature survey, and from the telephone interviews with 1995/96 and 1989/90 graduates from the Department of Communication, we devised an initial survey asking people to rate a range of teaching criteria, and rank those they felt should be added to the current Rutgers University Student Evaluation Form.
We then conducted a pilot assessment of our initial survey in a Master of Communication and Information Studies summer course, Methods of Inquiry. The class of 16 Master's students completed the survey in class. We coded and entered the data into SPSS and produced descriptive statistics, factor and reliability analyses, and frequency listings of added and ranked teaching criteria. Dr. Rice then went back to the class the following week to summarize the results, discuss the implications for, and ask for comments on, the next stage of the study.
The full printed report includes means, standard deviations, and ranges for the responses to the pilot survey, a copy of the pilot survey, and factor analyses of the items.
The first six teaching excellence criteria are the first six on the Rutgers Student Teaching Evaluation Form. Factor analysis indicates they represent two underlying dimensions of evaluation:
Factor One is primarily about the instructor's behavior, and includes 'instructor responded effectively', 'instructor generated positive interest in the course material', and 'instructor had a positive attitude toward assisting all students in understanding course material'.
Factor Two is primarily about the course methods and grading, and includes 'instructor assigned grades fairly' and 'the instructional methods encouraged student learning.'
The following paragraphs summarize the factor loadings of the remaining teaching excellence criteria, which were based on the literature survey and the telephone interviews.
The first factor explains nearly half of the variation, is about general, overall positive instructor and course traits and includes 'very approachable', 'assesses students' ability accurately', 'presents practical information', 'has practical experience', 'is able to put theory into action', 'is prepared; organizes course well', and 'offers challenging/difficult course workload'.
The second factor is generally about interaction and a teaching/research balance, and includes 'makes time for, spends time with, students', 'is available for office hours', 'balances teaching and research effort', 'provides feedback', 'provides instructor/student in-class interaction and discussion' and 'has good research productivity and reputation'.
The third factor is generally about clear and positive class interactions, and includes 'makes class and lectures interesting and fun', 'gets along well with students', 'listens to students' interests', 'explains expectations of course', 'is able to explain material clearly', 'uses/chooses appropriate methods/materials', 'has classroom climate conducive to learning', 'motivates students to greater effort and overall learning', and 'shows concern, enthusiasm, respect and tolerance for students'.
The remaining factors include small subsets of the remaining questions.
The final factor analysis combined all the teaching excellence criteria. Again, one major factor emerges, explaining nearly half of the overall variance, and includes 'offers solution-based instruction', ‘makes time for, spends time with, students', 'is very approachable', 'assesses students' ability accurately', 'balances teaching and research effort', 'has practical experience', 'makes class and lectures interesting and fun', 'gets along well with students', 'listens to students' interests', 'explains expectations of course', 'is able to explain material clearly', 'is able to put theory into action', 'demonstrates intellectual range', 'is sensitive to class progress', 'uses/chooses appropriate methods/materials', 'has classroom climate conducive to learning', 'provides feedback', 'provides instructor/student in-class interaction and discussion', 'motivates students to greater effort and overall learning', shows concern, enthusiasm, respect and tolerance for students', and 'offers challenging/difficult course workload'. Note that when the Rutgers Evaluation items are added, several of the formerly separate dimensions are now grouped together, and this single major factor does not include any of the standard Rutgers Evaluation items!
The six current Rutgers Evaluation items are somewhat dispersed across the factors, with only 'responded effectively to student comments and questions', 'generated interested in the course material' and 'had a positive attitude toward assisting all students in understanding course material' joining 'has effective communication skills' from the list of other criteria, representing a general positive interaction dimension. Item 5, 'assigned grades fairly', joins 'understands the working world', 'has enthusiasm for subject or for teaching, and 'evaluates student progress fairly,' representing a somewhat pragmatic focus. Item 6, 'the instructional methods encouraged student learning' joined 'effectively communicates information to students', 'is prepared; organizes course well', 'provides clear instruction' and 'has good speaking and presentation skills', representing a general clear pedagogy dimension.
The 16 pilot respondents were asked, "Thinking BACK to your years in college, what were the 4 most important criteria you used in evaluating instructors when you were a student"? They were instructed to just list the number of the criterion that appeared on the survey if any of those applied. The lists the distribution of responses for only the most frequent items, those mentioned more than once.
Frequency, Criterion
4 prepared, organized
3 responded effectively to comments and
questions
3 generated interest in material
6 assigned grades fairly
4 instructional methods encouraged learning
2 understands the 'working world'
5 effectively communicates info to students
2 able to explain material
2 knowledge of subject matter
4 prepared, organized
2 good speaking and presentation skills
Then the respondents were asked, "Given what you have experienced in the years since you graduated, what are the 4 most important criteria you would now use for evaluating instructors?" The following table lists the distribution of responses mentioned more than once.
Frequency, Criterion
2 responded effectively to comments
&
questions
5 assigned grades fairly
2 instructional methods encouraged learning
2 offers solution-based instruction
5 understands the 'working world'
3 effective communication skills
4 effectively communicates info to students
2 has practical experience
2 able to explain material
3 knowledge of subject matter
2 prepared, organized
2 offers challenging/difficult course
workload
Respondents were asked, "What criteria should be added to the standard Rutgers teacher evaluation form?" The following table lists the distribution of responses mentioned more than once.
Frequency, Criterion
2 available for office hours
3 very approachable
2 understands the 'working world'
3 effective communication skills
5 effectively communicates info to students
2 has practical experience
3 explains expectations of course
3 able to explain material
4 prepared, organized
4 provides clear instruction
3 has good speaking and presentation skills
2 sensitive to class progress
2 uses/chooses appropriate
methods/materials
4 provides feedback (in-class and grades)
3 provides student/inst. in-class
interaction/discussion
3 motivates students - greater effort
& overall learning
4 shows concern, enthusiasm, respect and
tolerance
2 offers challenging/difficult course
workload
Based on the above pilot responses, we
revised the survey in the following ways:
- dropped one item that had no variance.
- extended the range of item values from
1-5 to 1-7 to increase the variance.
- dropped the section asking respondents,
"Thinking BACK to your years in college, what were the 4 most important
criteria you used in evaluating instructors when you were a student"?,
as there was not much difference from the "NOW" assessments, and the
pilot
students were recent graduates, so we felt that asking people who had
graduated
nearly a decade ago to think back for their criteria would be
unrealistic.
- reformatted and reworded some questions
to avoid ambiguities (such as concerning graduation and graduate
programs)
and converted some open-ended questions to close-ended questions (such
as listing communication specialization) to simplify coding.
- using the results from the factor
analyses,
we dropped some redundant items, and rearranged the order of some
questions
to avoid a sequence of similar responses.
- added clearer instructions.
- added final information about the
incentive
for responding.
- developed and included a cover letter.
We first planned to survey graduates from 1986/1987 and 1996/1997, assuming a decade of work experience should be sufficient to stimulate different teaching excellence criteria. It turned out to be quite difficult to obtain mailing labels of Departmental graduates from a recent year and ten years prior. The database program was changed in 1988, so that 1989 was the first year that could be counted on to somewhat accurately reflect current addresses and actual graduates from the department. This was considered to be still a long enough time period.
We decided to mail out a 11x14 self-stick envelope package that included a 9x12 self-stick return envelope with a business reply mark and mailing label, the survey, and a business reply postcard so that people could independently send their name and address back for a drawing for an incentive. This guaranteed anonymity of the survey. The incentive was a pair of movie tickets to a chain of regional movie theaters, available to 22 respondents drawn at random from the pool of returned post cards. We ordered a booklet of 45 discount movie tickets from the movie theater chain promotions office.
Typical one-time, no follow-up mail survey response rates are in the 20% range, so it was less expensive to pay a premium for each envelope and post card returned, but avoid having to pay for stamps for the envelope and post card when we sent them out. The Rutgers Mailing and Document Services does not have a license to process 9x12 business reply envelopes at the institutional discount rate, so each envelope returned cost 78cents (3 ounces) plus a processing fee of 10cents. Each post card returned cost 20cents plus a 2cent processing fee.
Because these envelopes were sold in boxes of 250, we bought three, for a total of 750. There were slightly more than that number of students in the combined classes of 1989 and 1996, so we did proportional sampling of the two classes so that the same overall proportion of graduates from each class were represented in the mailings. The total number of graduate mailing labels for each year, and the resulting slight sampling from each year, are as follows:
Year
1989 1990
1996
1997 Total
Total
Population
290
250
230
120
890
Sample
245
210
193
102
750
The survey packets were mailed out near the end of June, with a request to return them by the end of July. By the end of September, we received, from the initial mailing of 750 packages:
73% of the respondents were female, only slightly higher than the average proportion of woman undergraduate communication majors at Rutgers University. Nearly all respondents had graduated from the two time periods supposedly represented by the mailing labels, though 10 respondents were from outside those years. In terms of the emphasis in their communication major, 29% of the respondents had focused on public relations, 23% on interpersonal communication, 22% on mass communication, 10% on organizational communication, and a few percent on international or health communication. Two-thirds of the respondents either planned to attend graduate school, were attending, or had already graduated.
The current job positions of the respondents were widely distributed across professions.
The 15 most important individual criteria based on the 1-7 importance scale were, in order, knowledge of subject matter; effectively communicates info to students; explains material clearly; responds effectively (a RU item); has effective communication skills; is prepared and organized (RU item); assists students in understanding material (RU item); provides clear instruction; is prepared; assigns grades fairly (RU item); generates interest in the material (RU item); uses instructional methods that encourage learning (RU item); has enthusiasm for subject teaching; and has good speaking and presentation skills.
The 12 most important of the criteria listed on the survey named by the respondents (including the standard Rutgers University items, and all others generated from the literature review, pilot survey, and telephone interviews) (each representing at least 3% of all responses), were:
A wide range of new criteria suggested by the respondents included:
The 20 criteria most mentioned (each representing at least 3% of all responses) as possible additions to the 10 standard RU evaluation items included:
Factor analyses identified 8 underlying dimensions. We can assess the importance of these factors (sets of related criteria) in two ways: the amount of variance among all the 50 rated items that the specific dimension represents, and the average importance of the constituent criteria. The validity of these dimensions is indicated by the strength of high-loading criteria, and the overall reliability of mean scales constructed by just those high-loading criteria (typically, items that load .60 or higher on one dimension, and .40 or lower on any other dimension).
The first factor is easily the most dominant on both bases. Being an effective communicator, clear, prepared, and organized instructor represented nearly 45% of the variance, and had the highest average importance rating (6.5).
The remaining factors each explained
from
6 to 2% of the variance.
Scales based on the high-loading criteria of the respective dimensions were then created. The scale alpha reliabilities ranged from .78 to .93. In decreasing order of importance, they were:
They were then tested for three
possible
influences or biases: gender, class, and reply date of the
respondent.
Essentially, there were no significant differences for any of these 8
mean
scales across these three variables. The only significant difference
was
a slightly greater importance placed on the first dimension (effective
communication, preparation, organized) by veteran alumni (6.6 vs. 6.4).
There was no significant correlation between the length of time that
the
first person returned the survey and the last person returned the
survey,
and any of the 8 mean scales (the average correlation was r =
-.05).
This lack of difference between early and late responders may imply
that
there is no consequential bias between respondents and non-respondents.