Can We Measure the Value of College Teaching?

By Robert Martin and Andrew Gillen
AP_professor_lecture_480_1sep10_se.jpgA popular notion within the academy is that teaching quality cannot be measured, but this is an article of faith, not a demonstrated fact. Very few institutions have made a systematic effort to measure teaching quality, largely because the faculty is opposed to it and administrators have little incentive to discover true teaching value added. Faculty view their conduct in the classroom as beyond judgment, while for deans, knowing how serious some teaching problems are is a kind of trap: this obligates them to fix those problems in an environment where very little can be done. Further, if some professors are identified as truly exceptional teachers, their peers may resent it and the exceptional teachers may expect higher compensation in return. So, most administrators choose to leave that sleeping dog alone.
One consequence is that colleges and universities scrupulously avoid competing on the basis of teaching metrics; choosing instead to compete on the basis of things that signal or imply quality, such as scholarly research, elaborate facilities, stately campuses, athletic teams, and extravagant entertainment. This competition accounts for most of the excess cost of college and for the decline in teaching quality.
The central issue here is the quality of undergraduate teaching. Students, parents, and taxpayers are most concerned about that question. In undergraduate education, quality is the amount of new knowledge acquired by a student as a result of taking an individual course or attending a particular college. The new knowledge in each case is the human capital value added by the professor and the institution. The value added includes both discrete new knowledge and the ability to integrate and apply that knowledge. If students, parents, and taxpayers know what to expect in terms of value added, they can make their own subjective valuations of the other services offered by the institution.

There are serious inconsistencies and misleading assertions employed by those who make the claim that teaching quality simply can’t be measured. It is more likely that, as faculty members, we simply do not want to be graded, even though we are hired to grade others.
The first inconsistency in the argument against quality measurement is that if quality cannot be measured then grading is a fraud. When we measure classroom performance, we evaluate a variety of subjective aspects of student performance in order to assign a final grade.. Clearly, the more precise disciplines (where there is a correct/incorrect answer to a problem) have an easier time assigning grades. Nevertheless, we have been assigning grades in the most subjective disciplines for generations.
Another inconsistency is that we accept the measurement of scholarship quality but reject the measurement of teaching quality. Scholarship is not an objective concept. What constitutes scholarship varies by discipline and is very hard to evaluate across disciplines; things that are regarded as valuable contributions by one discipline seem not to be in others. Further, scholarship is not obvious even within a single discipline. Consider the endless discussions that go on within research departments about what journals are in the “top twenty-five,” how to evaluate books versus journal articles, and how to adjust for multiple authors. Further, how do we evaluate citations, do all citations count regardless of the source? Suppose the article stimulated a number of citations because it contained a serious error?
Notice that even though there is great subjectivity in evaluating scholarship, the information is sufficient to establish a brisk market for scholars. The fact that there is no comparable market for teachers is due to the fact that we refuse to put the same effort into measuring teaching quality as we do in measuring research quality.
Another misleading argument is the use of perfection as a club to fend off attempts to measure value added. For people who argue this way, any proposed measure of value added has to be perfect or it is rejected; they make perfection the enemy of the good. In other words, they reject what we already know about scholarship (it does not have to be perfect to serve us quite well) in order to forestall the inevitable accountability that will come with measurement of teaching value added.
A fourth misleading position occurs when someone argues that we cannot measure value added because it is impossible to predict how an individual student will perform; it is said there are too many random variables that influence an individual student’s performance. For example, two students with the same background may differ radically with respect to their motivation, their maturity, or their health. Technically, it is true we cannot predict individual student behavior. Fortunately, we do not have to predict individual behavior in order to measure either an individual professor’s value added or an institution’s value added.
Since many of the unobservable differences in students are randomly distributed, they cancel each other out when we use a sufficiently large sample to estimate the performance of the average student who took professor X’s class or attended college Y. There are well established statistical techniques that allow people to infer such things with reasonable accuracy. For example, it is impossible to predict whether an individual motorist will or will not have an accident sometime during the next year. But by using a pool of drivers’ observable characteristics (age, sex, driving record, etc), an insurance company can predict how many out of that pool will have an accident and from that they can reliably estimate the casualty cost of insuring the entire pool.
It is hard to overstate how much our refusal to measure value added costs society. For example, research by Eric A. Hanushek of the Hoover Institution, a leader in the development of economic analysis of educational issues, indicates that:

A teacher one standard deviation above the mean effectiveness annually generates marginal gains of over $400,000 in present value of student future earnings with a class size of 20 and proportionately higher with larger class sizes. Alternatively, replacing the bottom 5-8 percent of teachers with average teachers could move the U.S. near the top of international math and science rankings with a present value of $100 trillion. (“The Economic Value of Higher Teacher Quality,” Eric A. Hanushek, The National Bureau of Economic Research, Working Paper No. 16606, December 2010).

Further, the refusal to measure value added is very costly to members of the academy as well. Most of us know the incentive system is badly skewed towards research and this comes at the expense of teaching. The continuous widening in the income distribution over the last three decades and the stagnation in median incomes are, in part, the result of the high cost and declining quality of undergraduate education. Our ability to compete in an international economy is adversely impacted by the likely decline in teaching value added. The aggressive “mission creep” among institutions occurs because all the rewards go to research rather than teaching. The nations’ most gifted would be teachers choose not to become professors because the pure teaching track in higher education offers very few rewards.
We can design a system of metrics to measure value added. Like the information we use to measure scholarship, it must come from third parties who verify the signal is accurate and provide the same signal for other institutions. A few of the already existing measurements include:
– The National Survey of Student Engagement (NSSE), is a survey designed to document campus teaching practices;

– The Collegiate Learning Assessment (CLA) is a test of reasoning and communication skills (there is also a community college version of this test)
– The Measure of Academic Proficiency and Progress (MAPP) is designed to assess general education skills such as critical thinking.
– The Collegiate Assessment of Academic Proficiency (CAAP) another general education test, though this one is given during college.

More information on all of these is available at the National Institute of Learning Outcomes Assessment website. Unfortunately, participation and reporting for each of these measures is generally voluntary, and it is interesting to note that those institutions at the top and bottom of the hierarchy typically refuse to participate.
Other promising candidates include:

– Value added contributions to passage rates on discipline specific licensing tests such as the bar exam for lawyers and the Uniform Certified Public Accountant Examination for accountants.
– Employment and salary data.
– Subject specific competence exams, similar to the GRE subject exams.

These and other metrics will never be perfect, but they will be efficient; they will be enough to establish a market for world-class senior teachers and that will be a game changer for higher education.
Robert Martin is emeritus Boles Professor of Economics at Centre College and author of The College Cost Disease: Higher Cost and Lower Quality, forthcoming from Edward Elgar, Ltd.
Andrew Gillen is the Research Director at the Center for College Affordability and Productivity.


Leave a Reply

Your email address will not be published. Required fields are marked *