Teaching Quality
Higher Ed's Dirtiest Secret
The dirtiest secret in higher education is that there is no good data on the quality of teaching and teachers on college campuses. There is no way for anyone to know how good or bad teaching is, beyond anecdotes. Ask around. As someone who has been in this business for 30 years, I will tell you: things are not great and getting worse. I estimate that except for very elite private institutions (Princeton, Wesleyan, and the like), well over half of university instruction across the U.S. is fair to poor. Perhaps 25% is good and 5% is excellent.1
How can I make such a claim? I’ll get to that and I’d love to be wrong. But I challenge anyone to push back with real data. There isn’t any beyond what I outline below.
I also challenge anyone to show me any college or university marketing materials that guarantee the quality of classes students will take. You won’t find much. It’s impossible to guarantee quality because there is no operational definition of quality teaching. You may find guarantees of a “quality education” defined by small classes and award-winning professors with Ivy League PhDs, but you will not find an institution that guarantees that some percentage of the courses a student will take will be high quality.
The dirty secret isn’t that there are bad teachers. Of course there are, just as there are bad doctors, bad lawyers, bad plumbers, bad customer service reps, bad cops. The problem is that there’s an entire ecosystem and infrastructure that decided that quality at the micro level, at the level of the individual instructor and individual course, isn’t as important as claims of “social impact” at the macro level, for the institution and the sector as a whole. And so, the inscrutability of teaching quality persists.2
Who is to blame for the lack of attention to quality teaching, the lack of data, and the absence of any good definition of quality teaching? Three groups. First, colleges and universities who have no incentive to define or measure teaching quality and have not funded serious controlled studies. Supporting quality is expensive. Second, faculty, who want even less attention paid to teaching quality than their institutions do. In fact, if there is one topic on which faculty and institutions are in complete agreement it is avoiding the topic of teaching quality altogether. (Both prefer to focus on “belonging” and “satisfaction,” the drivers of grade inflation.)
The third group to blame for a lack of data on teacher quality includes every single critic of university culture from William F. Buckley to Allan Bloom to Greg Lukianoff and Jonathan Haidt to Christopher Rufo to everyone involved in launching campus Civics Centers and developing new conservative courses: everyone who has mistaken politics for quality. Their message seems to be if faculty were more conservative, pro-America, more religious, or if they taught more pro-America content, teaching quality would magically improve.3 Focusing on ideology has been a distraction that has prevented a focus on quality.
A perfect example is a recent piece in the New York Times about ideology and UT Austin. No mention of teaching quality at all. Who says conservative professors would be better teachers?
I challenge anyone to show evidence that teaching quality varies by political persuasion. Of course, if faculty are overwhelmingly liberal it means the number of sub-par liberal professors is greater than the number of sub-par conservative professors, but there is no data that the percentages are different.
Teaching quality is rarely talked about publicly. I have been in higher education for over 30 years as a graduate student, faculty member, department chair, and administrator; I have taught at nearly every kind of institution, from elite privates to public flagships to a private liberal arts college to a regional public. I have been at hundreds of meetings and conferences where the fact of poor teaching and the consequences of poor teaching are discussed privately. I have met with dozens of professional staff at centers for teaching and learning at multiple universities endeavoring internally to improve teaching practices, especially during and after Covid.4 I have watched the growth and expansion of the efficient business model for higher ed, which can’t afford to see faculty as individual experts, uniquely skilled.
My hands are not clean, I admit. Over the years I have been party to quiet agreements to move poor teachers nearing retirement age out of rotation for required courses, to allow them to teach “electives” for the balance of their career, so they do minimum harm. I’ve had to hire barely qualified lecturers because an unexpected surge of enrollment meant warm bodies were needed on short notice, and any available person with a graduate degree would do. I have been asked more than a dozen times over the years to hire and give teaching assignments to unqualified lecturers solely because they are married to star professors the university wants to retain. I have been told to hold my nose and ask no questions, because hiring spouses is a special thing that universities do.
Administrative models that care more about teaching “load” than teaching “quality” are a huge part of the problem. There are lecturers, graduate student teachers, and contingent faculty who teach up to seven classes per semester at community colleges to make ends meet, who admit they cannot do as good a job as they would like. I do not blame these teachers under the circumstances. Current transfer policies in California and across the country smoothing the transfer course credit from one institution to another explicitly ignore teaching quality. If the “course” is what matters, not the teacher of the course, why should a teacher strive to be excellent?
One can understand why universities, which have incentives not to look too closely at teacher quality, don’t look too closely. One can understand why overworked, underpaid faculty members don’t want public scrutiny of what goes on in most classrooms. The University of Oklahoma story is a case in point.
But critics of higher education who focus solely on politics and “wokeness” should take a closer look at their assumptions about teacher quality and the lack of data generally. The Oklahoma story should not be seen as a story about religion but a story of no quality control at all. (What student should pay money to be treated like this?)
Here’s what I see from my own experience: except at a small minority of expensive schools and some of the best flagships, only 25% of teaching at universities might be considered upwards of “good.” These are the talented, heroic faculty members devoted to their craft under grim conditions. But even talented teachers can only do a fair job delivering a quality education under conditions where the “student experience” comes first, where administrators enroll unqualified students and demand they be given a diploma in four years whether they learned anything or not. At many institutions, including community colleges, more than 50% of teaching is poor or, in the case of online asynchronous courses, nonexistent. Sub-par teaching was a crisis before the AI era; it is even more of a crisis now.
Why raise this topic at this moment? For centuries, universities have had a near monopoly on knowledge. Now, universities are competing with companies that can deliver knowledge directly, outside the university structure. Universities, which still own “the diploma,” for now, need to stop thinking in terms of competition and start thinking in terms of collaboration. This will require the involvement of the very best teachers, the 25% or so ready to deliver high quality “last mile” teaching.
I have been arguing that the primary value of a college education in the AI era is studying with expert professors who know more than AI. I have said that administrators need to ask their faculty for memos detailing what they know that AI does not know. I have argued that AI should deliver baseline content mastery. If AI handles routine instruction, the value of individual human instruction is to guide students’ thinking with questioning, feedback, and example beyond what AI can offer. But this means defining quality teaching and identifying the highest quality teachers, which institutions do not do, systematically.5
It is finally time for a public call for real data on quality teaching at universities across the country alongside campus investment in AI. If AI instruction requires quality human guides, as I believe it does, teaching quality needs to be understood. There has been enough focus on politics. Lawmakers need to focus on quality.
What does the data say about college teaching quality?
There is almost no data on teacher quality in U.S. universities, defined as value-added to learning and long run outcomes. Yes, universities collect data on student feelings about learning, but they do not collect or publicize data that would offer a clear picture of instructor value-added across their institutions.
The first problem is that there is no consensus definition for quality teaching at the university level (in classrooms, laboratories, or lecture halls). There are many definitions for excellent teaching in K-12 education,6 but none for post-secondary education. To make sense of the research that follows, I propose the following bare-bones definition:
A high-quality university classroom experience involves the successful transmission of complex knowledge and the verifiable acquisition of skills.
This definition focuses on instructor-level value added to learning (the successful transmission of complex knowledge) and longer-run outcomes (the verifiable acquisition of skills). Skills can mean skills in a subject enabling success in advanced courses or for success after graduation. The definition folds factors such as inspiration and mentorship into the category of “value” the instructor brings to the classroom.
While most universities do not collect data on themselves, independent economists and researchers have managed to peek behind the curtain. There are some good published studies since 2000 that look at the distribution of teaching quality among university instructors: how much instructors differ in the learning they generate, what fraction of teaching appears to be genuinely high-quality by that standard, and how well institutional metrics capture instructor quality. There are studies of quality teaching practices but few studies of how widely evidence-based practices have been adopted.
Here are the best studies showing variation in teaching quality:
A 2008 study by Carrell and West of students at the US Air Force Academy who were randomly assigned to sections and professors in a large core curriculum with common syllabi and common exams, found large, statistically clear differences across professors.
A 2014 study by Braga, Paccagnella, and Pellizzari found that instructors differ meaningfully in their contribution to learning, even within the same course with common syllabi and exams.
A 2013 study by Figlio, Schapiro, and Soter found that the distribution of teaching quality is not about rank but the individual.
A 2016 study by De Vlieger, Jacob, and Stange also found substantial and persistent variation in instructor effectiveness, of which rank and experience explained only a small part.
A 2019 study by Feld, Salamanca, and Zölitz found that talented student instructors can be almost as effective as professors.
These studies confirm my experience: there are wide differences in instructor quality between the best and the worst. Teaching is about individual talent, though training can improve moderately talented instructors. The best teachers are not always the ones who have been teaching the longest or the ones with the fancy degrees. Everyone knows who the talented teachers are and who the terrible ones are. Students complain and other professors who have to re-teach the students coming from their classes complain. But doing something is expensive and time consuming and would be an institutional commitment.
The studies above (particularly Carrell and West) also show that it would not be terribly difficult to measure differences in instructor quality. The best data would be held by the institution itself. But while leaders of academic units (colleges, programs, departments) have a rough, anecdotal sense of the distribution of “good,” “fair,” and “poor” quality instructors (as I always did), this data is not held or disclosed at the institutional level. No university wants to acknowledge to the public and to prospective students the substantial percentage of students who have or will end up with a lower quality instructor.
Instead, institutions rely on the student evaluations of teaching (SETs), the standard metric in higher education: a survey students fill out at the end of each semester asking for ranking (usually on a scale of 1-5) of perceived effectiveness and “satisfaction.” SETs are collected by the university and sent to deans, department chairs, and individual faculty to be used for promotion, tenure, and contract renewal. Nobody likes them but in the absence of any other internal measure, SETs have become the standard proxies for learning. (The main external source of data beyond the SETs is Rate My Professors (RMP).)
The SETs do not paint as bleak a picture as I do about poor teaching. The default quantitative picture is: lots of 4s and 5s, very few 1s. Distributions are skewed toward the positive end. Why? It is hard to say. New studies using AI are doing sentiment analysis, mining thousands of open-ended comments from teaching-and-learning evaluations to find that roughly 60% of student comments are tagged as positive, around 25% as neutral and about 15% as negative. A 2023 methodological paper on SETs argues that students simply default to favorable ratings because it is what they do. A 2015 data-mining study of several years of course evaluations reports that, for highly rated courses and instructors, “most students give high ratings.”
Rate My Professors entered the market in 1999 in part because most universities do not disclose teaching evaluations, even if they are skewed toward the positive side. But RMP also skews positive. One 2016 study reviewed nearly a million evaluations covering 71,404 professors at 33 institutions. The authors define professors as “good” if their overall rating is 3.5 or above on a 1–5 scale and “poor” if it is 2.5 or below. In that dataset, 81% of professors fall in the “good” band and 19% in the “poor” band. Why? Because RMP mostly measures “likeability,” is strongly affected by course easiness and professor “hotness,” and is biased in multiple ways.
So why am I ignoring SET data to claim that as much as 70% of teaching at most universities is only fair to poor? Because SETs are designed to measure satisfaction, not quality. An influential study by Uttl, White, and Wong (2016) found little or no correlation between SET scores and objective learning outcomes. Another by Stroebe (2020) argues that SETs encourage poor teaching and contribute to grade inflation.
The most vocal arguments against teaching evaluations are that they are biased, influenced by subject, grading leniency, gender and ethnicity of the instructor: factors unrelated to effectiveness. So even if 80% of professors are labelled “good” on RMP or SETs, that is not evidence that 80% of teaching is good in the sense of “produces strong learning, clear understanding, durable skills.”
Is anyone looking at actual learning? The last good study was Arum and Roksa’s Academically Adrift (2010), which reported that 45% of students showed no statistically significant improvement in critical thinking, complex reasoning and writing skills after their first two years of college, and 36% showed no significant gains over four years.
Carl Wieman’s 2014 Teaching Practices Inventory (TPI) work was an attempt to measure practices rather than enthusiasm and hotness. But even in a department that had worked hard on teaching, the average class still spent the majority of time in straight lecturing. Hake’s 1998 meta-analysis of introductory physics courses (over 6,000 students) found that traditional lecture courses produced an average normalized gain of about 0.23 on a standard mechanics concept inventory, whereas interactive-engagement courses produced gains around 0.48.
If you look for data, there is no good data. If you are not even looking for data, there is no data. Parents and students outside of a university have no insight into teaching quality. None of the internal data is public. What little data reaches the public looks “good” because the metrics are built around student satisfaction and universities are now primarily in the satisfaction business. And yet large percentages of students show weak gains on independent tests.
I have found nothing to refute my claim that most university teaching is poor and weak.
There has been a great deal of research on effective teaching practices (active learning, structured practice with feedback, and incentives aligned with learning) but while there is evidence that many instructors use some of these methods, there is no data that most instructors use most of them at any one university or across the country.
Bottom line: the most generous view of the evidence suggests that there are pockets of quality teaching at all institutions but that good teaching is unevenly distributed and most likely not happening in majority of courses in most institutions. As the grade inflation stories suggest, the incentives are for instructors to maximize short-term student satisfaction with top grades, generating weaker long-run learning.
The percentages I offer are from my own experience talking to students, reading course reviews, looking at non-public data, observing classrooms, comparing transcripts, and spending time outside of the very best colleges and universities. I would love to be proven wrong but as I say, there is no data.
What do university websites say about their teaching?
I cannot find a single US college or university that in 2025 guarantees high quality teaching in the sense of a formal, student-facing promise that instruction itself will meet a defined quality standard, with clear remedies if it does not. What I see are guarantees about time to degree, cost, access, mentoring, class size, internships, and “high-quality education” in a generic sense.7
Overwhelmingly, “guarantees” are promises that if the student meets certain conditions, they will be able to complete the degree in four years or receive extra tuition at no cost. The websites mention small classes, caring professors, and strong advising, but there are no guarantees about instructional quality.
Mentions of “guaranteeing quality” are almost all about process. The University of New England’s online division, for example, states that “assessment is a guarantee of quality” describing how course-level assessment of learning outcomes shows they are “delivering on [their] promises of educational excellence.” In other words, internal quality assurance methods show that “learning” occurs, not that every class will be taught well.
Accrediting agencies explicitly disavow any guarantee of course-level or graduate-level quality. The Northwest Commission on Colleges and Universities (NWCCU) states in its FAQ that institutional or specialized accreditation “cannot guarantee the quality of individual graduates or of individual courses within an institution or program, but can give reasonable assurance of the context and quality of the education offered.” You’ll see the same language across accreditors.
But a university could implement every one of their recommendations and still have no reliable way of knowing whether its teaching develops student capabilities. The belonging movement and the coddling critique share an odd feature: both focus on how students feel rather than on what students know or can do.
This avoidance of quality was sustainable when universities held a monopoly on credentialing. It is suicide in the age of AI.
Artificial Intelligence is already a “good enough” teacher. It is patient, it is knowledgeable, and it is available 24/7. It is certainly better than the “poor to abysmal” human instruction I estimate makes up half of a college student’s experience. If universities continue to charge premium tuition for sub-par human instruction that is inferior to a $20/month AI subscription, the market will collapse.
The only survival strategy is to offer what AI cannot: the 25% of “heroic” teachers who provide mentorship, complex critique, and the human accountability that drives actual learning. But you cannot sell that premium product if you decline to identify who provides it.
A Challenge to Higher Ed Leaders
Transparency about teaching quality is desperately needed. Lawmakers should focus less on politics and efficiency and start asking for value-added learning data. Parents should start asking, “How do you guarantee the quality of the professors my college student will face?”
It is time to measure, reward, and guarantee teaching quality, because it is the only thing left that is worth paying for.
Coming soon: my thoughts on how high-quality teaching—the successful transmission of complex knowledge and the verifiable acquisition of skills—could be operationalized and measured.
References
Arum, R., & Roksa, J. (2011). Academically adrift: Limited learning on college campuses. University of Chicago Press.
Baum, S., & McPherson, M. (2019). Improving teaching: Strengthening the college learning experience. Dædalus, 148(4), 5-13.
Braga, M., Paccagnella, M., & Pellizzari, M. (2014). Evaluating students’ evaluations of professors. Economics of Education Review, 41, 71-88.
Brownback, A., & Sadoff, S. (2020). Improving college instruction through incentives. Journal of Political Economy, 128(8), 2925-2972.
Carrell, S. E., & West, J. E. (2010). Does professor quality matter? Evidence from random assignment of students to professors. Journal of Political Economy, 118(3), 409-432.
Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014). Measuring the Impacts of Teachers I: Evaluating Bias in Teacher Value-Added Estimates. American Economic Review, 104(9), 2593-2632.
Deming, D. J., Goldin, C., & Katz, L. F. (2012). The for-profit postsecondary school sector: Nimble critters or agile predators? Journal of Economic Perspectives, 26(1), 139-164.
Deming, D. J., et al. (2015). Can online learning bend the higher education cost curve?. American Economic Review, 105(5), 496-501.
Deslauriers, L., et al. (2019). Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom. Proceedings of the National Academy of Sciences, 116(39), 19251-19257.
De Vlieger, P., Jacob, B., & Stange, K. (2017). Measuring instructor effectiveness in higher education. NBER Working Paper No. 23267.
Feld, J., Salamanca, N., & Zölitz, U. (2019). Students are almost as effective as professors in university teaching. Economics of Education Review, 73, 101921.
Figlio, D. N., Schapiro, M. O., & Soter, K. B. (2015). Are tenure track professors better teachers?. Review of Economics and Statistics, 97(4), 715-724.
Freeman, S., et al. (2014). Active learning increases student performance in science, engineering, and mathematics. Proceedings of the National Academy of Sciences, 111(23), 8410-8415.
Goldin, C. D., & Katz, L. F. (2008). The race between education and technology. Harvard University Press.
Hoxby, C. M., & Stange, K. (Eds.). (2019). Productivity in higher education. University of Chicago Press.
National Research Council. Improving Measurement of Productivity in Higher Education. Washington, DC: National Academies Press, 2012.
Stark, P. B., & Freishtat, R. (2014). An evaluation of course evaluations. ScienceOpen Research.
Stroebe, W. (2020). Student evaluations of teaching encourage poor teaching and contribute to grade inflation: A theoretical and empirical analysis. Basic and Applied Social Psychology, 42(4), 276-294.
Sullivan, T. A., et al. (Eds.). (2012). Improving measurement of productivity in higher education. National Academies Press.
Uttl, B., White, C. A., & Wong Gonzalez, D. (2017). Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22-42.
To be clear: I am speaking about the U.S. higher ed sector generally from my personal perspective. I am not speaking either about or for my current employer, the University of Utah.
In 2018 and 2019 there had been growing attention to the absence of data on teaching quality, such as https://www.chronicle.com/newsletter/the-edge/2018-10-30?cid=wcontentlist&sra=true https://direct.mit.edu/daed/article/148/4/5/27275/Improving-Teaching-Strengthening-the-College but Covid changed the conversation.
The assumption seems to be among many Great Books and Civic Education advocates that conservative and religious texts somehow “teach themselves.” Then why hire faculty for these new Civics Centers at all? I note too that the entire “constructive dialogue” movement rests on the assumption that teachers do not have basic skills to platform debate in their own classrooms.
Every institution has an office for the improvement of teaching quality as a resource for struggling faculty. Here are two from institutions recently in the news, UC San Diego and University of Oklahoma; here is Sonoma State and CU Boulder, where I was a graduate student teacher. These are typical: they don’t define teaching excellence, they stress confidential consultations, they do not provide data on the quality of teaching across their university.
Yes there are 1-2 teaching awards given each year. I won a big one in 2014. But 1-2 awards a year is not a systematic census of excellence.
“What is clear is that very little consensus exists as to what characterizes or defines teaching excellence. Metcalf (1963) noted that “not much is known about the relationship between how a teacher teaches and the learning that results” (p. 938). Fifty years later the field may know more, but no agreement exists on exactly what is known. Given the wide variability in definitions and understandings of excellent teaching, it is no wonder that a number of scholars have determined that excellent teaching is essentially unknowable, at least from a research standpoint, and have, therefore, advocated that the educational research community focus on more easily identified and studied input variables like teacher personality or educational background (Watson, 1963). Warner, Connor K. “Contested definitions of excellent teaching: An analysis of the discourse of quality.” Journal of Thought 50, no. 1-2 (2016): 20-36.
I asked both Google Gemini and ChatGPT to double check. ChatGPT was able to find only one institution that uses “guarantee” and “high-quality education” in the same sentence: Eastern Iowa Community Colleges, which says in a March 2025 statement: “We guarantee all people access to an affordable, high-quality education, whether you are seeking quick entry into the welding workforce or a two-year radiology degree.” That is, they guarantee access, not the quality of instruction.



Since you use the analogy, is there good data in the sense that you mention in this essay on:
The quality of doctors.
The quality of mental health therapists.
The quality of lawyers.
The quality of judges.
The quality of accountants.
The quality of plumbers.
The quality of electricians.
The quality of investment managers.
The quality of academic administrators.
The quality of assessment professionals.
The quality of consultants.
The quality of chefs.
You get the point: service economies frustrate attempts to provide fixed quantitative measures of their quality for multiple reasons, but first and foremost because the question of the nature of quality is by its nature unsettled and because outcomes are by their nature ontologically, for real, difficult to measure and always will be.
You can build metrics for some of these professions that try to make the satisfaction of clients and customers into data, but even then there are very big issues. Ask a patient about doctor satisfaction while the patient is still dependent on that doctor and you will get one kind of data. Ask a patient after they've moved to another professional and you will get another. Ask a patient to assess a doctor who is treating an intrinsically difficult condition with low rates of reported success no matter what and you will hear one thing (in part depending on whether the patient understands the material reality of their condition) and ask a patient to assess a doctor who has treated an easily resolved problem and it will sound as if the doctor has removed a thorn from the lion's paw.
It's just a hard problem and that's all there is to it. It's not a dirty secret, it's life. Whether someone providing a service satisfies someone needing the service is not something we'll ever be able to measure in a way that banishes doubt, ambiguity and judgment. With education, rather like medicine, we have the extra problem that the experience a student is having might be the only time they ever have that experience. I might in a long life have used many plumbers and begin to have a basis for evaluating the difference between good plumbers and bad ones. But with professors, doctors, and a number of other professionals, it might be that the only people who have a deep basis for comparison are the professionals themselves. And there, yes, you do have a problem, if not quite a "dirty secret", which is that professionals are generally inclined to give each other the benefit of the doubt, and to preserve the integrity of professional relationships with one another more than to provide critical insight into less-than-sufficient practice by another. That's the real hard problem to crack, not the creation of better metrics to break people on the managerial wheel more effectively.
In the end, I really feel that anything that starts from the perspective that most professionals are bad at their jobs is a non-starter. I think for the most part people who aren't doing that well in the estimation of some of their clients are laboring under horrible systemic constraints. In the case of professors, that's teaching 500-1000 people in introductory surveys, with a 4-4 load, unresponsive administrations, no sense of deep values or mission in the institution, and poor compensation. In those circumstances, I don't really look to the professional as the problem, any more than I think to myself that a battlefield doctor in a war zone has maybe had to make compromises in the service they provide.
Like your post about silos, you've written things here that have been simmering away in my brain for years. Thank you for putting these words out into the world.
The failure to address poor teaching is endemic and, as you say, exacerbated by using satisfaction survey to assess teaching. Essentially, they measure how likeable an instructor and at best function as quality assurance...assuming there is a department chair or dean willing to do the thankless work of removing the teachers who actively alienate their students from learning.
Measuring quality in human performance is always a fraught and complicated task. Better measures only get us so far and I'm skeptical that legislators are in a position to make positive change.
My hope is that all the pressures on institutions of higher learning, AI among them, force a significant rebalancing of resource allocation in favor of teaching. We know what quality improvement looks like in research...it's called peer review.
If a college or university wants to improve teaching it will invite that 25% of teachers "who provide mentorship, complex critique, and human accountability" and have them run a program aimed at elevating teaching quality and assessing it using peer review methods.