Since you use the analogy, is there good data in the sense that you mention in this essay on:
The quality of doctors.
The quality of mental health therapists.
The quality of lawyers.
The quality of judges.
The quality of accountants.
The quality of plumbers.
The quality of electricians.
The quality of investment managers.
The quality of academic administrators.
The quality of assessment professionals.
The quality of consultants.
The quality of chefs.
You get the point: service economies frustrate attempts to provide fixed quantitative measures of their quality for multiple reasons, but first and foremost because the question of the nature of quality is by its nature unsettled and because outcomes are by their nature ontologically, for real, difficult to measure and always will be.
You can build metrics for some of these professions that try to make the satisfaction of clients and customers into data, but even then there are very big issues. Ask a patient about doctor satisfaction while the patient is still dependent on that doctor and you will get one kind of data. Ask a patient after they've moved to another professional and you will get another. Ask a patient to assess a doctor who is treating an intrinsically difficult condition with low rates of reported success no matter what and you will hear one thing (in part depending on whether the patient understands the material reality of their condition) and ask a patient to assess a doctor who has treated an easily resolved problem and it will sound as if the doctor has removed a thorn from the lion's paw.
It's just a hard problem and that's all there is to it. It's not a dirty secret, it's life. Whether someone providing a service satisfies someone needing the service is not something we'll ever be able to measure in a way that banishes doubt, ambiguity and judgment. With education, rather like medicine, we have the extra problem that the experience a student is having might be the only time they ever have that experience. I might in a long life have used many plumbers and begin to have a basis for evaluating the difference between good plumbers and bad ones. But with professors, doctors, and a number of other professionals, it might be that the only people who have a deep basis for comparison are the professionals themselves. And there, yes, you do have a problem, if not quite a "dirty secret", which is that professionals are generally inclined to give each other the benefit of the doubt, and to preserve the integrity of professional relationships with one another more than to provide critical insight into less-than-sufficient practice by another. That's the real hard problem to crack, not the creation of better metrics to break people on the managerial wheel more effectively.
In the end, I really feel that anything that starts from the perspective that most professionals are bad at their jobs is a non-starter. I think for the most part people who aren't doing that well in the estimation of some of their clients are laboring under horrible systemic constraints. In the case of professors, that's teaching 500-1000 people in introductory surveys, with a 4-4 load, unresponsive administrations, no sense of deep values or mission in the institution, and poor compensation. In those circumstances, I don't really look to the professional as the problem, any more than I think to myself that a battlefield doctor in a war zone has maybe had to make compromises in the service they provide.
Yes to all of this and this is the conversation that needs to happen. Silence about it all just leads to critiques where there is no solution. Thank you!
The last paragraph of Timothy's comment is very important. I've been there, one hundred percent. That's why I got out of higher ed and into K-12. K-12 educators have been thinking about this subject for a long time, and they can and should contribute significantly to this discussion. It is a huge, huge topic, but worthwhile, and I'm glad Hollis is spending time and effort on it.
I agree with the rest of your comments, but I don't think it's true that most people who their colleagues think are bad teachers are suffering from systemic constraints. That is certainly not my impression personally, although to the general point of the article I have very limited evidence.
I think you're right that there is a kind of "insider critique" that separates out people that we know are dealing with difficult conditions (who often excel despite, or even because, of that) and people that we know are either very unmotivated by teaching or just not very skilled at it. But I personally would struggle to put into words both how I know (or think I know) when someone's not bringing it in the classroom and how to describe precisely what it is that's not working. I think the vocabulary you need to describe that is often closer to trying to explain why an artistic performance is off than it is explaining a technical failure like "operated on the wrong foot". Though there are examples of pedagogical malpractice that are that kind of bad too--say, for example, telling students that X, Y and Z are the main focus of a high-stakes exam but the actual exam focused on C, G and N instead.
Love this comment, Timothy, but I don't agree that it's a non-starter to think that "most professionals are bad at their jobs." It depends on what you mean. I think there are two possibilities here:
Possibility 1: Given the constraints most academics face, it's not plausible that most of them are bad at their jobs. The average professor is average at teaching, given those constraints.
Possibility 2: Given a certain outcome we want--the measurable development of human capital to a fairly significant extent--it's not plausible that most professors are bad at their jobs.
If you take possibility 1, you're right. It's a non-starter. If you take possibility 2, it's easy as pie that most professionals are bad at their jobs.
Now, I'm guessing you don't think there's a good measure for human capital supplementation. That's a plausible position, but I don't take it, for two reasons.
First, I think it gives us bad incentives to give up on a measure. I think most of us are far too complacent about how much thought and effort we put into teaching. When you say that it's too difficult to come up with a good measure, it's easy to move from there into assurances that you're doing a good enough job because you can point to a couple of experiences you've had.
Second, the incentives are really strongly aligned to doing a bad job. We know some things about what it takes to teach well--retrieval practice, spaced learning, interleaved learning, the testing effect, etc.--and I think a lot of do them in only a haphazard way. One reason we (IMO) do a bad job is that it's hard to do these things. A lot of them are quite boring for us, and would force us to change our teaching from doing it in the way we like. Second, students don't want us to do this. If they had to spend more time doing work for classes in ways that improve learning, classes would become less easy and less fun for them. But most students already (a) don't care about learning for learning's sake, (b) only want a degree to get a job, and (c) see most of the general electives as obstacles rather than instruments to getting that job. And as far as (c) goes, they're right!
You'll notice my two reasons for not taking your position about the difficulty of measuring are not evidentiary but practical. That is, while I agree that's there's a significant likelihood that we won't reach a great measure, I think coming to that conclusion has bad practical outcomes that lead to us teaching worse. (That's my first reason against taking your view.) Second, I think that right now, not trying to get a measure will be replaced by some other measure (e.g., student evaluations) that is worse, and we also have all sorts of have reasons to think that what we're doing now is not only not optimal, but closer to pessimal than it is to optimal.
Sorry if I'm not expressing myself well, but I've become somewhat consumed by this issue lately.
Like your post about silos, you've written things here that have been simmering away in my brain for years. Thank you for putting these words out into the world.
The failure to address poor teaching is endemic and, as you say, exacerbated by using satisfaction survey to assess teaching. Essentially, they measure how likeable an instructor and at best function as quality assurance...assuming there is a department chair or dean willing to do the thankless work of removing the teachers who actively alienate their students from learning.
Measuring quality in human performance is always a fraught and complicated task. Better measures only get us so far and I'm skeptical that legislators are in a position to make positive change.
My hope is that all the pressures on institutions of higher learning, AI among them, force a significant rebalancing of resource allocation in favor of teaching. We know what quality improvement looks like in research...it's called peer review.
If a college or university wants to improve teaching it will invite that 25% of teachers "who provide mentorship, complex critique, and human accountability" and have them run a program aimed at elevating teaching quality and assessing it using peer review methods.
Thank you Rob -- I am grateful for your perspective particularly. The people who are most loud about "doing something" are generally unbalanced themselves, which is why I particularly like your phrase "significant rebalancing" of things in favor of teaching. I do not want to blame teachers but this awful system. Yes to your ideas for improvement!
What criteria would be used to assess good versus bad university teaching?
Other than anecdotes, what is there?
One way to assess it would be to have students take national, standardized tests before and after completing certain courses, and compare their scores horizontally against other teachers and other schools, and before and after they took the course. For example, a basic chemistry course could be assessed this way. But I have never heard of anything remotely like this existing.
What would probably work would be the reintroduction of aptitude tests, which were largely eliminated circa 1970 because they generated "disparate impact" results for minority students.
I would be interested to hear your suggestions for measuring university level teaching quality.
Also, my recollection of conservative critiques of university education is that they exclude conservatives and promote a left ideology. That is a separate question from "teaching quality" along some other axis.
This is the topic of my next post! I'm working on "how to operationalize" my definition. You are correct that there is nothing remotely like a real assessment of teaching quality. And yes you are correct that the ideological critiques are not about quality and never have been. Here I am angering everyone in one post. Wish me luck!
You are pointing out the Emperor's nudity. No anger from me! What angers me is that a huge sector of our economy, which is mandatory (no real career prospects without a BA unless you are in the trades), crushingly expensive (six figures of debt when you are just starting out in life?), and has an unmatched influence on all aspects of cultural and political and economic life (trust the experts!), has literally gone for generations with ZERO objective basis to assess the effectiveness of its core function -- teaching! (Yes, professors think research and publishing and talking to each other at conferences is their core function, but everyone else thinks they are supposed to be teaching.) If you were the proverbial Martian asking how the people in this ridiculously favored sector got such a sweet deal, the answer would start out, "Well, it's a long story ..." But however we got here, we need to get out of here. This ain't working.
The work of Eric Mazur, physics prof at Harvard, is relevant. The origin story of his 'flipped classroom' approach to physics teaching begins with his own discovery that testing his students for problem-solving skills. The results shocked him in that demonstrated almost no useful learning of what was taught in the course. His subsequent changes to the course (flipping the classroom to have rote and fundamentals assigned for off-class work and in classroom to emphasize live problem solving). I am not doing justice to the story in my summary, and better descriptions can be found at https://bokcenter.harvard.edu/flipped-classrooms or in numerous Eric Mazur youtube videos. This approach worked to greatly improve results for his physics students. It helps that teaching from first principles is relatively easier in physics. My own experience in trying to flip the classroom in a pathophysiology class proved much more difficult, in part due to the large amount of knowledge that does not come from first principles but has to be memorized (e.g. the many components of the inflammatory response, etc.). My main point is to agree with the potential of before and after testing regimens to assess teaching effectiveness.
Aptitude tests, as the name implies, are not designed to measure teaching and learning, but the in the moment capacities of the test taker. They're not a particularly good tool for measuring the specific effect of a class or teacher on a student's knowledge or capacities.
That is untenable going forward. The entire industry has lost trust and credibility, deservedly so. If the teachers cannot demonstrate their value, many of them are going to be unemployed. Every other profession has to produce, and prove they are producing, or they are unemployed. A correction is long overdue on this.
While I think instruction could be much improved, and spend most of my time working to improve that instruction, I think the quality (or lack there of) of college teaching is well down the list of challenges to higher education institutions. The "lost trust" has much less to do with the on-the-ground quality of the teaching vs. other structural factors, many of which are out of the hands of institutions themselves (though I am also a frequent critic of how institutions play the hands they're dealt).
I never had a hard time determining how much (or how little) my students learned. Importantly, neither did my students. If institutions genuinely cared about teaching quality by creating the conditions for good teaching to happen, it would not be impossible to get a strong sense of student learning.
The greater danger, IMO, is doing to higher ed what's happened to K-12 and creating a system where instructors are incentivized to teach to the test, creating an inevitable shit show once Campbell's Law kicks in, something that is well-documented in the K-12 space.
I agree with all of this. My father taught calculus and statistics at a community college for three decades. His students learned the material, and he knew they knew it.
The problem now is that students are paying too much for what they are getting, incurring debt, and the political anger is building. The "fix" will likely be worse than the disease if we have people being "taught to the test" up to age 22 rather than 18 -- compounding a mistake and corrupting the measuring methodology.
But time is running out for the college-level educational industry to clean its own house. I have children who recently college age, and the entire situation is an outrage and a disgrace. You are expected to mortgage your future for a piece of paper. We found work-arounds. But many people will not. It is a hostage situation, and the hostages are going to revolt.
The system isn't sustainable and hasn't been for some time. I published a book during the pandemic trying to argue that the faults that had been exposed by the events should have made it clear that institutions organized around collecting revenue, rather than on the educational mission were ultimately doomed.
There is a way forward, but it involves recognizing post-secondary as a public rather than private good and organizing operations around that value. It means a system where schools try to optimize something other than prestige as part of the competition for students.
The prestige competition and the status anxiety that drives it should be eliminated from the equation. People can get status from achievement in adult life, whether in scholarship, business, or whatever. Not based on college admission as a teenager.
Your work has been foundational to my thinking John and I appreciate your perspective. I think we both agree the lack of data is an institutional failure, not a failure of faculty certainly. My weariness about never being able to do anything whether as a dean or a chair about terrible teaching is what I am mostly speaking to.
I appreciate that frustration. My frustration is from the bottom of that relationship namely, why can institutions not figure out how to reward the quality of my work and keep me in a sustainable position that allows me to do that work. Measuring my teaching had no effect on that challenge. As I say, I think there's lower hanging fruit, but the first step would be to convince institutions that teaching matters.
The “definition” of teaching swaps “teaching” with “classroom experience,” and defines “high-quality” as “successful” and “verifiable.” These sound like the _outcomes_ of good teaching (ie, learning), not good teaching itself. This is like defining “good parenting” by the outcome of “well-adjusted” young adult.
Here’s an improvised attempt to get closer to the thing itself: Good teaching is a mode of _explanation_ and _demonstration_ which, when combined with feedback, compels the student to increase their own mental involvement with the material in the direction of mastery. This mastery is then _detected_ in demonstrable knowledge and skills.
My guess is that your intuition about who’s a good teacher hinges on their ability explain and demonstrate things, (ie, how smart they are) combined with their adeptness at feedback (ie, how mature and sincere they are). These criteria seem awfully close to aspects of someone’s character or personality, which puts them squarely in the last mile.
I went to a 'good' school. Not an Ivy, generally considered just below. A lot of my classmates were unable to get into Harvard/MIT/Standford; it was a lot of people's 2nd school. Anyway, the science and engineering dept's had good reputations, as did the medical school and associated health fields. I would call probably 2 of my professors good quality over that time, and maybe another 4 average or slightly above. The rest were objectlively awful, and many of them I think were completely aware of it and didn't care. The main difference is that the poor professors didn't want to teach, at all. Many were quite open about it outside of the immediate classroom context. They want to do reasearch; its why they toughed it out to get the phd in the first place. If they wanted to teach they'd have gone into Education, not biomedical engineering or physics. The school had a reputation for good research projects going back decades. This is what draws these profs. The teaching is a burden to be borne to allow access to the research.
I figured out pretty quickly that I was not going to receive any actual instruction from these people at the undergrad level. They were simply the holders of the hoops through which I must jump for the degree. All actual learning took place between the student, their peers, the course material, and a tutor if needed. The tutor program was actually pretty good, and they were the ones who taught me how to identify the profs that are only there for the research and status. They also pointed out the groups of students who are fully aware of how the system works, have no expectation of learning anything inside the classroom itself, and only appeared in class to get the next assignment, turn in the last one, and take tests. They also told me which professors to absolutely avoid as they both do not want to teach and they have an attendance policy that will fail you regardless of scores on assignments and tests. This was back in the 90s too, I imagine this approach is even more popualar now than ever with the assignments being both received from and turned into an online source that the university bought the course from, with the profs serving primarily as test proctors to prevent cheating. I'd have been fine with this too.
Thank you for writing this. I fear this is far too common. I am hoping that my piece will spark some recognition of how variable (because this is really the key) teaching is so that we can better support the excellent teachers and perhaps ease the poor into something else.
Hi there, this is a great post. I wanted to point you towards a yet-unpublished economics working paper that tries to do exactly what you're calling for (and it seems you haven't yet come across it).
"Instructor Value-Added in Post-Secondary Education" by Warnick, Light, and Yim (July 2025)
The gist is that they come up with a way to estimate college-professor-value-added using only transcript data. That means this method can be used at any higher ed institution.
They also show that their method is equally valid as quasi-experimental methods in the literature. From the abstract: "Using a unique policy at a large public university in Indiana, we show that our method accounts for selection just as well as methods that exploit conditional random assignment of students to courses. We next show that our method reduces forecast bias in a wider variety of institutions using data from nearly all public universities in Texas. We find that individual instructors matter for students’ future grades and post-college earnings in many subjects and courses."
Thank you! Are you the person who also linked this in the comments at MR? I had missed this paper, which mostly helps my larger claim, since the data are non-public and research-only, yes? Universities are not running these studies for internal accountability or guarantees of quality. There is still no student or parent who can log in and see “these sections are 1 SD above the mean on future GPA or earnings.”
As I understand it, what the paper shows (and it is a pretty elegant design) is that instructor-level value-added to both learning and income is technically measurable at scale with data universities already have. In other words, studying value added is feasible, as I suggested it could be, but no university yet wants to be this transparent.
The paper also supports my claims that instructor quality varies widely even within the same course (as everyone knows) and that rank, credentials, etc. are weak signals of real instructional value, that student evaluations correlate only weakly with value-added and more strongly with grading leniency and satisfaction, and that serious work on teaching quality proceeds entirely in terms of value-added rather than ideology.
In short, I can't say “no data” but I can still say "almost no data." And still, there is almost no institutionally produced or student-facing value-added data. The main obstacle is institutional will and incentives!
Also the big question -- How will AI change this??
Yes, I cross-posted on MR 🙂. My response there was snippy so I apologize for that.
I agree with everything you said. I, too, believe that institutional will is the "limiting reagent" to producing this at scale. In my opinion the pinch point is a desire to not single out poor performers. You see this with salary schedules and annual evaluations at many public institutions. Up until recently at OU, for example, there were always "across-the-board" raises given to all faculty regardless of performance. OU also recently made a move to reduce the variance in annual evaluations (moving from a 500-point rating scale to a 3-point scale). This benefits the worst performers at the expense of the stars. Just like grade inflation for students in the classroom.
Hopefully it's only a matter of time before these more sophisticated methods become commonplace. Academic Analytics, for example, does similar statistical measurement of faculty research productivity. It's not a far cry to imagine they could do something similar with student transcript data.
To the extent that AI forces GPA compression, that will undermine our ability to compute value-added (at least for successive GPA outcomes).
Another commenter here had a long comment suggesting that the problem is a lack of comparison of materials. I don't think that matters. In fact, I think forcing materials on faculty reduces quality and innovation. I'll be writing more soon!
Not only is there not any direct data on teaching quality, we avoid even indirct data. Ask students how many hours week they spent doing homework (you won't learn much if you don't do much)? Ask how often was something assessed and how quickly they were returned (2-3 papers a semester that are returned at then end of the semster or not at all doesn't suggest learning)? How often was class canceled or let our early?
These types of questions don't measure learning directly but they get at necssary traits of a course to have a chance of learning happening. As a faculty member who's children attended the same college, you learn a lot about what goes on and it isn't inspriring.
Yes. As a dean I could read all the evals in my college and they told a story of sorts (who was good at feedback, who cancelled classes, who had favorites, etc) but not really much else. But outside those evals, everyone knew -- mostly from alumni who came back to praise !
Latent in your own 25% hunch is a reminder that the absence of a single metric doesn’t necessarily mean the absence of real comparative knowledge. I’m curious how much the TBA operationalized approach will resemble rhetorical analysis or close reading—the fitness of a text to its topic and to its reader.
One question I often come back to is whether anyone actually wants quality teaching. We know from the literature the SET doesn't track learning. What administrations want isn't learning but graduation rates, donations, exciting courses, happy students, high enrollments, etc. What parents want, if we can trust the k12 school choice literature, rarely tracks value added.
I am very sympathetic to the call for increased quality in higher ed teaching and learning. I am skeptical that it is easy to measure and implement. The cited studies rely on random assignment of students into introductory courses then follow student performance on subsequent (first, second year) course work. I think it follows that universities could reasonably engage in random assignment for first year mass enrollment courses and assess these instructors (usually grad students and lecturers) work. It doesn't follow that you can pursue this strategy in any other context. You don't have enough students taking giant cohorted classes to make randomization work, and you don't have a clear "calc II follows and USES calc I" logic. And as you have argued in other writing, we should be handing all the first year instruction to AI tutors in any case and focusing our time and effort into upper division courses where this assessment strategy doesn't work at all.
The other solutions might be workable but it would be more helpful to grapple with the actual cost of implementing them. How much faculty time does it take, e.g., to move to active learning strategies relative to chalk and talk? (I did this last year and this. Answer: A LOT) What would a full blown mentor teacher program look like in terms of time and dollar expenses? I'm not sure what the data here look like, but worth discovering.
Any system that fails to reward excellence or get rid of mediocrity will suffer the fate you so aptly describe. Of course, that's what you would expect any good economist to say.
Thank you for this important article. I think you have one key line here that says it all: "Everyone knows who the talented teachers are and who the terrible ones are." That's especially the case for the smartest students. They are the ones we should be surveying. Good students can tell when a professor knows his/her stuff, communicates in an interesting and helpful way, and challenges students appropriately through assignments and lessons. General, end-of-course surveys that all students complete are pointless. Each university should be seeking out the best of the best students and conducting interviews with them on how they view the school's instructors.
Also, good teachers like to see other teachers in action. In my many years as a professor, I would constantly sit in on other classes (sometimes in my leadership role as chair, but often just for fun). I enjoyed seeing how other teachers ran their classes. And I could usually tell within five minutes who the good and bad teachers were.
As you said, everyone knows. But we just need to be asking the right people to do the evaluating.
Thank you for writing -- yes, exactly. The challenge of explaining this to those who haven't seen what you've seen (and what I see) is harder than I expected. All advice on expanding this conversation welcome!
Hollis, I agree with your core claims. My questions is this: what would data that speaks to the quality of teaching look like? Clearly neither SETs nor RMPs are even in the ballpark. As an ancient learner and teacher, I know good teaching when I see it, but what data would be persuasive for measuring it? I have some ideas, but I'm pretty sure they are not scalable.
I agree with most of the points made in your post with the exception of equating quality teaching with elite institutions in your first paragraph and that higher education is bereft of clear frameworks/rubrics on teaching effectiveness. I have worked in teaching centers at elite liberal arts colleges, state universities and regional private colleges. I have observed tens of classes at each institution and there are most definitely examples of amazing teaching at each. In no case was there a correlation to an institutional type or even a distinct discipline.
In terms of frameworks there is a definite trend of more colleges putting in place specific frameworks on what constitutes excellent teaching. This applies to online, blended and face-to-face courses. Much of the work started online as we worked to show that online teaching is ‘as good as’ face-to-face instruction. There is not time here to go into the long dead ‘no significant difference’ debate.
The frameworks are not perfect but there is a great deal we know about what good teaching looks like. Unfortunately, it is a huge battle to get these frameworks adopted as part of promotion and tenure and to have research on teaching accepted as scholarly research, so again, implementation is spotty. But they do provide a great starting point for any institution interested in aligning their statements of teaching quality with clear indicators of teaching effectiveness.
Thank you for all of this. My reason for claiming that there is better teaching at elite universities is about the lower tails alone and the dependence on untrained and sometimes unknown adjunct labor. Many lecturers are excellent. But I have seen appalling decisions made when there are spikes in enrollment in a unit and suddenly teachers are needed a week before the semester starts. Appalling decisions. That doesn't happen at elite institutions. Nobody really understand how common it is at community colleges and mid-tier publics.
Can't disagree. I have seen the same thing. On the other side are the research professors at elite institutions who simply should not be in the classroom at all. I witnessed one professor at an Ivy League university state on the first day of class, exact quote, "I hate teaching freshmen and I don't give 'A's.' Large, research-centric universities are also notorious for putting TA's in lower level classes. We could argue about which universities do the best job teaching overall but essentially we are in agreement. There is no rigorous, systemic support of excellent teaching and AI is calling out our inability to define and demonstrate the teaching of flexible, critical thinking.
This is a needed article. I taught at a mid-level state university for 30 years.
I was a good teacher. I know because I had 30 years of student course evaluations, written by them and in numbers to standardized questions (both types of evaluations were standard). My two highest numbers (like twin peaks sticking up) were in (1) standards/expectations and in (2) overall quality of instruction. Thousands of ratings. Expect more, students work harder, students learn more, students feel their money was well spent. You show you are working hard, they will work harder.
I was a HARD grader. I took attendance. I never had a single student complain about high standards. They like achieving something that gave them confidence. No grade inflation for me or in our department. That wasn't the acceptable norm.
Most of my students were first generation college students. I loved them and respected them. They didn't expect anything but to for me to make demands on them to learn. They worked in high school, during summers, and during the school years to go to college. (Did I say how much I loved them?)
It's THEIR opinions that mattered. All of my colleagues were evaluated as I was....every course, every semester.
When I was hired, in 1977, the Department Chair told me, explicitly, "you are here to teach."
I agree with many of your points about teaching quality. It's a hard problem.
I am confused about your last claim that current commercially available AI's can provide better instruction than 50% of college teachers. Do you know of many students who have used them in this way, and found them to be effective at teaching?
I agree that LLMs often give good answers to many questions. IMO, the confident wrong answers make them not yet useful unless you are already an expert.
Since you use the analogy, is there good data in the sense that you mention in this essay on:
The quality of doctors.
The quality of mental health therapists.
The quality of lawyers.
The quality of judges.
The quality of accountants.
The quality of plumbers.
The quality of electricians.
The quality of investment managers.
The quality of academic administrators.
The quality of assessment professionals.
The quality of consultants.
The quality of chefs.
You get the point: service economies frustrate attempts to provide fixed quantitative measures of their quality for multiple reasons, but first and foremost because the question of the nature of quality is by its nature unsettled and because outcomes are by their nature ontologically, for real, difficult to measure and always will be.
You can build metrics for some of these professions that try to make the satisfaction of clients and customers into data, but even then there are very big issues. Ask a patient about doctor satisfaction while the patient is still dependent on that doctor and you will get one kind of data. Ask a patient after they've moved to another professional and you will get another. Ask a patient to assess a doctor who is treating an intrinsically difficult condition with low rates of reported success no matter what and you will hear one thing (in part depending on whether the patient understands the material reality of their condition) and ask a patient to assess a doctor who has treated an easily resolved problem and it will sound as if the doctor has removed a thorn from the lion's paw.
It's just a hard problem and that's all there is to it. It's not a dirty secret, it's life. Whether someone providing a service satisfies someone needing the service is not something we'll ever be able to measure in a way that banishes doubt, ambiguity and judgment. With education, rather like medicine, we have the extra problem that the experience a student is having might be the only time they ever have that experience. I might in a long life have used many plumbers and begin to have a basis for evaluating the difference between good plumbers and bad ones. But with professors, doctors, and a number of other professionals, it might be that the only people who have a deep basis for comparison are the professionals themselves. And there, yes, you do have a problem, if not quite a "dirty secret", which is that professionals are generally inclined to give each other the benefit of the doubt, and to preserve the integrity of professional relationships with one another more than to provide critical insight into less-than-sufficient practice by another. That's the real hard problem to crack, not the creation of better metrics to break people on the managerial wheel more effectively.
In the end, I really feel that anything that starts from the perspective that most professionals are bad at their jobs is a non-starter. I think for the most part people who aren't doing that well in the estimation of some of their clients are laboring under horrible systemic constraints. In the case of professors, that's teaching 500-1000 people in introductory surveys, with a 4-4 load, unresponsive administrations, no sense of deep values or mission in the institution, and poor compensation. In those circumstances, I don't really look to the professional as the problem, any more than I think to myself that a battlefield doctor in a war zone has maybe had to make compromises in the service they provide.
Yes to all of this and this is the conversation that needs to happen. Silence about it all just leads to critiques where there is no solution. Thank you!
The last paragraph of Timothy's comment is very important. I've been there, one hundred percent. That's why I got out of higher ed and into K-12. K-12 educators have been thinking about this subject for a long time, and they can and should contribute significantly to this discussion. It is a huge, huge topic, but worthwhile, and I'm glad Hollis is spending time and effort on it.
I agree with the rest of your comments, but I don't think it's true that most people who their colleagues think are bad teachers are suffering from systemic constraints. That is certainly not my impression personally, although to the general point of the article I have very limited evidence.
I think you're right that there is a kind of "insider critique" that separates out people that we know are dealing with difficult conditions (who often excel despite, or even because, of that) and people that we know are either very unmotivated by teaching or just not very skilled at it. But I personally would struggle to put into words both how I know (or think I know) when someone's not bringing it in the classroom and how to describe precisely what it is that's not working. I think the vocabulary you need to describe that is often closer to trying to explain why an artistic performance is off than it is explaining a technical failure like "operated on the wrong foot". Though there are examples of pedagogical malpractice that are that kind of bad too--say, for example, telling students that X, Y and Z are the main focus of a high-stakes exam but the actual exam focused on C, G and N instead.
Love this comment, Timothy, but I don't agree that it's a non-starter to think that "most professionals are bad at their jobs." It depends on what you mean. I think there are two possibilities here:
Possibility 1: Given the constraints most academics face, it's not plausible that most of them are bad at their jobs. The average professor is average at teaching, given those constraints.
Possibility 2: Given a certain outcome we want--the measurable development of human capital to a fairly significant extent--it's not plausible that most professors are bad at their jobs.
If you take possibility 1, you're right. It's a non-starter. If you take possibility 2, it's easy as pie that most professionals are bad at their jobs.
Now, I'm guessing you don't think there's a good measure for human capital supplementation. That's a plausible position, but I don't take it, for two reasons.
First, I think it gives us bad incentives to give up on a measure. I think most of us are far too complacent about how much thought and effort we put into teaching. When you say that it's too difficult to come up with a good measure, it's easy to move from there into assurances that you're doing a good enough job because you can point to a couple of experiences you've had.
Second, the incentives are really strongly aligned to doing a bad job. We know some things about what it takes to teach well--retrieval practice, spaced learning, interleaved learning, the testing effect, etc.--and I think a lot of do them in only a haphazard way. One reason we (IMO) do a bad job is that it's hard to do these things. A lot of them are quite boring for us, and would force us to change our teaching from doing it in the way we like. Second, students don't want us to do this. If they had to spend more time doing work for classes in ways that improve learning, classes would become less easy and less fun for them. But most students already (a) don't care about learning for learning's sake, (b) only want a degree to get a job, and (c) see most of the general electives as obstacles rather than instruments to getting that job. And as far as (c) goes, they're right!
You'll notice my two reasons for not taking your position about the difficulty of measuring are not evidentiary but practical. That is, while I agree that's there's a significant likelihood that we won't reach a great measure, I think coming to that conclusion has bad practical outcomes that lead to us teaching worse. (That's my first reason against taking your view.) Second, I think that right now, not trying to get a measure will be replaced by some other measure (e.g., student evaluations) that is worse, and we also have all sorts of have reasons to think that what we're doing now is not only not optimal, but closer to pessimal than it is to optimal.
Sorry if I'm not expressing myself well, but I've become somewhat consumed by this issue lately.
Like your post about silos, you've written things here that have been simmering away in my brain for years. Thank you for putting these words out into the world.
The failure to address poor teaching is endemic and, as you say, exacerbated by using satisfaction survey to assess teaching. Essentially, they measure how likeable an instructor and at best function as quality assurance...assuming there is a department chair or dean willing to do the thankless work of removing the teachers who actively alienate their students from learning.
Measuring quality in human performance is always a fraught and complicated task. Better measures only get us so far and I'm skeptical that legislators are in a position to make positive change.
My hope is that all the pressures on institutions of higher learning, AI among them, force a significant rebalancing of resource allocation in favor of teaching. We know what quality improvement looks like in research...it's called peer review.
If a college or university wants to improve teaching it will invite that 25% of teachers "who provide mentorship, complex critique, and human accountability" and have them run a program aimed at elevating teaching quality and assessing it using peer review methods.
Thank you Rob -- I am grateful for your perspective particularly. The people who are most loud about "doing something" are generally unbalanced themselves, which is why I particularly like your phrase "significant rebalancing" of things in favor of teaching. I do not want to blame teachers but this awful system. Yes to your ideas for improvement!
What criteria would be used to assess good versus bad university teaching?
Other than anecdotes, what is there?
One way to assess it would be to have students take national, standardized tests before and after completing certain courses, and compare their scores horizontally against other teachers and other schools, and before and after they took the course. For example, a basic chemistry course could be assessed this way. But I have never heard of anything remotely like this existing.
What would probably work would be the reintroduction of aptitude tests, which were largely eliminated circa 1970 because they generated "disparate impact" results for minority students.
I would be interested to hear your suggestions for measuring university level teaching quality.
Also, my recollection of conservative critiques of university education is that they exclude conservatives and promote a left ideology. That is a separate question from "teaching quality" along some other axis.
This is the topic of my next post! I'm working on "how to operationalize" my definition. You are correct that there is nothing remotely like a real assessment of teaching quality. And yes you are correct that the ideological critiques are not about quality and never have been. Here I am angering everyone in one post. Wish me luck!
You are pointing out the Emperor's nudity. No anger from me! What angers me is that a huge sector of our economy, which is mandatory (no real career prospects without a BA unless you are in the trades), crushingly expensive (six figures of debt when you are just starting out in life?), and has an unmatched influence on all aspects of cultural and political and economic life (trust the experts!), has literally gone for generations with ZERO objective basis to assess the effectiveness of its core function -- teaching! (Yes, professors think research and publishing and talking to each other at conferences is their core function, but everyone else thinks they are supposed to be teaching.) If you were the proverbial Martian asking how the people in this ridiculously favored sector got such a sweet deal, the answer would start out, "Well, it's a long story ..." But however we got here, we need to get out of here. This ain't working.
The work of Eric Mazur, physics prof at Harvard, is relevant. The origin story of his 'flipped classroom' approach to physics teaching begins with his own discovery that testing his students for problem-solving skills. The results shocked him in that demonstrated almost no useful learning of what was taught in the course. His subsequent changes to the course (flipping the classroom to have rote and fundamentals assigned for off-class work and in classroom to emphasize live problem solving). I am not doing justice to the story in my summary, and better descriptions can be found at https://bokcenter.harvard.edu/flipped-classrooms or in numerous Eric Mazur youtube videos. This approach worked to greatly improve results for his physics students. It helps that teaching from first principles is relatively easier in physics. My own experience in trying to flip the classroom in a pathophysiology class proved much more difficult, in part due to the large amount of knowledge that does not come from first principles but has to be memorized (e.g. the many components of the inflammatory response, etc.). My main point is to agree with the potential of before and after testing regimens to assess teaching effectiveness.
Thank you! I am learning a great deal since posting the piece, for which I am deeply grateful.
Aptitude tests, as the name implies, are not designed to measure teaching and learning, but the in the moment capacities of the test taker. They're not a particularly good tool for measuring the specific effect of a class or teacher on a student's knowledge or capacities.
What would be a good test?
Currently there is nothing.
That is untenable going forward. The entire industry has lost trust and credibility, deservedly so. If the teachers cannot demonstrate their value, many of them are going to be unemployed. Every other profession has to produce, and prove they are producing, or they are unemployed. A correction is long overdue on this.
While I think instruction could be much improved, and spend most of my time working to improve that instruction, I think the quality (or lack there of) of college teaching is well down the list of challenges to higher education institutions. The "lost trust" has much less to do with the on-the-ground quality of the teaching vs. other structural factors, many of which are out of the hands of institutions themselves (though I am also a frequent critic of how institutions play the hands they're dealt).
I never had a hard time determining how much (or how little) my students learned. Importantly, neither did my students. If institutions genuinely cared about teaching quality by creating the conditions for good teaching to happen, it would not be impossible to get a strong sense of student learning.
The much harder task would be to truly measure the teacher's specific impact on that learning. As I've written elsewhere, the confounding conditions and variables are too numerous to even wrap one's head around: https://www.insidehighered.com/blogs/just-visiting/why-measuring-teaching-success-so-complicated
The greater danger, IMO, is doing to higher ed what's happened to K-12 and creating a system where instructors are incentivized to teach to the test, creating an inevitable shit show once Campbell's Law kicks in, something that is well-documented in the K-12 space.
I agree with all of this. My father taught calculus and statistics at a community college for three decades. His students learned the material, and he knew they knew it.
The problem now is that students are paying too much for what they are getting, incurring debt, and the political anger is building. The "fix" will likely be worse than the disease if we have people being "taught to the test" up to age 22 rather than 18 -- compounding a mistake and corrupting the measuring methodology.
But time is running out for the college-level educational industry to clean its own house. I have children who recently college age, and the entire situation is an outrage and a disgrace. You are expected to mortgage your future for a piece of paper. We found work-arounds. But many people will not. It is a hostage situation, and the hostages are going to revolt.
The system isn't sustainable and hasn't been for some time. I published a book during the pandemic trying to argue that the faults that had been exposed by the events should have made it clear that institutions organized around collecting revenue, rather than on the educational mission were ultimately doomed.
There is a way forward, but it involves recognizing post-secondary as a public rather than private good and organizing operations around that value. It means a system where schools try to optimize something other than prestige as part of the competition for students.
That all sounds good.
The prestige competition and the status anxiety that drives it should be eliminated from the equation. People can get status from achievement in adult life, whether in scholarship, business, or whatever. Not based on college admission as a teenager.
Your work has been foundational to my thinking John and I appreciate your perspective. I think we both agree the lack of data is an institutional failure, not a failure of faculty certainly. My weariness about never being able to do anything whether as a dean or a chair about terrible teaching is what I am mostly speaking to.
I appreciate that frustration. My frustration is from the bottom of that relationship namely, why can institutions not figure out how to reward the quality of my work and keep me in a sustainable position that allows me to do that work. Measuring my teaching had no effect on that challenge. As I say, I think there's lower hanging fruit, but the first step would be to convince institutions that teaching matters.
The “definition” of teaching swaps “teaching” with “classroom experience,” and defines “high-quality” as “successful” and “verifiable.” These sound like the _outcomes_ of good teaching (ie, learning), not good teaching itself. This is like defining “good parenting” by the outcome of “well-adjusted” young adult.
Here’s an improvised attempt to get closer to the thing itself: Good teaching is a mode of _explanation_ and _demonstration_ which, when combined with feedback, compels the student to increase their own mental involvement with the material in the direction of mastery. This mastery is then _detected_ in demonstrable knowledge and skills.
My guess is that your intuition about who’s a good teacher hinges on their ability explain and demonstrate things, (ie, how smart they are) combined with their adeptness at feedback (ie, how mature and sincere they are). These criteria seem awfully close to aspects of someone’s character or personality, which puts them squarely in the last mile.
I went to a 'good' school. Not an Ivy, generally considered just below. A lot of my classmates were unable to get into Harvard/MIT/Standford; it was a lot of people's 2nd school. Anyway, the science and engineering dept's had good reputations, as did the medical school and associated health fields. I would call probably 2 of my professors good quality over that time, and maybe another 4 average or slightly above. The rest were objectlively awful, and many of them I think were completely aware of it and didn't care. The main difference is that the poor professors didn't want to teach, at all. Many were quite open about it outside of the immediate classroom context. They want to do reasearch; its why they toughed it out to get the phd in the first place. If they wanted to teach they'd have gone into Education, not biomedical engineering or physics. The school had a reputation for good research projects going back decades. This is what draws these profs. The teaching is a burden to be borne to allow access to the research.
I figured out pretty quickly that I was not going to receive any actual instruction from these people at the undergrad level. They were simply the holders of the hoops through which I must jump for the degree. All actual learning took place between the student, their peers, the course material, and a tutor if needed. The tutor program was actually pretty good, and they were the ones who taught me how to identify the profs that are only there for the research and status. They also pointed out the groups of students who are fully aware of how the system works, have no expectation of learning anything inside the classroom itself, and only appeared in class to get the next assignment, turn in the last one, and take tests. They also told me which professors to absolutely avoid as they both do not want to teach and they have an attendance policy that will fail you regardless of scores on assignments and tests. This was back in the 90s too, I imagine this approach is even more popualar now than ever with the assignments being both received from and turned into an online source that the university bought the course from, with the profs serving primarily as test proctors to prevent cheating. I'd have been fine with this too.
Thank you for writing this. I fear this is far too common. I am hoping that my piece will spark some recognition of how variable (because this is really the key) teaching is so that we can better support the excellent teachers and perhaps ease the poor into something else.
Hi there, this is a great post. I wanted to point you towards a yet-unpublished economics working paper that tries to do exactly what you're calling for (and it seems you haven't yet come across it).
https://merrill-warnick.github.io/merrill-warnick/PostSecVA_Latest.pdf
"Instructor Value-Added in Post-Secondary Education" by Warnick, Light, and Yim (July 2025)
The gist is that they come up with a way to estimate college-professor-value-added using only transcript data. That means this method can be used at any higher ed institution.
They also show that their method is equally valid as quasi-experimental methods in the literature. From the abstract: "Using a unique policy at a large public university in Indiana, we show that our method accounts for selection just as well as methods that exploit conditional random assignment of students to courses. We next show that our method reduces forecast bias in a wider variety of institutions using data from nearly all public universities in Texas. We find that individual instructors matter for students’ future grades and post-college earnings in many subjects and courses."
Some cool findings:
1 SD better instructor leads to +0.13 GPA, +17% future earnings
Student evaluations are uncorrelated with value-added
Observable characteristics explain <2% of variation in value-added
A value-added-based retention program could increase students' future earnings by 2.7%
Thank you! Are you the person who also linked this in the comments at MR? I had missed this paper, which mostly helps my larger claim, since the data are non-public and research-only, yes? Universities are not running these studies for internal accountability or guarantees of quality. There is still no student or parent who can log in and see “these sections are 1 SD above the mean on future GPA or earnings.”
As I understand it, what the paper shows (and it is a pretty elegant design) is that instructor-level value-added to both learning and income is technically measurable at scale with data universities already have. In other words, studying value added is feasible, as I suggested it could be, but no university yet wants to be this transparent.
The paper also supports my claims that instructor quality varies widely even within the same course (as everyone knows) and that rank, credentials, etc. are weak signals of real instructional value, that student evaluations correlate only weakly with value-added and more strongly with grading leniency and satisfaction, and that serious work on teaching quality proceeds entirely in terms of value-added rather than ideology.
In short, I can't say “no data” but I can still say "almost no data." And still, there is almost no institutionally produced or student-facing value-added data. The main obstacle is institutional will and incentives!
Also the big question -- How will AI change this??
Yes, I cross-posted on MR 🙂. My response there was snippy so I apologize for that.
I agree with everything you said. I, too, believe that institutional will is the "limiting reagent" to producing this at scale. In my opinion the pinch point is a desire to not single out poor performers. You see this with salary schedules and annual evaluations at many public institutions. Up until recently at OU, for example, there were always "across-the-board" raises given to all faculty regardless of performance. OU also recently made a move to reduce the variance in annual evaluations (moving from a 500-point rating scale to a 3-point scale). This benefits the worst performers at the expense of the stars. Just like grade inflation for students in the classroom.
Hopefully it's only a matter of time before these more sophisticated methods become commonplace. Academic Analytics, for example, does similar statistical measurement of faculty research productivity. It's not a far cry to imagine they could do something similar with student transcript data.
To the extent that AI forces GPA compression, that will undermine our ability to compute value-added (at least for successive GPA outcomes).
Another commenter here had a long comment suggesting that the problem is a lack of comparison of materials. I don't think that matters. In fact, I think forcing materials on faculty reduces quality and innovation. I'll be writing more soon!
Not only is there not any direct data on teaching quality, we avoid even indirct data. Ask students how many hours week they spent doing homework (you won't learn much if you don't do much)? Ask how often was something assessed and how quickly they were returned (2-3 papers a semester that are returned at then end of the semster or not at all doesn't suggest learning)? How often was class canceled or let our early?
These types of questions don't measure learning directly but they get at necssary traits of a course to have a chance of learning happening. As a faculty member who's children attended the same college, you learn a lot about what goes on and it isn't inspriring.
Yes. As a dean I could read all the evals in my college and they told a story of sorts (who was good at feedback, who cancelled classes, who had favorites, etc) but not really much else. But outside those evals, everyone knew -- mostly from alumni who came back to praise !
Latent in your own 25% hunch is a reminder that the absence of a single metric doesn’t necessarily mean the absence of real comparative knowledge. I’m curious how much the TBA operationalized approach will resemble rhetorical analysis or close reading—the fitness of a text to its topic and to its reader.
One question I often come back to is whether anyone actually wants quality teaching. We know from the literature the SET doesn't track learning. What administrations want isn't learning but graduation rates, donations, exciting courses, happy students, high enrollments, etc. What parents want, if we can trust the k12 school choice literature, rarely tracks value added.
I want quality teaching!
(said in the voice of the old guy from the Simpsons episode: "I was saying Boo-urns!")
I am very sympathetic to the call for increased quality in higher ed teaching and learning. I am skeptical that it is easy to measure and implement. The cited studies rely on random assignment of students into introductory courses then follow student performance on subsequent (first, second year) course work. I think it follows that universities could reasonably engage in random assignment for first year mass enrollment courses and assess these instructors (usually grad students and lecturers) work. It doesn't follow that you can pursue this strategy in any other context. You don't have enough students taking giant cohorted classes to make randomization work, and you don't have a clear "calc II follows and USES calc I" logic. And as you have argued in other writing, we should be handing all the first year instruction to AI tutors in any case and focusing our time and effort into upper division courses where this assessment strategy doesn't work at all.
The other solutions might be workable but it would be more helpful to grapple with the actual cost of implementing them. How much faculty time does it take, e.g., to move to active learning strategies relative to chalk and talk? (I did this last year and this. Answer: A LOT) What would a full blown mentor teacher program look like in terms of time and dollar expenses? I'm not sure what the data here look like, but worth discovering.
Any system that fails to reward excellence or get rid of mediocrity will suffer the fate you so aptly describe. Of course, that's what you would expect any good economist to say.
Thank you for this important article. I think you have one key line here that says it all: "Everyone knows who the talented teachers are and who the terrible ones are." That's especially the case for the smartest students. They are the ones we should be surveying. Good students can tell when a professor knows his/her stuff, communicates in an interesting and helpful way, and challenges students appropriately through assignments and lessons. General, end-of-course surveys that all students complete are pointless. Each university should be seeking out the best of the best students and conducting interviews with them on how they view the school's instructors.
Also, good teachers like to see other teachers in action. In my many years as a professor, I would constantly sit in on other classes (sometimes in my leadership role as chair, but often just for fun). I enjoyed seeing how other teachers ran their classes. And I could usually tell within five minutes who the good and bad teachers were.
As you said, everyone knows. But we just need to be asking the right people to do the evaluating.
Thank you for writing -- yes, exactly. The challenge of explaining this to those who haven't seen what you've seen (and what I see) is harder than I expected. All advice on expanding this conversation welcome!
Hollis, I agree with your core claims. My questions is this: what would data that speaks to the quality of teaching look like? Clearly neither SETs nor RMPs are even in the ballpark. As an ancient learner and teacher, I know good teaching when I see it, but what data would be persuasive for measuring it? I have some ideas, but I'm pretty sure they are not scalable.
I agree with most of the points made in your post with the exception of equating quality teaching with elite institutions in your first paragraph and that higher education is bereft of clear frameworks/rubrics on teaching effectiveness. I have worked in teaching centers at elite liberal arts colleges, state universities and regional private colleges. I have observed tens of classes at each institution and there are most definitely examples of amazing teaching at each. In no case was there a correlation to an institutional type or even a distinct discipline.
In terms of frameworks there is a definite trend of more colleges putting in place specific frameworks on what constitutes excellent teaching. This applies to online, blended and face-to-face courses. Much of the work started online as we worked to show that online teaching is ‘as good as’ face-to-face instruction. There is not time here to go into the long dead ‘no significant difference’ debate.
The frameworks are not perfect but there is a great deal we know about what good teaching looks like. Unfortunately, it is a huge battle to get these frameworks adopted as part of promotion and tenure and to have research on teaching accepted as scholarly research, so again, implementation is spotty. But they do provide a great starting point for any institution interested in aligning their statements of teaching quality with clear indicators of teaching effectiveness.
Here are just a few:
A Compilation of Resources on Teaching Effectiveness (Google Doc): https://docs.google.com/document/d/1j7Em9bhHz8J9pf1KDlDsk1C3O5B-BHunJohJHBviOwc/edit?usp=sharing
Association of College & University Educators (ACUE) Effective Practices Framework https://acue.org/?acue_courses=acues-effective-practice-framework
Quality Matters (Online & Hybrid Teaching) https://www.qualitymatters.org/qa-resources/rubric-standards/higher-ed-rubric
College and University Classroom Environment Inventory https://case.edu/ucite/sites/case.edu.ucite/files/2018-02/College-and-University-Classroom-Environment-Inventory.pdf
The Career Framework for University Teaching (Royal Academy of Engineering)
(background) https://www.raeng.org.uk/publications/reports/career-framework-for-university-teaching-background (Framework) https://www.teachingframework.com/
TEVAL Project: http://teval.net/index.html
(Frameworks) http://teval.net/resources.html
Thank you for all of this. My reason for claiming that there is better teaching at elite universities is about the lower tails alone and the dependence on untrained and sometimes unknown adjunct labor. Many lecturers are excellent. But I have seen appalling decisions made when there are spikes in enrollment in a unit and suddenly teachers are needed a week before the semester starts. Appalling decisions. That doesn't happen at elite institutions. Nobody really understand how common it is at community colleges and mid-tier publics.
Can't disagree. I have seen the same thing. On the other side are the research professors at elite institutions who simply should not be in the classroom at all. I witnessed one professor at an Ivy League university state on the first day of class, exact quote, "I hate teaching freshmen and I don't give 'A's.' Large, research-centric universities are also notorious for putting TA's in lower level classes. We could argue about which universities do the best job teaching overall but essentially we are in agreement. There is no rigorous, systemic support of excellent teaching and AI is calling out our inability to define and demonstrate the teaching of flexible, critical thinking.
This is a needed article. I taught at a mid-level state university for 30 years.
I was a good teacher. I know because I had 30 years of student course evaluations, written by them and in numbers to standardized questions (both types of evaluations were standard). My two highest numbers (like twin peaks sticking up) were in (1) standards/expectations and in (2) overall quality of instruction. Thousands of ratings. Expect more, students work harder, students learn more, students feel their money was well spent. You show you are working hard, they will work harder.
I was a HARD grader. I took attendance. I never had a single student complain about high standards. They like achieving something that gave them confidence. No grade inflation for me or in our department. That wasn't the acceptable norm.
Most of my students were first generation college students. I loved them and respected them. They didn't expect anything but to for me to make demands on them to learn. They worked in high school, during summers, and during the school years to go to college. (Did I say how much I loved them?)
It's THEIR opinions that mattered. All of my colleagues were evaluated as I was....every course, every semester.
When I was hired, in 1977, the Department Chair told me, explicitly, "you are here to teach."
I agree with many of your points about teaching quality. It's a hard problem.
I am confused about your last claim that current commercially available AI's can provide better instruction than 50% of college teachers. Do you know of many students who have used them in this way, and found them to be effective at teaching?
I agree that LLMs often give good answers to many questions. IMO, the confident wrong answers make them not yet useful unless you are already an expert.