The toxic myth of good and bad teachers

Time to read: 13 minutes

There are a number of claims made by various people about the effect on a student having a good teacher versus having a bad teacher. Most of these claims are nonsensical, and rather than increasing the likelihood of improvement in schools, they do a great deal of damage to teachers, students and schools, and make school improvement much less likely.

Why?

Because there aren’t many good teachers and there aren’t many bad teachers, most teachers are just average. We know this, because we know that teacher quality, measured across all teachers, results in a normal distribution, a bell curve. Sure we’d find that there are a few high performing teachers at the top end and a few low performing teachers at the bottom, but the far majority would be in the middle with not that much separating them. If you’re a teacher who is reading this post, I’ve got bad news for you, you’re almost certainly an average teacher. Just as I am almost certainly an average teacher. While we’d like to think that we’re high performing, compared to our colleagues, the actual evidence points to the contrary.

It would be same if we measured the quality of carpenters, golfers, doctors, lawyers, public servants, scientists, whoever… but for some reason we don’t complain about the quality of other professions. Teacher bashing has become a convenient excuse for far too many critics.

A few week or so ago, The Age newspaper identified some of the A+ teachers helping students to 40+ in VCE here in Victoria, Australia. In this article five teachers, whose VCE results stand out clearly above other teachers, are identified as great teachers. One teacher had ten out of the top 14 students in VCE Sociology, another taught 17 of the top 33 students for his Business, and another taught 2 of the top 8 students in their Australian History. The results these teachers have achieved are exceptional, and clearly it is impossible to believe that every teacher could produce these kind of results, but it is also clearly wrong to suggest that just because they aren’t producing these kinds of results that they are bad teachers.

Obviously, every parent whose child is undertaking VCE would love to have teachers who produced these results. Yet to believe that it is possible for the majority of teachers to produce results similar to these four teachers in this article is plainly wrong. It is impossible for most teachers to have a number of students in the top bracket of students. Of course it shouldn’t be surprising that there are teachers who produce results like these, rather it is to be totally expected, as obviously when we consider the distribution of teacher quality, its distribution is shaped like a bell curve.

Empirical_Rule

This is why statements from people like John Hattie are so misleading. According to Hattie “teachers account for a variance of 30% in student achievement.” I’m not convinced this is true but even if it is, is Hattie describing the maximum variance within to the two limits of the bell curve? If he is then the 30% variance only applies to a tiny fraction of exceptional good teachers compared with the tiny number of exceptionally poor teachers. For the far majority of teachers, whose quality lies in the middle of curve, the variance will be non-existent. Sure there may be a theoretical maximum 30% difference within the absolute best and the absolute worse, but for the far majority the variance will be close to zero.

hattie-30

Looking at the graph above then Hattie’s 30% maximum variance within good and bad teachers, even if it is true, is vastly overstated. Three standard deviations from the mean of 15% falls at 6% and 24%, which cover the middle 99.7% of teachers. As such there is only 18% difference within the middle 99.7% of teachers. Looking at two standard deviations which account for 95% of all teachers, slims the variance to 12%! While the middle 68% of teachers (1 standard deviation) only shows a 6% variance in student achievement, a far cry from the stated 30%!

Of course Hattie might not believe that teacher quality fits a normal distribution, but if so he needs to justify why he believes this and how many good and bad teachers there is in our schools. He may also suggest that the absolute maximum difference is more than 30% and the 30% figure correlates to the third or even the second standard deviation from the mean, but if that was the case, then surely that would be the figure promoted or he would explain how many teachers are subject to this variance. But he doesn’t so I believe it is safe and proper to fit a normal distribution to his variance claims.

Dylan Wiliam tries to similarly tries to promote the myth of good and bad teachers when speaking at the ALT-C conference in 2007 he said, “If you get one of the best teachers, you will learn in six months what an average teacher will take a year to teach you. If you get one of the worst teachers, that same learning will take you two years.” Again, I’m not agreeing with these figures, in fact I highly doubt them unless of course Wiliam is speaking about pure memorisation and direct instruction, which he well may be.

Wiliam gives us a little more information than Hattie though. He suggests that his data doesn’t fit a normal distribution, where the average would be in the middle of the lower and upper bounds. If Wiliam’s data fitted a normal distribution then the average would be 15 months instead of 12 months, as 15 months is 9 months more than the lower bound of 6 months, and 9 months less than the upper bound of 24 months.  As such, Wiliam’s assertion fits what is called a positively skewed distribution, as shown below.

positive-skewed-data

By graphically representing Wiliam’s figures, it is obvious that his claims are overblown. It is clear that the far majority of teachers produce about the same results, and the teachers at the lower and upper ends of the impact are in the tiny minority. Also according to Wiliam, most teachers, that is more than half, are not producing a year’s worth of learning a year! While the mean is 12 months, in a positively skewed distribution the mode will be less than the mean. In Wiliam’s world, more than 50% don’t produce a year’s worth of learning in one year… and somehow it is their fault??

 

Even if the maximum difference within the best and the average teaching is 18 months, then the actual variance of what is an average teacher, and the variance within the far majority (1, 2 or 3 standard deviations from the mean) would be much, much smaller, and again just like Hattie’s 30% variance extremely overstated for the majority of teachers and students. Even if Hattie’s and William’s figures are correct, the point to their message must be that this difference does not occur regularly in our classrooms but rather it is an extremely rare exception.

 

The cumulative effect of Hattie, Wiliam and others suggesting that these rare and extreme examples of teacher quality variance are in fact common occurrences, results in teacher quality being viewed as a much bigger problem than it is. Yes, if their figures are accurate, it should be a big concern that 0.015% of teachers impact student learning outcomes much less than others (probably only measure solely through test scores) but it is a rare problem rather than systemic problem than many people believe, and it should be seen and treated as such. Also, it should be recognised that the problem of variance in professional quality is not unique to teaching but rather occurs at the same distribution and to the same degree in every profession.

Atul Gawande posed the question “What happens when patients find out how good their doctors really are?” in his 2004 article Under The Bell Curve. Gawande describes the efforts over  117 cystic fibrosis clinics across the US over the last 60 years. We’d like to think that when we go to hospital we would get the same quality of care and would have the same expected outcomes regardless of which hospital we attend and which doctor attends to us. Yet, Gawande tells us that isn’t the case at all, there are good doctors and bad doctors, with the good hospitals in 1997 reporting life spans 16 years above the average for cystic fibrosis patients! Gawande points us to the bell curve and reports that the far majority of doctors and hospitals however are average and their patients have much shorter life expectancies.

Gawande then explores how hospitals reacted to the news that the care they were providing to their patients was average, and the efforts they used to increase the quality of the average majority. While there have been substantial improvements, Gawande insists that the bell curve remains, and will always remain, and the difference in life expectancy for cystic fibrosis (and patients will other life threatening illnesses) will always be dependent on the quality of the care they receive.

Gawande finishes his article examining himself as a surgeon. What if he found out that he was just average, or worse? For Gawande however, the problem of being average isn’t as big as settling for being average, something I assume that Gawande admits that he would rather quit being a surgeon than doing.

So do the doctors and hospitals how provide average quality care for cystic fibrosis patients at their clinics want to improve want to improve? One would hope so, but simply identifying them as average doesn’t mean they are happy being average as there is no evidence to suggest this. The bell curve in itself does and cannot distinguish those who want to improve from those who don’t. In fact, Gawande points to patients who chose to stay with their average doctors because of the care they feel built up over a number of years.

It is distressing for teachers to acknowledge the bell curve. After all we all want to view ourselves as being good teachers as opposed to being average but a realistic understanding of how skills and knowledge are distributed across of a cohort forces us to face this unwelcome truth.

Of course it would be easier if we actually could measure teacher quality which would allow us to measure, identify and quantify good, bad and average teachers. The problems is that we don’t have an universal way of understanding teacher quality, while various groups have tried they haven’t done a good job of this. The previous government here in Victoria, unsuccessfully tried to implement a system where school principals would rate their teachers from 1 to 5 in order to identify 20 to 40% of them as being underperforming and not eligible for pay promotion. Clearly those suggesting this system believed that teacher quality in Victorian schools didn’t fit a normal distribution but rather a negatively skewed distribution. Of course, the only reason they had for this was budgetary.

This is where the toxic nature of talking about good and bad teachers is revealed. After all does it matter more about the actual distribution of teacher quality, or does it matter more about what people believe the distribution looks like? What happens when a myth is propagated that teacher quality doesn’t fit a bell curve but rather fits a negatively skewed distribution?

Furthermore, in the absence of appropriate data we do what most people do, we assume that we are a good teacher and therefore we are the definition of a good teacher. And if we’re not teachers ourselves, when base our view on good teachers on the teachers we had when we were at school. It’s almost as if we say, “I might not know what teacher quality is, but I know a great teacher when I see one.” Which might sound reasonable… but in reality these ideas have an incredibly narrow view of what a teacher is, and quickly descend into discrimination and teacher bashing.

Discrimination and teacher bashing? How?

Well, some people believe that to be a mythical great teacher you need to be a highly passionate caring teacher. In this narrative great teachers are in the mould of Miss Honey from Roald Dahl’s “Matilda” with a rare gift to inspire and connect with their students. These people point to inspirational teachers who taught them when they are in school, or the inspirational teacher they believe themselves to be.

This narrow understanding of teacher quality creates unrealistic expectations, it really is impossible for every teacher, in most schools, to have an amazing rapport with each and every student. As a result quality teaching becomes a teacher who displays their passion for teaching by working long hours and having teaching as their only real priority. Who is always positive and never has a bad day!

While we’d all like every teacher to be passionate about teaching, but discrimination happens when we expect every teacher to be only thinking about teaching and willing to put in every hour they can. Single-parents and others for a range of reasons, who are unable or unwilling to devote every waking hour to teaching are quickly labeled as bad teachers, who should be moved on, overlooked for promotion or discriminated in other ways. People with problems in their personal lives, or suffer from medical conditions might not always project this image of the inspirational teacher, and when we’re on the hunt for bad teachers these people can soon be in our sights…

Others believe that to be a mythical great teacher you need high level knowledge and skills. A good teacher is so much smarter and more knowledgeable than a bad teacher. Pretty soon though we’re lining up those teachers we don’t think are knowledgeable enough and moving them on. Tests have recently be proposed here in the Australia to check that new teachers are literate enough teach, despite them having passed their teaching degree and all their school teaching placements.

Older teachers who are not up with technology might be the first to go. Next might be women who have taken maternity leave and have a big gap in their experience or who are not able to (in our eyes) balance family/work. Next might be those who aren’t on Twitter day and night, attending professional conferences whenever they can in order to keep their skills up to date.

We’re all too quick to blame and label those teachers who aren’t just like us. Rather than celebrating diversity and considering what it might offer our students and our education system, we see diversity and being undesirable. We see diversity as being different from good, and we blame those for not being exactly like our picture of an ideal teacher, and we make erroneous assumptions about them.

Again this something that Dylan Wiliam gets really wrong when he says, “if we create a culture where every teacher believes they need to improve, not because they are not good enough but because they can be even better, there is no limit to what we can achieve.” While this might sound reasonable, sort of, where is Wiliam’s evidence that every teacher doesn’t currently want to improve? My confident guess is that teacher’s desire to improve is also distributed as bell curve, and Wiliam’s assertion that there are many teachers that don’t want to improve is misguided and overstated.

William’s attempt to link a teacher’s desire to improve to the variance of teacher quality is also false. You cannot overcome the bell curve by wishing it away, no more than every golfer can play as well as professional golfers if only they wanted to improve! It is silly. And why does Wiliam’s faith in limitless potential derive from? Surely finding better approaches for learning and teaching is where limitless potential might be found, such as via new pedagogical approaches afforded by modern technologies?

But those who talk about good and bad teachers don’t want to find new pedagogical approaches, they’re happy with the system we’ve got. And shame on anyone who can’t be a good teacher in their system and can’t reap good results using their approaches. According to these experts, it’s not the bell curve that’s the problem, it is the teachers themselves.

Not only does this lead to discrimination, with anyone who doesn’t fit their mould being labelled a bad teacher. It also leads to not focussing on what could actually improve student learning outcomes. While we try to narrow the quality gap, whether it be Hattie’s 30% or Wiliam’s year and a half year, we’re not concerned with why all teachers can’t successfully teach in Hattie and William’s systems. We’re not looking for pedagogical approaches, (constructivism anyone?, inquiry anyone?) that might not be so susceptible to such variances due to teacher quality.

Consider the Measures of Effective teaching project whose goal is to identify effective teaching. I’m still at a loss why you wouldn’t just use test scores as a predictor of future test scores, unless of course you’re trying to pretend that student learning isn’t just about test scores. Of course, if you want to try to pretend that you can measure effective teaching beyond test scores you can then appear to agree that learning isn’t just about test scores, which I guess is why METS suggest approaches that weigh test scores somewhere between 33% and 50%…

In order for every student to achieve success we need learning and teaching approaches that are suitable for average teachers. We need to recognise that education of our students is far more than test scores. That is the first stage and until we’ve done that we need to lay off teacher quality. If Hattie, Wiliam, and others do believe that education is all about test scores then they need to be honest and upfront about that before we start labelling teachers as good and bad.

How many good and bad teachers there actually are matters a lot. Take for example, the report: Great Teaching, Inspired Learning What does the evidence tell us about effective teaching? where the authors say:  “Modelling by the US economist Erik Hanushek estimates that if a student had a good teacher as opposed to an average teacher for five years in a row, the effect would be sufficient to close the average performance gap associated with low-socioeconomic status.”

But how likely is it that a student had a good teacher as opposed to an average teacher five years in a row? If we want the results that Hattie and William suggest the best of the best teachers can achieve, then we’re looking at the teachers above the third standard deviation or 0.015% of teachers. How likely is it that a student would have these teachers for five years in a row? We can working this out by multiplying 0.15 with itself five times

0.015 x 0.015 x 0.015 x 0.015 x 0.015 = 0.00000000007%

This is so unlikely you wonder why Hanushek would even bother suggesting this.

If we believe that teacher quality fits with a normal distribution how many standard deviations are we going to choose to identify good teachers, that is, where do we set the bar? Say we set the middle 68% as the average (one standard deviation) which means the top 16% will be good teachers? How likely is a student to have a good teacher five years in a row? Only 0.0001%! Alternatively, if we believe that only 80% of teachers are good, then only 30% of students will have a good teacher for five years in a row. And where do Hattie, Wiliam and their peers set the bar, for what is tolerance of teacher quality for which their pedagogical approaches work?

We have two choices, first we follow the path of Hattie, Wiliam and their peers who think that our pedagogical approaches are set in stone and appropriate and our teacher variance is the problem, or we can decide that teacher variance shouldn’t impact student learning, rather instead our pedagogical approaches should ensure all students equally experience learning success. Make no mistake, a focus on teacher quality is incompatible with a focus on pedagogical innovation and improvement, and conversely a focus on pedagogical innovation and improvement is incompatible with a focus on pedagogical quality. We need to choose which focus we believe offers the biggest gains for increasing student learning and equity.

I believe that we need to find, and that we can find, learning and teaching approaches that work for almost all (99.85%) teachers. If we can find pedagogical approaches that work for 99.85% of teachers, then 99.25% of students will have access to exemplary learning experiences for five years in a row. This will not only result in better learning outcomes but also a system that is more inclusive, equitable and more diverse.

Improving our pedagogical approaches so that they work effectively for all students and teachers is a complex task, and one that we won’t be able to solve while we continue to apportion the blame on bad teachers.

For me, the choice is clear. We need to stop speaking about good and bad teachers, we need to stop worrying about teacher variance, and instead focus on what might actually make a difference in the lives of our students focussing on developing higher quality learning and teaching approaches that are not limited by the variance of teacher quality.

 

Footnote: By the way, the bell curve as it relates to good and bad school leaders is also true. Sure there may be a tiny great school leaders, and tiny few terrible ones, but most of them are just part of the average majority…. as for doctors, public servants, politicians, car drivers, golfers, singers, ….

 

Update: Feedback from Andrew Worsnop suggests that I’m misusing Hattie’s 30% figure, I’ve expanded the section on Wiliam’s figures as it makes the same point. I haven’t changed my writing on Hattie’s 30% though, as I’m not sure that I agree with Andrew that I am misconstruing what Hattie is saying about the 30% variance in teacher quality/impact.

 

Image credit:  A visual representation of the Empirical (68-95-99.7) Rule based on the normal distribution. http://commons.wikimedia.org/wiki/File:Empirical_Rule.PNG Creative Commons Attribution-Share Alike 4.0 International license.

8 thoughts on “The toxic myth of good and bad teachers”

  1. “What happens when a myth is propagated that teacher quality doesn’t fit a bell curve but rather fits a negatively skewed distribution?”

    What happens when a myth is propagated that teacher quality fits a bell curve?

    “…obviously when we consider the distribution of teacher quality, its distribution is shaped like a bell curve.”

    You offer no justification for this “obvious” assumption.
    You have several times mentioned that others need to justify the use of their apparent distributions, without at any stage justifying your own.

    “Of course it would be easier if we actually could measure teacher quality which would allow us to measure, identify and quantify good, bad and average teachers.”

    The fact that you now claim that this can’t be measured undermines your previous statement about comparison with other professions: “It would be same if we measured the quality of carpenters, golfers, doctors, lawyers, public servants, scientists, whoever…”

    ” “If you get one of the best teachers, you will learn in six months what an average teacher will take a year to teach you. If you get one of the worst teachers, that same learning will take you two years.” Again, I’m not agreeing with these figures, in fact I highly doubt them unless of course Wiliam is speaking about pure memorisation and direct instruction, which he well may be.”

    The end comment is rather odd. Do you mean to say that there is no difference between teachers in helping students understand ideas, but there is a difference in helping students remember things?

    “By graphically representing Wiliam’s figures, it is obvious that his claims are overblown. It is clear that the far majority of teachers produce about the same results, and the teachers at the lower and upper ends of the impact are in the tiny minority. Also according to Wiliam, most teachers, that is more than half, are not producing a year’s worth of learning a year! While the mean is 12 months, in a positively skewed distribution the mode will be less than the mean.”

    The mode is less than the mean, meaning that the modal teacher teaches the content in less than 12 months. So in fact, according to your representation of Dylan Williams’ statement, more than half of teachers are producing MORE than a year’s worth of learning in a year.

    “How likely is it that a student would have these teachers for five years in a row? We can working this out by multiplying 0.15 with itself five times

    0.015 x 0.015 x 0.015 x 0.015 x 0.015 = 0.00000000007%”

    This is assuming which teacher you get each year is independent of which teacher you get the previous year / the next year. This is a pretty bad assumption since it seems likely that teachers are not randomly distributed – plus, it is very common for a child to have the same teacher for a subject for several years (this happened to me for most of my subjects, even in a large school). Even if teachers are randomly distributed, they tend to stay in the same school for many years, so if you managed to get a good teacher the first year, your chances of getting one the next year are closer to 1 in 3 rather than 15 in 1000.

    “Make no mistake, a focus on teacher quality is incompatible with a focus on pedagogical innovation and improvement, and conversely a focus on pedagogical innovation and improvement is incompatible with a focus on pedagogical quality.”

    I wholeheartedly disagree, to the point that
    I would say that the ESSENCE of pedagogical improvement is improvement in teacher quality. Teachers can get better. If we didn’t believe that people can get better at things, then what’s the point of education in the first place?

    ***

    I believe that teachers actually account for considerably MORE than 30% of the variance of results. The reasons for this are very simple: Marva Collins. (Jaime Escalante is another example.) These were two amazing teachers who achieved remarkable things with their highly disadvantaged students. For example, Jaime Escalante had the largest number of students of any school in the state pass the Advanced Placement Calculus examination EVERY YEAR for about six years, after which he left due to arguments with the administration. And these were kids from a disadvantaged neighbourhood (mainly Hispanic, and poor). Marva Collins took the drop-outs from the elementary schools in the poorest, most disadvantaged area in Chicago, who were all illiterate and had very negative attitudes towards academics, and had them ravenously reading Shakespeare, Dante, Dostoyevsky, and a host of other classic authors within two years.

    Comparing the performance of the students of these teachers with students in general, we can see that the variance between them and other kids of similar backgrounds is HUGE – much larger than 30% for sure! Since the only apparently important variable that is different between the groups is who the teacher was, I put this huge variance down to the teachers.

    p.s. What’s more, don’t forget: rather than thinking in relative terms about how a teacher (doctor, golfer) etc. gets better compared to the rest, what about thinking about the whole distribution moving up in quality together in absolute terms? This should certainly be fairly transparent with golfers, who you mention in passing – in any sport, the quality of performance of all professional players tends to go up with time. Comparing golfers (or any sportspeople) today with those from 30 or 50 years ago, those of today are undoubtedly superior.

    This should stretch to teachers as well. We could (at least in theory) make ALL teachers better – it might not change the shape of the distribution, but it would nonetheless improve education for everyone by improving absolute performance.

  2. Thanks for the comment Stanislaw,

    I get your concerns, they’re common concerns. You base what you know from what you see. You see the good teachers you mention, you see the difference that they make and assume that but from there you make assumptions that are simply not true. What’s worse these assumptions have really bad consequences for students, teachers and schools.

    The example you give, is one of outliers, this is exactly my point, outliers tell us nothing of the 99.7% of the distribution.

    You might also be interested in this keynote from a recent programming conference where the keynoter makes the same claims about good and bad programmers https://www.youtube.com/watch?v=hIJdFxYlEKE

    1. The point I am making at the end: The outliers are at the edge of the distribution. The difference between the two ends of the distribution, you have written, is 30% of variance of student performance. I have added that the difference is that between a very low performing student (more precisely, a typical student under a very low-performing teacher) and a Marva Collins-type miracle student (more precisely, a typical student under a miraculously good teacher). It seems to me that to equate these two (i.e. to equate the last sentence with “30% of variance”) is absurd. I am claiming that the effect of the teacher is in fact potentially very large, and so improving all teachers would be a sensible idea. Much like the average amateur chess player now would have a fair chance of beating a world champion from the 19th century, I believe that the whole distribution can move up (in other words, the distribution stays the same shape but the mean improves dramatically).

  3. The point of this post, if that wasn’t clear to you, is to state that Hattie and William’s variance (whether it is 30% or 18 months) is absurd. Furthermore, it is even more absurd to suggest such variance can be overcome by teachers “trying harder” or identifying “bad” teachers.

    I believe we can improve things as well, the big question is how… and this post aims to show a focus on teacher quality is the wrong way.

  4. It is certainly true that it would be impossible to eliminate variance in teacher quality. I agree that eliminating teachers at the lower end of the distribution is problematic and probably futile, since there must always be someone at the lower end of the distribution. So I agree that labelling teachers as “good” or “bad” can be unhelpful.

    However, I disagree that such a variance itself is absurd. And I don’t see how this means that we shouldn’t try to improve teacher quality – just because the variance can’t be eliminated, doesn’t mean the mean can’t be improved. To compare with surgeons: if the best surgeons perform 98% of their surgeries well and the worst perform 60% well, if we just eliminated the worst then we’d subsequently feel that we had to eliminate the 65% surgeons since they are now “the worst”. This would be a bit silly, since we could then eliminate the new “worst” (70%), then the new “worst” after that (75%)… However, I still think that raising the mean surgeon performance (without firing any surgeons) is good – now we could have e.g. a 70-99% distribution instead of 60-98%, so there would still be a variance and still be a “worst” surgeon, but healthcare has improved for all.

  5. What you suggest sounds fine in theory but in reality it doesn’t work, because we’re unable to measure the right things. Report cards for surgeons have unintended consequences (see http://www.nytimes.com/2015/07/22/opinion/giving-doctors-grades.html) and similar attempts at measuring teacher effectiveness have produced similar detrimental outcomes (but without the deaths!)

    Here in Australia we’ve had high stakes testing for a number of years with our NAPLAN tests. We’re linking teacher quality to these results, and guess what teachers now spend lots of time teaching to test! And guess what, because of all this teaching to test, results are going down!

  6. That is very sad. The problem of trying to measure something so soft is that you end up missing the point, as you mention the Australian authorities appear to be doing. With all that high-stakes testing around, when is there time for deep learning? This problem is really bad in the US as well. In the UK it’s not quite as bad, but still not great. 🙁

Leave a Reply