What can Visible Learning effect sizes tell us about inquiry-based learning? Nothing.

Time to read: 8 minutes

I haven’t read Visible Learning, I’ve only skimmed through it a couple of times, largely because the book isn’t aimed at me. Visible learning and its effect sizes are only useful to inform learning and teaching in highly instructional settings, I’m not interested in highly instructional settings. I think they’re a poor substitute for authentic learning, I think students would be much better served inquiry-based learning and teaching approaches, if Visible Learning was about that I’d read it. It isn’t so I haven’t and I won’t.

Which is why I was very disappointed to read Dan Haesler’s reporting on his interview with John Hattie, here, here, and here, where John reportedly critiques student-centred learning, inquiry, 21st century skills, and constructivism. Except for constructivism where this poor paper is cited as evidence, (I’ll explain why the paper is extremely poor later,) there is no evidence quoted that  suggests to me that the claims John makes doesn’t come from Visible Learning’s meta analysis. These statements, about inquiry-based learning and 21st century skills (though it isn’t my favourite term) have compelled me to write this post in order to challenge their validity.

Statements like this from John, deeply worry me..

“We have a whole rhetoric about discovery learning, constructivism, and learning styles that has got zero evidence for them anywhere.” Note, I’m not defending learning styles!

…and this next statement is worrying as  well…

 “I’m just finishing a synthesis on learning strategies, it’s not as big [as others he’s done] there’s only about 15 – 20 million kids in the sample, and one of the things that I’ve learnt from the learning strategies, and a lot of them include the 21st Century skill strategies is that there’s a dirty secret.” 

If the synthesis that John is speaking about is using the same approach as Visible Learning’s meta analysis, then the dirty little secret is that the research is invalid and can’t be trusted.

There has been a bit of talk about the maths in Visible Learning, to me this is a distraction from the real problem of the book. You only have to get to the second page of the preface of the book to find the first huge problem, as we read…

“This is not a book about qualitative studies. It only includes studies that include basic statistics (means, variances, sample sizes.) Again this is not to suggest that qualitative studies are not important or powerful just that I have had to draw lines around what can be accomplished in a 15 year writing span.” Visible Learning (preface xi)

The next section outlines an even bigger, insurmountable problem with Visible learning’s design and findings…

“It is not a book about criticism of research, I have deliberately not included much about moderators of research findings based on research attributes (quality of study, nature of design) again not because they are unimportant (my expertise is measurement and research design), but because they have been dealt with elsewhere by others.” Visible Learning (preface xi)

If you’re not interested in instructional teaching approaches and you reach a passage similar to either of the two above, my advice would be to put the book down and walk away, and advise others to do so.

These two decisions, to omit qualitative studies and not require the study design to match the object of study renders its findings virtually useless. Firstly, this restricts the definition of impact to numbered results (obviously this almost always means test scores) and secondly it allows these numbers (test scores) to measure the impact of things that may not even be seeking to impact. In short they’re using test scores to measure things that may not be able to be measured meaningfully with test scores. Furthermore, they are disregarding other research that measures these same things against their actual claims. In Visible Learning all that matters is presumably badly designed (more on that later) test scores, in classrooms we know that is simply not true. Visible learning’s design makes it incompatible with a large number of learning theories, approaches and strategies, yet unfortunately it doesn’t admit this, and it still calculates an effect size for them.

In Visible Learning’s world of research,the impact of inquiry-based approaches can be measured by how well a student does on a badly designed and irrelevant test. Does that mean the impact of 21st century skills can be measured by an irrelevant test? Does that mean that the impact of constructivism can be measured against an irrelevant test? It is as if that all that matters is the test.

The Visible Learning study design chooses not to require the object of study (eg inquiry-based learning, 21st century skills ) to be evaluated against it benefits. Instead it allows the researchers to test inquiry-based learning, 21st century skills and whatever else to be tested against what the researcher deems important, say for example the ability to pass a math test. Furthermore, the requirement that the studies produce a number surely results in favouring studies that align non-instructional approaches with instructional outcomes. Visible Learning then omits surely better designed non-instructional approaches but excluding qualitative research. The end result is that for non-instructional approaches Visible Learning has to be omitting well designed research while including poorly designed research.

To see that these questions aren’t just hypotheticals, lets look at an example where these two failings 1) the reliance on numbers and an irrelevant test, and 2) a misalignment between study design and study object cause meaningless results that are then touted as evidence. Why don’t we briefly look at the paper that John purportedly himself suggests is a major investigation into constructivismShould There Be a Three-Strikes Rule Against Pure Discovery Learning? by Richard Mayer.

Let’s specifically look at his third strike supposedly taking down Seymour Papert’s vision of discovery learning, of course we’ll conveniently side step around Mayer’s wrong assertion that Papert promoted constructivism when in fact he promoted constructionism. Does Mayer examine all of Papert and MIT kindergarten’s studies and seek to replicate them? Of course not, instead he makes the same design mistake that Visible Learning makes, trying to apply an instructional theoretical research approach to constructionism/constructivism/discovery learning.  Mayer refers to findings of two similarly deeply flawed studies, studies that seek to test what the researchers think the students should know against what they actually have learned. Actually that’s not true, the two studies do not seek to find out what the students learned at all, that might have would’ve been a better study…

The first Kurland & Pea (1985) study, which provides no rationale that the fundamental programming concepts are indeed fundamental, where is the author’s explanation that the impact of constructivism/constructionism/discovery learning can be accurately measured by testing these fundamental concepts. This study is flawed because it wrongly associates worth of the approach against a test that has nothing to do with the worth of the approach.

Where is it asserted anyway that LOGO is designed to teach a predefined set of fundamental programming concepts? Absolutely nowhere, the authors just made it up. They’ve used a badly designed test student’s knowledge of recursion! Why recursion? Simply because the authors think it is important, not because the purpose of LOGO is to teach recursion (spoiler, it is not) or because the purpose of discovery learning is to learn recursion (again spoiler, it is not.) If using a test wasn’t bad enough to evaluate LOGO and discovery learning, they’ve made it even worse by limiting the test to recursion. Sure if you want to students to learn recursion quickly do some research but don’t try to extrapolate the results or misconstrue the results to make unfounded claims against LOGO and discovery learning.

A proper study would assess impact based on the overt goals of constructivism/constructionism/discovery learning, but this study didn’t, instead it wrongly measured impact against a measure suitable for instructional approaches using a very bad test that produced very bad numbers.

The design of the second study Fay and Mayer (1994) is even worse, and its findings should be believed even less. The researchers taught programming using two approaches but they only tested using one approach (bonus points if you guess which approach was used to measure impact.)

Now there’s a radical idea, someone quick do research into the effectiveness of direct instruction to deliver on the promise of constructivism!

The third study, Lee & Thompson (1997), is a thesis and too long for me to be bothered reading, yet a flip through the first twenty pages doesn’t fill me with optimism that the research is using constructivism/constructionism/discovery learning to understand and assess the impact of constructivism/constructionism/discovery learning. In fact it again seems like a study using guided instruction design to compare guide instruction with constructionism. One could guess that the study might be comparing the approach of LOGO with an approach of LOGO plus a worksheet… by asking the students to complete a worksheet!


The Mayer paper that John cites and its three examples show why impact of constructionism and LOGO can only be adequately understood through the theoretical lens of constructionism. Anything else is to do a huge disservice to Seymour Papert, constructionism and educational research. To reduce LOGO to “discovery learning” an believe that its value can be assessed by testing students on a specific externally defined knowledge of recursion is both ridiculous and poor research. I don’t know whether these studies contributed to the Visible Learning effect sizes but I’m sure many studies like these did, studies which supposedly prove LOGO, constructivism, constructionism, discovery learning, and everything else outside of instruction don’t work as well as instruction.

If Visible Learning effect sizes did take study design into account then it would not be open to these errors, and its effect sizes would be more believable. It doesn’t and therefore I cannot see how anyone can have any confidence in the effect sizes.

Visible learning and its effect sizes probably adequately report on the impact of instructional approaches but it cannot possibly adequately or in good faith report on the effect on student learning of using LOGO, constructivism, constructionism, discovery learning, or anything else that is based on a learning theory that is not based on instruction.  If you want to know if research finds evidence into any of these things go and find research that is uses the basis of theory as a lens of understanding, there is lots out there. Unfortunately these well designed and trustworthy research isn’t included in Visible Learning because it is qualitative research (it has to be by the nature of the object of the research) and therefore is excluded from Visible Learning’s meta analysis and therefore does not contribute to Visible Learning’s effect sizes.

The same is true for John’s claim that his forthcoming meta analysis illuminates the “dirty little secret” that 21st century skills don’t work. If in this forthcoming study 21st century skills are not evaluated against the purpose of 21st century skills then the study is flawed and should not be trusted. If the forthcoming study uses the same study design as the Visible Learning then it too will be deeply flawed, and John’s claims about 21st century skills should be ignored.


Finally, just in case you’re not convinced by my critique of these research papers and the importance of using the theory of learning to measure the impact of the theory of learning. Lets examine a more everyday critique of inquiry-based learning. I’m not sure if it was the author, the sub-editor or John Fleming the interviewee, that came up with the opening paragraphs of the article Schools cool to direct instruction as teachers ‘choose their own adventure‘ but they are gold….

“WHEN teaching your four-year-old to tie their shoelaces, do you give them four pairs of shoes and tell them to try different techniques until they work it out? Or do you sit down and show them how to do it: make the bunny ears and tie the bow, watch while they try it, lending a helping finger if required, and then let them practise until they can do it on their own?

The two approaches illustrate different teaching styles used in classrooms. The first describes a constructivist method in which a child “constructs” their own understanding through discovery or activities, also referred to as student-centred learning.”

Of course, it is absurd that any rational person, let alone a teacher, wouldn’t use direct instruction to teach a child to tie their shoelaces. Unless of course you’re this guy, and you might point out that most people tie their shoelaces incorrectly.


Now if your assessment of a child’s ability to tie their shoelaces is by being able to replicate the adults (wrong) method, say by testing them, then our use of direct instruction is working wonderfully well. If your assessment on the success of direct instruction is based on how many kids are running around the school grounds with their laces undone, or how often they have to stop and retie them, then maybe direct instruction isn’t doing so well.

If direct instruction cannot even teach children to tie their shoelaces correctly, what can it possibly be trusted to get right?  If direct instruction fails to make obvious that the teacher is teaching incorrectly when tying shoelaces, what else are teachers using direct instruction getting wrong? In maths? In english? In everything?


Would a better study design show a more accurate effect size for direct instruction? A study design that looked beyond a simple numbered value produced by a test? For us to have any faith in academic research, I’d like to believe yes.


What if kids did use an inquiry approach to learn to tie our shoelaces? What if we did as the above article suggests and give kids four pairs of shoes and asked them to work out the best method?

My bet is that we’d see benefits above that of actually having everyone being able to tie their shoelaces correctly. Maybe we’d see students who were less accepting that there is a single right way? Maybe we’d see kids believe less that when something goes wrong its because they hadn’t followed the proper process accurately?  Maybe we’d see kids being more critical consumers of the purportedly correct information they’re presented with?

Maybe we’d see a whole range of things… but while we continue to use instructional measures (predefined narrow tests) to measure impact we’ll never know.


Note: I have updated this post for clarity since publishing, and I will probably make further updates over the next couple of weeks, as I receive feedback.

10 thoughts on “What can Visible Learning effect sizes tell us about inquiry-based learning? Nothing.”

  1. Thanks for that Richard – you’ve raised many of the issues I have with meta analyses such as these….and in using one paradigm of analysis to evaluate the effect of process-driven, formative rather than summative learning. Really enjoyed reading this.

  2. Thanks David, Sorry about the missing comment, I’ll check the spam folder but it may be long gone by now. Great to meet you in person yesterday. Richard

Leave a Reply