Columns on View All Columns
Visit ELTBOOKS - all Western ELT Books with 20% discount (Japan only)

The Uni-Files

A candid look at EFL life and lessons from a university teacher's perspective.

August 19, 2009

Why I Don't Get Statistics

I don’t get statistics.

OK, let me qualify that a little. I don’t get statistics the way they are used in a lot of EFL/ESL research. It’s true that I am not a stat boffin, although I was one of those kids who would memorize things like the projected 162-game pinch hitting averages for entire major league baseball teams. This, of course, might be considered to be more ‘numerical data memorization’ than statistics per se. Nonetheless, I have a basic non-academic understanding of stats and can appreciate when they are used to make interesting predictions about the natural sciences or explain social trends. Trust me, it's not jealousy at work here.

Now I know there are people out there who are very passionate indeed about statistics. In fact, many such people like to view anything and everything statistically, even the spouses' mood swings. And the fact that you would definitely not want to invite these people over for a pool party is totally beside the point.

What is thoroughly connected to the point though is the trend of many EFL/ESL journals to try and make themselves look more academically ‘rigorous’ by featuring several charts and tables featuring what I can only regard as stat porn. Now this isn’t my own blog site so I will not name the more egregious offenders and I will guess that you, dear reader, can easily find examples of what I’m talking about.

And what I’m talking about is:

1. Cases where the stat tables and related discussion actually serve to obscure a worthy point:
A statistical table should serve to illustrate the point you are making more viscerally, not obscure it. If you have reliably discerned that “discrete item grammar point test preparation has little or no influence on university entrance exam performance” then for crying out loud, say it! Don’t hide it in some statistical quagmire that would take Mossad codebreakers to unravel.

2. Cases where an elaborate statistical analysis ends up stating only a completely mundane point:
So, you’ve ‘proven’ that ‘students who have extended study abroad experience tend to be more familiar with English colloquialisms”. And you believe that this is something that can be seen as a result of your intricate statistical analysis. Hmmm.

3. Cases where that which is actually not mathematically complex is dolled up in sexy, saucy stats to make it seem intricate and deep:
OK. You asked 15 of your students if they felt that pre-activity vocabulary lists helped them more than deep-ending or post-activity highlighting. 10 said yes, 5 said no. There it is. t-values, standard deviations, chi-squares and the like mean virtually nothing here.

4. Cases where the numbers involved are so narrowly differentiated that they cease to mean anything:
So, the mean of sample A is .004572. The mean of sample B is .004578 and so on down through sample F, which has a mean of .004569. What the hell does all that mean? Is this a significant difference? I don’t know because when I look at numbers like these I might as well be looking at Thai or Arabic script.

And this is to say nothing (Ok- obviously I am saying something) of the fact that many stat-based research papers in EFL/ESL are based upon surveys and/or highly subjective calculations of nebulous phenomena.

With surveys, your output is only as good as your input (duh!). Are the questions valid and reliable or vague and loaded? Are sufficient options given? Are the questions comprehensive- in relation to the hypothesis being tested? Are the subjects taking the survey seriously? Is the sample sufficient and representative? Are they being guided into certain responses? Are there other affective factors which would invalidate their responses? I have seen many studies involving elaborate and detailed statistical analysis in which the surveys that provided the input were poorly designed and quite obviously contained faulty and biased assumptions. Some were clearly designed to prove the hypothesis the researchers already believed in NO MATTER WHAT!

However, the writers or editors seem have been dazzled by the glossy statistical display and have overlooked more tactile shortcomings. This can be doubly vexing when the writer (or subsequent citations) take the view that the sophisticated stats ‘proved’ their point- as though the stats actually created the phenomena. I should also add that some of the most dubious research papers that I’ve come across have displayed low Margins of Error. Well, yes, that's because statistical Margins of Error ironically often fail to notice blatant, ummm, errors.

By ‘highly subjective calculations of nebulous phenomena’ I am referring to those types of studies where emotional reactions to input are classified and then, artificially gauged or otherwise numerated. For example, you are comparing Japanese responses to FTA’s (face-threatening acts) to British. And you are doing this by observing the responses of each subject in a clinically designed FTA scenario. You have classified no reaction as a 0, a slight visible reaction as 1, and so on all the way up to major confrontation as 5. First, you should recognize are only measuring your own subjective skills of observation and interpretation and that representing this wholly subjective enterprise via a number does not lend it instant objectivity. And we might also ask if the 1-5 scale is an accurate measure of the alleged categorical differences? Or is this akin to some Pinball games giving you scores in increments of millions while another pinball game tends to calculate scores only in increments of thousands (thus leading the average 15 year old in 1975 to deem the former game to be the ‘better’ one).

The most obvious example that I can think of is Hofstede’s Cultural Dimensions (the link sends you to a site which offers a quick scan of the theory. The man’s own website is This research is supposedly to aid us in understanding cultural proclivities (although oddly Hofstede seems to associate one country with one culture). I urge readers to take a look at the categorizations and subsequent valuations made and ask if they cannot immediately intuit that there is something very wrong going on here- not the least of which was asking people in each of the target countries to characterize themselves (the old ‘emic perspective must be true’ motif).

Yet of course because these ‘conclusions’ are ultimately manifested in statistical form it has a greater ‘truth’ impact upon many readers. After all, the MAN HAS CRUNCHED NUMBERS! AND NUMBERS DON’T LIE!

Personally, I tend to very old school, meat and potatoes, in my approach to surveys and the like. I certainly try to avoid the hubris that my numbers ‘prove’ anything and at best might say that they ‘suggest’ or ‘bring into question’ X, which is often about as far as you can go in a lot of EFL/ESL research. Anyway (shameless self-plug warning), I recently published the results of a survey suggesting some reasons why those Japanese who are excellent at English managed to outdistanced their peers, on the online ETJ Journal. I think it’s easy to read and understand, if not definitive. Take a look.

Most of all it does NOT contain any stat porn.

« So, why teach English to medical students? | Main | Handy-Dandy Guide to Sloppy Cross-Cultural Research »


Hi Mike

I really enjoy your column here, and was very interested in the latest topic. I have almost no training in statistics (gave up maths happily at 16) and so feel intimidated by them sometimes.

I too often wonder what the point of all those numbers are in certain articles ;)

However, that aside, do you think it is necessary for EFL practitioners to become proficient in statistics in order to publish?

Thanks for the comment, Ben!

Obviously, I think stats are overrated in a lot of humanities research and are often used in lieu of sound argumentation. However, it would never hurt to have an academic knowledge of the field and in some corners (not EFL/ESL) of humanities research, it is essential. At the very least, it certainly seems to impress editorial boards, and if used concisely and carefully, can augment a paper. But augmentation is the key word here- it should never be used a sexy substitute for sloppy, or pithy, hypothesizing.


Depending on the area of your research you may be forced to brush up on stats. Over the past few years I've had 3 papers rejected by refereed journals for not having made use of sufficiently rigorous statistical analysis.

Perhaps it was a valid critique of my research or perhaps it was just an easy way for the reviewer to reject a paper sent to a journal overwhelmed by submissions.

I think Mike could make another blog post criticizing the use of stats just on the basis of n-sizes. Despite the incredibly complex nature of language acquisition I'm always amazed at the statistical gymnastics conducted on tiny sample sizes. For example, research on a single class of 30 students is supposed to mean something to the other one billion people learning the language?

I read your column this morning and then returned to my current book, The Black Swan, by Nassim Nicholas Taleb. Born in Lebanon, he went to Wharton Business school and became a "quant" on Wall Street, using statistics to foresee trends. The black swan is an exceedingly rare event that has high impact on any situation.

(Page 154 of Paperback edition)
"The most interesting text of how academic methods fare in the real world was run by Spyros Makridakis, who spent part of this career managing competitions between forecasters who practice a 'scientific method' called econometrics--an approach that combines economic theory with statistical measurements. Simply put, he made people forecast in real life and then he judged their accuracy. This led to the series of 'M-conmpetitions' he ran. [...] Makridakis and Hibon reached the sad conclusion that 'statistically sophisticated or complex methods do not necessarily provide more accurate forecasts than simpler ones.'"

I share the view that much of the statistics we see in the ELT literature is "stat porn."

Most statistical studies that we see in ELT fall into the broad category of "statistical inference." In other words, a researcher uses data obtained from a "sample" to "infer" the characteristics of an entire "population" of interest. Such tests are based on assumptions, and if the assumptions are not warranted the tests may be worthless.

The one grand assumption underlying all of these methods is that the researcher has defined a population of interest, asked a question about that population, drawn a "representative sample" (often involving a very complicated procedure) for the purposes of the study, and collected the data from that sample. In such a case, the calculated statistical results can be fairly generalized from the sample to the population, subject to the significance levels, confidence intervals or the like that are associated with the chosen method.

The typical ELT researcher is a classroom teacher, and the typical "sample" is the students in one of that teacher's classes. Of what population is this class a "representative sample"? All students in the department? All students in the university? All universities in Japan? All universities in the world? All humans in the world? Do we really have any reason to believe that this convenient sample is representative of anything other than itself? If we have no such reason, then the only population that we can make inferences on is a figment consisting of all students who are just like the students in this particular class. For any other type of student, the result is just qualitative if it is applicable at all, and the carefully calculated statistics mean nothing.

Since nobody really cares about such a mythical population, why bother with statistical inference at all? Just eyeball the data, calculate a few summary measurements, display a few tables and graphs, THINK about what it all means, and tell us what you think, clearly and simply. Unless you've done a rigorous sampling at the beginning, that's the best you can do, and for many practical (i.e., non-academic, non-journalistic) purposes, that's good enough.

Please note that I am not criticizing statistical methods themselves. These are mathematically sound, and when rigorously applied, they work. However, in the real world, we need to distinguish when we are getting rigorous results that mean something, and when we are making porn.

Recent Columns

Recent Comments




World Today