Columns on View All Columns
Visit ELTBOOKS - all Western ELT Books with 20% discount (Japan only)

The Uni-Files

A candid look at EFL life and lessons from a university teacher's perspective.

November 28, 2010

Corpses of corpora

Collecting corpora. How do they do it?

I visited a hospital in Singapore recently in order to conduct some on-site research as to what nurses actually say and do on the job- or an analysis of specialist discourse (to make it sound more pretentious). This was my first attempt at doing anything even remotely related to corpus development or confirmation and I’m hoping a few things that I learned might ultimately be of benefit to ESPers, those who teach English to nurses or other health care workers, or anyone carrying out English corpus or specialist discourse research.

First things first- recording these types of language domains is pretty much impossible. Even if you could closely mic one nurse it would come out as either nonsense or as impenetrable jargon. The resulting script would look something like this:

Nurse A: No. That one. (Unintelligible). Can. You need it now? 36- Dialysis. Yes.
Nurse B: (unintelligible)

It's a mess. It looks like the aftermath of a language accident.

There are also huge privacy and liability issues- doubly so at a hospital- and administrators are understandably hesitant to allow this type of intrusion. Triply so when someone is trying to do an important (and busy) job. You can feel like a real prat following people around, holding up mics and jotting down notes, ear cocked into the conversation from behind like a cub reporter trying to get his precious scoop. But I had no choice, so me and my trusty notebook trekked around several wards, attached to ‘my’ nurse.

But interestingly even this incomplete record of the workplace reveals a lot. One can immediately see the truncated speech forms, how herky-jerkily dynamic the interactions are, how ellipsis becomes an integral feature of speech, how specialist jargon is regularly and widely used as stand alone transactional content (lists of data figure heavily in medical discourse), and even how local varieties (the stand alone ‘can’ being very Singaporean) enter the fray.

This is, as you might realize, quite different from the types of dialogues one tends to find in English textbooks. Although textbooks are certainly utilizing corpora based discourse more than they used to there still exists the overriding tendency to represent speech in full sentences, invariably complete, orderly and ‘correct’. While this may have pedagogical benefits for the learner, the question as to whether this is accurately descriptive or not is of course another matter:

Idealized version: “Good morning Mr. Chen. I’ve brought you your breakfast”
Actual version: “Breakfast!”

It is also very interesting to note discourse framing features such as: the percentage of the time nurses communicate with patients vs. other nurses vs. other health workers (two-cent answer: nurse-patient communication ranks well down the list percentage-wise). Why and when code-switching (both in terms of register and English variety) occurs also makes for interesting analysis. How are structured speech events, such as roll calls and handovers, organized to maximize communication? Catching all this takes some serious effort.

Another factor also comes into play- ‘incorrect’ English. One has to try and determine, was a language ‘violation’ a result of the vagaries of spoken vs. written English, with the two following different sets of governing rules? Or was it just heat-of-the-moment sloppiness? After all, we all speak non-prescriptively at times; we get tongue-tied, are less than eloquent, and change logical courses halfway through our speech. Or was it the local English variety? Or perhaps again the person was not really a native-speaker of English.

A Japanese colleague is visiting the U.S. to research for a similar purpose and I myself will be visiting some other locales to get a better sense of other varieties of nursing English. Ultimately, we hope to piece together a rounded and accurate picture of what should be prioritized, or even included at all, in ESP teaching materials such as Nursing. But, man, this is much harder than I thought.

Comments on any readers’ experiences of researching, developing or using corpora-based ESP materials are welcome.

« 5 Reasons to take English off the Center Shiken | Main | Double Dipping- Divergence Duty or Dubious Duplication? »


Most learner corpora I've seen tend to be collected via slightly more artificial means in far more controlled situations. For example, a teacher interviews a learner and records it. Or a learner is asked to perform some kind of task while being recorded.

Have you looked at needs analysis research? I can't recall any details but I remember Michael Long describing during a talk a pretty indepth needs analysis of flight attendant English using similar strategies that you are using.

Hi Treb.

You nailed with the controlled setting-artificial means comment. The problem that exists in such situations is that it invokes a type of observer's paradox . People tailor their commentary or behaviour to the expectations or wishes of the observer or alter it in some way which makes it unnatural and therefore not truly descriptive.

While we haven't conducted a formal needs analysis, we have, through interviews and research on existing teaching materials, tried to discover what holes are missing in J-based medical English and what should be prioritized for acquisition (for in-service medical professionals) based upon that. We've also looked heavily at existing corpora close to our topic- notably that of Yvonne Ford at the U of Michigan.

Widdowson makes an interesting comment about certain facets of dynamic, interactive speech not being conducive to strictly quantitative corpus research. In fact, he had a lot to say that was critical as to how corpora are applied.

Recent Columns

Recent Comments




World Today