Keyword analysis - Approaches to identifying specialised vocabulary for ESP

Vocabulary and English for Specific Purposes Research - Averil Coxhead 2018

Keyword analysis
Approaches to identifying specialised vocabulary for ESP

Like corpus comparison, keywords are determined by looking at their frequency in several corpora and using a statistical formula for comparing them with a norm. One way to do this is to compare the frequency of words in one specialised field against a general corpus. The concept of keyness is linked to the probability of a word occurring in a text. If a word has a high level of keyness, the occurrence is probably not by chance. The Lancaster University Corpus Linguistics website has a good example of the concept of keyness (go to using an analysis of Baptist Church newsletters and a general corpus.

Paquot’s (2010) Academic Keyword List (AKL) was developed using keyness, range and distribution of vocabulary in two academic written corpora (professional writing and student writing by native speakers of English). The study included single and multi-word items, and incorporated high frequency lexis. Examples from the AKL include same, second, which, scope, requirement, leading, late, according, according to and relation to. The full AKL is available at Gilquin, Granger and Paquot (2007) point out the potential of learner corpora for comparative studies between writers in English with different first languages, for example, at different levels of proficiency, and with first language corpora (see Flowerdew, 2014 for more examples of studies using keyword analysis in ESP).

A second way to use keyword analysis in specialised texts is exemplified in a study by Grabowski (2015), who wanted to find keywords in a corpus of pharmaceutical English which had four kinds of texts from the field: patient information leaflets, summaries of product characteristics, clinical trial procedures and chapters from academic textbooks. He did not use a general corpus, being instead interested in the keyness of lexical items in the four different kinds of texts in pharmacology — in other words, how each of these different text types is different from or the same as the others. To help with the comparison, Grabowski (2015) decided on a minimum number of occurrences for a word and used the statistical measure of log likelihood — a measure for comparing word frequency in corpora and whether lexical items occur more often in one section of a corpus than another (see McEnery, Xiao & Tono, 2006) to determine the probability of whether an occurrence of a word is by chance. Through this analysis, Grabowski was able to rank the four kinds of texts according to the number of keywords in them. The academic textbooks contained the highest number of keywords and the information leaflets for patients contained the least. Grabowski (2015) then provides examples of keywords linked to the communicative purpose of the texts in the corpus. Kwary (2011) points out that keyword analyses do not take multi-word units into account, stating that this is a drawback of such studies.