Computer Science - Pre-university, undergraduate and postgraduate vocabulary

Vocabulary and English for Specific Purposes Research - Averil Coxhead 2018

Computer Science
Pre-university, undergraduate and postgraduate vocabulary

Lam (2001) researched the impact of new Computer Science vocabulary on first-year students at a Hong Kong university. With a group of 425 first-year Computer Science students, Lam (2001) used qualitative methods, including interviews, computer usage exercises and think-aloud techniques along with quantitative methods such as multiple-choice questionnaires to find out more about vocabulary in this subject area. Table 6.4 shows some examples of general, semi-technical and technical vocabulary in Computer Science according to Lam (2001).

Table 6.4 Examples of general, semi-technical and technical vocabulary in Computer Science from Lam (2001, p. 28)

Note that some examples in the second column in Table 6.4 might appear to be everyday vocabulary, such as parent and tree, but they also have a specific meaning in Computer Science. The learners in Lam’s (2001) study reported that while dictionaries were useful for developing an understanding of semi-technical vocabulary, the students used subject-specific glosses to help with their learning.

Table 6.5 Most frequent technical items and meanings in Computer Science from West’s second 1,000 words of the GSL (Radford, 2013, p. 32)

Radford (2013) used his many years of study and teaching in Computer Science to identify lexical items with a technical meaning in his teaching materials corpus. Table 6.5 shows examples of high frequency specialised vocabulary in Computer Science (occurring over 200 times in the corpus) and their meanings. These lexical items also occur in the second 1,000 words of West’s GSL (1953).

Both Lam and Radford contend that it is the semi-technical words in Computer Science which second language learners struggle most with during their university studies. The ’fully’ technical words, on the other hand, require effort from all learners, native speakers and non-native speakers of English, and these technical words include a range of content words including proper nouns, acronyms and abbreviations.

Specialised vocabulary in Medicine

A key feature of English for Medical Purposes is a large technical vocabulary, including a substantial proportion of lexical items from Latin and Greek (Ferguson, 2013). A range of studies have used corpora to find out more about the frequently used lexical items in Medicine, mostly with the aim of identifying vocabulary worth focusing on for second language learners studying Medicine. This research is important because Chung and Nation (2003) found that one word in three in an Anatomy textbook was technical. Such a high proportion of technical vocabulary suggests that learners in this field have a large learning task in terms of vocabulary.

Chen and Ge (2007) used a corpus of medical research articles to investigate the frequency and distribution of the families of the AWL (Coxhead, 2000). Chen and Ge (2007) found that the AWL covered 10% of their medical research article corpus, but only 51.1% of the AWL occurred with the frequency needed to be selected for a Medical word list. This coverage by the AWL is not surprising because it was not developed from a corpus of Medical texts; instead, the focus of that list was general academic purposes. Chen and Ge (2007) compared the AWL across sections of their research articles: abstract, introduction, materials and methods, results and discussion and find that the AWL covers just under 10% of the materials and methods and results sections, but higher than 10% over the other sections. This study shed light on how a general academic word list such as the AWL might be useful to some degree for students studying Medicine, but there is a much larger group of words outside the AWL than in the AWL, which are important for specialisation in this field.

A follow-up study on Medical vocabulary was carried out by Wang, Liang, and Ge (2008) who developed a Medical Academic Word List (MAWL). This word list was created using a 1.09-million-word corpus, again with research articles with the introduction, method, results and discussion structure. The corpus contained 32 subject areas, including Anesthesiology and Pain Medicine, Medicine and Dentistry, Cardiology and Cardiovascular Medicine, Nephrology and Clinical Neurology. Principles involved in making this list included specialised occurrence of lexis (outside the first 2,000 words, represented by West’s, 1953, GSL), range and frequency. Two subject experts were consulted to check the items in the word list. The resulting MAWL contains 623 word families and covers 12.24% of the source corpus. The MAWL and AWL have a 55% overlap, and the Wang et al. (2008) Medical list includes general academic lexical items such as previous and data, as well as specialised Medical lexis, such as vein and lesion.

A more recent study by Lei and Liu (2016) used a Medical textbook corpus as a reference corpus (3.5 million running words) and a source corpus of medical journal articles (2.7 million running words) as a comparison corpus in order to develop a lemma-based Medical word list: the Medical AVL. This project combined selection and checking procedures from Gardner and Davies (2014; Coxhead (2000). The selection process began with frequency of occurrence in the Medical corpus, and was followed by a comparison of frequency in the BNC corpus and Medical corpus. Other selection principles included the range of occurrence, dispersion across texts, a measure to determine whether items were discipline specific and a final dictionary check for high frequency words and specialised meanings. The resulting word list contains 819 lemmas. Lei and Liu (2016) checked the coverage of their specialised word list over three corpora, general, academic and medical, to check whether the Medical list was more specialised than general in nature. The list in Lei and Liu (2016) contains part of speech information for each item (for example, adult_a; adult_n) and lemmas are also notated by part of speech: alter_v, alteration_n, altered_a, alternatively_r (adverb).

A study which reflected the kind of reading that medical students in Taiwan are required to do, Hsu’s (2013) study involved a 15-million-word corpus of 155 Medical textbooks in 31 subject areas and Nation’s (2006) BNC lists. A key focus for the Hsu (2013 study is a gap between semi-technical and highly technical vocabulary in Medicine, particularly for medical students in the Taiwanese context. Because a large number of lexical items in the Medical corpus occurred outside the first 14,000 words of the BNC (the lists were available only up to 14,000 at the time of this study), Hsu (2013 p. 467) selected lexical items that occurred outside the first 3,000 words of the BNC lists (Nation, 2006), across more than half the 31 subject areas of her corpus and a frequency of over 800 occurrences. Some of the high frequency items in this word list are diagnosis, renal and syndrome.

An example of a medical study which focuses on one subject area is Fraser’s (2007, 2009) research into the vocabulary of Pharmacology, drawing on a 185,000 corpus of 51 research articles from six main areas of the field. The resulting Pharmacology Word List contained 601 word families which covered 12.91% of the corpus (examples include abnormality, mutation, plasma, saline, and toxicity). Fraser (2007) also developed lists of abbreviations and acronyms for Pharmacology. Grabowski (2015) also looked into pharmaceutical lexis using a corpus, concentrating on multiword units (see Chapter 4).

This section has focused on studies in specialised vocabulary in Medicine, using various approaches to extract lexis from corpora. This research can be particularly challenging for researchers who do not have medical training, especially when there are high frequency words which have a medical meaning that only in the medical context (Hsu, 2013). Examples of such specialised vocabulary include susceptible to colds, tear duct, anesthesia induction agents, dilate pupils and cerebrospinal fluid (Hsu, 2013, p. 467).