Vocabulary and English for Specific Purposes Research - Averil Coxhead 2018
Middle School Vocabulary Lists (Greene, 2008)
Specialised vocabulary in secondary school/Middle School
Greene’s (2008; see also Greene & Coxhead, 2015) research into academic vocabulary for Middle School students was a response to Coxhead’s (2000) earlier work on the AWL, but targeted the needs of learners in Middle School in the USA. Greene (2008) gathered a corpus of 109 textbooks used in Grades 6—8 in Middle Schools, in the following subjects: English grammar and writing, Health, Mathematics, Science and Social Sciences and History. The corpus is roughly even between the three grades, but not quite so even across the subjects, with Mathematics, Social Sciences and History and Science containing more textbooks and Health containing fewer. The total corpus size is over 18 million running words, with a fairly even spread of running words across the grades. Grade 8 contains the most running words (nearly 6.7 million) and Grade 6 contains the least (5.9 million). This large-scale corpus allowed Greene to find out the coverage of existing word lists such as West’s GSL (1953) and Coxhead’s (2000) AWL over the textbook corpora. This first step established whether there would be candidates for selection outside the GSL and AWL for new word list. These word lists also provide coverage figures over a text. That is, they show what percentage of a text is ’covered’ by a word list. The GSL covers nearly 80% and the AWL covers nearly 5.4% of the Middle School texts. These figures shows that these texts are less difficult than university-level texts, with their higher coverage of general English (the GSL) and lower coverage of the AWL (academic English) than in Coxhead’s (2000) study of university-level texts. The lists do not include proper nouns, abbreviations or compound nouns.
The next step in Greene’s (2008) research was to identify candidates for inclusion in a Middle School Vocabulary List for each of the subject areas. Greene’s study is important for several reasons. Firstly, it focused on the actual texts which students are required to read. Secondly, it considered the lexical needs of students in different subject areas. And, thirdly, Greene made principled decisions about the size and balance of the corpus, and the selecting of items for the word lists. Greene selected items outside the first 2,000 of West’s GSL from the corpus. Out of the remaining words, she used frequency criteria to select items from the AWL which met the frequency and range cut offs across the subjects in her corpus. She then considered items which were not in the AWL which met the frequency and range cut offs. Finally, she selected items which met a discipline-specific frequency cut off point, using each of the subject corpora. These selection principles mean that the Middle School lists can have some overlap, since some lexical items would occur in all subject areas. An example of such a word is chapter (Greene & Coxhead, 2015), which is unsurprising because the corpus is made up of textbooks with chapters. The selection principles also mean that the subject or discipline is taken into account, which means the Health list, for example, contains items such as drug, muscle and infect.
The Middle School lists are discussed next in each of the areas on English Literature, Mathematics, Science and Social Sciences, but as an overview here, it is important to note that each list roughly contains between 600 to 800 types. Greene (2008; Greene & Coxhead, 2015) used word families only when the actual word family members occurred in the textbook corpus, unlike Coxhead (2000) who used Bauer and Nation’s (1993) word families as a guide for the AWL (see Chapter 3 on word lists). The coverage of these lists over the textbook corpus is quite impressive, ranging from 10.17% in Science, down to 5.83% in Social Sciences and History. Greene then set up a parallel corpus (nearly nine million running words) to validate her first study and found similar coverage results of the Middle School lists over the parallel corpus. A third corpus of Middle School fiction texts was used to establish whether the Middle School lists contained academic vocabulary, rather than general-purpose vocabulary (following Coxhead’s methodology for validity). The results ranged from 1.73% coverage over the fiction corpus in Mathematics through to 2.89% over the fiction corpus by the English grammar and writing Middle School lists. Let’s now turn to case studies of secondary school subjects and specialised vocabulary, beginning with English Literature.