The role and value of word list research for ESP
Nation (2016) notes that an important part of any corpus-based study is to make sure that the corpus is as clean as possible. A common problem in this regard is the irregularity of hyphenated items in a corpus and its impact on compound forms. Examples of hyphenated forms can be seen in Table 3.4, such as onsite (and on-site) and weatherboards. If these items were split into their constituent parts, on and site and weather and boards, they would be counted in very different ways by the RANGE programme. On, site and boards are in the first 1,000 word families of Nation’s (BNC) (see Nation, 2006) and weather is in the second 1,000 word families of the BNC lists. However, by keeping the compound nouns together, the meaning of the items is clear. Nation (2016) calls these forms ’transparent compounds’. Nation’s (2006) BNC lists include a list of transparent compound nouns. The compound nouns in the Carpentry texts could be included in that existing list, and the RANGE Programme would identify all the words in the compound noun list which appear in the Carpentry corpus. Alternatively, another word list of compound nouns for Carpentry could be developed and kept separately for research purposes. A common frustration in working with texts is the slow and laborious process of finding and deciding on what to do with hyphenated forms and compounds. Nation (2016) has a chapter on hyphenated lexical words and transparent compounds, as well as suggestions on what to do with them based on Nation’s work with his BNC corpus and word list development.