Validating word lists for ESP
The role and value of word list research for ESP
Validation of word lists is important because any word list is influenced by the corpus it was made from. Validation can be done by using a second corpus, preferably a mirror of the first corpus, and running the word list over that second corpus to check coverage and look for any differences. For example, Coxhead (2000) gathered two corpora for her AWL study. The first was the 3.5 million running word corpus of written academic texts which she used to develop the word list. The coverage of the AWL over that corpus was 10% overall. The second was a smaller corpus of academic written texts which was used to check the coverage of the AWL to see how the list performed over another corpus of similar texts. The coverage of the AWL over that corpus was 8.5%. The differences in coverage were attributed to the different sizes in the corpora and the predominance of Science texts in the second corpus. The coverage of the AWL over Science texts tends to be around 9%, depending on the level of technicality of the texts. Coxhead also developed a fiction corpus to check whether the AWL was more academic than general in nature. The coverage of the AWL over that corpus was 1.4%, which suggests that the AWL is more academic than general in nature (Miller & Biber, 2015).