Frequency and range requirements
The role and value of word list research for ESP
It is useful in corpus studies to maintain similar numbers of running words in different sub-corpora when there are several components, such as spoken and written texts. Texts of different lengths yield different frequency and range results, and therefore affect studies into the vocabulary of ESP. This is because low frequency items have more opportunity to occur in longer texts than they do in shorter texts. Range means comparing across several specialised fields or types of texts within one specialised field.
Dispersion is a measure which takes into account the evenness of the distribution of words across sections of a corpus which are equal in size (see Leech, Rayson & Wilson, 2001). For example, in the Coxhead and Hirsh (2007) Science Word List study, dispersion was used across all 14 subject areas. Dispersion can also be used across texts in a corpus. It is an important measure because it helps avoid possible bias for selecting items for a word list which might have occurred in only one text in a sub-corpus but with sufficient frequency to make it a candidate for selection. Biber, Reppen, Schnur and Ghanem (2016) point out problems when using large corpora and dispersion measures.