Limitations of multi-word units in research
Multi-word units and metaphor in ESP
A key limitation in multi-word unit research is defining exactly what is being investigated, with many possible terms (Nation, 2016). As we have seen in this chapter, the possible range and combinations of lexical patterns is quite varied, from two-word combinations through to larger formulaic sequences. Some of these patterns can be incomplete and not very meaningful, as we can see in these examples: to do with the, or I think it was. Simpson-Vlach & Ellis, 2010, p. 493) pick up on this issue, highlighting the point that they are ’neither terribly functional nor pedagogically compelling’. Selection principles, such as whether incomplete patterns are included in analyses, can also vary as researchers consider important aspects of the design of their studies, such as the frequency and range of sequences, the text types and disciplines for analysis, the purpose of the research overall.
Long strings of words might not be continuous in texts (Paquot & Granger, 2012). For example, a lexical bundle such as the consequences of, is part of a highly frequent frame ’the something of something’. This means that a very frequent pattern, a/the something of something might contain a high frequency word and occur often, or a low frequency word and not occur very often. Frequency is important for language learners and teachers, and while some lexical bundles or collocations might have a strong statistical relationship, they might not be very frequent in texts. Byrd and Coxhead (2010, pp. 46—47) make this point by writing,
The scale used to report lexical bundles is typically in terms of the number of bundles per million words. For example, on the basis of… occurs 308 times in the 3.6 million words that make up the AWL corpus. That’s 106 times per million words, or 53 times per 500,000 words, or twice per 15,625 words. Studies of vocabulary acquisition report that learners need many encounters with a word or phrase before it becomes part of their lexicon (Nation, 2008). Few learners will read a million words in an EAP class. Most will read fewer than the 15,000 words needed to encounter on the basis of even twice.
While the frame of ’the XXX of XXX’ (for example, the basis of research) might be frequent in academic texts, actual strings such as on the basis of may not occur very often at all. Furthermore, deciding on the unit of counting can be problematic. For example, in the results of an analysis of a set of academic readings on Midwifery, the target word labour co-occurs with both stage and stages, as in stage of labour/stages of labour. In these cases, which words should be included in the multi-word units? Should the plural form and the singular form both be included?
A clear limitation of the research so far is its main focus on EAP, with few examples of research into specialised disciplines and professional corpora. Few studies as well use anything more than quantitative analyses of corpora to explore the use of multi-word units and metaphor in writing and speaking. A further limitation is that much research has focused on written texts, rather than written and spoken texts. An exception is the work by Biber (2006) and colleagues on the T2KSWAL corpus, which includes a spoken corpus of over 1.6 million running words (see Chapter 6 for more on this research).
Another limitation is the lack of information on the context of bundles found in corpora (Byrd & Coxhead, 2010). An example of the kind of contextual information on lexical bundles which learners and teachers might find useful came from an analysis of on the basis of in concordance lines using Coxhead’s AWL written academic corpus. Three patterns arose in the data, as Figure 4.2 illustrates.
Figure 4.2 Three patterns of use for on the basis of (adapted from Byrd & Coxhead, 2010, p. 53—54)
This kind of analysis presents issues on how to bring this kind of data into classrooms and into programmes of learning, as well as how effective any teaching approach might be that includes it. A final limitation of this area of research is the lack of replication studies in the literature to confirm findings, explore any differences or similarities, and built certainty in the field in terms of methodological approaches and generalisability.