Collocations in general academic written texts
Multi-word units and metaphor in ESP
Identifying multi-word units in context is an important step in finding out more about their frequency and specialised nature. Several studies on collocations in academic and discipline-specific contexts have appeared in the literature in recent years (for instance, Carter & McCarthy, 2006; Ackermann & Chen, 2013; Durrant, 2009; Liu, 2012). The focus of this research has primarily been on EAP. Large-scale corpora are a feature of this research, as can be seen in Ackermann and Chen’s Academic Collocation List (ACL). The ACL was based on the Pearson International Corpus of Academic English, which contains more than 25 million running words of journal articles and textbooks from 28 disciplines. The corpus was divided into four disciplines: Applied Sciences and professions, Humanities, Social Sciences and Natural/Formal sciences (Ackermann & Chen, 2013, p. 237). Initial analysis of the corpus was carried out by computer to identify collocations, followed by a refinement process involving quantitative and qualitative analysis, a review by experts and organisation of the collocations which remained to form the ACL, which contains 2,468 items.
It is important that this large list of multi-word units from Ackermann and Chen was categorised the collocation list using grammatical patterns (2013, p. 241). The largest group within this list is combinations of nouns, such as anecdotal evidence and target audience at 74.3%. The next biggest group is combinations of verb + noun/adj, such as undertake research and seem plausible at 13.8%. Verb + adv (e.g. explicitly state) and adv + adj (e.g. highly controversial) make up the remaining 6.9% and 5.0% of the list. The research used a common core approach rather than a discipline-specific approach, which is why the most frequent adj + noun and noun + noun combinations include academic writing, brief overview and causal link.
Another large-scale study of academic collocations is Durrant’s (2009) corpus-based analysis of five academic disciplines: Arts and Humanities, Engineering, Medicine and Health Sciences, Science, Social Sciences, Law and Education. In the 25-million-word corpus, Durrant initially identified 1,000 two-word collocations in his corpus. He compared the frequency of these 1,000 items across all five academic areas of his corpus and found that these collocations occurred between 30,000 and 35,000 times per million words in four corpora, but around 17,000 times in the Arts and Humanities corpus. Some of the principles employed by Durrant (2009) in selecting the collocations from the corpus included focusing on word forms rather than lemmas or word families, limiting his analysis to col-locations occurring in four-word spans, a keyword analysis comparing the collocations in the academic corpus with a non-academic corpus (in this case, 85 million words of the BNC) and a frequency criteria. It is also useful to look at the items which Durrant (2009) did not select, including two-word collocations containing proper nouns, abbreviations, acronyms, Latin terms or numbers. He also did not select items with higher frequencies in more marginal parts of the academic texts, such as references lists. This kind of information is vital for understanding how studies were carried out and might be replicated.
Of the 1,000 collocations, Durrant (2009, p. 163) notes that most are grammatical (see examples in Table 4.1) and he highlights patterns such as verb + that (for example, confirm that, hypothesise that) as useful for EAP learners. The top-20 items from Durrant’s study are in Table 4.1. They are arranged by their mean frequency per million words.
Table 4.1 clearly illustrates the grammatical patterns of these common academic collocations. The patterns also show that this study was based on written corpora, given the appearance of items such as as shown, and that the corpus is academic in nature, shown in examples such as these results, present study and our study. Table 4.1 also shows that high frequency words play a major role in the academic written texts. Durrant (2009, p. 165) notes,
The identification of such patterns remains methodologically problematic, though programs such as Concgram (Cheng et al., 2009) seem to offer an interesting way forward here. It should be borne in mind, however, that as collocations become longer their frequency will in general decrease, and their range of applications is likely to narrow (that is, they become more situationally specific). The existence of a useful cross-disciplinary set of two-word items is therefore a necessary, but not a sufficient, condition for the existence of a similarly useful cross-disciplinary set of longer collocations.
Methodological difficulties of analysing collocations in academic texts arose in a study by Coxhead and Byrd (2012). A primary problem was theoretical and involved unpacking how collocations are to be defined in the field. In this case, Coxhead and Byrd used the statistical measure of log likelihood in their methodology, following McEnery, Xiao and Tono (2006), but Byrd and Coxhead (2010) reported raw data where possible so that further research might be carried out using their data. Coxhead and Byrd (2012) analysed common collocations of Coxhead’s AWL (2000) in a corpus of 3.5 million running words, used for the AWL study. Using examples from the AWL, Coxhead and Byrd provide examples of a narrow analysis of noun collocations for create, suggesting concrete and abstract
Table 4.1 Top 20 key academic collocations and their mean frequencies from Durrant (2009, p. 166)
Table 4.2 Collocations to the left and right of analysis (Coxhead & Byrd, 2012, p. 1 1)
categories. Concrete nouns collocating with create include document, environment, database, record and field, and mostly come from Computer Science (2012) while abstract collocates include impression, difficulties, reasons, problems and rights. An analysis of analysis, as it were, showed that the collocations before and after analysis can differ (see Table 4.2), but some collocations operate both before and after the target word. For example, method can occur before analysis as well as after analysis, for example, methods of analysis and analysis of methods. Coxhead and Byrd (2012) noted that some AWL words tend to co-occur, such as analysis with assessment, data, evaluation and interpretation.