The Indexer: The International Journal of Indexing

Corpus linguistics for indexing

The Indexer: The International Journal of Indexing (2019), 37, (2), 105–124.


This methodological paper demonstrates how methods from corpus linguistics – a collection of computer-assisted approaches to the analysis of large volumes of text – can be used in the creation of indexes. We begin this article by introducing corpus linguistics, including its main principles and advantages, before demonstrating how corpus methods can be used by indexers, providing a case study in which we create an index for an academic journal article using the established corpus techniques of frequency, keywords, collocation and concordance. This case study shows how when combined with human input and intuition, corpus linguistics methods can provide indexers with new perspectives on the texts they are working on, all the while increasing the systematicity, replicability and objectivity of the indexing process itself.

Access Token


Adolphs, S. and Carter, R. (2013) Spoken corpus linguistics: from monomodal to multimodal. London and New York: Routledge. Google Scholar

Anthony, L. (2019) AntConc (Version 3.5.8). Tokyo: Waseda University. Google Scholar

Aston, G. and Burnard, L. (1998) The BNC handbook: exploring the British National Corpus with SARA. Edinburgh: Edinburgh University Press. Google Scholar

Baker, P. (2006) Using corpora in discourse analysis. London: Continuum. Google Scholar

Baker, P. (2009) ‘The BE06 corpus of British English and recent language change’, International Journal of Corpus Linguistics 14(3), 312–37. Google Scholar

Baker, P., Gabrielatos, C. and McEnery, T. (2013) Discourse analysis and media attitudes: the representation of Islam in the British press. Cambridge: Cambridge University Press. Google Scholar

Biber, D., Conrad, S. and Cortes, V. (2004) ‘If you look at …: lexical bundles in university teaching and textbooks’, Applied Linguistics 25(3), 371–405. Google Scholar

Biber, D. and Reppen, R. (2015) The Cambridge handbook of English corpus linguistics. Cambridge: Cambridge University Press. Google Scholar

Booth, P. F. (2001) Indexing: the manual of good practice. Munich: De Gruyter. Google Scholar

Boulton, A. (2017) ‘Corpora in language teaching and learning’, Language Teaching 50(4), 483–506. Google Scholar

Brezina, V., McEnery, T. and Wattam, S. (2015) ‘Collocations in context: a new perspective on collocation networks’, International Journal of Corpus Linguistics 20(2), 139–73. Google Scholar

Crawford, P. and Brown, B. (2010) ‘Health communication: corpus linguistics, data driven learning and education for health professionals’, International English for Specific Purposes Journal 2(1), 1–25. Google Scholar

Day, R. E. (2014) Indexing it all: the subject in the age of documentation, information, and data. Cambridge, MA: MIT Press. Google Scholar

Dunning, T. (1993) ‘Accurate methods for the statistics of surprise and coincidence’, Computational Linguistics 19(1), 61–74. Google Scholar

Fetters, L. K. (2014) Handbook of indexing techniques: a guide for beginning indexers, 5th edn. Medford, New Jersey: Information Today Inc. Google Scholar

Firth, J. R. (1957) Papers in linguistics 1934–1951. Oxford: Oxford University Press. Google Scholar

Gablasova, D., Brezina, V. and McEnery, T. (2017) ‘Collocations in corpus-based language learning research: identifying, comparing, and interpreting the evidence’, Language Learning 67(S1), 155–79. Google Scholar

Gabrielatos, C. (2018) ‘Keyness analysis: nature, metrics and techniques’, in C. Taylor and A. Marchi (eds), Corpus approaches to discourse: a critical review. London and New York: Routledge, pp. 225–58. Google Scholar

Garside, R., Leech, G. and McEnery, A. (eds) (1997) Corpus annotation. London: Longman. Google Scholar

Hanks, P. (2012) ‘The corpus revolution in lexicography’, International Journal of Lexicography 25(4), 398–436. Google Scholar

Hardie, A. (2012) ‘CQPweb: combining power, flexibility and usability in a corpus analysis tool’, International Journal of Corpus Linguistics 17(3), 380–409. Google Scholar

Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P. and Suchomel, V. (2014) ‘The Sketch Engine: ten years on’, Lexicography 1, 7–36. Google Scholar

Kilgarriff, A., Husák, M., McAdam, K., Rundell, M. and Rychlý, P. (2008) ‘GDEX: automatically finding good dictionary examples in a corpus’, in E. Bernal and J. DeCesaris (eds), Proceedings of the XIII EURALEX international congress. Barcelona: Universitat Pompeu Fabra, pp. 425–33. Google Scholar

Leech, G. (1991) ‘The state of the art in corpus linguistics’, in K. Aijmer and B. Altenberg (eds), English corpus linguistics: studies in honour of Jan Svartvik. London: Longman, pp. 8–29. Google Scholar

Leech G. (2000) ‘Grammars of spoken English: new outcomes of corpus-oriented research’, Language Learning 50(4), 675–724. Google Scholar

Leech, G., Hundt, M., Mair, C. and Smith, N. (2009) Change in contemporary English: a grammatical study. Cambridge: Cambridge University Press. Google Scholar

Love, R., Dembry, C., Hardie, A., Brezina, V., and McEnery, T. (2017) ‘The SPOKEN BNC2014: designing and building a spoken corpus of everyday conversations’, International Journal of Corpus Linguistics 22(3), 319–44. Google Scholar

McEnery, T. and Hardie, A. (2012) Corpus linguistics: method, theory and practice. Cambridge: Cambridge University Press. Google Scholar

McEnery, T. and Wilson, A. (2001) Corpus linguistics: an introduction, 2nd edn. Edinburgh: Edinburgh University Press. Google Scholar

McEnery, T., Xiao, R. and Tono, Y. (2006) Corpus-based language studies: an advanced resource book. London and New York: Routledge. Google Scholar

O’Keeffe, A. and McCarthy, M. (eds) (2010) The Routledge handbook of corpus linguistics. London and New York: Routledge. Google Scholar

Rayson, P. (2008) ‘From key words to key semantic domains’, International Journal of Corpus Linguistics 13(4), 519–49. Google Scholar

Scott, M. (2016) WordSmith Tools version 7. Stroud: Lexical Analysis Software. Google Scholar

Scott, M. and Tribble, C. (2006) Textual patterns: key words and corpus analysis in language education. Amsterdam: John Benjamins. Google Scholar

Smith, H. (1958) ‘Editorial’, The Indexer 1(1), 1–2. Google Scholar

Wellisch, H. H. (1988) ‘Indexing and abstracting: a current-awareness bibliography’, The Indexer 16(2), 107–10. Google Scholar

Wright, D. and Brookes, G. (2019) ‘“This is England, speak English!”: a corpus-assisted critical study of language ideologies in the right-leaning British press’, Critical Discourse Studies 16(1), 56–83. Google Scholar

Zgusta, L. (1967) ‘Multiword lexical units’, Word 23(1–3), 578–87. Google Scholar

If you have private access to this content, please log in with your username and password here


Author details

Brookes, Gavin

McEnery, Tony