Getting serious about reading research & quantitative linguistics

The fields of reading research and quantitative linguistics have a lot of interesting results, ideas and framing which might be useful for my own research however I haven't previously done much thorough reading. This is partially to do with previous attempts at reading the SLA literature and coming across a lot of things that are somewhat interesting but not particularly useful for advancing my own agenda.

I will also expand my reading of general language learning, language acquisition, SLA & technology assisted language learning research — but I will be more selective in this area.

When doing background reading, it makes sense to define some high level aims to direct things. Here are my preferences based upon some themes I am already aware of:

  • -1: Grand theories & schools of language acquisition e.g. Krashen and his ilk
  • 0: Motivation; Eye-tracking
  • +1: Quantitative studies regarding form/learning; Studies involving Finnish
  • +2: Studies involving proposed metrics of word importance, text difficulty & lexical diversity; Human perception of/verses aforementioned metrics

I'll read towards more recent publications, only going historical for specifically recommended or foundational stuff.

Starting points

The major reading research journals are:

  • Journal of Research in Reading
  • Reading Research Quarterly

Some big more general language learning journals (which may also include articles on research research) are:

  • Language Learning
  • Language Learning and Technology
  • Computer Assisted Language Learning

Some specific reading I've received recommendations for or otherwise want to look into are:

  • Jarvis, Scott. 2013a. Capturing the Diversity in Lexical Diversity. Language Learning 63(s1). 87–106.
  • Jarvis, Scott. 2013b. Defining and measuring lexical diversity. In Scott Jarvis & Michael Daller (eds.), Vocabulary Knowledge: Human ratings and automated measures (Studies in Bilingualism), 13–44. Amsterdam: John Benjamins.
  • Jarvis, Scott. 2017. Grounding lexical diversity in human judgments. Language Testing 34(4). 537–553.
  • Milton, James. 2009. Measuring Second Language Vocabulary Acquisition. Bristol: Multilingual Matters.
  • Recent publications by Ilmari Ivaska

Summaries of Lexical Diversity stuff by Scott Jarvis

Capturing the Diversity in Lexical Diversity. Jarvis, S. 2013 provides a discursive high level overview of measures of lexical diversity before drawing comparisons with diversity measures in ecology and presenting an experiment comparing human judgements to existing measures.

Finally, he goes on to suggest considering multiple dimensions: size (number of tokens); richness (number of types); effective number of types (=2^(information entropy of distribution formed by tokens grouped by type in bits)); evenness; disparity (how semantically related words in a text are); importance (e.g. IDF like measures against a background corpus); dispersion (gap between tokens of the same type)

Defining and measuring lexical diversity. Jarvis, S. 2013 This is quite similar to the previous article, although it breaks diversity down in a slightly different way. It's a in book Vocabulary Knowledge, and another article caught my eye.

Modelling L2 vocabulary learning, Roderick Edwards and Laura Collins This article starts by modelling probability a word of rank i is learnt after x total word exposures, based on the assumption that a word is learnt after a fixed (here 3) number of exposures. It then proposes a hypothesis that word production is also Zipfian, but relative to the rank within a learner's vocabulary/word inventory rather than within the frequency ranks from a background corpus (BNC). Their data is not sufficient to invalidate the alternative hypothesis. The book might be worth returning to though.

Grounding lexical diversity in human judgments. Jarvis, S. 2017 This article takes human ratings of lexical diversity and examines the degree to which they correlate with well various indices of it.

At this point, it's worth mentioning that chasing down the actual definitions of some of these indices is a bit challenging. Two nice tools are:

  • quanteda which is a quite polished R package
  • The lexical-diversity Python package, which is probably the easiest way to find out how all the indices are defined.

Next time

More summaries!