Optimizing Human Language Learning

This is the companion website for Masato Hagiwara's keynote talk "Optimizing Human Language Learning"

At Optimizing Human Learning Workshop colocated with the ITS2018 conference

June 12, 2018, Montréal, Canada

(This companion website was inspired by a blog post by Prof. Charles Suttton)

TL;CE (too long; checked email)

Language learning and humankind
- I tried to learn Esperanto from textbooks 15 years ago
- The Esperanto community is now growing and thriving thanks to online technologies
- We as humankind are getting better at learning languages, mainly thanks to technologies
- There are 1.2 billion people worldwide who are learning a second language. Optimizing human language learning has a truly huge impact on those people's lives
- The most popular language learned on Duolingo in Sweden is Swedish. Who are the learners? Refugees. They are learning Swedish from Arabic
Optimizing human
- Your total learning = time x learning efficacy. This "time" factor is often overlooked
- We wanted to find out what separates those who stay on our app and those who quit
- There are a few statistically significant factors for predicting user retention. Frequency, number, and consistency of lessons stood out
- We ran user clustering to find out different user behavioral patterns based on time of day, and found clear behavioral clusters (e.g., 9-to-5, prime time, weekend, etc.)
- The "before bed" cluster was the best in terms of learning performance. This result is consistent with many research studies (e.g., Mezza et al. 2016) on the effect of sleep
Optimizing language
- There are very few educational datasets that are 1) about language learning, 2) multilingual, and 3) longitudinal
- We released the SLAM (second language acquisition model) dataset, which provides Duolingo users' long-term language learning traces and token-level error attribution
- We ran the 2018 Duolingo SLAM shared task. 15 teams participated
- For this particular dataset, the choice of algorithms (e.g., tree ensembles, RNNs, multitask approach) appears to be more important than feature engineering
Optimizing learning
- The two sigma problem - tutored students perform two standard deviations (sigmas) better than the ones in a traditional classroom setting
- "Progress bar" keeps track of your mastery of the concepts taught in a lesson
- The easiest way is to approximate this is to use the population learning curve
- The population curve is misleading - Streeter (2015) proposed a framework for modeling a mixture of arbitrary learning curves
- "Strength bar" keeps track of how much you still remember the concept taught in a lesson
- The older version of Duolingo was using the Leitner system, which is a simple system to estimate the halflife based on the numbers of corrects and incorrects
- Halflife regression (Settles 2015) estimates the halflife based on an arbitrary set of features, leading to a 1.7% increase in next-day retention

Slides

Slides as PDF

Other Resources

Duolingo user stories
- "Online technologies brought new life to Esperanto, an invented language" The invented language that found a second life online
- The Colombian security guard, who self taught English and became an English teacher: El vigilante que aprendió a hablar cinco idiomas a través de Duolingo (Spanish)
- The Irish-American woman, who was diagnosed with viral meningitis and improved memory loss by learning Irish: 'I needed something to centre myself around': US woman learned Irish to improve memory loss
- Swedish is the most popular language learned on Duolingo in Sweden. Here's why: Which countries study which languages, and what can we learn from it?
Optimizing human
Optimizing language
- Data for the 2018 Duolingo Shared Task on Second Language Acquisition Modeling (SLAM)
- 2018 Duolingo Shared Task on Second Language Acquisition Modeling (SLAM)
Optimizing learning
- The 2 Sigma Problem: The Search for Methods of Group Instruction as Effective as One-to-One Tutoring
- (Streeter 2015) Mixture Modeling of Individual Learning Curves
- (Settles and Meeder 2015) A Trainable Spaced Repetition Model for Language Learning

Collaborators

Burr Settles, Duolingo

And we're hiring!

Duolingo is hiring research scientist, language assessment expert, and machine learning engineering positions, among others.

Visit our jobs page if you are interested!