Masato Hagiwara

Personal Update (January 2024): I was diagnosed with late-stage lung cancer in March 2023. Since then, I have been undergoing treatment, and my overall health conditions are currently stable. However, it's important to note that stage IV lung cancer typically carries a poor prognosis. I am currently raising funds for my family, which includes my wonderful wife and two lovely daughters, as well as for my medical expenses.

I would greatly appreciate it if you could consider making a donation through this GoFundMe link. I am also providing regular updates on my CaringBridge page.

Thank you for your kindness and support during this challenging time.

At City of Hope, Cancer Treatment and Research Center
At City of Hope, Cancer Treatment and Research Center

I'm a Senior AI Researcher at Earth Species Project working on decoding non-human communication with AI/ML technologies. Author of the Manning book Real-World Natural Language Processing.

Formerly, I was a Machine Learning Engineer / Researcher at Duolingo. I love language and machine learning, and help people connect the two. I speak Chinese, Japanese, and English fluently, and am learning Korean and Lojban. I helped launch the Japanese, Korean, and Chinese courses on Duolingo. My research projects appeared on TechCrunch and Quartz.

You can find my resume here.


  • Feb. 2023: Two papers (AVES and BEANS) my colleagues at Earth Species Projects and I co-authored were accepted at ICASSP 2023!
  • Nov. 2021: I'm joining Earth Species Project as a Senior AI Researcher. I'm thrilled to work on decoding non-human communication with AI/ML technologies!
  • Aug. 2021: I'll be giving an invited talk on "Machine Learning for Language Learning" hosted by Waseda University. See the official announcement and the talk slides for more info.
  • Jul. 2021: I'm working on a book about Japanese NLP with Paul O'Leary McCann. We'll be covering everything from tokenization/morphological analysis up to recent neural methods and BERT. See the official website for more info.
  • Apr. 2021: I'm happy to announce GrammarTagger, a neural multilingual grammar profiler, and EXPATS, a toolkit for explainable automated text scoring!
  • Apr. 2020: I'm now working with Mirai Translate, a Japan-based startup offering human-level machine translation services, and ACTNext, ACT's research and development unit for educational research.
  • Dec. 2019: We launched GitHub Typo Corpus, a large-scale multilingual dataset of misspellings and grammatical errors. The paper was accepted to appear at LREC 2020.
  • Nov. 2019: I'm presenting our ultra fine-grained NER system at TAC KBP 2019, which ranked #2 among 9 strong competitors in the EDL track (joint work with Studio Ousia)!
  • Aug. 2019: Our paper on TEASPN: Framework and Protocol for Integrated Writing Assistance Environments, is accepted to appear at EMNLP 2019 (system demonstration)!
  • Jul. 2019: My book "Real-World Natural Language Processing" is available via MEAP, Manning Early Access Program. Feedback is welcome!


  • NLP — I'm the main researcher and developer of many ML/NLP open source projects and datasets, including:
    • AVES, a self-supervised, transformer-based audio representation model for encoding animal vocalizations ("BERT for animals").
    • BEANS, a collection of bioacoustics tasks and public datasets, specifically designed to measure the performance of machine learning algorithms in the field of bioacoustics.
    • TEASPN, a protocol and a framework for integrated writing environments
    • Rakuten MA, a morphological analyzer for Chinese and Japanese written entirely in JavaScript
    • NanigoNet, a language detector for code-mixed input supporting 150+19 human+programming languages
    • Github Typo Corpus, a large-scale multilingual dataset of misspellings and grammatical errors
    • Open Language Profiles, a platform for sharing open linguistic resources for language education

  • Duolingo - I built and worked on research for Duolingo, the most popular language learning app in the world, and Duolingo English Test, an affordable and accessible English certification test developed by Duolingo.

  • Music - In my free time, I create music and play jazz.


  • Feb. 2019 - Present: Owner & Independent NLP/ML Engineer and Researcher - Octanove Labs LLC (Seattle, WA)
    • Worked as a consultant for early-to-mid stage startups in the US/Japan on their ML strategies
    • Worked on QA and NER with Stduio Ousia (ranked #2 at TAC KBP 2019 fine-grained NER track)
    • Built educational research and open-source projects with RIKEN (TEASPN, NanigoNet, and Github Typo Corpus)
    • Built a free, Web-based AllenNLP course in collaboration with with Matt Gardner at Allen Institute for AI

  • Feb. 2015 - Feb. 2019: Senior Machine Learning Engineer / Researcher - Duolingo, Inc. (Pittsburgh, PA)
    • Built automatic grading technologies for Duolingo English Test using neural networks
    • Led data creation and analysis for various research projects, including user behavior analysis and second language acquisition modeling (SLAM) shared task
    • Led the content creation of Chinese, Japanese, and Korean from English courses

  • Oct. 2010 - Feb. 2015: Lead Scientist - Rakuten Institute of Technology (New York, NY)
    • Developed machine transliteration (NLP2011 paper award) and machine translation algorithms for the largest Japanese e-commerce website (Rakuten)
    • Built a Chinese/Japanese word segmentation / morphological analyzer (RakutenMA)
    • Developed a writing support system for English as a Second Language (ESL) learners

  • Apr. 2008 - Jul. 2008: Research Intern - Microsoft Research (Redmond, WA; Mentor: Hisami Suzuki)
    • Built a state-of-the-art method for Japanese query alteration for spelling correction and spelling/transliteration normalization
    • Implemented the system using Visual C#, SQL Server, and Ruby, with tens of gigabytes of query log, which was integrated into Microsoft Live Search
    • Published a research paper on the query alteration algorithm at NAACL 2009 and at the 3rd NLP Symposium for Young Researchers (Outstanding Presentation Award)

  • Aug. 2005 - Sep. 2005: Intern (Software Engineer), Google Inc. (Mountain View, CA; Mentors: Dekang Lin and Jun Wu)
    • Improved Japanese query suggestion, which is currently used as the basis for the query suggestion shown at the top and bottom of the Google search result
    • Ran knowledge extraction algorithms on the distributed computation infrastructure (MapReduce and the Google's large network clusters)


  • Apr. 2006 - Mar. 2009: Ph.D., Information Engineering,
    • Graduate School of Information Science, Nagoya University, Japan.
    • Doctoral Thesis: "Modeling and Selection of Context for Better Synonym Acquisition"

  • Apr. 2004 - Mar. 2006 : Master's Degree, Information Engineering,
    • Graduate School of Information Science, Nagoya University, Japan
    • Skipped a year in undergraduate due to the excellent academic performance. Overall GPA: 3.8
    • Master's Thesis: "Utilization of Probabilistic Latent Semantics for Automatic Thesaurus Construction"

  • Apr. 2001 - Mar. 2004 : Information Engineering Course, School of Engineering,
    • Nagoya University, Japan. Computer Science GPA: 3.9

Awards & Professional Activities

  • Invited talk on “Education and AllenNLP” at AllenNLP Summit, 2019.
  • Co-organizer of the Workshop for Natural Language Processing Open Source Software (NLP-OSS), co-located at ACL 2018.
  • Invited keynote at the Optimizing Human Learning workshop co-located with ITS 2018 (Montréal, Canada, June 2018).
  • Invited talk at CUNY NLP Seminar (hosted by Prof. Heng Ji) Title: Word Segmentation and Transliteration in Chinese and Japanese, April 2013. slides
  • 2011 Field Innovation Award from the Japanese Society for Artificial Intelligence: ANPI_NLP: Safety Information Confirmation Support using Natural Language Processing for The 2011 Tohoku Earthquake.
  • Paper Award at NLP2011 “Latent Class Transliteration based on Source Language Origins” (the largest Japanese NLP academic conference)
  • Best Paper Award at NLP2009 “Semantic Category Extraction from Unsegmented Text using Graph Kernels” (the largest Japanese NLP academic conference, chosen among 235 papers)
  • Paper Award at the 3rd NLP Symposium for Young Researchers. Presentation: “A Unified Approach to Japanese Query Alteration based on Semantic Similarity”



Journal Papers

  • Burr Settles, Geoffrey T. LaFlair, Masato Hagiwara: Machine Learning–Driven Language Assessment. Transactions of the Association for Computational Linguistics, Vol. 8, pp. 247–263, 2020.
  • Masato Hagiwara, Koji Murakami, Graham Neubig, Yuichiroh Matsubayashi: Robust NLP for Real-world Data : 7. ANPI_NLP - Mining Safety Information after Disasters Using Natural Language Processing-. Information Processing Society of Japan Magazine. Vol. 53, No. 3, pp. 241-248, 2012.
  • 萩原正人,小川泰弘,外山勝彦: グラフカーネルを用いた非分かち書き文からの漸次的語彙知識獲得, 人工知能学会誌, Vol.26, No.3, pp.440-450, 2011.
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Supervised Synonym Acquisition Using Distributional Features and Syntactic Patterns. Journal of Natural Language Processing, Vol. 16, Num. 2, pp. 59-83, 2009.
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. A Comparative Study on Effective Context Selection for Distributional Similarity. Journal of Natural Language Processing, Vol. 5, Num. 5, pp. 119-150, 2008.
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Effective Use of Indirect Dependency for Distributional Similarity. Journal of Natural Language Processing, Vol. 15, Num. 4, pp. 19-42, 2008.
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Bootstrapping-based Extraction of Dictionary Terms from Unsegmented Legal Text. New Frontiers in Artificial Intelligence: JSAI 2008 Conference and Workshops, Revised Selected papers, Lecture Notes in Computer Science, Vol. 5447, pp. 213-227, 2009.

Conference Papers (Selected)

  • Masato Hagiwara. AVES: Animal Vocalization Encoder based on Self-Supervision. ICASSP 2023 [paper].
  • Masato Hagiwara, Benjamin Hoffman, Jen-Yu Liu, Maddie Cusimano, Felix Effenberger, Katie Zacarian. BEANS: The Benchmark of Animal Sounds. ICASSP 2023 [paper].
  • Yoshinari Fujinuma, Masato Hagiwara. Semi-Supervised Joint Estimation of Word and Document Readability. TextGraphs-15, 2021 [paper].
  • Takumi Ito, Tatsuki Kuribayashi, Hayato Kobayashi, Ana Brassard, Masato Hagiwara, Jun Suzuki and Kentaro Inui. Diamonds in the Rough: Generating Fluent Sentences from Early-stage Drafts. ILNG 2019 [paper].
  • Masato Hagiwara, Takumi Ito, Tatsuki Kuribayashi, Jun Suzuki and Kentaro Inui. TEASPN: Framework and Protocol for Integrated Writing Assistance Environments. EMNLP (system demonstrations), 2019. [paper]
  • Burr Settles, Chris Brust, Erin Gustafson, Masato Hagiwara, Nitin Madnani. Second Language Acquisition Modeling. BEA 2018, 2018. [paper]
  • Ayah Zirikly, Masato Hagiwara. Cross-lingual Transfer of Named Entity Recognizers without Parallel Corpora. ACL 2015, pp. 390-396, 2015. [paper]
  • Masato Hagiwara, Satoshi Sekine. Lightweight Client-Side Chinese/Japanese Morphological Analyzer Based on Online Learning. COLING 2014 system demonstration, pp. 39-43, 2014. [paper]
  • Haibo Li, Masato Hagiwara, Qi Li, Heng Ji. Comparison of the Impact of Word Segmentation on Name Tagging for Chinese and Japanese, LREC 2014, pp.2532-2536, 2014. [paper]
  • Masato Hagiwara, Satoshi Sekine. Accurate Word Segmentation using Transliteration and Language Model Projection, ACL 2013, pp 183-189. [paper]
  • Masato Hagiwara, Soh Masuko. KooSHO: Japanese Text Input Environment based on Aerial Hand Writing. The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL/HLT 2013), demo session, pp. 24-27. 2013. [paper]
  • Yuta Hayashibe, Masato Hagiwara, Satoshi Sekine. phloat : Integrated Writing Environment for ESL learners, Second Workshop on Advances in Text Input Methods (WTIM 2012), pp.57-72, 2012. [paper] [slides]
  • Masato Hagiwara, Satoshi Sekine. Latent Semantic Transliteration using Dirichlet Mixture. NEWS 2012 (the 4th Named Entities Workshop), pp. 30-37, 2012. [paper]
  • Graham Neubig, Yuichiroh Matsubayashi, Masato Hagiwara, Koji Murakami. Safety Information Mining — What can NLP do in a disaster —, Proc. of IJCNLP 2011. [paper]
  • Masato Hagiwara and Satoshi Sekine. Latent Class Transliteration based on Source Language Origins. Proc. of ACL-HLT 2011, pp. 53-57, 2011. [paper]
  • Masato Hagiwara and Hisami Suzuki. Japanese Query Alteration Based on Lexical Semantic Similarity. Proc. of NAACL HLT 2009, pp. 191-199, 2009. [paper]
  • Nobuyuki Shimizu, Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama and Hiroshi Nakagawa. Metric learning for synonym acquisition. Proc. of COLING 2008, pp. 793-800, 2008. [paper]
  • Masato Hagiwara. A Supervised Learning Approach to Automatic Synonym Identification based on Distributional Features. Proc. of ACL 2008 Student Research Workshop, pp. 1-6, 2008. [paper] [link]
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Context Feature Selection for Distributional Similarity. Proc. of IJCNLP 2008, pp. 553-560, 2008. [paper] [link]
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Selection of Effective Contextual Information for Automatic Synonym Acquisition. Proc. of COLING/ACL 2006, pp. 353 - 360, 2006. [paper] [link]
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. PLSI Utilization for Automatic Thesaurus Construction. Proc. of IJCNLP 2005, pp. 334 - 345, 2005. [paper]



In English

In Japanese