Personal Update (January 2024): I was diagnosed with late-stage lung cancer in March 2023. Since then, I have been undergoing treatment, and my overall health conditions are currently stable. However, it's important to note that stage IV lung cancer typically carries a poor prognosis. I am currently raising funds for my family, which includes my wonderful wife and two lovely daughters, as well as for my medical expenses.
I would greatly appreciate it if you could consider making a donation through this GoFundMe link. I am also providing regular updates on my CaringBridge page.
Thank you for your kindness and support during this challenging time.
Formerly, I was a Machine Learning Engineer / Researcher at Duolingo. I love language and machine learning, and help people connect the two. I speak Chinese, Japanese, and English fluently, and am learning Korean and Lojban. I helped launch the Japanese, Korean, and Chinese courses on Duolingo. My research projects appeared on TechCrunch and Quartz.
Feb. 2023: Two papers (AVES and BEANS) my colleagues at Earth Species Projects and I co-authored were accepted at ICASSP 2023!
Nov. 2021: I'm joining Earth Species Project as a Senior AI Researcher. I'm thrilled to work on decoding non-human communication with AI/ML technologies!
Aug. 2021: I'll be giving an invited talk on "Machine Learning for Language Learning" hosted by Waseda University. See the official announcement and the talk slides for more info.
Jul. 2021: I'm working on a book about Japanese NLP with Paul O'Leary McCann. We'll be covering everything from tokenization/morphological analysis up to recent neural methods and BERT. See the official website for more info.
Apr. 2021: I'm happy to announce GrammarTagger, a neural multilingual grammar profiler, and EXPATS, a toolkit for explainable automated text scoring!
Apr. 2020: I'm now working with Mirai Translate, a Japan-based startup offering human-level machine translation services, and ACTNext, ACT's research and development unit for educational research.
Dec. 2019: We launched GitHub Typo Corpus, a large-scale multilingual dataset of misspellings and grammatical errors. The paper was accepted to appear at LREC 2020.
Aug. 2019: Our paper on TEASPN: Framework and Protocol for Integrated Writing Assistance Environments, is accepted to appear at EMNLP 2019 (system demonstration)!
NLP — I'm the main researcher and developer of many ML/NLP open source projects and datasets, including:
AVES, a self-supervised, transformer-based audio representation model for encoding animal vocalizations ("BERT for animals").
BEANS, a collection of bioacoustics tasks and public datasets, specifically designed to measure the performance of machine learning algorithms in the field of bioacoustics.
TEASPN, a protocol and a framework for integrated writing environments
Rakuten MA, a morphological analyzer for Chinese and Japanese written entirely in JavaScript
NanigoNet, a language detector for code-mixed input supporting 150+19 human+programming languages
Github Typo Corpus, a large-scale multilingual dataset of misspellings and grammatical errors
Open Language Profiles, a platform for sharing open linguistic resources for language education
Education — I love teaching NLP to the world. Books/courses I wrote include:
Duolingo - I built and worked on research for Duolingo, the most popular language learning app in the world, and Duolingo English Test, an affordable and accessible English certification test developed by Duolingo.
Music - In my free time, I create music and play jazz.
Built automatic grading technologies for Duolingo English Test using neural networks
Led data creation and analysis for various research projects, including user behavior analysis and second language acquisition modeling (SLAM) shared task
Led the content creation of Chinese, Japanese, and Korean from English courses
Oct. 2010 - Feb. 2015: Lead Scientist - Rakuten Institute of Technology (New York, NY)
Developed machine transliteration (NLP2011 paper award) and machine translation algorithms for the largest Japanese e-commerce website (Rakuten)
Built a Chinese/Japanese word segmentation / morphological analyzer (RakutenMA)
Developed a writing support system for English as a Second Language (ESL) learners
Apr. 2009 - Sep. 2010: Research and Development Engineer - Baidu Japan, Inc. (Shanghai / Beijing / Tokyo)
Improved the ranking and page analysis algorithms including spam detection and emoticon search for Baidu mobile search
Worked as a consultant on various NLP projects including Japanese Input Method BaiduType
Apr. 2008 - Jul. 2008: Research Intern - Microsoft Research (Redmond, WA; Mentor: Hisami Suzuki)
Built a state-of-the-art method for Japanese query alteration for spelling correction and spelling/transliteration normalization
Implemented the system using Visual C#, SQL Server, and Ruby, with tens of gigabytes of query log, which was integrated into Microsoft Live Search
Published a research paper on the query alteration algorithm at NAACL 2009 and at the 3rd NLP Symposium for Young Researchers (Outstanding Presentation Award)
Aug. 2005 - Sep. 2005: Intern (Software Engineer), Google Inc. (Mountain View, CA; Mentors: Dekang Lin and Jun Wu)
Improved Japanese query suggestion, which is currently used as the basis for the query suggestion shown at the top and bottom of the Google search result
Ran knowledge extraction algorithms on the distributed computation infrastructure (MapReduce and the Google's large network clusters)
Education
Apr. 2006 - Mar. 2009: Ph.D., Information Engineering,
Graduate School of Information Science, Nagoya University, Japan.
Doctoral Thesis: "Modeling and Selection of Context for Better Synonym Acquisition"
Apr. 2004 - Mar. 2006 : Master's Degree, Information Engineering,
Graduate School of Information Science, Nagoya University, Japan
Skipped a year in undergraduate due to the excellent academic performance. Overall GPA: 3.8
Master's Thesis: "Utilization of Probabilistic Latent Semantics for Automatic Thesaurus Construction"
Apr. 2001 - Mar. 2004 : Information Engineering Course, School of Engineering,
Nagoya University, Japan. Computer Science GPA: 3.9
Awards & Professional Activities
Invited talk on “Education and AllenNLP” at AllenNLP Summit, 2019.
Invited talk at CUNY NLP Seminar (hosted by Prof. Heng Ji) Title: Word Segmentation and Transliteration in Chinese and Japanese, April 2013. slides
2011 Field Innovation Award from the Japanese Society for Artificial Intelligence: ANPI_NLP: Safety Information Confirmation Support using Natural Language Processing for The 2011 Tohoku Earthquake.
Paper Award at NLP2011 “Latent Class Transliteration based on Source Language Origins” (the largest Japanese NLP academic conference)
Best Paper Award at NLP2009 “Semantic Category Extraction from Unsegmented Text using Graph Kernels” (the largest Japanese NLP academic conference, chosen among 235 papers)
Paper Award at the 3rd NLP Symposium for Young Researchers. Presentation: “A Unified Approach to Japanese Query Alteration based on Semantic Similarity”
Steven Bird, Ewan Klein, Edward Loper. 萩原正人 (Masato Hagiwara), 中山敬広 (Takahiro Nakayama), 水野貴明(Takaaki Mizuno) (translation). 入門 自然言語処理 (Natural Language Processing with Python). O'Reilly Japan, 2010. O'Reilly Japan - 入門 自然言語処理
Journal Papers
Burr Settles, Geoffrey T. LaFlair, Masato Hagiwara: Machine Learning–Driven Language Assessment. Transactions of the Association for Computational Linguistics, Vol. 8, pp. 247–263, 2020.
Masato Hagiwara, Koji Murakami, Graham Neubig, Yuichiroh Matsubayashi: Robust NLP for Real-world Data : 7. ANPI_NLP - Mining Safety Information after Disasters Using Natural Language Processing-. Information Processing Society of Japan Magazine. Vol. 53, No. 3, pp. 241-248, 2012.
Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Supervised Synonym Acquisition Using Distributional Features and Syntactic Patterns. Journal of Natural Language Processing, Vol. 16, Num. 2, pp. 59-83, 2009.
Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. A Comparative Study on Effective Context Selection for Distributional Similarity. Journal of Natural Language Processing, Vol. 5, Num. 5, pp. 119-150, 2008.
Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Effective Use of Indirect Dependency for Distributional Similarity. Journal of Natural Language Processing, Vol. 15, Num. 4, pp. 19-42, 2008.
Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Bootstrapping-based Extraction of Dictionary Terms from Unsegmented Legal Text. New Frontiers in Artificial Intelligence: JSAI 2008 Conference and Workshops, Revised Selected papers, Lecture Notes in Computer Science, Vol. 5447, pp. 213-227, 2009.
Conference Papers (Selected)
Masato Hagiwara. AVES: Animal Vocalization Encoder based on Self-Supervision. ICASSP 2023 [paper].
Masato Hagiwara, Benjamin Hoffman, Jen-Yu Liu, Maddie Cusimano, Felix Effenberger, Katie Zacarian. BEANS: The Benchmark of Animal Sounds. ICASSP 2023 [paper].
Yoshinari Fujinuma, Masato Hagiwara. Semi-Supervised Joint Estimation of Word and Document Readability. TextGraphs-15, 2021 [paper].
Takumi Ito, Tatsuki Kuribayashi, Hayato Kobayashi, Ana Brassard, Masato Hagiwara, Jun Suzuki and Kentaro Inui. Diamonds in the Rough: Generating Fluent Sentences from Early-stage Drafts. ILNG 2019 [paper].
Masato Hagiwara, Takumi Ito, Tatsuki Kuribayashi, Jun Suzuki and Kentaro Inui. TEASPN: Framework and Protocol for Integrated Writing Assistance Environments. EMNLP (system demonstrations), 2019. [paper]
Burr Settles, Chris Brust, Erin Gustafson, Masato Hagiwara, Nitin Madnani. Second Language Acquisition Modeling. BEA 2018, 2018. [paper]
Ayah Zirikly, Masato Hagiwara. Cross-lingual Transfer of Named Entity Recognizers without Parallel Corpora. ACL 2015, pp. 390-396, 2015. [paper]
Masato Hagiwara, Satoshi Sekine. Lightweight Client-Side Chinese/Japanese Morphological Analyzer Based on Online Learning. COLING 2014 system demonstration, pp. 39-43, 2014. [paper]
Haibo Li, Masato Hagiwara, Qi Li, Heng Ji. Comparison of the Impact of Word Segmentation on Name Tagging for Chinese and Japanese, LREC 2014, pp.2532-2536, 2014. [paper]
Masato Hagiwara, Satoshi Sekine. Accurate Word Segmentation using Transliteration and Language Model Projection, ACL 2013, pp 183-189. [paper]
Masato Hagiwara, Soh Masuko. KooSHO: Japanese Text Input Environment based on Aerial Hand Writing. The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL/HLT 2013), demo session, pp. 24-27. 2013. [paper]
Yuta Hayashibe, Masato Hagiwara, Satoshi Sekine. phloat : Integrated Writing Environment for ESL learners, Second Workshop on Advances in Text Input Methods (WTIM 2012), pp.57-72, 2012. [paper] [slides]
Masato Hagiwara, Satoshi Sekine. Latent Semantic Transliteration using Dirichlet Mixture. NEWS 2012 (the 4th Named Entities Workshop), pp. 30-37, 2012. [paper]
Graham Neubig, Yuichiroh Matsubayashi, Masato Hagiwara, Koji Murakami. Safety Information Mining — What can NLP do in a disaster —, Proc. of IJCNLP 2011. [paper]
Masato Hagiwara and Satoshi Sekine. Latent Class Transliteration based on Source Language Origins. Proc. of ACL-HLT 2011, pp. 53-57, 2011. [paper]
Masato Hagiwara and Hisami Suzuki. Japanese Query Alteration Based on Lexical Semantic Similarity. Proc. of NAACL HLT 2009, pp. 191-199, 2009. [paper]
Nobuyuki Shimizu, Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama and Hiroshi Nakagawa. Metric learning for synonym acquisition. Proc. of COLING 2008, pp. 793-800, 2008. [paper]
Masato Hagiwara. A Supervised Learning Approach to Automatic Synonym Identification based on Distributional Features. Proc. of ACL 2008 Student Research Workshop, pp. 1-6, 2008. [paper] [link]
Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Context Feature Selection for Distributional Similarity. Proc. of IJCNLP 2008, pp. 553-560, 2008. [paper] [link]
Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Selection of Effective Contextual Information for Automatic Synonym Acquisition. Proc. of COLING/ACL 2006, pp. 353 - 360, 2006. [paper] [link]
Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. PLSI Utilization for Automatic Thesaurus Construction. Proc. of IJCNLP 2005, pp. 334 - 345, 2005. [paper]