I am a Researcher / Machine Learning Engineer currently working at Duolingo. I love languages, machine learning, and everything in between.
I speak Chinese, Japanese, and English fluently, and am learning Korean and Lojban. I helped launch the Japanese, Korean, and Chinese courses on Duolingo. My research projects recently appeared on TechCrunch and Quartz.
- Feb. 2015 - Present: Machine Learning Engineer / Researcher - Duolingo, Inc. (in Pittsburgh, PA)
- Built automatic grading technologies for Duolingo English Test writing and speaking questions using neural networks
- Led data creation and analysis for various research projects, including user behavior analysis and second language acquisition modeling (SLAM) shared task
- Led the content creation of Chinese, Japanese, and Korean from English courses
- Oct. 2010 - Feb. 2015: Lead Scientist - Rakuten Institute of Technology (in New York)
- Implemented machine transliteration (NLP2011 paper award) and machine translation for e-commerce
- Built Chinese/Japanese word segmentation, morphological analysis, named entity extraction systems
- Lexical knowledge acquisition and information extraction from the Web
- Developed writing support system for English as a Second Language (ESL) learners
- Apr. 2009 - Sep. 2010: Research and Development Engineer - Baidu Japan, Inc. (worked in Shanghai / Beijing / Tokyo)
- Planned and acted as a lead developer in various projects including Unnatural language processing contest and Baidu Mobile Corpus and Timed Corpus.
- Worked on the ranking and page analytical algorithms including spam detection for Baidu mobile search. Also worked on the mobile emoticon search using various NLP semantic analysis techniques.
- Worked on various NLP topics including - word / sentence analysis technologies, synonym mining and dictionary construction, proper noun detection, Japanese Input Method BaiduType, etc.
- Apr. 2008 - Jul. 2008: Research Intern - Microsoft Research, Redmond, USA. (Mentor: Hisami Suzuki)
- Proposed a state-of-the-art method for Japanese query alteration, which corrects misspellings and normalizes the spelling/transliteration variants, with higher accuracy than previous systems.
- Implemented the system using Visual C#, SQL Server, and Ruby, with tens of gigabytes of query log. This system is being integrated into Microsoft Live Search (http://www.live.com/).
- Developed a method to automatically and efficiently generate query re-writing pairs from session log.
- Presented the project at the 3rd NLP Symposium for Young Researchers and was awarded the outstanding presentation award. Presented at NAACL 2009.
- Nov. 2006 - Aug. 2007: Developer - IPA, JAPAN: Exploratory Software Project. (Project Manager: Prof. David J. Farber)
- Accepted as the Exploratory Software Project "Serendi: A Location-Aware Social Networking Platform," a meta social networking service targeted at mobile devices with GPS. (acceptance ratio 23.4%)
- Conducted an extensive user test with more than 50 users and confirmed the reliability of the system.
- Aug. 2005 - Sep. 2005: Intern (Software Engineer), Google Inc., CA, USA. (Mentors: Dekang Lin and Jun Wu)
- Participated in the two-month internship program, as one of the few interns chosen from Japan, as it was only the second year since the internship program started.
- Improved Japanese query suggestion, which is currently used as the basis for the query suggestion shown at the top and bottom of the Google search result.
- Fully used the parallel distributed computation algorithms such as MapReduce and the large network cluster infrastructure which Google offers.
- Apr. 2006 - Mar. 2007: Research Assistant, Nagoya University
- Proposed and implemented some extension and selection methods of context for lexical similarity computation, to increase the performance of linguistic resources construction such as thesauri.
- Apr. 2006 - Mar. 2009: Ph.D. Candidate, Department of Information Engineering,
- Graduate School of Information Science, Nagoya University, Japan.
- Doctoral Thesis: "Modeling and Selection of Context for Better Synonym Acquisition"
- Apr. 2004 - Mar. 2006 : Master's Program in Department of Information Engineering,
- Graduate School of Information Science, Nagoya University, Japan
- Skipped a year in undergraduate and admitted to the graduate school based on the grade-skipping system due to the excellent academic performance. Overall GPA: 3.8
- Master's Thesis: "Utilization of Probabilistic Latent Semantics for Automatic Thesaurus Construction"
- Apr. 2001 - Mar. 2004 : Information Engineering Course, School of Engineering,
- Nagoya University, Japan. Computer Science GPA: 3.9
Awards & Professional Activities
- Invited talk at CUNY NLP Seminar (hosted by Prof. Heng Ji) Title: Word Segmentation and Transliteration in Chinese and Japanese, April 2013. slides
- 2011 Field Innovation Award from the Japanese Society for Artificial Intelligence: ANPI_NLP: Safety Information Confirmation Support using Natural Language Processing for The 2011 Tohoku Earthquake.
- Paper Award at NLP2011 “Latent Class Transliteration based on Source Language Origins” (the largest Japanese NLP academic conference)
- Invited presentation at IPSJ 2012 “Real-world Natural Language Processing”
- Leading editorial member of a special issue on “UnNatural Language Processing, ” Journal of Natural Language Processing, 2011.
- Panelist at the joint workshop “Relationship between industrial, students, universities, and students in the NLP field” at the 17th Annual Meeting of the Association for Natural Language Processing
- Best Paper Award at NLP2009 “Semantic Category Extraction from Unsegmented Text using Graph Kernels” (the largest Japanese NLP academic conference, chosen among 235 papers)
- Paper Award at the 3rd NLP Symposium for Young Researchers. Presentation: “A Unified Approach to Japanese Query Alteration based on Semantic Similarity”
- Paper Award at the 22nd IMI Seminar of the 21st Century COE Program. Presentation: “Utilization of Probabilistic Latent Semantics for Automatic Thesaurus Construction”
- Program Committee at SANCL 2012, the Student Research Workshop (SRW) at ACL-IJCNLP 2009 and ACL 2012.
- Program Committee at ACL 2014 (morphology) and COLING 2014 (machine translation).
Books and Articles
- Drew Conway, John Myles White, 萩原正人 (Masato Hagiwara), 奥野 陽 (Yoh Okuno), 水野 貴明 (Takaaki Mizuno), 木下 哲也 (Tetsuya Kinoshita) (translation). 入門 機械学習 (Machine Learning for Hackers). O'Reilly Japan, 2012. O'Reilly Japan - 入門 機械学習
- Steven Bird, Ewan Klein, Edward Loper. 萩原正人 (Masato Hagiwara), 中山敬広 (Takahiro Nakayama), 水野貴明(Takaaki Mizuno) (translation). 入門 自然言語処理 (Natural Language Processing with Python). O'Reilly Japan, 2010. O'Reilly Japan - 入門 自然言語処理
- Masato Hagiwara, Koji Murakami, Graham Neubig, Yuichiroh Matsubayashi: Robust NLP for Real-world Data : 7. ANPI_NLP - Mining Safety Information after Disasters Using Natural Language Processing-. Information Processing Society of Japan Magazine. Vol. 53, No. 3, pp. 241-248, 2012.
- Masato Hagiwara: Recommendation for Overseas Internship. Japanese Society for Artificial Intelligence Journal, Vol. 29, No. 2, pp. 209-211, 2014.
- 萩原正人，小川泰弘，外山勝彦: グラフカーネルを用いた非分かち書き文からの漸次的語彙知識獲得, 人工知能学会誌, Vol.26, No.3, pp.440-450, 2011.
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Supervised Synonym Acquisition Using Distributional Features and Syntactic Patterns. Journal of Natural Language Processing, Vol. 16, Num. 2, pp. 59-83, 2009.
i>Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. A Comparative Study on Effective Context Selection for Distributional Similarity. Journal of Natural Language Processing, Vol. 5, Num. 5, pp. 119-150, 2008.
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Effective Use of Indirect Dependency for Distributional Similarity. Journal of Natural Language Processing, Vol. 15, Num. 4, pp. 19-42, 2008.
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Bootstrapping-based Extraction of Dictionary Terms from Unsegmented Legal Text. New Frontiers in Artificial Intelligence: JSAI 2008 Conference and Workshops, Revised Selected papers, Lecture Notes in Computer Science, Vol. 5447, pp. 213-227, 2009.
Conference Papers (Selected)
- Ayah Zirikly, Masato Hagiwara. Cross-lingual Transfer of Named Entity Recognizers without Parallel Corpora. ACL 2015, pp. 390-396, 2015. [paper]
- Masato Hagiwara, Satoshi Sekine. Lightweight Client-Side Chinese/Japanese Morphological Analyzer Based on Online Learning. COLING 2014 system demonstration, pp. 39-43, 2014. [paper]
- Haibo Li, Masato Hagiwara, Qi Li, Heng Ji. Comparison of the Impact of Word Segmentation on Name Tagging for Chinese and Japanese, LREC 2014, pp.2532-2536, 2014. [paper]
- Masato Hagiwara, Satoshi Sekine. Accurate Word Segmentation using Transliteration and Language Model Projection, ACL 2013, pp 183-189. [paper]
- Masato Hagiwara, Soh Masuko. KooSHO: Japanese Text Input Environment based on Aerial Hand Writing. The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL/HLT 2013), demo session, pp. 24-27. 2013. [paper]
- Yuta Hayashibe, Masato Hagiwara, Satoshi Sekine. phloat : Integrated Writing Environment for ESL learners, Second Workshop on Advances in Text Input Methods (WTIM 2012), pp.57-72, 2012. [paper] [slides]
- Masato Hagiwara, Satoshi Sekine. Latent Semantic Transliteration using Dirichlet Mixture. NEWS 2012 (the 4th Named Entities Workshop), pp. 30-37, 2012. [paper]
- Graham Neubig, Yuichiroh Matsubayashi, Masato Hagiwara, Koji Murakami. Safety Information Mining — What can NLP do in a disaster —, Proc. of IJCNLP 2011. [paper]
- Masato Hagiwara and Satoshi Sekine. Latent Class Transliteration based on Source Language Origins. Proc. of ACL-HLT 2011, pp. 53-57, 2011. [paper]
- Masato Hagiwara and Hisami Suzuki. Japanese Query Alteration Based on Lexical Semantic Similarity. Proc. of NAACL HLT 2009, pp. 191-199, 2009. [paper]
- Nobuyuki Shimizu, Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama and Hiroshi Nakagawa. Metric learning for synonym acquisition. Proc. of COLING 2008, pp. 793-800, 2008. [paper]
- Masato Hagiwara. A Supervised Learning Approach to Automatic Synonym Identification based on Distributional Features. Proc. of ACL 2008 Student Research Workshop, pp. 1-6, 2008. [paper] [link]
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Bootstrapping-based Extraction of Dictionary Terms from Unsegmented Legal Text. Proc. of JURISIN 2008, pp. 63-72, 2008. [paper]
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Context Feature Selection for Distributional Similarity. Proc. of IJCNLP 2008, pp. 553-560, 2008. [paper] [link]
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Effective Proximity Distance for Word-Based Context. Proc. of SNLP 2007, pp. 105-110, 2007. [paper] [link]
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Effectiveness of Indirect Dependency for Automatic Synonym Acquisition. Proc. of CoSMo 2007, pp. 1 - 8, 2007. [paper] [ppt]
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Selection of Effective Contextual Information for Automatic Synonym Acquisition. Proc. of COLING/ACL 2006, pp. 353 - 360, 2006. [paper] [link]
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. PLSI Utilization for Automatic Thesaurus Construction. Proc. of IJCNLP 2005, pp. 334 - 345, 2005. [paper]
- In English
- In Japanese
- How I work - Masato Hagiwara at Duolingo (Jan. 2016, Lifehacker.jp)
- Why you shouldn't study at weekends - Data reveal three common traits of successful language learners (Dec. 2016, TechCrunch Japan)
- Difference between successful and unsuccessful language learners, according to a researcher at Duolingo (Dec. 2016, Lifehacker.jp)
- Aptitude doesn't matter for language learning - Interview with Masato Hagiwara, a Japanese software engineer at Duolingo (Aug. 2015, Lifehacker.jp)
- Humans still learning languages in 30 years? (Aug. 2015, Lifehacker.jp)
- Free language learning app Duolingo raises $45 million from Google Capital (June 2015, Nikkei Computer)
- Process Emojis as 'words' - Emojis not used as defined (July 2010, INTERNET Watch)
- Process Emojis as 'words' - algorithm to distinguish 'beers' from 'parties' (July 2010, INTERNET Watch)
- Character encoding experts turn Baidu Emoji search episodes into an academic paper (Mar. 2010, INTERNET Watch)