Masato Hagiwara

I am a Researcher / Machine Learning Engineer currently working at Duolingo. I love languages, machine learning, and everything in between.

I speak Chinese, Japanese, and English fluently, and am learning Korean and Lojban. I helped launch the Japanese, Korean, and Chinese courses on Duolingo. My research projects recently appeared on TechCrunch and Quartz.




  • Feb. 2015 - Present: Machine Learning Engineer / Researcher - Duolingo, Inc. (in Pittsburgh, PA)
    • Built automatic grading technologies for Duolingo English Test writing and speaking questions using neural networks
    • Led data creation and analysis for various research projects, including user behavior analysis and second language acquisition modeling (SLAM) shared task
    • Led the content creation of Chinese, Japanese, and Korean from English courses

  • Oct. 2010 - Feb. 2015: Lead Scientist - Rakuten Institute of Technology (in New York)
    • Implemented machine transliteration (NLP2011 paper award) and machine translation for e-commerce
    • Built Chinese/Japanese word segmentation, morphological analysis, named entity extraction systems
    • Lexical knowledge acquisition and information extraction from the Web
    • Developed writing support system for English as a Second Language (ESL) learners

  • Apr. 2009 - Sep. 2010: Research and Development Engineer - Baidu Japan, Inc. (worked in Shanghai / Beijing / Tokyo)
    • Planned and acted as a lead developer in various projects including Unnatural language processing contest and Baidu Mobile Corpus and Timed Corpus.
    • Worked on the ranking and page analytical algorithms including spam detection for Baidu mobile search. Also worked on the mobile emoticon search using various NLP semantic analysis techniques.
    • Worked on various NLP topics including - word / sentence analysis technologies, synonym mining and dictionary construction, proper noun detection, Japanese Input Method BaiduType, etc.

  • Apr. 2008 - Jul. 2008: Research Intern - Microsoft Research, Redmond, USA. (Mentor: Hisami Suzuki)
    • Proposed a state-of-the-art method for Japanese query alteration, which corrects misspellings and normalizes the spelling/transliteration variants, with higher accuracy than previous systems.
    • Implemented the system using Visual C#, SQL Server, and Ruby, with tens of gigabytes of query log. This system is being integrated into Microsoft Live Search (
    • Developed a method to automatically and efficiently generate query re-writing pairs from session log.
    • Presented the project at the 3rd NLP Symposium for Young Researchers and was awarded the outstanding presentation award. Presented at NAACL 2009.

  • Nov. 2006 - Aug. 2007: Developer - IPA, JAPAN: Exploratory Software Project. (Project Manager: Prof. David J. Farber)
    • Accepted as the Exploratory Software Project "Serendi: A Location-Aware Social Networking Platform," a meta social networking service targeted at mobile devices with GPS. (acceptance ratio 23.4%)
    • Developed the "compatibility" analysis module, which recommends users in real time based on natural language processing and network analysis. Used PHP, JavaScript, Ruby, MySQL, and ActiveRecord.
    • Conducted an extensive user test with more than 50 users and confirmed the reliability of the system.

  • Aug. 2005 - Sep. 2005: Intern (Software Engineer), Google Inc., CA, USA. (Mentors: Dekang Lin and Jun Wu)
    • Participated in the two-month internship program, as one of the few interns chosen from Japan, as it was only the second year since the internship program started.
    • Improved Japanese query suggestion, which is currently used as the basis for the query suggestion shown at the top and bottom of the Google search result.
    • Fully used the parallel distributed computation algorithms such as MapReduce and the large network cluster infrastructure which Google offers.

  • Apr. 2006 - Mar. 2007: Research Assistant, Nagoya University
    • Proposed and implemented some extension and selection methods of context for lexical similarity computation, to increase the performance of linguistic resources construction such as thesauri.


  • Apr. 2006 - Mar. 2009: Ph.D. Candidate, Department of Information Engineering,
    • Graduate School of Information Science, Nagoya University, Japan.
    • Doctoral Thesis: "Modeling and Selection of Context for Better Synonym Acquisition"

  • Apr. 2004 - Mar. 2006 : Master's Program in Department of Information Engineering,
    • Graduate School of Information Science, Nagoya University, Japan
    • Skipped a year in undergraduate and admitted to the graduate school based on the grade-skipping system due to the excellent academic performance. Overall GPA: 3.8
    • Master's Thesis: "Utilization of Probabilistic Latent Semantics for Automatic Thesaurus Construction"

  • Apr. 2001 - Mar. 2004 : Information Engineering Course, School of Engineering,
    • Nagoya University, Japan. Computer Science GPA: 3.9

Awards & Professional Activities

  • Invited talk at CUNY NLP Seminar (hosted by Prof. Heng Ji) Title: Word Segmentation and Transliteration in Chinese and Japanese, April 2013. slides
  • 2011 Field Innovation Award from the Japanese Society for Artificial Intelligence: ANPI_NLP: Safety Information Confirmation Support using Natural Language Processing for The 2011 Tohoku Earthquake.
  • Paper Award at NLP2011 “Latent Class Transliteration based on Source Language Origins” (the largest Japanese NLP academic conference)
  • Invited presentation at IPSJ 2012 “Real-world Natural Language Processing”
  • Leading editorial member of a special issue on “UnNatural Language Processing, ” Journal of Natural Language Processing, 2011.
  • Panelist at the joint workshop “Relationship between industrial, students, universities, and students in the NLP field” at the 17th Annual Meeting of the Association for Natural Language Processing
  • Best Paper Award at NLP2009 “Semantic Category Extraction from Unsegmented Text using Graph Kernels” (the largest Japanese NLP academic conference, chosen among 235 papers)
  • Paper Award at the 3rd NLP Symposium for Young Researchers. Presentation: “A Unified Approach to Japanese Query Alteration based on Semantic Similarity”
  • Paper Award at the 22nd IMI Seminar of the 21st Century COE Program. Presentation: “Utilization of Probabilistic Latent Semantics for Automatic Thesaurus Construction”
  • Program Committee at SANCL 2012, the Student Research Workshop (SRW) at ACL-IJCNLP 2009 and ACL 2012.
  • Program Committee at ACL 2014 (morphology) and COLING 2014 (machine translation).


Books and Articles

  • Drew Conway, John Myles White, 萩原正人 (Masato Hagiwara), 奥野 陽 (Yoh Okuno), 水野 貴明 (Takaaki Mizuno), 木下 哲也 (Tetsuya Kinoshita) (translation). 入門 機械学習 (Machine Learning for Hackers). O'Reilly Japan, 2012. O'Reilly Japan - 入門 機械学習
  • Steven Bird, Ewan Klein, Edward Loper. 萩原正人 (Masato Hagiwara), 中山敬広 (Takahiro Nakayama), 水野貴明(Takaaki Mizuno) (translation). 入門 自然言語処理 (Natural Language Processing with Python). O'Reilly Japan, 2010. O'Reilly Japan - 入門 自然言語処理
  • Masato Hagiwara, Koji Murakami, Graham Neubig, Yuichiroh Matsubayashi: Robust NLP for Real-world Data : 7. ANPI_NLP - Mining Safety Information after Disasters Using Natural Language Processing-. Information Processing Society of Japan Magazine. Vol. 53, No. 3, pp. 241-248, 2012.
  • Masato Hagiwara: Recommendation for Overseas Internship. Japanese Society for Artificial Intelligence Journal, Vol. 29, No. 2, pp. 209-211, 2014.

Journal Papers

  • 萩原正人,小川泰弘,外山勝彦: グラフカーネルを用いた非分かち書き文からの漸次的語彙知識獲得, 人工知能学会誌, Vol.26, No.3, pp.440-450, 2011.
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Supervised Synonym Acquisition Using Distributional Features and Syntactic Patterns. Journal of Natural Language Processing, Vol. 16, Num. 2, pp. 59-83, 2009. i>Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. A Comparative Study on Effective Context Selection for Distributional Similarity. Journal of Natural Language Processing, Vol. 5, Num. 5, pp. 119-150, 2008.
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Effective Use of Indirect Dependency for Distributional Similarity. Journal of Natural Language Processing, Vol. 15, Num. 4, pp. 19-42, 2008.
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Bootstrapping-based Extraction of Dictionary Terms from Unsegmented Legal Text. New Frontiers in Artificial Intelligence: JSAI 2008 Conference and Workshops, Revised Selected papers, Lecture Notes in Computer Science, Vol. 5447, pp. 213-227, 2009.

Conference Papers (Selected)

  • Ayah Zirikly, Masato Hagiwara. Cross-lingual Transfer of Named Entity Recognizers without Parallel Corpora. ACL 2015, pp. 390-396, 2015. [paper]
  • Masato Hagiwara, Satoshi Sekine. Lightweight Client-Side Chinese/Japanese Morphological Analyzer Based on Online Learning. COLING 2014 system demonstration, pp. 39-43, 2014. [paper]
  • Haibo Li, Masato Hagiwara, Qi Li, Heng Ji. Comparison of the Impact of Word Segmentation on Name Tagging for Chinese and Japanese, LREC 2014, pp.2532-2536, 2014. [paper]
  • Masato Hagiwara, Satoshi Sekine. Accurate Word Segmentation using Transliteration and Language Model Projection, ACL 2013, pp 183-189. [paper]
  • Masato Hagiwara, Soh Masuko. KooSHO: Japanese Text Input Environment based on Aerial Hand Writing. The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL/HLT 2013), demo session, pp. 24-27. 2013. [paper]
  • Yuta Hayashibe, Masato Hagiwara, Satoshi Sekine. phloat : Integrated Writing Environment for ESL learners, Second Workshop on Advances in Text Input Methods (WTIM 2012), pp.57-72, 2012. [paper] [slides]
  • Masato Hagiwara, Satoshi Sekine. Latent Semantic Transliteration using Dirichlet Mixture. NEWS 2012 (the 4th Named Entities Workshop), pp. 30-37, 2012. [paper]
  • Graham Neubig, Yuichiroh Matsubayashi, Masato Hagiwara, Koji Murakami. Safety Information Mining — What can NLP do in a disaster —, Proc. of IJCNLP 2011. [paper]
  • Masato Hagiwara and Satoshi Sekine. Latent Class Transliteration based on Source Language Origins. Proc. of ACL-HLT 2011, pp. 53-57, 2011. [paper]
  • Masato Hagiwara and Hisami Suzuki. Japanese Query Alteration Based on Lexical Semantic Similarity. Proc. of NAACL HLT 2009, pp. 191-199, 2009. [paper]
  • Nobuyuki Shimizu, Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama and Hiroshi Nakagawa. Metric learning for synonym acquisition. Proc. of COLING 2008, pp. 793-800, 2008. [paper]
  • Masato Hagiwara. A Supervised Learning Approach to Automatic Synonym Identification based on Distributional Features. Proc. of ACL 2008 Student Research Workshop, pp. 1-6, 2008. [paper] [link]
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Bootstrapping-based Extraction of Dictionary Terms from Unsegmented Legal Text. Proc. of JURISIN 2008, pp. 63-72, 2008. [paper]
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Context Feature Selection for Distributional Similarity. Proc. of IJCNLP 2008, pp. 553-560, 2008. [paper] [link]
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Effective Proximity Distance for Word-Based Context. Proc. of SNLP 2007, pp. 105-110, 2007. [paper] [link]
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Effectiveness of Indirect Dependency for Automatic Synonym Acquisition. Proc. of CoSMo 2007, pp. 1 - 8, 2007. [paper] [ppt]
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Selection of Effective Contextual Information for Automatic Synonym Acquisition. Proc. of COLING/ACL 2006, pp. 353 - 360, 2006. [paper] [link]
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. PLSI Utilization for Automatic Thesaurus Construction. Proc. of IJCNLP 2005, pp. 334 - 345, 2005. [paper]




