100 MustRead NLProc Papers
This is a list of 100 important natural language processing (NLP) papers that serious students and researchers working in the field should probably know about and read.
This list is originally based on the answers for a Quora question I posted years ago: What are the most important research papers which all NLP studnets should definitely read?. I thank all the people who contributed to the original post.
This list is far from complete or objective, and is evolving, as important papers are being published year after year. Please let me know via pull requests and issues if anything is missing.
Also, I didn’t try to include links to original papers since it is a lot of work to keep dead links up to date. I’m sure you can find most (if not all) of the papers listed here via a single Google search by their titles.
A paper doesn’t have to be a peerreviewed conference/journal paper to appear here. We also include tutorial/surveystyle papers that are often easier to understand than the original papers.
Language Modeling

Joshua Goodman: A bit of progress in language modeling, MSR Technical Report, 2001.

Yee Whye Teh: A Hierarchical Bayesian Language Model based on PitmanYor Processes, COLING/ACL 2006.

Yee Whye Teh: A Bayesian interpretation of Interpolated KneserNey.
Segmentation, Tagging, Parsing

Donald Hindle and Mats Rooth. Structural Ambiguity and Lexical Relations, Computational Linguistics, 1993.

Adwait Ratnaparkhi: A Maximum Entropy Model for PartOfSpeech Tagging, EMNLP 1996.

Michael Collins: Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms, EMNLP 2002.

Dan Klein and Christopher Manning: Accurate Unlexicalized Parsing, ACL 2003.

Dan Klein and Christopher Manning: CorpusBased Induction of Syntactic Structure: Models of Dependency and Constituency, ACL 2004.

Joakim Nivre and Mario Scholz: Deterministic Dependency Parsing of English Text, COLING 2004.

Ryan McDonald et al.: NonProjective Dependency Parsing using SpanningTree Algorithms, EMNLP 2005.

Daniel Andor et al.: Globally Normalized TransitionBased Neural Networks, 2016.
Information Extraction

Marti A. Hearst: Automatic Acquisition of Hyponyms from Large Text Corpora, COLING 1992.

Collins and Singer: Unsupervised Models for Named Entity Classification, EMNLP 1999.

Patrick Pantel and Dekang Lin, Discovering Word Senses from Text, SIGKDD, 2002.

Mike Mintz et al.: Distant supervision for relation extraction without labeled data, ACL 2009.
Machine Learning

John Lafferty, Andrew McCallum, Fernando C.N. Pereira: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, ICML 2001.

Charles Sutton, Andrew McCallum. An Introduction to Conditional Random Fields for Relational Learning.

Kamal Nigam, et al.: Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning, 1999.

Kevin Knight: Bayesian Inference with Tears, 2009.
Topic Models

Thomas Hofmann: Probabilistic Latent Semantic Indexing, SIGIR 1999.

David Blei, Andrew Y. Ng, and Michael I. Jordan: Latent Dirichlet Allocation, J. Machine Learning Research, 2003.
Machine Translation & Transliteration

Peter F. Brown et al.: A Statistical Approach to Machine Translation, Computational Linguistics, 1990.

Knight Knight, Graehl Jonathan. Machine Transliteration. Computational Linguistics, 1992.

Dekai Wu: Inversion Transduction Grammars and the Bilingual Parsing of Parallel Corpora, Computational Linguistics, 1997.

Kevin Knight: A Statistical MT Tutorial Workbook, 1999.

Philipp Koehn, Franz J Och, and Daniel Marcu: Statistical PhraseBased Translation, NAACL 2003.

Philip Resnik and Noah A. Smith: The Web as a Parallel Corpus, Computational Linguistics, 2003.

Franz J Och and Hermann Ney: The AlignmentTemplate Approach to Statistical Machine Translation, Computational Linguistics, 2004.

David Chiang. A Hierarchical PhraseBased Model for Statistical Machine Translation, ACL 2005.
Automatic Text Summarization

Kevin Knight and Daniel Marcu: Summarization beyond sentence extraction. Artificial Intelligence 139, 2002.

James Clarke and Mirella Lapata: Modeling Compression with Discourse Constraints. EMNLPCONLL 2007.

Ryan McDonald: A Study of Global Inference Algorithms in MultiDocument Summarization, ECIR 2007.

Wentau Yih et al. MultiDocument Summarization by Maximizing Informative ContentWords. IJCAI 2007.
Neural Models

Yoshua Bengio, et al.: A Neural Probabilistic Language Model, J. of Machine Learning Research, 2003.

Richard Socher, et al.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection, NIPS 2011.

Ronan Collobert et al.: Natural Language Processing (almost) from Scratch, J. of Machine Learning Research, 2011.

Tomas Mikolov, et al.: Efficient Estimation of Word Representations in Vector Space, 2013.

Tomas Mikolov, et al.: Distributed Representations of Words and Phrases and their Compositionality, NIPS 2013.

Quoc V. Le and Tomas Mikolov: Distributed Representations of Sentences and Documents, 2014.

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le: Sequence to Sequence Learning with Neural Networks, NIPS 2014.

Oriol Vinyals, Quoc Le: A Neural Conversation Model, 2015.

Xiang Zhang, Junbo Zhao, and Yann LeCun: Characterlevel Convolutional Networks for Text Classification, NIPS 2015.