Stanford University · Computer Science

Large Movie Review Dataset

A binary sentiment classification benchmark introduced in Maas et al., ACL 2011. Also known as aclImdb or simply IMDB.

This dataset contains 50,000 polarized movie reviews scraped from IMDB — 25,000 for training and 25,000 for testing — plus 50,000 additional unlabeled reviews for unsupervised or semi-supervised learning. Reviews are labeled only when strongly positive (score ≥ 7/10) or strongly negative (score ≤ 4/10), so the resulting classification task is challenging but unambiguous. Both raw text and pre-processed bag-of-words formats are provided.

In the years since release the dataset has become one of the standard text-classification benchmarks in NLP, used to evaluate landmark models including ULMFiT, ELMo, BERT, RoBERTa, XLNet, ALBERT, and DistilBERT. It ships in Hugging Face Datasets, TensorFlow Datasets, Keras, and PyTorch-NLP.

50,000
Labeled reviews
7,920
Citations of the source paper
179K
Hugging Face downloads / month
1,547+
Models on Hugging Face

Download

aclImdb_v1.tar.gz ~84 MB compressed · ~133 MB extracted

Large Movie Review Dataset v1.0 — raw text reviews plus pre-tokenized bag-of-words features.

See the included README for full file layout, class balance, and a description of the bag-of-words encoding.

Splits

SplitReviewsLabels
Train25,000balanced pos / neg
Test25,000balanced pos / neg
Unsupervised50,000unlabeled

Fields

text
raw movie review (string)
label
0 = negative, 1 = positive

Mirrors & library integrations

Used as a benchmark by

A non-exhaustive list of landmark papers and models that use this dataset to evaluate text-classification or representation-learning quality. Modern transformer models reach 95–97% test accuracy on the binary task; the original 2011 paper reported 88.89%.

Devlin et al., NAACL 2019 · pretraining benchmark
Howard & Ruder, ACL 2018 · transfer-learning showcase
Peters et al., NAACL 2018 · contextualized embeddings
Liu et al., 2019 · replication study of BERT
Yang et al., NeurIPS 2019 · permutation-based pretraining
Lan et al., ICLR 2020 · parameter-efficient BERT
Sanh et al., 2019 · distillation benchmark
Reimers & Gurevych, EMNLP 2019 · sentence embeddings
Le & Mikolov, ICML 2014 · document representations
Sebastian Ruder · sentiment-analysis SOTA tracking

Citation

If you use this dataset, please cite the ACL 2011 paper:

@InProceedings{maas-EtAl:2011:ACL-HLT2011,
  author    = {Maas, Andrew L.  and  Daly, Raymond E.  and  Pham, Peter T.
               and  Huang, Dan  and  Ng, Andrew Y.  and  Potts, Christopher},
  title     = {Learning Word Vectors for Sentiment Analysis},
  booktitle = {Proceedings of the 49th Annual Meeting of the Association for
               Computational Linguistics: Human Language Technologies},
  month     = {June},
  year      = {2011},
  address   = {Portland, Oregon, USA},
  publisher = {Association for Computational Linguistics},
  pages     = {142--150},
  url       = {http://www.aclweb.org/anthology/P11-1015}
}

Download .bib file · Read the paper (PDF)

Contact

Questions or comments about the dataset? Email amaas [at] cs.stanford.edu.