Terms and Application Form of MSMO Dataset

Table of Contents

1 Introduction

Multimodal summarization has become a hot research topic due to the rapid growth of multimedia data. However, the output of existing multimodal summarization is usually presented in a single modality, such as textual or visual. And we argue that multimodal output is very necessary. Thus we propose a new task, Multimodal Summarization with Multimodal Output (MSMO), which aims to automatically generate a pictorial summary given a document and a collection of images.

1.1 Dataset

There is no large-scale benchmark dataset for MSMO. We follow Hermann et al. to construct a corpus from Daily Mail website. We use the manually-written highlights offered by Daily Mail as a reference text summary. To get the pictorial reference, we employ 10 graduate students to select the relevant images from the article for each reference text summary. We allow annotators to select up to three images to reduce the difference between different annotators. If the annotators find that there is no relevant image, they will select none of them. Each article is annotated by at least two students, a third annotator will be asked to decide the final annotation for the case of divergence for the first two annotators. We only conduct the annotation on the test set.

More details can be found in our EMNLP2018 paper.

2 Copyright

The copyright of this dataset belongs to the authors, and the dataset is only used for research purposes. Display, reproduction, transmission, distribution or publication of this dataset is prohibited. If you are interested in our dataset, please fill out the application form below and send an email to {junnan.zhu, haoran.li}@nlpr.ia.ac.cn. We will send the download link of this dataset to the applicant. If you have any question, don't hesitate to contact us.

3 Application Form






□ I have read the above terms, and accept them.


4 Reference

[1] Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Proceedings of Neural Information Processing Systems (NIPS), pages 1693–1701.

If you find this useful, please cite our paper.

  author    = {Zhu, Junnan  and  Li, Haoran  and  Liu, Tianshang  and Zhou, Yu and Zhang, Jiajun  and  Zong, Chengqing},
  title     = {MSMO: Multimodal Summarization with Multimodal Output},
  booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
  year      = {2018},
  pages     = {4154--4164},

Author: Junnan Zhu

Created: 2018-11-09 Fri 10:19