Content Type Profiling of Data-to-Text Generation Datasets

Ashish Upadhyay, Stewart Massie


Abstract
Data-to-Text Generation (D2T) problems can be considered as a stream of time-stamped events with a text summary being produced for each. The problem becomes more challenging when event summaries contain complex insights derived from multiple records either within an event, or across several events from the event stream. It is important to understand the different types of content present in the summary to help us better define the system requirements so that we can build better systems. In this paper, we propose a novel typology of content types, that we use to classify the contents of event summaries. Using the typology, a profile of a dataset is generated as the distribution of the aggregated content types which captures the specific characteristics of the dataset and gives a measure of the complexity present in the problem. Extensive experimentation on different D2T datasets is performed and these demonstrate that neural systems struggle in generating contents of complex types.
Anthology ID:
2022.coling-1.507
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5770–5782
Language:
URL:
https://aclanthology.org/2022.coling-1.507
DOI:
Bibkey:
Cite (ACL):
Ashish Upadhyay and Stewart Massie. 2022. Content Type Profiling of Data-to-Text Generation Datasets. In Proceedings of the 29th International Conference on Computational Linguistics, pages 5770–5782, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Content Type Profiling of Data-to-Text Generation Datasets (Upadhyay & Massie, COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.507.pdf
Code
 ashishu007/content-type-profiling