2016
pdf
bib
The SENSEI Annotated Corpus: Human Summaries of Reader Comment Conversations in On-line News
Emma Barker
|
Monica Lestari Paramita
|
Ahmet Aker
|
Emina Kurtic
|
Mark Hepple
|
Robert Gaizauskas
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue
pdf
bib
Automatic label generation for news comment clusters
Ahmet Aker
|
Monica Paramita
|
Emina Kurtic
|
Adam Funk
|
Emma Barker
|
Mark Hepple
|
Rob Gaizauskas
Proceedings of the 9th International Natural Language Generation conference
pdf
bib
abs
What’s the Issue Here?: Task-based Evaluation of Reader Comment Summarization Systems
Emma Barker
|
Monica Paramita
|
Adam Funk
|
Emina Kurtic
|
Ahmet Aker
|
Jonathan Foster
|
Mark Hepple
|
Robert Gaizauskas
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Automatic summarization of reader comments in on-line news is an extremely challenging task and a capability for which there is a clear need. Work to date has focussed on producing extractive summaries using well-known techniques imported from other areas of language processing. But are extractive summaries of comments what users really want? Do they support users in performing the sorts of tasks they are likely to want to perform with reader comments? In this paper we address these questions by doing three things. First, we offer a specification of one possible summary type for reader comment, based on an analysis of reader comment in terms of issues and viewpoints. Second, we define a task-based evaluation framework for reader comment summarization that allows summarization systems to be assessed in terms of how well they support users in a time-limited task of identifying issues and characterising opinion on issues in comments. Third, we describe a pilot evaluation in which we used the task-based evaluation framework to evaluate a prototype reader comment clustering and summarization system, demonstrating the viability of the evaluation framework and illustrating the sorts of insight such an evaluation affords.
2015
pdf
bib
Comment-to-Article Linking in the Online News Domain
Ahmet Aker
|
Emina Kurtic
|
Mark Hepple
|
Rob Gaizauskas
|
Giuseppe Di Fabbrizio
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue
2012
pdf
bib
abs
A Corpus of Spontaneous Multi-party Conversation in Bosnian Serbo-Croatian and British English
Emina Kurtić
|
Bill Wells
|
Guy J. Brown
|
Timothy Kempton
|
Ahmet Aker
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
In this paper we present a corpus of audio and video recordings of spontaneous, face-to-face multi-party conversation in two languages. Freely available high quality recordings of mundane, non-institutional, multi-party talk are still sparse, and this corpus aims to contribute valuable data suitable for study of multiple aspects of spoken interaction. In particular, it constitutes a unique resource for spoken Bosnian Serbo-Croatian (BSC), an under-resourced language with no spoken resources available at present. The corpus consists of just over 3 hours of free conversation in each of the target languages, BSC and British English (BE). The audio recordings have been made on separate channels using head-set microphones, as well as using a microphone array, containing 8 omni-directional microphones. The data has been segmented and transcribed using segmentation notions and transcription conventions developed from those of the conversation analysis research tradition. Furthermore, the transcriptions have been automatically aligned with the audio at the word and phone level, using the method of forced alignment. In this paper we describe the procedures behind the corpus creation and present the main features of the corpus for the study of conversation.