2024
pdf
bib
abs
”So, are you a different person today?” Analyzing Bias in Questions during Parole Hearings
Wassiliki Siskou
|
Ingrid Espinoza
Proceedings of the Second Workshop on Social Influence in Conversations (SICon 2024)
During Parole Suitability Hearings commissioners need to evaluate whether an inmate’s risk of reoffending has decreased sufficiently to justify their release from prison before completing their full sentence. The conversation between the commissioners and the inmate is the key element of such hearings and is largely driven by question-and-answer patterns which can be influenced by the commissioner’s questioning behavior. To our knowledge, no previous study has investigated the relationship between the types of questions asked during parole hearings and potentially biased outcomes. We address this gap by analysing commissioner’s questioning behavior during Californian parole hearings. We test ChatGPT-4o’s capability of annotating questions automatically and achieve a high F1-score of 0.91 without prior training. By analysing all questions posed directly by commissioners to inmates, we tested for potential biases in question types across multiple demographic variables. The results show minimal bias in questioning behavior toward inmates asking for parole.
pdf
bib
abs
Automated Anonymization of Parole Hearing Transcripts
Abed Itani
|
Wassiliki Siskou
|
Annette Hautli-Janisz
Proceedings of the Natural Legal Language Processing Workshop 2024
Responsible natural language processing is more and more concerned with preventing the violation of personal rights that language technology can entail (CITATION). In this paper we illustrate the case of parole hearings in California, the verbatim transcripts of which are made available to the general public upon a request sent to the California Board of Parole Hearings. The parole hearing setting is highly sensitive: inmates face a board of legal representatives who discuss highly personal matters not only about the inmates themselves but also about victims and their relatives, such as spouses and children. Participants have no choice in contributing to the data collection process, since the disclosure of the transcripts is mandated by law. As researchers who are interested in understanding and modeling the communication in these hierarchy-driven settings, we face an ethical dilemma: publishing raw data as is for the community would compromise the privacy of all individuals affected, but manually cleaning the data requires a substantive effort. In this paper we present an automated anonymization process which reliably removes and pseudonymizes sensitive data in verbatim transcripts, while at the same time preserving the structure and content of the data. Our results show that the process exhibits little to no leakage of sensitive information when applied to more than 300 hearing transcripts.
pdf
bib
abs
PSE v1.0: The First Open Access Corpus of Public Service Encounters
Ingrid Espinoza
|
Steffen Frenzel
|
Laurin Friedrich
|
Wassiliki Siskou
|
Steffen Eckhard
|
Annette Hautli-Janisz
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Face-to-face interactions between representatives of the state and citizens are a key intercept in public service delivery, for instance when providing social benefits to vulnerable groups. Despite the relevance of these encounters for the individual, but also for society at large, there is a significant research gap in the systematic empirical study of the communication taking place. This is mainly due to the high institutional and data protection barriers for collecting data in a very sensitive and private setting in which citizens request support from the state. In this paper, we describe the procedure of compiling the first open access dataset of transcribed recordings of so-called Public Service Encounters in Germany, i.e., meetings between state officials and citizens in which there is direct communication in order to allocate state services. This dataset sets a new research directive in the social sciences, because it allows the community to open up the black box of direct state-citizen interaction. With data of this kind it becomes possible to directly and systematically investigate bias, bureaucratic discrimination and other power-driven dynamics in the actual communication and ideally propose guidelines as to alleviate these issues.
2022
pdf
bib
abs
The Keystone Role Played by Questions in Debate
Zlata Kikteva
|
Kamila Gorska
|
Wassiliki Siskou
|
Annette Hautli-Janisz
|
Chris Reed
Proceedings of the 3rd Workshop on Computational Approaches to Discourse
Building on the recent results of a study into the roles that are played by questions in argumentative dialogue (Hautli-Janisz et al.,2022a), we expand the analysis to investigate a newly released corpus that constitutes the largest extant corpus of closely annotated debate. Questions play a critical role in driving dialogical discourse forward; in combative or critical discursive environments, they not only provide a range of discourse management techniques, they also scaffold the semantic structure of the positions that interlocutors develop. The boundaries, however, between providing substantive answers to questions, merely responding to questions, and evading questions entirely, are fuzzy and the way in which answers, responses and evasions affect the subsequent development of dialogue and argumentation structure are poorly understood. In this paper, we explore how questions have ramifications on the large-scale structure of a debate using as our substrate the BBC television programme Question Time, the foremost topical debate show in the UK. Analysis of the data demonstrates not only that questioning plays a particularly prominent role in such debate, but also that its repercussions can reverberate through a discourse.
pdf
bib
abs
QT30: A Corpus of Argument and Conflict in Broadcast Debate
Annette Hautli-Janisz
|
Zlata Kikteva
|
Wassiliki Siskou
|
Kamila Gorska
|
Ray Becker
|
Chris Reed
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Broadcast political debate is a core pillar of democracy: it is the public’s easiest access to opinions that shape policies and enables the general public to make informed choices. With QT30, we present the largest corpus of analysed dialogical argumentation ever created (19,842 utterances, 280,000 words) and also the largest corpus of analysed broadcast political debate to date, using 30 episodes of BBC’s ‘Question Time’ from 2020 and 2021. Question Time is the prime institution in UK broadcast political debate and features questions from the public on current political issues, which are responded to by a weekly panel of five figures of UK politics and society. QT30 is highly argumentative and combines language of well-versed political rhetoric with direct, often combative, justification-seeking of the general public. QT30 is annotated with Inference Anchoring Theory, a framework well-known in argument mining, which encodes the way arguments and conflicts are created and reacted to in dialogical settings. The resource is freely available at
http://corpora.aifdb.org/qt30.
pdf
bib
abs
Automatized Detection and Annotation for Calls to Action in Latin-American Social Media Postings
Wassiliki Siskou
|
Clara Giralt Mirón
|
Sarah Molina-Raith
|
Miriam Butt
Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Voter mobilization via social media has shown to be an effective tool. While previous research has primarily looked at how calls-to-action (CTAs) were used in Twitter messages from non-profit organizations and protest mobilization, we are interested in identifying the linguistic cues used in CTAs found on Facebook and Twitter for an automatic identification of CTAs. The work is part of an on-going collaboration with researchers from political science, who are investigating CTAs in the period leading up to recent elections in three different Latin American countries. We developed a new NLP pipeline for Spanish to facilitate their work. Our pipeline annotates social media posts with a range of linguistic information and then conducts targeted searches for linguistic cues that allow for an automatic annotation and identification of relevant CTAs. By using carefully crafted and linguistically informed heuristics, our system so far achieves an F1-score of 0.72.