We present Claim-Dissector: a novel latent variable model for fact-checking and analysis, which given a claim and a set of retrieved evidence jointly learns to identify: (i) the relevant evidences to the given claim (ii) the veracity of the claim. We propose to disentangle the per-evidence relevance probability and its contribution to the final veracity probability in an interpretable way — the final veracity probability is proportional to a linear ensemble of per-evidence relevance probabilities. In this way, the individual contributions of evidences towards the final predicted probability can be identified. In per-evidence relevance probability, our model can further distinguish whether each relevant evidence is supporting (S) or refuting (R) the claim. This allows to quantify how much the S/R probability contributes to final verdict or to detect disagreeing evidence. Despite its interpretable nature, our system achieves results competetive with state-of-the-art on the FEVER dataset, as compared to typical two-stage system pipelines, while using significantly fewer parameters. Furthermore, our analysis shows that our model can learn fine-grained relevance cues while using coarse-grained supervision and we demonstrate it in 2 ways. (i) We show that our model can achieve competitive sentence recall while using only paragraph-level relevance supervision. (ii) Traversing towards the finest granularity of relevance, we show that our model is capable of identifying relevance at the token level. To do this, we present a new benchmark TLR-FEVER focusing on token-level interpretability — humans annotate tokens in relevant evidences they considered essential when making their judgment. Then we measure how similar are these annotations to the tokens our model is focusing on.
This paper presents our proposed method for SemEval-2023 Task 12, which focuses on sentiment analysis for low-resource African languages. Our method utilizes a language-centric domain adaptation approach which is based on adversarial training, where a small version of Afro-XLM-Roberta serves as a generator model and a feed-forward network as a discriminator. We participated in all three subtasks: monolingual (12 tracks), multilingual (1 track), and zero-shot (2 tracks). Our results show an improvement in weighted F1 for 13 out of 15 tracks with a maximum increase of 4.3 points for Moroccan Arabic compared to the baseline. We observed that using language family-based labels along with sequence-level input representations for the discriminator model improves the quality of the cross-lingual sentiment analysis for the languages unseen during the training. Additionally, our experimental results suggest that training the system on languages that are close in a language families tree enhances the quality of sentiment analysis for low-resource languages. Lastly, the computational complexity of the prediction step was kept at the same level which makes the approach to be interesting from a practical perspective. The code of the approach can be found in our repository.
In this paper, we describe our participation in the subtask 1 of CASE-2022, Event Causality Identification with Casual News Corpus. We address the Causal Relation Identification (CRI) task by exploiting a set of simple yet complementary techniques for fine-tuning language models (LMs) on a few annotated examples (i.e., a few-shot configuration).We follow a prompt-based prediction approach for fine-tuning LMs in which the CRI task is treated as a masked language modeling problem (MLM). This approach allows LMs natively pre-trained on MLM tasks to directly generate textual responses to CRI-specific prompts. We compare the performance of this method against ensemble techniques trained on the entire dataset. Our best-performing submission was fine-tuned with only 256 instances per class, 15.7% of the all available data, and yet obtained the second-best precision (0.82), third-best accuracy (0.82), and an F1-score (0.85) very close to what was reported by the winner team (0.86).
In this paper, we describe our shared task submissions for Subtask 2 in CASE-2022, Event Causality Identification with Casual News Corpus. The challenge focused on the automatic detection of all cause-effect-signal spans present in the sentence from news-media. We detect cause-effect-signal spans in a sentence using T5 — a pre-trained autoregressive language model. We iteratively identify all cause-effect-signal span triplets, always conditioning the prediction of the next triplet on the previously predicted ones. To predict the triplet itself, we consider different causal relationships such as cause→effect→signal. Each triplet component is generated via a language model conditioned on the sentence, the previous parts of the current triplet, and previously predicted triplets. Despite training on an extremely small dataset of 160 samples, our approach achieved competitive performance, being placed second in the competition. Furthermore, we show that assuming either cause→effect or effect→cause order achieves similar results.
This work presents a novel four-stage open-domain QA pipeline R2-D2 (Rank twice, reaD twice). The pipeline is composed of a retriever, passage reranker, extractive reader, generative reader and a mechanism that aggregates the final prediction from all system’s components. We demonstrate its strength across three open-domain QA datasets: NaturalQuestions, TriviaQA and EfficientQA, surpassing state-of-the-art on the first two. Our analysis demonstrates that: (i) combining extractive and generative reader yields absolute improvements up to 5 exact match and it is at least twice as effective as the posterior averaging ensemble of the same models with different parameters, (ii) the extractive reader with fewer parameters can match the performance of the generative reader on extractive QA datasets.
This work demonstrates that using the objective with independence assumption for modelling the span probability P (a_s , a_e ) = P (a_s )P (a_e) of span starting at position a_s and ending at position a_e has adverse effects. Therefore we propose multiple approaches to modelling joint probability P (a_s , a_e) directly. Among those, we propose a compound objective, composed from the joint probability while still keeping the objective with independence assumption as an auxiliary objective. We find that the compound objective is consistently superior or equal to other assumptions in exact match. Additionally, we identified common errors caused by the assumption of independence and manually checked the counterpart predictions, demonstrating the impact of the compound objective on the real examples. Our findings are supported via experiments with three extractive QA models (BIDAF, BERT, ALBERT) over six datasets and our code, individual results and manual analysis are available online.
The OCCAM project (Optical Character recognition, ClassificAtion & Machine Translation) aims at integrating the CEF (Connecting Europe Facility) Automated Translation service with image classification, Translation Memories (TMs), Optical Character Recognition (OCR), and Machine Translation (MT). It will support the automated translation of scanned business documents (a document format that, currently, cannot be processed by the CEF eTranslation service) and will also lead to a tool useful for the Digital Humanities domain.
We participated in all three subtasks. In subtasks A and B, our submissions are based on pretrained language representation models (namely ALBERT) and data augmentation. We experimented with solving the task for another language, Czech, by means of multilingual models and machine translated dataset, or translated model inputs. We show that with a strong machine translation system, our system can be used in another language with a small accuracy loss. In subtask C, our submission, which is based on pretrained sequence-to-sequence model (BART), ranked 1st in BLEU score ranking, however, we show that the correlation between BLEU and human evaluation, in which our submission ended up 4th, is low. We analyse the metrics used in the evaluation and we propose an additional score based on model from subtask B, which correlates well with our manual ranking, as well as reranking method based on the same principle. We performed an error and dataset analysis for all subtasks and we present our findings.
This paper describes BUT-FIT’s submission at SemEval-2020 Task 5: Modelling Causal Reasoning in Language: Detecting Counterfactuals. The challenge focused on detecting whether a given statement contains a counterfactual (Subtask 1) and extracting both antecedent and consequent parts of the counterfactual from the text (Subtask 2). We experimented with various state-of-the-art language representation models (LRMs). We found RoBERTa LRM to perform the best in both subtasks. We achieved the first place in both exact match and F1 for Subtask 2 and ranked second for Subtask 1.
This paper describes our system that was designed for Humor evaluation within the SemEval-2020 Task 7. The system is based on convolutional neural network architecture. We investigate the system on the official dataset, and we provide more insight to model itself to see how the learned inner features look.
This paper describes our system submitted to SemEval 2019 Task 7: RumourEval 2019: Determining Rumour Veracity and Support for Rumours, Subtask A (Gorrell et al., 2019). The challenge focused on classifying whether posts from Twitter and Reddit support, deny, query, or comment a hidden rumour, truthfulness of which is the topic of an underlying discussion thread. We formulate the problem as a stance classification, determining the rumour stance of a post with respect to the previous thread post and the source thread post. The recent BERT architecture was employed to build an end-to-end system which has reached the F1 score of 61.67 % on the provided test data. Without any hand-crafted feature, the system finished at the 2nd place in the competition, only 0.2 % behind the winner.
This paper introduces the Web TextFull linkage to Linked Open Data (WTF-LOD) dataset intended for large-scale evaluation of named entity recognition (NER) systems. First, we present the process of collecting data from the largest publically-available textual corpora, including Wikipedia dumps, monthly runs of the CommonCrawl, and ClueWeb09/12. We discuss similarities and differences of related initiatives such as WikiLinks and WikiReverse. Our work primarily focuses on links from “textfull” documents (links surrounded by a text that provides a useful context for entity linking), de-duplication of the data and advanced cleaning procedures. Presented statistics demonstrate that the collected data forms one of the largest available resource of its kind. They also prove suitability of the result for complex NER evaluation campaigns, including an analysis of the most ambiguous name mentions appearing in the data.
This paper deals with information retrieval on semantically enriched web-scale document collections. It particularly focuses on web-crawled content in which mentions of entities appearing in Freebase, DBpedia and other Linked Open Data resources have been identified. A special attention is paid to indexing structures and advanced query mechanisms that have been employed into a new semantic retrieval system. Scalability features are discussed together with performance statistics and results of experimental evaluation of presented approaches. Examples given to demonstrate key features of the developed solution correspond to the cultural heritage domain in which the results of our work have been primarily applied.
Sense-tagged corpora are used to evaluate word sense disambiguation (WSD) systems. Manual creation of such resources is often prohibitively expensive. That is why the concept of pseudowords - conflations of two or more unambiguous words - has been integrated into WSD evaluation experiments. This paper presents a new method of pseudoword generation which takes into account semantic-relatedness of the candidate words forming parts of the pseudowords to the particular senses of the word to be disambiguated. We compare the new approach to its alternatives and show that the results on pseudowords, that are more similar to real ambiguous words, better correspond to the actual results. Two techniques assessing the similarity are studied - the first one takes advantage of manually created dictionaries (wordnets), the second one builds on the automatically computed statistical data obtained from large corpora. Pros and cons of the two techniques are discussed and the results on a standard task are demonstrated.
This paper introduces a new architecture that aims at combining molecular biology data with information automatically extracted from relevant scientific literature (using text mining techniques on PubMed abstracts and fulltext papers) to help biomedical experts to interpret experimental results in hand. The infrastructural level bears on semantic-web technologies and standards that facilitate the actual fusion of the multi-source knowledge.
The present paper describes psycholinguistic experiments aimed at exploring the way people behave while accessing electronic dictionaries. In our work we focused on the access by meaning that, in comparison with the access by form, is currently less studied and very seldom implemented in modern dictionary interfaces. Thus, the goal of our experiments was to explore dictionary users requirements and to study what services an intelligent dictionary interface should be able to supply to help solving access by meaning problems. We tested several access-supporting enhancements of electronic dictionaries based on various language resources (corpora, wordnets, word association norms and explanatory dictionaries). Experiments were carried out with native speakers of three European languages English, Czech and Russian. Results for monolingual and bilingual cases are presented.
Current Semantic Web implementation efforts pose a number of challenges. One of the big ones among them is development and evolution of specific resources --- the ontologies --- as a base for representation of the meaning of the web. This paper deals with the automatic acquisition of semantic relations from the text of scientific publications (journal articles, conference papers, project descriptions, etc.). We also describe the process of building of corresponding ontological resources and their application for semi--automatic generation of scientific portals. Extracted relations and ontologies are crucial for the structuring of the information at the portal pages, automatic classification of the presented documents as well as for personalisation at the presentation level. Besides a general description of the portal generating system, we give also a detailed overview of extraction of semantic relations in the form of a domain--specific ontology. The overview consists of presentation of an architecture of the ontology extraction system, description of methods used for mining of semantic relations and analysis of selected results and examples.
This paper examines the use of parallel and comparable corpora for automatic acquisition of semantics-extraction patterns. It presents a new method of the pattern extraction which takes advantage of parallel texts to "port" text mining solutions from a source language to a target language. It is shown thatthe technique can help in situations when the extraction procedure is to beapplied in a language (languages) with a limited set of available resources,e.g. domain-specific thesauri. The primary motivation of our work lies in a particular multilingual e-learning system. For testing purposes, other applications of the given approach were implemented. They include pattern extraction from general texts (tested on wordnet relations), acquisition of domain-specific patterns from large parallel corpus of legal EU documents, and mining of subjectivity expressions for multilingual opinion extraction system.
The paper introduces PACE — a parser comparison and evaluation system for the syntactic processing of natural languages. The analysis is based on context free grammar with contextual extensions (constraints). The system is able to manage very large and extremely ambiguous CF grammars. It is independent of the parsing algorithm used. The tool can solve the contextual constraints on the resulting CF structure, select the best parsing trees according to their probabilities, or combine them. We discuss the advantages and disadvantages of our modular design as well as how efficiently it processes the standard evaluation grammars.