2024
pdf
bib
Proceedings of the First edition of the Workshop on the Scaling Behavior of Large Language Models (SCALE-LLM 2024)
Antonio Valerio Miceli-Barone
|
Fazl Barez
|
Shay Cohen
|
Elena Voita
|
Ulrich Germann
|
Michal Lukasik
Proceedings of the First edition of the Workshop on the Scaling Behavior of Large Language Models (SCALE-LLM 2024)
2023
pdf
bib
abs
Large Language Models with Controllable Working Memory
Daliang Li
|
Ankit Singh Rawat
|
Manzil Zaheer
|
Xin Wang
|
Michal Lukasik
|
Andreas Veit
|
Felix Yu
|
Sanjiv Kumar
Findings of the Association for Computational Linguistics: ACL 2023
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP), partly owing to the massive amounts of world knowledge they memorize during pretraining. While many downstream applications provide the model with an informational context to aid its underlying task, how the model’s world knowledge interacts with the factual information presented in the context remains under explored. As a desirable behavior, an LLM should give precedence to the context whenever it contains task-relevant information that conflicts with the model’s memorized knowledge. This enables model predictions to be grounded in the context, which then facilitates updating specific model predictions without frequently retraining the model. By contrast, when the context is irrelevant to the task, the model should ignore it and fall back on its internal knowledge. In this paper, we undertake a first joint study of the aforementioned two properties, namely controllability and robustness, in the context of LLMs. We demonstrate that state-of-the-art T5 and PaLM models (both pretrained and finetuned) could exhibit low controllability and robustness that does not improve with increasing the model size. As a solution, we propose a simple yet effective method – knowledge aware finetuning (KAFT) – to strengthen both controllability and robustness by injecting counterfactual and irrelevant contexts to standard supervised datasets. Our comprehensive evaluation showcases the utility of KAFT across model architectures and sizes.
2020
pdf
bib
abs
Text Segmentation by Cross Segment Attention
Michal Lukasik
|
Boris Dadachev
|
Kishore Papineni
|
Gonçalo Simões
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Document and discourse segmentation are two fundamental NLP tasks pertaining to breaking up text into constituents, which are commonly used to help downstream tasks such as information retrieval or text summarization. In this work, we propose three transformer-based architectures and provide comprehensive comparisons with previously proposed approaches on three standard datasets. We establish a new state-of-the-art, reducing in particular the error rates by a large margin in all cases. We further analyze model sizes and find that we can build models with many fewer parameters while keeping good performance, thus facilitating real-world applications.
pdf
bib
abs
Semantic Label Smoothing for Sequence to Sequence Problems
Michal Lukasik
|
Himanshu Jain
|
Aditya Menon
|
Seungyeon Kim
|
Srinadh Bhojanapalli
|
Felix Yu
|
Sanjiv Kumar
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising. However, extending such methods directly to seq2seq settings, such as Machine Translation, is challenging: the large target output space of such problems makes it intractable to apply label smoothing over all possible outputs. Most existing approaches for seq2seq settings either do token level smoothing, or smooth over sequences generated by randomly substituting tokens in the target sequence. Unlike these works, in this paper, we propose a technique that smooths over well formed relevant sequences that not only have sufficient n-gram overlap with the target sequence, but are also semantically similar. Our method shows a consistent and significant improvement over the state-of-the-art techniques on different datasets.
2018
pdf
bib
abs
Content Explorer: Recommending Novel Entities for a Document Writer
Michal Lukasik
|
Richard Zens
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Background research is an essential part of document writing. Search engines are great for retrieving information once we know what to look for. However, the bigger challenge is often identifying topics for further research. Automated tools could help significantly in this discovery process and increase the productivity of the writer. In this paper, we formulate the problem of recommending topics to a writer. We consider this as a supervised learning problem and run a user study to validate this approach. We propose an evaluation metric and perform an empirical comparison of state-of-the-art models for extreme multi-label classification on a large data set. We demonstrate how a simple modification of the cross-entropy loss function leads to improved results of the deep learning models.
2016
pdf
bib
abs
Stance Classification in Rumours as a Sequential Task Exploiting the Tree Structure of Social Media Conversations
Arkaitz Zubiaga
|
Elena Kochkina
|
Maria Liakata
|
Rob Procter
|
Michal Lukasik
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Rumour stance classification, the task that determines if each tweet in a collection discussing a rumour is supporting, denying, questioning or simply commenting on the rumour, has been attracting substantial interest. Here we introduce a novel approach that makes use of the sequence of transitions observed in tree-structured conversation threads in Twitter. The conversation threads are formed by harvesting users’ replies to one another, which results in a nested tree-like structure. Previous work addressing the stance classification task has treated each tweet as a separate unit. Here we analyse tweets by virtue of their position in a sequence and test two sequential classifiers, Linear-Chain CRF and Tree CRF, each of which makes different assumptions about the conversational structure. We experiment with eight Twitter datasets, collected during breaking news, and show that exploiting the sequential structure of Twitter conversations achieves significant improvements over the non-sequential methods. Our work is the first to model Twitter conversations as a tree structure in this manner, introducing a novel way of tackling NLP tasks on Twitter conversations.
pdf
bib
Hawkes Processes for Continuous Time Sequence Classification: an Application to Rumour Stance Classification in Twitter
Michal Lukasik
|
P. K. Srijith
|
Duy Vu
|
Kalina Bontcheva
|
Arkaitz Zubiaga
|
Trevor Cohn
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
pdf
bib
Metrics for Evaluation of Word-level Machine Translation Quality Estimation
Varvara Logacheva
|
Michal Lukasik
|
Lucia Specia
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
2015
pdf
bib
Modeling Tweet Arrival Times using Log-Gaussian Cox Processes
Michal Lukasik
|
P. K. Srijith
|
Trevor Cohn
|
Kalina Bontcheva
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
pdf
bib
Classifying Tweet Level Judgements of Rumours in Social Media
Michal Lukasik
|
Trevor Cohn
|
Kalina Bontcheva
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
pdf
bib
Point Process Modelling of Rumour Dynamics in Social Media
Michal Lukasik
|
Trevor Cohn
|
Kalina Bontcheva
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)