Max Ryabinin


2024

pdf bib
Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements
Anton Voronov | Lena Wolf | Max Ryabinin
Findings of the Association for Computational Linguistics ACL 2024

Large language models demonstrate a remarkable capability for learning to solve new tasks from a few examples.The prompt template, or the way the input examples are formatted to obtain the prompt, is an important yet often overlooked aspect of in-context learning.In this work, we conduct a comprehensive study of the template format’s influence on the in-context learning performance.We evaluate the impact of the prompt template across 21 models (from 770M to 70B parameters) and 4 standard classification datasets. We show that a poor choice of the template can reduce the performance of the strongest models and inference methods to a random guess level.More importantly, the best templates do not transfer between different setups and even between models of the same family.Our findings show that the currently prevalent approach to evaluation, which ignores template selection, may give misleading results due to different templates in different works.As a first step towards mitigating this issue, we propose Template Ensembles that aggregate model predictions across several templates.This simple test-time augmentation boosts average performance while being robust to the choice of random set of templates.

2022

pdf bib
RuCoLA: Russian Corpus of Linguistic Acceptability
Vladislav Mikhailov | Tatiana Shamardina | Max Ryabinin | Alena Pestova | Ivan Smurov | Ekaterina Artemova
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Linguistic acceptability (LA) attracts the attention of the research community due to its many uses, such as testing the grammatical knowledge of language models and filtering implausible texts with acceptability classifiers.However, the application scope of LA in languages other than English is limited due to the lack of high-quality resources.To this end, we introduce the Russian Corpus of Linguistic Acceptability (RuCoLA), built from the ground up under the well-established binary LA approach. RuCoLA consists of 9.8k in-domain sentences from linguistic publications and 3.6k out-of-domain sentences produced by generative models. The out-of-domain set is created to facilitate the practical use of acceptability for improving language generation.Our paper describes the data collection protocol and presents a fine-grained analysis of acceptability classification experiments with a range of baseline approaches.In particular, we demonstrate that the most widely used language models still fall behind humans by a large margin, especially when detecting morphological and semantic errors. We release RuCoLA, the code of experiments, and a public leaderboard to assess the linguistic competence of language models for Russian.

2021

pdf bib
It’s All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning
Alexey Tikhonov | Max Ryabinin
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
Embedding Words in Non-Vector Space with Unsupervised Graph Learning
Max Ryabinin | Sergei Popov | Liudmila Prokhorenkova | Elena Voita
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

It has become a de-facto standard to represent words as elements of a vector space (word2vec, GloVe). While this approach is convenient, it is unnatural for language: words form a graph with a latent hierarchical structure, and this structure has to be revealed and encoded by word embeddings. We introduce GraphGlove: unsupervised graph word representations which are learned end-to-end. In our setting, each word is a node in a weighted graph and the distance between words is the shortest path distance between the corresponding nodes. We adopt a recent method learning a representation of data in the form of a differentiable weighted graph and use it to modify the GloVe training algorithm. We show that our graph-based representations substantially outperform vector-based methods on word similarity and analogy tasks. Our analysis reveals that the structure of the learned graphs is hierarchical and similar to that of WordNet, the geometry is highly non-trivial and contains subgraphs with different local topology.