Hiroaki Yamagiwa


2025

pdf bib
Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings
Hiroaki Yamagiwa | Momose Oyama | Hidetoshi Shimodaira
Proceedings of the 31st International Conference on Computational Linguistics

Cosine similarity is widely used to measure the similarity between two embeddings, while interpretations based on angle and correlation coefficient are common. In this study, we focus on the interpretable axes of embeddings transformed by Independent Component Analysis (ICA), and propose a novel interpretation of cosine similarity as the sum of semantic similarities over axes. The normalized ICA-transformed embeddings exhibit sparsity, enhancing the interpretability of each axis, and the semantic similarity defined by the product of the components represents the shared meaning between the two embeddings along each axis. The effectiveness of this approach is demonstrated through intuitive numerical examples and thorough numerical experiments. By deriving the probability distributions that govern each component and the product of components, we propose a method for selecting statistically significant axes.

pdf bib
Norm of Mean Contextualized Embeddings Determines their Variance
Hiroaki Yamagiwa | Hidetoshi Shimodaira
Proceedings of the 31st International Conference on Computational Linguistics

Contextualized embeddings vary by context, even for the same token, and form a distribution in the embedding space. To analyze this distribution, we focus on the norm of the mean embedding and the variance of the embeddings. In this study, we first demonstrate that these values follow the well-known formula for variance in statistics and provide an efficient sequential computation method. Then, by observing embeddings from intermediate layers of several Transformer models, we found a strong trade-off relationship between the norm and the variance: as the mean embedding becomes closer to the origin, the variance increases. Furthermore, when the sets of token embeddings are treated as clusters, we show that the variance of the entire embedding set can theoretically be decomposed into the within-cluster variance and the between-cluster variance. We found experimentally that as the layers of Transformer models deepen, the embeddings move farther from the origin, the between-cluster variance relatively decreases, and the within-cluster variance relatively increases. These results are consistent with existing studies on the anisotropy of the embedding spaces across layers.

2024

pdf bib
Shimo Lab at “Discharge Me!”: Discharge Summarization by Prompt-Driven Concatenation of Electronic Health Record Sections
Yunzhen He | Hiroaki Yamagiwa | Hidetoshi Shimodaira
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

In this paper, we present our approach to the shared task “Discharge Me!” at the BioNLP Workshop 2024. The primary goal of this task is to reduce the time and effort clinicians spend on writing detailed notes in the electronic health record (EHR). Participants develop a pipeline to generate the “Brief Hospital Course” and “Discharge Instructions” sections from the EHR. Our approach involves a first step of extracting the relevant sections from the EHR. We then add explanatory prompts to these sections and concatenate them with separate tokens to create the input text. To train a text generation model, we perform LoRA fine-tuning on the ClinicalT5-large model. On the final test data, our approach achieved a ROUGE-1 of 0.394, which is comparable to the top solutions.

pdf bib
Understanding Higher-Order Correlations Among Semantic Components in Embeddings
Momose Oyama | Hiroaki Yamagiwa | Hidetoshi Shimodaira
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Independent Component Analysis (ICA) offers interpretable semantic components of embeddings.While ICA theory assumes that embeddings can be linearly decomposed into independent components, real-world data often do not satisfy this assumption. Consequently, non-independencies remain between the estimated components, which ICA cannot eliminate. We quantified these non-independencies using higher-order correlations and demonstrated that when the higher-order correlation between two components is large, it indicates a strong semantic association between them, along with many words sharing common meanings with both components. The entire structure of non-independencies was visualized using a maximum spanning tree of semantic components. These findings provide deeper insights into embeddings through ICA.

pdf bib
Axis Tour: Word Tour Determines the Order of Axes in ICA-transformed Embeddings
Hiroaki Yamagiwa | Yusuke Takase | Hidetoshi Shimodaira
Findings of the Association for Computational Linguistics: EMNLP 2024

Word embedding is one of the most important components in natural language processing, but interpreting high-dimensional embeddings remains a challenging problem. To address this problem, Independent Component Analysis (ICA) is identified as an effective solution. ICA-transformed word embeddings reveal interpretable semantic axes; however, the order of these axes are arbitrary. In this study, we focus on this property and propose a novel method, Axis Tour, which optimizes the order of the axes. Inspired by Word Tour, a one-dimensional word embedding method, we aim to improve the clarity of the word embedding space by maximizing the semantic continuity of the axes. Furthermore, we show through experiments on downstream tasks that Axis Tour yields better or comparable low-dimensional embeddings compared to both PCA and ICA.

2023

pdf bib
Discovering Universal Geometry in Embeddings with ICA
Hiroaki Yamagiwa | Momose Oyama | Hidetoshi Shimodaira
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

This study utilizes Independent Component Analysis (ICA) to unveil a consistent semantic structure within embeddings of words or images. Our approach extracts independent semantic components from the embeddings of a pre-trained model by leveraging anisotropic information that remains after the whitening process in Principal Component Analysis (PCA). We demonstrate that each embedding can be expressed as a composition of a few intrinsic interpretable axes and that these semantic axes remain consistent across different languages, algorithms, and modalities. The discovery of a universal semantic structure in the geometric patterns of embeddings enhances our understanding of the representations in embeddings.

pdf bib
Improving word mover’s distance by leveraging self-attention matrix
Hiroaki Yamagiwa | Sho Yokoi | Hidetoshi Shimodaira
Findings of the Association for Computational Linguistics: EMNLP 2023

Measuring the semantic similarity between two sentences is still an important task. The word mover’s distance (WMD) computes the similarity via the optimal alignment between the sets of word embeddings. However, WMD does not utilize word order, making it challenging to distinguish sentences with significant overlaps of similar words, even if they are semantically very different. Here, we attempt to improve WMD by incorporating the sentence structure represented by BERT’s self-attention matrix (SAM). The proposed method is based on the Fused Gromov-Wasserstein distance, which simultaneously considers the similarity of the word embedding and the SAM for calculating the optimal transport between two sentences. Experiments demonstrate the proposed method enhances WMD and its variants in paraphrase identification with near-equivalent performance in semantic textual similarity.