Lun-Wei Ku


2024

pdf bib
Enhancing Perception: Refining Explanations of News Claims with LLM Conversations
Yi-Li Hsu | Jui-Ning Chen | Yang Fan Chiang | Shang-Chien Liu | Aiping Xiong | Lun-Wei Ku
Findings of the Association for Computational Linguistics: NAACL 2024

We introduce Enhancing Perception, a framework for Large Language Models (LLMs) designed to streamline the time-intensive task typically undertaken by professional fact-checkers of crafting explanations for fake news. This study investigates the effectiveness of enhancing LLM explanations through conversational refinement. We compare various questioner agents, including state-of-the-art LLMs like GPT-4, Claude 2, PaLM 2, and 193 American participants acting as human questioners. Based on the histories of these refinement conversations, we further generate comprehensive summary explanations. We evaluated the effectiveness of these initial, refined, and summary explanations across 40 news claims by involving 2,797 American participants, measuring their self-reported belief change regarding both real and fake claims after receiving the explanations. Our findings reveal that, in the context of fake news, explanations that have undergone conversational refinement—whether by GPT-4 or human questioners, who ask more diverse and detail-oriented questions—were significantly more effective than both the initial unrefined explanations and the summary explanations. Moreover, these refined explanations achieved a level of effectiveness comparable to that of expert-written explanations. The results highlight the potential of automatic explanation refinement by LLMs in debunking fake news claims.

pdf bib
Findings of the Association for Computational Linguistics ACL 2024
Lun-Wei Ku | Andre Martins | Vivek Srikumar
Findings of the Association for Computational Linguistics ACL 2024

pdf bib
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Lun-Wei Ku | Andre Martins | Vivek Srikumar
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Lun-Wei Ku | Andre Martins | Vivek Srikumar
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2023

pdf bib
Label-Aware Hyperbolic Embeddings for Fine-grained Emotion Classification
Chih Yao Chen | Tun Min Hung | Yi-Li Hsu | Lun-Wei Ku
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Fine-grained emotion classification (FEC) is a challenging task. Specifically, FEC needs to handle subtle nuance between labels, which can be complex and confusing. Most existing models only address text classification problem in the euclidean space, which we believe may not be the optimal solution as labels of close semantic (e.g., afraid and terrified) may not be differentiated in such space, which harms the performance. In this paper, we propose HypEmo, a novel framework that can integrate hyperbolic embeddings to improve the FEC task. First, we learn label embeddings in the hyperbolic space to better capture their hierarchical structure, and then our model projects contextualized representations to the hyperbolic space to compute the distance between samples and labels. Experimental results show that incorporating such distance to weight cross entropy loss substantially improve the performance on two benchmark datasets, with around 3% improvement compared to previous state-of-the-art, and could even improve up to 8.6% when the labels are hard to distinguish. Code is available at https://github.com/dinobby/HypEmo.

pdf bib
HonestBait: Forward References for Attractive but Faithful Headline Generation
Chih Yao Chen | Dennis Wu | Lun-Wei Ku
Findings of the Association for Computational Linguistics: ACL 2023

Current methods for generating attractive headlines often learn directly from data, which bases attractiveness on the number of user clicks and views. Although clicks or views do reflect user interest, they can fail to reveal how much interest is raised by the writing style and how much is due to the event or topic itself. Also, such approaches can lead to harmful inventions by over-exaggerating the content, aggravating the spread of false information. In this work, we propose HonestBait, a novel framework for solving these issues from another aspect: generating headlines using forward references (FRs), a writing technique often used for clickbait. A self-verification process is included during training to avoid spurious inventions. We begin with a preliminary user study to understand how FRs affect user interest, after which we present PANCO, an innovative dataset containing pairs of fake news with verified news for attractive but faithful news headline generation. Auto matic metrics and human evaluations show that our framework yields more attractive results (+11.25% compared to human-written verified news headlines) while maintaining high veracity, which helps promote real information to fight against fake news.

pdf bib
Is Explanation the Cure? Misinformation Mitigation in the Short Term and Long Term
Yi-Li Hsu | Shih-Chieh Dai | Aiping Xiong | Lun-Wei Ku
Findings of the Association for Computational Linguistics: EMNLP 2023

With advancements in natural language processing (NLP) models, automatic explanation generation has been proposed to mitigate misinformation on social media platforms in addition to adding warning labels to identified fake news. While many researchers have focused on generating good explanations, how these explanations can really help humans combat fake news is under-explored. In this study, we compare the effectiveness of a warning label and the state-of- the-art counterfactual explanations generated by GPT-4 in debunking misinformation. In a two-wave, online human-subject study, participants (N = 215) were randomly assigned to a control group in which false contents are shown without any intervention, a warning tag group in which the false claims were labeled, or an explanation group in which the false contents were accompanied by GPT-4 generated explanations. Our results show that both interventions significantly decrease participants’ self-reported belief in fake claims in an equivalent manner for the short-term and long-term. We discuss the implications of our findings and directions for future NLP-based misinformation debunking strategies.

pdf bib
LLM-in-the-loop: Leveraging Large Language Model for Thematic Analysis
Shih-Chieh Dai | Aiping Xiong | Lun-Wei Ku
Findings of the Association for Computational Linguistics: EMNLP 2023

Thematic analysis (TA) has been widely used for analyzing qualitative data in many disciplines and fields. To ensure reliable analysis, the same piece of data is typically assigned to at least two human coders. Moreover, to produce meaningful and useful analysis, human coders develop and deepen their data interpretation and coding over multiple iterations, making TA labor-intensive and time-consuming. Recently the emerging field of large language models (LLMs) research has shown that LLMs have the potential replicate human-like behavior in various tasks: in particular, LLMs outperform crowd workers on text-annotation tasks, suggesting an opportunity to leverage LLMs on TA. We propose a human–LLM collaboration framework (i.e., LLM-in-the-loop) to conduct TA with in-context learning (ICL). This framework provides the prompt to frame discussions with a LLM (e.g., GPT-3.5) to generate the final codebook for TA. We demonstrate the utility of this framework using survey datasets on the aspects of the music listening experience and the usage of a password manager. Results of the two case studies show that the proposed framework yields similar coding quality to that of human coders but reduces TA’s labor and time demands.

pdf bib
Location-Aware Visual Question Generation with Lightweight Models
Nicholas Suwono | Justin Chen | Tun Hung | Ting-Hao Huang | I-Bin Liao | Yung-Hui Li | Lun-Wei Ku | Shao-Hua Sun
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

This work introduces a novel task, location-aware visual question generation (LocaVQG), which aims to generate engaging questions from data relevant to a particular geographical location. Specifically, we represent such location-aware information with surrounding images and a GPS coordinate. To tackle this task, we present a dataset generation pipeline that leverages GPT-4 to produce diverse and sophisticated questions. Then, we aim to learn a lightweight model that can address the LocaVQG task and fit on an edge device, such as a mobile phone. To this end, we propose a method which can reliably generate engaging questions from location-aware information. Our proposed method outperforms baselines regarding human evaluation (e.g., engagement, grounding, coherence) and automatic evaluation metrics (e.g., BERTScore, ROUGE-2). Moreover, we conduct extensive ablation studies to justify our proposed techniques for both generating the dataset and solving the task.

pdf bib
Proceedings of the 11th International Workshop on Natural Language Processing for Social Media
Lun-Wei Ku | Cheng-Te Li
Proceedings of the 11th International Workshop on Natural Language Processing for Social Media

pdf bib
Proceedings of the Second Workshop on Natural Language Interfaces
Kehai Chen | Lun-Wei Ku
Proceedings of the Second Workshop on Natural Language Interfaces

2022

pdf bib
Proceedings of the Tenth International Workshop on Natural Language Processing for Social Media
Lun-Wei Ku | Cheng-Te Li | Yu-Che Tsai | Wei-Yao Wang
Proceedings of the Tenth International Workshop on Natural Language Processing for Social Media

pdf bib
Learning to Rank Visual Stories From Human Ranking Data
Chi-Yang Hsu | Yun-Wei Chu | Vincent Chen | Kuan-Chieh Lo | Chacha Chen | Ting-Hao Huang | Lun-Wei Ku
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Visual storytelling (VIST) is a typical vision and language task that has seen extensive development in the natural language generation research domain. However, it remains unclear whether conventional automatic evaluation metrics for text generation are applicable on VIST. In this paper, we present the VHED (VIST Human Evaluation Data) dataset, which first re-purposes human evaluation results for automatic evaluation; hence we develop Vrank (VIST Ranker), a novel reference-free VIST metric for story evaluation. We first show that the results from commonly adopted automatic metrics for text generation have little correlation with those obtained from human evaluation, which motivates us to directly utilize human evaluation results to learn the automatic evaluation model. In the experiments, we evaluate the generated texts to predict story ranks using our model as well as other reference-based and reference-free metrics. Results show that Vrank prediction is significantly more aligned to human evaluation than other metrics with almost 30% higher accuracy when ranking story pairs. Moreover, we demonstrate that only Vrank shows human-like behavior in its strong ability to find better stories when the quality gap between two stories is high. Finally, we show the superiority of Vrank by its generalizability to pure textual stories, and conclude that this reuse of human evaluation results puts Vrank in a strong position for continued future advances.

pdf bib
Multi-VQG: Generating Engaging Questions for Multiple Images
Min-Hsuan Yeh | Vincent Chen | Ting-Hao Huang | Lun-Wei Ku
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Generating engaging content has drawn much recent attention in the NLP community. Asking questions is a natural way to respond to photos and promote awareness. However, most answers to questions in traditional question-answering (QA) datasets are factoids, which reduce individuals’ willingness to answer. Furthermore, traditional visual question generation (VQG) confines the source data for question generation to single images, resulting in a limited ability to comprehend time-series information of the underlying event. In this paper, we propose generating engaging questions from multiple images. We present MVQG, a new dataset, and establish a series of baselines, including both end-to-end and dual-stage architectures. Results show that building stories behind the image sequence enables models togenerate engaging questions, which confirms our assumption that people typically construct a picture of the event in their minds before asking questions. These results open up an exciting challenge for visual-and-language models to implicitly construct a story behind a series of photos to allow for creativity and experience sharing and hence draw attention to downstream applications.

2021

pdf bib
Happy Dance, Slow Clap: Using Reaction GIFs to Predict Induced Affect on Twitter
Boaz Shmueli | Soumya Ray | Lun-Wei Ku
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Datasets with induced emotion labels are scarce but of utmost importance for many NLP tasks. We present a new, automated method for collecting texts along with their induced reaction labels. The method exploits the online use of reaction GIFs, which capture complex affective states. We show how to augment the data with induced emotion and induced sentiment labels. We use our method to create and publish ReactionGIF, a first-of-its-kind affective dataset of 30K tweets. We provide baselines for three new tasks, including induced sentiment prediction and multilabel classification of induced emotions. Our method and dataset open new research opportunities in emotion detection and affective computing.

pdf bib
Stretch-VST: Getting Flexible With Visual Stories
Chi-yang Hsu | Yun-Wei Chu | Tsai-Lun Yang | Ting-Hao Huang | Lun-Wei Ku
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

In visual storytelling, a short story is generated based on a given image sequence. Despite years of work, most visual storytelling models remain limited in terms of the generated stories’ fixed length: most models produce stories with exactly five sentences because five-sentence stories dominate the training data. The fix-length stories carry limited details and provide ambiguous textual information to the readers. Therefore, we propose to “stretch” the stories, which create the potential to present in-depth visual details. This paper presents Stretch-VST, a visual storytelling framework that enables the generation of prolonged stories by adding appropriate knowledge, which is selected by the proposed scoring function. We propose a length-controlled Transformer to generate long stories. This model introduces novel positional encoding methods to maintain story quality with lengthy inputs. Experiments confirm that long stories are generated without deteriorating the quality. The human evaluation further shows that Stretch-VST can provide better focus and detail when stories are prolonged compared to state of the art. We create a webpage to demonstrate our prolonged capability.

pdf bib
Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing
Boaz Shmueli | Jan Fell | Soumya Ray | Lun-Wei Ku
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The use of crowdworkers in NLP research is growing rapidly, in tandem with the exponential increase in research production in machine learning and AI. Ethical discussion regarding the use of crowdworkers within the NLP research community is typically confined in scope to issues related to labor conditions such as fair pay. We draw attention to the lack of ethical considerations related to the various tasks performed by workers, including labeling, evaluation, and production. We find that the Final Rule, the common ethical framework used by researchers, did not anticipate the use of online crowdsourcing platforms for data collection, resulting in gaps between the spirit and practice of human-subjects ethics in NLP research. We enumerate common scenarios where crowdworkers performing NLP tasks are at risk of harm. We thus recommend that researchers evaluate these risks by considering the three ethical principles set up by the Belmont Report. We also clarify some common misconceptions regarding the Institutional Review Board (IRB) application. We hope this paper will serve to reopen the discussion within our community regarding the ethical use of crowdworkers.

pdf bib
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics
Lun-Wei Ku | Vivi Nastase | Ivan Vulić
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics

pdf bib
Plot and Rework: Modeling Storylines for Visual Storytelling
Chi-yang Hsu | Yun-Wei Chu | Ting-Hao Huang | Lun-Wei Ku
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Lying Through One’s Teeth: A Study on Verbal Leakage Cues
Min-Hsuan Yeh | Lun-Wei Ku
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Although many studies use the LIWC lexicon to show the existence of verbal leakage cues in lie detection datasets, none mention how verbal leakage cues are influenced by means of data collection, or the impact thereof on the performance of models. In this paper, we study verbal leakage cues to understand the effect of the data construction method on their significance, and examine the relationship between such cues and models’ validity. The LIWC word-category dominance scores of seven lie detection datasets are used to show that audio statements and lie-based annotations indicate a greater number of strong verbal leakage cue categories. Moreover, we evaluate the validity of state-of-the-art lie detection models with cross- and in-dataset testing. Results show that in both types of testing, models trained on a dataset with more strong verbal leakage cue categories—as opposed to only a greater number of strong cues—yield superior results, suggesting that verbal leakage cues are a key factor for selecting lie detection datasets.

pdf bib
Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media
Lun-Wei Ku | Cheng-Te Li
Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media

2020

pdf bib
Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media
Lun-Wei Ku | Cheng-Te Li
Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media

pdf bib
Reactive Supervision: A New Method for Collecting Sarcasm Data
Boaz Shmueli | Lun-Wei Ku | Soumya Ray
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Sarcasm detection is an important task in affective computing, requiring large amounts of labeled data. We introduce reactive supervision, a novel data collection method that utilizes the dynamics of online conversations to overcome the limitations of existing data collection techniques. We use the new method to create and release a first-of-its-kind large dataset of tweets with sarcasm perspective labels and new contextual features. The dataset is expected to advance sarcasm detection research. Our method can be adapted to other affective computing domains, thus opening up new research opportunities.

pdf bib
Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent
Yun-Hsuan Jen | Chieh-Yang Huang | MeiHua Chen | Ting-Hao Huang | Lun-Wei Ku
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Many English-as-a-second language learners have trouble using near-synonym words (e.g., small vs.little; briefly vs.shortly) correctly, and often look for example sentences to learn how two nearly synonymous terms differ. Prior work uses hand-crafted scores to recommend sentences but has difficulty in adopting such scores to all the near-synonyms as near-synonyms differ in various ways. We notice that the helpfulness of the learning material would reflect on the learners’ performance. Thus, we propose the inference-based learner-like agent to mimic learner behavior and identify good learning materials by examining the agent’s performance. To enable the agent to behave like a learner, we leverage entailment modeling’s capability of inferring answers from the provided materials. Experimental results show that the proposed agent is equipped with good learner-like behavior to achieve the best performance in both fill-in-the-blank (FITB) and good example sentence selection tasks. We further conduct a classroom user study with college ESL learners. The results of the user study show that the proposed agent can find out example sentences that help students learn more easily and efficiently. Compared to other models, the proposed agent improves the score of more than 17% of students after learning.

2019

pdf bib
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)
Rada Mihalcea | Ekaterina Shutova | Lun-Wei Ku | Kilian Evang | Soujanya Poria
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

pdf bib
UHop: An Unrestricted-Hop Relation Extraction Framework for Knowledge-Based Question Answering
Zi-Yuan Chen | Chih-Hung Chang | Yi-Pei Chen | Jijnasa Nayak | Lun-Wei Ku
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In relation extraction for knowledge-based question answering, searching from one entity to another entity via a single relation is called “one hop”. In related work, an exhaustive search from all one-hop relations, two-hop relations, and so on to the max-hop relations in the knowledge graph is necessary but expensive. Therefore, the number of hops is generally restricted to two or three. In this paper, we propose UHop, an unrestricted-hop framework which relaxes this restriction by use of a transition-based search framework to replace the relation-chain-based search one. We conduct experiments on conventional 1- and 2-hop questions as well as lengthy questions, including datasets such as WebQSP, PathQuestion, and Grid World. Results show that the proposed framework enables the ability to halt, works well with state-of-the-art models, achieves competitive performance without exhaustive searches, and opens the performance gap for long relation paths.

pdf bib
From Receptive to Productive: Learning to Use Confusing Words through Automatically Selected Example Sentences
Chieh-Yang Huang | Yi-Ting Huang | MeiHua Chen | Lun-Wei Ku
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

Knowing how to use words appropriately has been a key to improving language proficiency. Previous studies typically discuss how students learn receptively to select the correct candidate from a set of confusing words in the fill-in-the-blank task where specific context is given. In this paper, we go one step further, assisting students to learn to use confusing words appropriately in a productive task: sentence translation. We leverage the GiveMe-Example system, which suggests example sentences for each confusing word, to achieve this goal. In this study, students learn to differentiate the confusing words by reading the example sentences, and then choose the appropriate word(s) to complete the sentence translation task. Results show students made substantial progress in terms of sentence structure. In addition, highly proficient students better managed to learn confusing words. In view of the influence of the first language on learners, we further propose an effective approach to improve the quality of the suggested sentences.

2018

pdf bib
EmotionLines: An Emotion Corpus of Multi-Party Conversations
Chao-Chun Hsu | Sheng-Yeh Chen | Chuan-Chun Kuo | Ting-Hao Huang | Lun-Wei Ku
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media
Lun-Wei Ku | Cheng-Te Li
Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media

pdf bib
SocialNLP 2018 EmotionX Challenge Overview: Recognizing Emotions in Dialogues
Chao-Chun Hsu | Lun-Wei Ku
Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media

This paper describes an overview of the Dialogue Emotion Recognition Challenge, EmotionX, at the Sixth SocialNLP Workshop, which recognizes the emotion of each utterance in dialogues. This challenge offers the EmotionLines dataset as the experimental materials. The EmotionLines dataset contains conversations from Friends TV show transcripts (Friends) and real chatting logs (EmotionPush), where every dialogue utterance is labeled with emotions. Organizers provide baseline results. 18 teams registered in this challenge and 5 of them submitted their results successfully. The best team achieves the unweighted accuracy 62.48 and 62.5 on EmotionPush and Friends, respectively. In this paper we present the task definition, test collection, the evaluation results of the groups that participated in this challenge, and their approach.

2017

pdf bib
Enabling Transitivity for Lexical Inference on Chinese Verbs Using Probabilistic Soft Logic
Wei-Chung Wang | Lun-Wei Ku
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

To learn more knowledge, enabling transitivity is a vital step for lexical inference. However, most of the lexical inference models with good performance are for nouns or noun phrases, which cannot be directly applied to the inference on events or states. In this paper, we construct the largest Chinese verb lexical inference dataset containing 18,029 verb pairs, where for each pair one of four inference relations are annotated. We further build a probabilistic soft logic (PSL) model to infer verb lexicons using the logic language. With PSL, we easily enable transitivity in two layers, the observed layer and the feature layer, which are included in the knowledge base. We further discuss the effect of transitives within and between these layers. Results show the performance of the proposed PSL model can be improved at least 3.5% (relative) when the transitivity is enabled. Furthermore, experiments show that enabling transitivity in the observed layer benefits the most.

pdf bib
NLPSA at IJCNLP-2017 Task 2: Imagine Scenario: Leveraging Supportive Images for Dimensional Sentiment Analysis
Szu-Min Chen | Zi-Yuan Chen | Lun-Wei Ku
Proceedings of the IJCNLP 2017, Shared Tasks

Categorical sentiment classification has drawn much attention in the field of NLP, while less work has been conducted for dimensional sentiment analysis (DSA). Recent works for DSA utilize either word embedding, knowledge base features, or bilingual language resources. In this paper, we propose our model for IJCNLP 2017 Dimensional Sentiment Analysis for Chinese Phrases shared task. Our model incorporates word embedding as well as image features, attempting to simulate human’s imaging behavior toward sentiment analysis. Though the performance is not comparable to others in the end, we conduct several experiments with possible reasons discussed, and analyze the drawbacks of our model.

pdf bib
MoodSwipe: A Soft Keyboard that Suggests MessageBased on User-Specified Emotions
Chieh-Yang Huang | Tristan Labetoulle | Ting-Hao Huang | Yi-Pei Chen | Hung-Chen Chen | Vallari Srivastava | Lun-Wei Ku
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present MoodSwipe, a soft keyboard that suggests text messages given the user-specified emotions utilizing the real dialog data. The aim of MoodSwipe is to create a convenient user interface to enjoy the technology of emotion classification and text suggestion, and at the same time to collect labeled data automatically for developing more advanced technologies. While users select the MoodSwipe keyboard, they can type as usual but sense the emotion conveyed by their text and receive suggestions for their message as a benefit. In MoodSwipe, the detected emotions serve as the medium for suggested texts, where viewing the latter is the incentive to correcting the former. We conduct several experiments to show the superiority of the emotion classification models trained on the dialog data, and further to verify good emotion cues are important context for text suggestion.

pdf bib
Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media
Lun-Wei Ku | Cheng-Te Li
Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media

pdf bib
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017)
Lun-Wei Ku | Yu Tsao
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017)

2016

pdf bib
Whose Nickname is This? Recognizing Politicians from Their Aliases
Wei-Chung Wang | Hung-Chen Chen | Zhi-Kai Ji | Hui-I Hsiao | Yu-Shian Chiu | Lun-Wei Ku
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

Using aliases to refer to public figures is one way to make fun of people, to express sarcasm, or even to sidestep legal issues when expressing opinions on social media. However, linking an alias back to the real name is difficult, as it entails phonemic, graphemic, and semantic challenges. In this paper, we propose a phonemic-based approach and inject semantic information to align aliases with politicians’ Chinese formal names. The proposed approach creates an HMM model for each name to model its phonemes and takes into account document-level pairwise mutual information to capture the semantic relations to the alias. In this work we also introduce two new datasets consisting of 167 phonemic pairs and 279 mixed pairs of aliases and formal names. Experimental results show that the proposed approach models both phonemic and semantic information and outperforms previous work on both the phonemic and mixed datasets with the best top-1 accuracies of 0.78 and 0.59 respectively.

pdf bib
Proceedings of the Fourth International Workshop on Natural Language Processing for Social Media
Lun-Wei Ku | Jane Yung-jen Hsu | Cheng-Te Li
Proceedings of the Fourth International Workshop on Natural Language Processing for Social Media

pdf bib
ANTUSD: A Large Chinese Sentiment Dictionary
Shih-Ming Wang | Lun-Wei Ku
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper introduces the augmented NTU sentiment dictionary, abbreviated as ANTUSD, which is constructed by collecting sentiment stats of words in several sentiment annotation work. A total of 26,021 words were collected in ANTUSD. For each word, the CopeOpi numerical sentiment score and the number of positive annotation, neutral annotation, negative annotation, non-opinionated annotation, and not-a-word annotation are provided. Words and their sentiment information in ANTUSD have been linked to the Chinese ontology E-HowNet to provide rich semantic information. We demonstrate the usage of ANTUSD in polarity classification of words, and the results show that a superior f-score 98.21 is achieved, which supports the usefulness of the ANTUSD. ANTUSD can be freely obtained through application from NLPSA lab, Academia Sinica: http://academiasinicanlplab.github.io/

pdf bib
UTCNN: a Deep Learning Model of Stance Classification on Social Media Text
Wei-Fan Chen | Lun-Wei Ku
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Most neural network models for document classification on social media focus on text information to the neglect of other information on these platforms. In this paper, we classify post stance on social media channels and develop UTCNN, a neural network model that incorporates user tastes, topic tastes, and user comments on posts. UTCNN not only works on social media texts, but also analyzes texts in forums and message boards. Experiments performed on Chinese Facebook data and English online debate forum data show that UTCNN achieves a 0.755 macro average f-score for supportive, neutral, and unsupportive stance classes on Facebook data, which is significantly better than models in which either user, topic, or comment information is withheld. This model design greatly mitigates the lack of data for the minor class. In addition, UTCNN yields a 0.842 accuracy on English online debate forum data, which also significantly outperforms results from previous work, showing that UTCNN performs well regardless of language or platform.

pdf bib
Sensing Emotions in Text Messages: An Application and Deployment Study of EmotionPush
Shih-Ming Wang | Chun-Hui Scott Lee | Yu-Chun Lo | Ting-Hao Huang | Lun-Wei Ku
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

Instant messaging and push notifications play important roles in modern digital life. To enable robust sense-making and rich context awareness in computer mediated communications, we introduce EmotionPush, a system that automatically conveys the emotion of received text with a colored push notification on mobile devices. EmotionPush is powered by state-of-the-art emotion classifiers and is deployed for Facebook Messenger clients on Android. The study showed that the system is able to help users prioritize interactions.

pdf bib
WordForce: Visualizing Controversial Words in Debates
Wei-Fan Chen | Fang-Yu Lin | Lun-Wei Ku
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

This paper presents WordForce, a system powered by the state of the art neural network model to visualize the learned user-dependent word embeddings from each post according to the post content and its engaged users. It generates the scatter plots to show the force of a word, i.e., whether the semantics of word embeddings from posts of different stances are clearly separated from the aspect of this controversial word. In addition, WordForce provides the dispersion and the distance of word embeddings from posts of different stance groups, and proposes the most controversial words accordingly to show clues to what people argue about in a debate.

pdf bib
Automatically Suggesting Example Sentences of Near-Synonyms for Language Learners
Chieh-Yang Huang | Nicole Peinelt | Lun-Wei Ku
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

In this paper, we propose GiveMeExample that ranks example sentences according to their capacity of demonstrating the differences among English and Chinese near-synonyms for language learners. The difficulty of the example sentences is automatically detected. Furthermore, the usage models of the near-synonyms are built by the GMM and Bi-LSTM models to suggest the best elaborative sentences. Experiments show the good performance both in the fill-in-the-blank test and on the manually labeled gold data, that is, the built models can select the appropriate words for the given context and vice versa.

pdf bib
Chinese Textual Sentiment Analysis: Datasets, Resources and Tools
Lun-Wei Ku | Wei-Fan Chen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Tutorial Abstracts

The rapid accumulation of data in social media (in million and billion scales) has imposed great challenges in information extraction, knowledge discovery, and data mining, and texts bearing sentiment and opinions are one of the major categories of user generated data in social media. Sentiment analysis is the main technology to quickly capture what people think from these text data, and is a research direction with immediate practical value in ‘big data’ era. Learning such techniques will allow data miners to perform advanced mining tasks considering real sentiment and opinions expressed by users in additional to the statistics calculated from the physical actions (such as viewing or purchasing records) user perform, which facilitates the development of real-world applications. However, the situation that most tools are limited to the English language might stop academic or industrial people from doing research or products which cover a wider scope of data, retrieving information from people who speak different languages, or developing applications for worldwide users. More specifically, sentiment analysis determines the polarities and strength of the sentiment-bearing expressions, and it has been an important and attractive research area. In the past decade, resources and tools have been developed for sentiment analysis in order to provide subsequent vital applications, such as product reviews, reputation management, call center robots, automatic public survey, etc. However, most of these resources are for the English language. Being the key to the understanding of business and government issues, sentiment analysis resources and tools are required for other major languages, e.g., Chinese. In this tutorial, audience can learn the skills for retrieving sentiment from texts in another major language, Chinese, to overcome this obstacle. The goal of this tutorial is to introduce the proposed sentiment analysis technologies and datasets in the literature, and give the audience the opportunities to use resources and tools to process Chinese texts from the very basic preprocessing, i.e., word segmentation and part of speech tagging, to sentiment analysis, i.e., applying sentiment dictionaries and obtaining sentiment scores, through step-by-step instructions and a hand-on practice. The basic processing tools are from CKIP Participants can download these resources, use them and solve the problems they encounter in this tutorial. This tutorial will begin from some background knowledge of sentiment analysis, such as how sentiment are categorized, where to find available corpora and which models are commonly applied, especially for the Chinese language. Then a set of basic Chinese text processing tools for word segmentation, tagging and parsing will be introduced for the preparation of mining sentiment and opinions. After bringing the idea of how to pre-process the Chinese language to the audience, I will describe our work on compositional Chinese sentiment analysis from words to sentences, and an application on social media text (Facebook) as an example. All our involved and recently developed related resources, including Chinese Morphological Dataset, Augmented NTU Sentiment Dictionary (aug-NTUSD), E-hownet with sentiment information, Chinese Opinion Treebank, and the CopeOpi Sentiment Scorer, will also be introduced and distributed in this tutorial. The tutorial will end by a hands-on session of how to use these materials and tools to process Chinese sentiment. Content Details, Materials, and Program please refer to the tutorial URL: http://www.lunweiku.com/

2015

pdf bib
Designing a Tag-Based Statistical Math Word Problem Solver with Reasoning and Explanation
Yi-Chung Lin | Chao-Chun Liang | Kuang-Yi Hsu | Chien-Tsung Huang | Shen-Yun Miao | Wei-Yun Ma | Lun-Wei Ku | Churn-Jung Liau | Keh-Yih Su
International Journal of Computational Linguistics & Chinese Language Processing, Volume 20, Number 2, December 2015 - Special Issue on Selected Papers from ROCLING XXVII

pdf bib
Embarrassed or Awkward? Ranking Emotion Synonyms for ESL Learners’ Appropriate Wording
Wei-Fan Chen | Mei-Hua Chen | Lun-Wei Ku
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Proceedings of the third International Workshop on Natural Language Processing for Social Media
Shou-de Lin | Lun-Wei Ku | Cheng-Te Li | Erik Cambria
Proceedings of the third International Workshop on Natural Language Processing for Social Media

pdf bib
A Dual-Layer Semantic Role Labeling System
Lun-Wei Ku | Shafqat Mumtaz Virk | Yann-Huei Lee
Proceedings of ACL-IJCNLP 2015 System Demonstrations

2014

pdf bib
Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP)
Shou-de Lin | Lun-Wei Ku | Erik Cambria | Tsung-Ting Kuo
Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP)

pdf bib
A Rule-Based Approach to Aspect Extraction from Product Reviews
Soujanya Poria | Erik Cambria | Lun-Wei Ku | Chen Gui | Alexander Gelbukh
Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP)

pdf bib
Cross-Lingual Information to the Rescue in Keyword Extraction
Chung-Chi Huang | Maxine Eskenazi | Jaime Carbonell | Lun-Wei Ku | Ping-Che Yang
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2013

pdf bib
Proceedings of the IJCNLP 2013 Workshop on Natural Language Processing for Social Media (SocialNLP)
Shou-de Lin | Lun-Wei Ku | Tsung-Ting Kuo
Proceedings of the IJCNLP 2013 Workshop on Natural Language Processing for Social Media (SocialNLP)

pdf bib
Interest Analysis using PageRank and Social Interaction Content
Chung-chi Huang | Lun-Wei Ku
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
Demonstration of IlluMe: Creating Ambient According to Instant Message Logs
Lun-Wei Ku | Cheng-Wei Sun | Ya-Hsin Hsueh
Proceedings of the ACL 2012 System Demonstrations

2011

pdf bib
Predicting Opinion Dependency Relations for Opinion Analysis
Lun-Wei Ku | Ting-Hao Huang | Hsin-Hsi Chen
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
Construction of a Chinese Opinion Treebank
Lun-Wei Ku | Ting-Hao Huang | Hsin-Hsi Chen
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we base on the syntactic structural Chinese Treebank corpus, construct the Chinese Opinon Treebank for the research of opinion analysis. We introduce the tagging scheme and develop a tagging tool for constructing this corpus. Annotated samples are described. Information including opinions (yes or no), their polarities (positive, neutral or negative), types (expression, status, or action), is defined and annotated. In addition, five structure trios are introduced according to the linguistic relations between two Chinese words. Four of them that are possibly related to opinions are also annotated in the constructed corpus to provide the linguistic cues. The number of opinion sentences together with the number of their polarities, opinion types, and trio types are calculated. These statistics are compared and discussed. To know the quality of the annotations in this corpus, the kappa values of the annotations are calculated. The substantial agreement between annotations ensures the applicability and reliability of the constructed corpus.

pdf bib
Predicting Morphological Types of Chinese Bi-Character Words by Machine Learning Approaches
Ting-Hao Huang | Lun-Wei Ku | Hsin-Hsi Chen
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presented an overview of Chinese bi-character words’ morphological types, and proposed a set of features for machine learning approaches to predict these types based on composite characters’ information. First, eight morphological types were defined, and 6,500 Chinese bi-character words were annotated with these types. After pre-processing, 6,178 words were selected to construct a corpus named Reduced Set. We analyzed Reduced Set and conducted the inter-annotator agreement test. The average kappa value of 0.67 indicates a substantial agreement. Second, Bi-character words’ morphological types are considered strongly related with the composite characters’ parts of speech in this paper, so we proposed a set of features which can simply be extracted from dictionaries to indicate the characters’ “tendency” of parts of speech. Finally, we used these features and adopted three machine learning algorithms, SVM, CRF, and Naïve Bayes, to predict the morphological types. On the average, the best algorithm CRF achieved 75% of the annotators’ performance.

2009

pdf bib
Using Morphological and Syntactic Structures for Chinese Opinion Analysis
Lun-Wei Ku | Ting-Hao Huang | Hsin-Hsi Chen
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
意見持有者辨識之研究 (A Study on Identification of Opinion Holders) [In Chinese]
Chia-Ying Lee | Lun-Wei Ku | Hsin-Hsi Chen
Proceedings of the 21st Conference on Computational Linguistics and Speech Processing

pdf bib
Identification of Opinion Holders
Lun-Wei Ku | Chia-Ying Lee | Hsin-Hsi Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 14, Number 4, December 2009

2008

pdf bib
Question Analysis and Answer Passage Retrieval for Opinion Question Answering Systems
Lun-Wei Ku | Yu-Ting Liang | Hsin-Hsi Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 13, Number 3, September 2008: Special Issue on Selected Papers from ROCLING XIX

2007

pdf bib
Test Collection Selection and Gold Standard Generation for a Multiply-Annotated Opinion Corpus
Lun-Wei Ku | Yong-Sheng Lo | Hsin-Hsi Chen
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib
Question Analysis and Answer Passage Retrieval for Opinion Question Answering Systems
Lun-Wei Ku | Yu-Ting Liang | Hsin-Hsi Chen
Proceedings of the 19th Conference on Computational Linguistics and Speech Processing

2006

pdf bib
Tagging Heterogeneous Evaluation Corpora for Opinionated Tasks
Lun-Wei Ku | Yu-Ting Liang | Hsin-Hsi Chen
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Opinion retrieval aims to tell if a document is positive, neutral or negative on a given topic. Opinion extraction further identifies the supportive and the non-supportive evidence of a document. To evaluate the performance of technologies for opinionated tasks, a suitable corpus is necessary. This paper defines the annotations for opinionated materials. Heterogeneous experimental materials are annotated, and the agreements among annotators are analyzed. How human can monitor opinions of the whole is also examined. The corpus can be employed to opinion extraction, opinion summarization, opinion tracking and opinionated question answering.