2024
pdf
bib
abs
Improving Socratic Question Generation using Data Augmentation and Preference Optimization
Nischal Ashok Kumar

Andrew Lan
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)
The Socratic method is a way of guiding students toward solving a problem independently without directly revealing the solution to the problem by asking incremental questions. Although this method has been shown to significantly improve student learning outcomes, it remains a complex laborintensive task for instructors. Large language models (LLMs) can be used to augment human effort by automatically generating Socratic questions for students. However, existing methods that involve prompting these LLMs sometimes produce invalid outputs, e.g., those that directly reveal the solution to the problem or provide irrelevant or premature questions. To alleviate this problem, inspired by reinforcement learning with AI feedback (RLAIF), we first propose a data augmentation method to enrich existing Socratic questioning datasets with questions that are invalid in specific ways. Also, we propose a method to optimize opensource LLMs such as LLama 2 to prefer groundtruth questions over generated invalid ones, using direct preference optimization (DPO). Our experiments on a Socratic questions dataset for student code debugging show that a DPOoptimized LLama 27B model can effectively avoid generating invalid questions, and as a result, outperforms existing stateoftheart prompting methods.
pdf
bib
abs
Improving Automated Distractor Generation for Math Multiplechoice Questions with Overgenerateandrank
Alexander Scarlatos

Wanyong Feng

Andrew Lan

Simon Woodhead

Digory Smith
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)
Multiplechoice questions (MCQs) are commonly used across all levels of math education since they can be deployed and graded at a large scale. A critical component of MCQs is the distractors, i.e., incorrect answers crafted to reflect student errors or misconceptions. Automatically generating them in math MCQs, e.g., with large language models, has been challenging. In this work, we propose a novel method to enhance the quality of generated distractors through overgenerateandrank, training a ranking model to predict how likely distractors are to be selected by real students. Experimental results on a realworld dataset and human evaluation with math teachers show that our ranking model increases alignment with humanauthored distractors, although humanauthored ones are still preferred over generated ones.
pdf
bib
abs
Exploring Automated Distractor Generation for Math Multiplechoice Questions via Large Language Models
Wanyong Feng

Jaewook Lee

Hunter McNichols

Alexander Scarlatos

Digory Smith

Simon Woodhead

Nancy Ornelas

Andrew Lan
Findings of the Association for Computational Linguistics: NAACL 2024
Multiplechoice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and are a reliable format in assessments and practices. One of the most important aspects of MCQs is the distractors, i.e., incorrect options that are designed to target common errors or misconceptions among real students. To date, the task of crafting highquality distractors largely remains a labor and timeintensive process for teachers and learning content designers, which has limited scalability. In this work, we study the task of automated distractor generation in the domain of math MCQs and explore a wide variety of large language model (LLM)based approaches, from incontext learning to finetuning. We conduct extensive experiments using a realworld math MCQ dataset and find that although LLMs can generate some mathematically valid distractors, they are less adept at anticipating common errors or misconceptions among real students.
pdf
bib
abs
SyllabusQA: A Course Logistics Question Answering Dataset
Nigel Fernandez

Alexander Scarlatos

Andrew Lan
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Automated teaching assistants and chatbots have significant potential to reduce the workload of human instructors, especially for logisticsrelated question answering, which is important to students yet repetitive for instructors. However, due to privacy concerns, there is a lack of publicly available datasets. We introduce SyllabusQA, an opensource dataset with 63 real course syllabi covering 36 majors, containing 5,078 openended course logisticsrelated questionanswer pairs that are diverse in both question types and answer formats. Since many logisticsrelated questions contain critical information like the date of an exam, it is important to evaluate the factuality of answers. We benchmark several strong baselines on this task, from large language model prompting to retrievalaugmented generation. We introduce FactQA, an LLMbased (GPT4) evaluation metric to evaluate the factuality of predicted answers. We find that despite performing close to humans on traditional metrics of textual similarity, there remains a significant gap between automated approaches and humans in terms of fact precision.
2023
pdf
bib
abs
TreeBased Representation and Generation of Natural and Mathematical Language
Alexander Scarlatos

Andrew Lan
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Mathematical language in scientific communications and educational scenarios is important yet relatively understudied compared to natural languages. Recent works on mathematical language focus either on representing standalone mathematical expressions, especially in their natural tree format, or mathematical reasoning in pretrained natural language models. Existing works on jointly modeling and generating natural and mathematical languages simply treat mathematical expressions as text, without accounting for the rigid structural properties of mathematical expressions. In this paper, we propose a series of modifications to existing language models to jointly represent and generate text and math: representing mathematical expressions as sequences of node tokens in their operator tree format, using math symbol and tree position embeddings to preserve the semantic and structural properties of mathematical expressions, and using a constrained decoding method to generate mathematically valid expressions. We ground our modifications in GPT2, resulting in a model MathGPT, and demonstrate that it outperforms baselines on mathematical expression generation tasks.
pdf
bib
abs
Interpretable Math Word Problem Solution Generation via Stepbystep Planning
Mengxue Zhang

Zichao Wang

Zhichao Yang

Weiqi Feng

Andrew Lan
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Solutions to math word problems (MWPs) with stepbystep explanations are valuable, especially in education, to help students better comprehend problemsolving strategies. Most existing approaches only focus on obtaining the final correct answer. A few recent approaches leverage intermediate solution steps to improve final answer correctness but often cannot generate coherent steps with a clear solution strategy. Contrary to existing work, we focus on improving the correctness and coherence of the intermediate solutions steps. We propose a stepbystep planning approach for intermediate solution generation, which strategically plans the generation of the next solution step based on the MWP and the previous solution steps. Our approach first plans the next step by predicting the necessary math operation needed to proceed, given history steps, then generates the next step, tokenbytoken, by prompting a language model with the predicted math operation. Experiments on the GSM8K dataset demonstrate that our approach improves the accuracy and interpretability of the solution on both automatic metrics and human evaluation.
pdf
bib
abs
Improving Reading Comprehension Question Generation with Data Augmentation and Overgenerateandrank
Nischal Ashok Kumar

Nigel Fernandez

Zichao Wang

Andrew Lan
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)
Reading comprehension is a crucial skill in many aspects of education, including language learning, cognitive development, and fostering early literacy skills in children. Automated answeraware reading comprehension question generation has significant potential to scale up learner support in educational activities. One key technical challenge in this setting is that there can be multiple questions, sometimes very different from each other, with the same answer; a trained question generation method may not necessarily know which question human educators would prefer. To address this challenge, we propose 1) a data augmentation method that enriches the training dataset with diverse questions given the same context and answer and 2) an overgenerateandrank method to select the best question from a pool of candidates. We evaluate our method on the FairytaleQA dataset, showing a 5% absolute improvement in ROUGEL over the best existing method. We also demonstrate the effectiveness of our method in generating harder, “implicit” questions, where the answers are not contained in the context as text spans.
2022
pdf
bib
abs
Openended Knowledge Tracing for Computer Science Education
Naiming Liu

Zichao Wang

Richard Baraniuk

Andrew Lan
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
In educational applications, knowledge tracing refers to the problem of estimating students’ timevarying concept/skill mastery level from their past responses to questions and predicting their future performance.One key limitation of most existing knowledge tracing methods is that they treat student responses to questions as binaryvalued, i.e., whether they are correct or incorrect. Response correctness analysis/prediction is straightforward, but it ignores important information regarding mastery, especially for openended questions.In contrast, exact student responses can provide much more information.In this paper, we conduct the first exploration int openended knowledge tracing (OKT) by studying the new task of predicting students’ exact openended responses to questions.Our work is grounded in the domain of computer science education with programming questions. We develop an initial solution to the OKT problem, a student knowledgeguided code generation approach, that combines program synthesis methods using language models with student knowledge tracing methods. We also conduct a series of quantitative and qualitative experiments on a realworld student code dataset to validate and demonstrate the promise of OKT.
2021
pdf
bib
abs
Math Word Problem Generation with Mathematical Consistency and Problem Context Constraints
Zichao Wang

Andrew Lan

Richard Baraniuk
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
We study the problem of generating arithmetic math word problems (MWPs) given a math equation that specifies the mathematical computation and a context that specifies the problem scenario. Existing approaches are prone to generating MWPs that are either mathematically invalid or have unsatisfactory language quality. They also either ignore the context or require manual specification of a problem template, which compromises the diversity of the generated MWPs. In this paper, we develop a novel MWP generation approach that leverages i) pretrained language models and a context keyword selection model to improve the language quality of generated MWPs and ii) an equation consistency constraint for math equations to improve the mathematical validity of the generated MWPs. Extensive quantitative and qualitative experiments on three realworld MWP datasets demonstrate the superior performance of our approach compared to various baselines.
2020
pdf
bib
abs
Robust and Interpretable Grounding of Spatial References with Relation Networks
TsungYen Yang

Andrew Lan

Karthik Narasimhan
Findings of the Association for Computational Linguistics: EMNLP 2020
Learning representations of spatial references in natural language is a key challenge in tasks like autonomous navigation and robotic manipulation. Recent work has investigated various neural architectures for learning multimodal representations for spatial concepts. However, the lack of explicit reasoning over entities makes such approaches vulnerable to noise in input text or state observations. In this paper, we develop effective models for understanding spatial references in text that are robust and interpretable, without sacrificing performance. We design a textconditioned relation network whose parameters are dynamically computed with a crossmodal attention module to capture finegrained spatial relations between entities. This design choice provides interpretability of learned intermediate outputs. Experiments across three tasks demonstrate that our model achieves superior performance, with a 17% improvement in predicting goal locations and a 15% improvement in robustness compared to stateoftheart systems.