@inproceedings{ghosh-etal-2022-finrad,
title = "{F}in{RAD}: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability",
author = "Ghosh, Sohom and
Sengupta, Shovon and
Naskar, Sudip and
Singh, Sunny Kumar",
editor = "El-Haj, Mahmoud and
Rayson, Paul and
Zmandar, Nadhem",
booktitle = "Proceedings of the 4th Financial Narrative Processing Workshop @LREC2022",
month = jun,
year = "2022",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://aclanthology.org/2022.fnp-1.1/",
pages = "1--9",
abstract = "In today`s world, the advancement and spread of the Internet and digitalization have resulted in most information being openly accessible. This holds true for financial services as well. Investors make data driven decisions by analysing publicly available information like annual reports of listed companies, details regarding asset allocation of mutual funds, etc. Many a time these financial documents contain unknown financial terms. In such cases, it becomes important to look at their definitions. However, not all definitions are equally readable. Readability largely depends on the structure, complexity and constituent terms that make up a definition. This brings in the need for automatically evaluating the readability of definitions of financial terms. This paper presents a dataset, FinRAD consisting of financial terms, their definitions and embeddings. In addition to standard readability scores (like {\textquotedblleft}Flesch Reading Index (FRI){\textquotedblright}, {\textquotedblleft}Automated Readability Index (ARI){\textquotedblright}, {\textquotedblleft}SMOG Index Score (SIS){\textquotedblright},{\textquotedblleft}Dale-Chall formula (DCF){\textquotedblright}, etc.), it also contains the readability scores (AR) assigned based on sources from which the terms have been collected. We manually inspect a sample from it to ensure the quality of the assignment. Subsequently, we prove that the rule-based standard readability scores (like {\textquotedblleft}Flesch Reading Index (FRI){\textquotedblright}, {\textquotedblleft}Automated Readability Index (ARI){\textquotedblright}, {\textquotedblleft}SMOG Index Score (SIS){\textquotedblright},{\textquotedblleft}Dale-Chall formula (DCF){\textquotedblright}, etc.) do not correlate well with the manually assigned binary readability scores of definitions of financial terms. Finally, we present a few neural baselines using transformer based architecture to automatically classify these definitions as readable or not. Pre-trained FinBERT model fine-tuned on FinRAD corpus performs the best (AU-ROC = 0.9927, F1 = 0.9610). This corpus can be downloaded from \url{https://github.com/sohomghosh/FinRAD_Financial_Readability_Assessment_Dataset}."
}
<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="ghosh-etal-2022-finrad">
<titleInfo>
<title>FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability</title>
</titleInfo>
<name type="personal">
<namePart type="given">Sohom</namePart>
<namePart type="family">Ghosh</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Shovon</namePart>
<namePart type="family">Sengupta</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sudip</namePart>
<namePart type="family">Naskar</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sunny</namePart>
<namePart type="given">Kumar</namePart>
<namePart type="family">Singh</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2022-06</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Proceedings of the 4th Financial Narrative Processing Workshop @LREC2022</title>
</titleInfo>
<name type="personal">
<namePart type="given">Mahmoud</namePart>
<namePart type="family">El-Haj</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Paul</namePart>
<namePart type="family">Rayson</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Nadhem</namePart>
<namePart type="family">Zmandar</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<publisher>European Language Resources Association</publisher>
<place>
<placeTerm type="text">Marseille, France</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
</relatedItem>
<abstract>In today‘s world, the advancement and spread of the Internet and digitalization have resulted in most information being openly accessible. This holds true for financial services as well. Investors make data driven decisions by analysing publicly available information like annual reports of listed companies, details regarding asset allocation of mutual funds, etc. Many a time these financial documents contain unknown financial terms. In such cases, it becomes important to look at their definitions. However, not all definitions are equally readable. Readability largely depends on the structure, complexity and constituent terms that make up a definition. This brings in the need for automatically evaluating the readability of definitions of financial terms. This paper presents a dataset, FinRAD consisting of financial terms, their definitions and embeddings. In addition to standard readability scores (like “Flesch Reading Index (FRI)”, “Automated Readability Index (ARI)”, “SMOG Index Score (SIS)”,“Dale-Chall formula (DCF)”, etc.), it also contains the readability scores (AR) assigned based on sources from which the terms have been collected. We manually inspect a sample from it to ensure the quality of the assignment. Subsequently, we prove that the rule-based standard readability scores (like “Flesch Reading Index (FRI)”, “Automated Readability Index (ARI)”, “SMOG Index Score (SIS)”,“Dale-Chall formula (DCF)”, etc.) do not correlate well with the manually assigned binary readability scores of definitions of financial terms. Finally, we present a few neural baselines using transformer based architecture to automatically classify these definitions as readable or not. Pre-trained FinBERT model fine-tuned on FinRAD corpus performs the best (AU-ROC = 0.9927, F1 = 0.9610). This corpus can be downloaded from https://github.com/sohomghosh/FinRAD_Financial_Readability_Assessment_Dataset.</abstract>
<identifier type="citekey">ghosh-etal-2022-finrad</identifier>
<location>
<url>https://aclanthology.org/2022.fnp-1.1/</url>
</location>
<part>
<date>2022-06</date>
<extent unit="page">
<start>1</start>
<end>9</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings
%T FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability
%A Ghosh, Sohom
%A Sengupta, Shovon
%A Naskar, Sudip
%A Singh, Sunny Kumar
%Y El-Haj, Mahmoud
%Y Rayson, Paul
%Y Zmandar, Nadhem
%S Proceedings of the 4th Financial Narrative Processing Workshop @LREC2022
%D 2022
%8 June
%I European Language Resources Association
%C Marseille, France
%F ghosh-etal-2022-finrad
%X In today‘s world, the advancement and spread of the Internet and digitalization have resulted in most information being openly accessible. This holds true for financial services as well. Investors make data driven decisions by analysing publicly available information like annual reports of listed companies, details regarding asset allocation of mutual funds, etc. Many a time these financial documents contain unknown financial terms. In such cases, it becomes important to look at their definitions. However, not all definitions are equally readable. Readability largely depends on the structure, complexity and constituent terms that make up a definition. This brings in the need for automatically evaluating the readability of definitions of financial terms. This paper presents a dataset, FinRAD consisting of financial terms, their definitions and embeddings. In addition to standard readability scores (like “Flesch Reading Index (FRI)”, “Automated Readability Index (ARI)”, “SMOG Index Score (SIS)”,“Dale-Chall formula (DCF)”, etc.), it also contains the readability scores (AR) assigned based on sources from which the terms have been collected. We manually inspect a sample from it to ensure the quality of the assignment. Subsequently, we prove that the rule-based standard readability scores (like “Flesch Reading Index (FRI)”, “Automated Readability Index (ARI)”, “SMOG Index Score (SIS)”,“Dale-Chall formula (DCF)”, etc.) do not correlate well with the manually assigned binary readability scores of definitions of financial terms. Finally, we present a few neural baselines using transformer based architecture to automatically classify these definitions as readable or not. Pre-trained FinBERT model fine-tuned on FinRAD corpus performs the best (AU-ROC = 0.9927, F1 = 0.9610). This corpus can be downloaded from https://github.com/sohomghosh/FinRAD_Financial_Readability_Assessment_Dataset.
%U https://aclanthology.org/2022.fnp-1.1/
%P 1-9
Markdown (Informal)
[FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability](https://aclanthology.org/2022.fnp-1.1/) (Ghosh et al., FNP 2022)
ACL