TMID: A Comprehensive Real-world Dataset for Trademark Infringement Detection in E-Commerce

Tongxin Hu, Zhuang Li, Xin Jin, Lizhen Qu, Xin Zhang


Abstract
Annually, e-commerce platforms incur substantial financial losses due to trademark infringements, making it crucial to identify and mitigate potential legal risks tied to merchant information registered to the platforms. However, the absence of high-quality datasets hampers research in this area. To address this gap, our study introduces TMID, a novel dataset to detect trademark infringement in merchant registrations. This is a real-world dataset sourced directly from Alipay, one of the world’s largest e-commerce and digital payment platforms. As infringement detection is a legal reasoning task requiring an understanding of the contexts and legal rules, we offer a thorough collection of legal rules and merchant and trademark-related contextual information with annotations from legal experts. We ensure the data quality by performing an extensive statistical analysis. Furthermore, we conduct an empirical study on this dataset to highlight its value and the key challenges. Through this study, we aim to contribute valuable resources to advance research into legal compliance related to trademark infringement within the e-commerce sphere.
Anthology ID:
2023.emnlp-industry.18
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
December
Year:
2023
Address:
Singapore
Editors:
Mingxuan Wang, Imed Zitouni
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
176–184
Language:
URL:
https://aclanthology.org/2023.emnlp-industry.18
DOI:
10.18653/v1/2023.emnlp-industry.18
Bibkey:
Cite (ACL):
Tongxin Hu, Zhuang Li, Xin Jin, Lizhen Qu, and Xin Zhang. 2023. TMID: A Comprehensive Real-world Dataset for Trademark Infringement Detection in E-Commerce. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 176–184, Singapore. Association for Computational Linguistics.
Cite (Informal):
TMID: A Comprehensive Real-world Dataset for Trademark Infringement Detection in E-Commerce (Hu et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-industry.18.pdf