TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills

Qiushi Sun, Nuo Chen, Jianing Wang, Ming Gao, Xiang Li


Abstract
Code pre-trained models (CodePTMs) have recently demonstrated a solid capacity to process various code intelligence tasks, e.g., code clone detection, code translation, and code summarization. The current mainstream method that deploys these models to downstream tasks is to fine-tune them on individual tasks, which is generally costly and needs sufficient data for large models. To tackle the issue, in this paper, we present TransCoder, a unified Transferable fine-tuning strategy for Code representation learning. Inspired by human inherent skills of knowledge generalization, TransCoder drives the model to learn better code-related knowledge like human programmers. Specifically, we employ a tunable prefix encoder to first capture cross-task and cross-language transferable knowledge, subsequently applying the acquired knowledge for optimized downstream adaptation. Besides, our approach confers benefits for tasks with minor training sample sizes and languages with smaller corpora, underscoring versatility and efficacy. Extensive experiments conducted on representative datasets clearly demonstrate that our method can lead to superior performance on various code-related tasks and encourage mutual reinforcement, especially in low-resource scenarios. Our codes are available at https://github.com/QiushiSun/TransCoder.
Anthology ID:
2024.lrec-main.1453
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
16713–16726
Language:
URL:
https://aclanthology.org/2024.lrec-main.1453
DOI:
Bibkey:
Cite (ACL):
Qiushi Sun, Nuo Chen, Jianing Wang, Ming Gao, and Xiang Li. 2024. TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 16713–16726, Torino, Italia. ELRA and ICCL.
Cite (Informal):
TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills (Sun et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1453.pdf