Expertise Retrieval Using Adjusted TF-IDF and Keyword Mapping  to ACM Classification Terms

Lyla Ruslana Aini; Evi Yulianti

doi:10.29207/resti.v9i3.6397

Lyla Ruslana Aini Badan Riset dan Inovasi Nasional
Evi Yulianti Universitas Indonesia

DOI: https://doi.org/10.29207/resti.v9i3.6397

Keywords: adjusted TF-IDF, ACM classification, BERT, expertise, fasttext, BERT multilingual, SBERT, XLM-RoBERTA

Abstract

In an era of collaboration, knowing someone's expertise is becoming increasingly necessary. Recognizing individuals' proficiency can be challenging because it requires considerable manual time. This study explores the expertise of lecturers from the Computer Science Department, Universitas Indonesia (Fasilkom UI), based on scientific publications. The data were obtained from the Sinta journal website’s scrapping process, which includes Scopus, Garuda, and Google Scholar data sources. The approach used was keyword extraction using the adjusted TF-IDF. The resulting keywords were then mapped to the ACM classification class using cosine similarity calculations with various embedding models, including BERT, BERT multilingual, FastText, XLM Roberta, and SBERT. The experimental results highlighted that combining the adjusted TF-IDF with mapping to the ACM classes using SBERT is a promising approach for gaining the best expertise. The use of abstract data has proved to be better than that of full-text data. Using title-abstract-EN data achieved a score of 0.49 for both the P@1 and NDCG@1 metrics, whereas the title-abstract-ENID data attained a score of 0.75 for both metrics P@1 and NDCG@1.

Downloads

Download data is not yet available.

References

K. Balog, Y. Fang, M. de Rijke, P. Serdyukov, and L. Si, Expertise Retrieval. Now Foundations and Trends, 2012. doi: 10.1561/1500000024.

R. Gonçalves and C. F. Dorneles, “Automated Expertise Retrieval: A Taxonomy-Based Survey and Open Issues,” ACM Comput. Surv., vol. 52, no. 5, Sep. 2019, doi: 10.1145/3331000.

H. H. Lathabai, A. Nandy, and V. K. Singh, “Institutional collaboration recommendation: An expertise-based framework using NLP and network analysis,” Expert Syst. Appl., vol. 209, p. 118317, 2022, doi: https://doi.org/10.1016/j.eswa.2022.118317.

E. Broek, A. Sergeeva, and M. Vrije, “When the Machine Meets the Expert: An Ethnography of Developing AI for Hiring,” MIS Q., vol. 45, pp. 1557–1580, Sep. 2021, doi: 10.25300/MISQ/2021/16559.

B. Ju, “Does domain knowledge matter: Mapping users’ expertise to their information interactions,” J. Am. Soc. Inf. Sci. Technol., vol. 58, no. 13, pp. 2007–2020, Nov. 2007.

W. Lyu and J. Liu, “Soft skills, hard skills: What matters most? Evidence from job postings,” Appl. Energy, vol. 300, p. 117307, 2021, doi: https://doi.org/10.1016/j.apenergy.2021.117307.

R. Fulbright, “The Expertise Level BT - Augmented Cognition. Human Cognition and Behavior,” D. D. Schmorrow and C. M. Fidopiastis, Eds., Cham: Springer International Publishing, 2020, pp. 49–68.

A. Salatino, S. Angioni, F. Osborne, D. Reforgiato Recupero, and E. Motta, “Diversity of Expertise is Key to Scientific Impact: a Large-Scale Analysis in the Field of Computer Science,” in 27th International Conference on Science, Technology and Innovation Indicators (STI 2023), Jun. 2023. doi: 10.48550/arXiv.2306.15344.

C. S. Campbell, P. P. Maglio, A. Cozzi, and B. Dom, “Expertise identification using email communications,” in Proceedings of the Twelfth International Conference on Information and Knowledge Management, in CIKM ’03. New York, NY, USA: Association for Computing Machinery, 2003, pp. 528–531. doi: 10.1145/956863.956965.

A. Kongthon, C. Haruechaiyasak, S. Thaiprayoon, and K. Trakultaweekoon, “Automatically Constructing Areas of Expertise Based on R&D Publication Data,” in 2017 Portland International Conference on Management of Engineering and Technology (PICMET), 2017, pp. 1–6. doi: 10.23919/PICMET.2017.8125418.

Q. T. Tho, S. C. Hui, and A. C. M. Fong, “A citation-based document retrieval system for finding research expertise,” Inf. Process. Manag., vol. 43, no. 1, pp. 248–264, 2007, doi: https://doi.org/10.1016/j.ipm.2006.05.015.

Y.-B. Kang, H. Du, A. R. M. Forkan, P. P. Jayaraman, A. Aryani, and T. Sellis, “ExpFinder: A hybrid model for expert finding from text-based expertise data,” Expert Syst. Appl., vol. 211, p. 118691, 2023, doi: https://doi.org/10.1016/j.eswa.2022.118691.

P. Chaiwanarom and C. Lursinsap, “Collaborator recommendation in interdisciplinary computer science using degrees of collaborative forces, temporal evolution of research interest, and comparative seniority status,” Knowledge-Based Syst., vol. 75, pp. 161–172, 2015, doi: https://doi.org/10.1016/j.knosys.2014.11.029.

X. Song, J. Yan, Y. Huang, H. Sun, and H. Zhang, “A Collaboration-Aware Approach to Profiling Developer Expertise with Cross-Community Data,” in 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS), 2022, pp. 344–355. doi: 10.1109/QRS57517.2022.00043.

S. Surakka and L. Malmi, “Delphi Study of the Cognitive Skills of Experienced Software Developers,” Informatics Educ., vol. 4, no. 1, pp. 123–142, 2005, doi: 10.15388/infedu.2005.08.

J. E. Montandon and M. T. Valente, “Mining the Technical Skills of Open Source Developers,” An. do Concurs. Teses e Diss. (CTD); 2022 An. do XXXV Concurs. Teses e Diss., pp. 1–10, 2022, doi: 10.5753/ctd.2022.222910.

T. Dey, A. Karnauch, and A. Mockus, “Representation of Developer Expertise in Open Source Software,” in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021, pp. 995–1007. doi: 10.1109/ICSE43902.2021.00094.

C. Li, W. K. Cheung, Y. Ye, X. Zhang, D. Chu, and X. Li, “The Author-Topic-Community model for author interest profiling and community discovery,” Knowl. Inf. Syst., vol. 44, no. 2, pp. 359–383, 2015, doi: 10.1007/s10115-014-0764-9.

V. Kumar and N. Pedanekar, “Mining Shapes of Expertise in Online Social Q&A Communities,” in Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion, in CSCW ’16 Companion. New York, NY, USA: Association for Computing Machinery, 2016, pp. 317–320. doi: 10.1145/2818052.2869096.

A. Askari, S. Verberne, and G. Pasi, “Expert Finding in Legal Community Question Answering BT - Advances in Information Retrieval,” M. Hagen, S. Verberne, C. Macdonald, C. Seifert, K. Balog, K. Nørvåg, and V. Setty, Eds., Cham: Springer International Publishing, 2022, pp. 22–30.

N. Ghasemi, R. Fatourechi, and S. Momtazi, “User Embedding for Expert Finding in Community Question Answering,” ACM Trans. Knowl. Discov. Data, vol. 15, no. 4, Mar. 2021, doi: 10.1145/3441302.

R. Menaha, V. E. Jayanthi, N. Krishnaraj, and N. Praveen sundra kumar, “A Cluster-based Approach for Finding Domain wise Experts in Community Question Answering System,” J. Phys. Conf. Ser., vol. 1767, no. 1, p. 12035, 2021, doi: 10.1088/1742-6596/1767/1/012035.

N. Coulter, “ACM’S computing classification system reflects changing times,” Commun. ACM, vol. 40, no. 12, pp. 111–112, Dec. 1997, doi: 10.1145/265563.265579.

B. Rous, “Major update to ACM’s Computing Classification System,” Commun. ACM, vol. 55, no. 11, p. 12, Nov. 2012, doi: 10.1145/2366316.2366320.

P. Rodríguez and A. Spirling, “Word Embeddings: What Works, What Doesn’t, and How to Tell the Difference for Applied Research,” J. Polit., vol. 84, May 2021, doi: 10.1086/715162.

S. Selva Birunda and R. Kanniga Devi, “A Review on Word Embedding Techniques for Text Classification BT - Innovative Data Communication Technologies and Application,” in Lecture Notes on Data Engineering and Communications Technologies, J. S. Raj, A. M. Iliyasu, R. Bestak, and Z. A. Baig, Eds., Singapore: Springer Singapore, 2021, pp. 267–281.

M. Koroteev, BERT: A Review of Applications in Natural Language Processing and Understanding. 2021. doi: 10.48550/arXiv.2103.11943.

J. Seo, S. Lee, L. Liu, and W. Choi, “TA-SBERT: Token Attention Sentence-BERT for Improving Sentence Representation,” IEEE Access, vol. 10, pp. 39119–39128, 2022, doi: 10.1109/ACCESS.2022.3164769.