A Systematic and Comparative Analysis of Semantic Search Algorithms
Main Article Content
Abstract
Users often struggle to discover the information they need online because of the massive volume of data that is readily available as well as being generated every day in the today’s digital age. Traditional keyword-based search engines may not be able to handle complex queries, which could result in irrelevant or insufficient search results. This issue can be solved by semantic search, which utilises machine learning and natural language processing to interpret the meaning and context of a user's query. In this paper we focus on analyzing the BM-25 algorithm, Mean of Word Vectors approach, Universal Sentence Encoder model, and Sentence-BERT model on the CISI Dataset for Semantic Search Task. The results indicate that, the Finetuned SBERT model performs the best.
Article Details
References
“Total data volume worldwide 2010-2025 | Statista.” https://www.statista.com/statistics/871513/worldwide-data-created/ (accessed Apr. 25, 2023).
“CISI (a dataset for Information Retrieval) | Kaggle.” https://www.kaggle.com/datasets/dmaso01dsta/cisi-a-dataset-for-information-retrieval (accessed Apr. 25, 2023).
E. Mäkelä, “Survey of Semantic Search Research.” [Online]. Available:
W. Wei, P. M. Barnaghi, and A. Bargiela, “Search with Meanings: An Overview of Semantic Search Systems.” [Online]. Available: http://www.w3.org/TR/owl-guide/
Lee, C.-H. ., Noh, H.-R. ., & Kim, K.-C. . (2023). Design of Torque and Power Density Improvement According to the Rotor Shape of IPMSM. International Journal of Intelligent Systems and Applications in Engineering, 11(4s), 174–179. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2585.
G. Sudeepthi, G. Anuradha, M. B.-I. J. of Computer, and undefined 2012, “A survey on semantic web search engine,” Citeseer, 2012, Accessed: Apr. 25, 2023. [Online].
J. R. Pérez-Agüera, J. Arroyo, J. Greenberg, J. P. Iglesias, and V. Fresno, “Using BM25F for semantic search,” ACM International Conference Proceeding Series, 2010, doi: 10.1145/1863879.1863881.
Thakre, B., Thakre, R., Timande, S., & Sarangpure, V. (2021). An Efficient Data Mining Based Automated Learning Model to Predict Heart Diseases. Machine Learning Applications in Engineering Education and Management, 1(2), 27–33. Retrieved from http://yashikajournals.com/index.php/mlaeem/article/view/17
H. Dong, F. Hussain, E. C.-2008 2nd I. international, and undefined 2008, “A survey in semantic search technologies,” ieeexplore.ieee.org, 2008, Accessed: Apr. 25, 2023. [Online]. Available:
C. Zhai and S. Massung, “Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining June 2016 https://doi. org/10.1145/2915031.2915054,” dl.acm.org, Accessed: Apr. 25, 2023. [Online]..
H. Wu, R. Luk, K. Wong, K. K.-A. T. on, and undefined 2008, “Interpreting tf-idf term weights as making relevance decisions,” dl.acm.org, vol. 26, no. 3, Jun. 2008, doi: 10.1145/1361684.1361686
Sherje, D. N. . (2021). Content Based Image Retrieval Based on Feature Extraction and Classification Using Deep Learning Techniques. Research Journal of Computer Systems and Engineering, 2(1), 16:22. Retrieved from https://technicaljournals.org/RJCSE/index.php/journal/article/view/14
D A. Singhal, C. Buckley, and M. Mitra, “Pivoted document length normalization,” SIGIR Forum (ACM Special Interest Group on Information Retrieval), pp. 21–29, 1996:
S. Robertson, S. W.-S. P. of the Seventeenth, and undefined 1994, “Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval,” Springer, pp. 232–241, Aug. 1994
P. Agrawal, “Exploration of Proximity Heuristics in Length Normalization,” Jan. 2017, [Online]. Available: http://arxiv.org/abs/1701.01417.