Future of Information and Communication Conference (FICC) 2024
4-5 April 2024
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 15 Issue 4, 2024.
Abstract: Discovering similarity between sentences can be beneficial to a variety of systems, including chatbots for customer support, educational platforms, e-commerce customer inquiries, and community forums or question-answering systems. One of the primary issues that online question-answering platforms and customer service chatbots have is the large number of duplicate inquiries that are placed on the platform. In addition to cluttering up the platform, these repetitive queries degrade the content's quality and make it harder for visitors to locate pertinent information. Therefore, it is necessary to automatically detect sentence similarity in order to improve the user experience and quickly match user expectations. The present study makes use of the Quora dataset to construct a framework for similarity discovery in sentence pairs. As part of our research, we have built additional attributes based on textual data for improving the accuracy of similarity prediction. The study investigates several vectorization methods and their influence on accuracy. To convert preprocess text input to a numerical vector, we implemented Word2Vec, FastText, Term Frequency-Inverse Document Frequency (TF-IDF), CountVectorizer (CV), and OpenAI embedding. In order to judge sentence similarity, the embedding offered by several approaches was used with various models, including cosine similarity, Random Forest (RF), AdaBoost, XGBoost, LSTM, and CNN. The result demonstrates that all algorithms trained on OpenAI embedding yield excellent outcomes. The OpenAI-created embedding offers excellent information to models trained on it and has significant potential for capturing sentence similarity.
Nilesh B. Korade, Mahendra B. Salunke, Amol A. Bhosle, Prashant B. Kumbharkar, Gayatri G. Asalkar and Rutuja G. Khedkar, “Strengthening Sentence Similarity Identification Through OpenAI Embeddings and Deep Learning” International Journal of Advanced Computer Science and Applications(IJACSA), 15(4), 2024. http://dx.doi.org/10.14569/IJACSA.2024.0150485
@article{Korade2024,
title = {Strengthening Sentence Similarity Identification Through OpenAI Embeddings and Deep Learning},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2024.0150485},
url = {http://dx.doi.org/10.14569/IJACSA.2024.0150485},
year = {2024},
publisher = {The Science and Information Organization},
volume = {15},
number = {4},
author = {Nilesh B. Korade and Mahendra B. Salunke and Amol A. Bhosle and Prashant B. Kumbharkar and Gayatri G. Asalkar and Rutuja G. Khedkar}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.