Document Type : Review Paper

Authors

1 Ph.D. Graduate of Knowledge and Information Sciences, University of Tehran, Tehran, Iran

2 Professor, Department of Knowledge & Information Science, University of Qom, Qom, Iran

3 PhD Candidate in Knowledge & Information Science, University of Qom, Qom, Iran

Abstract

Introduction

Recently, the development of artificial intelligence and human-computer interaction has highlighted the increasing importance of language challenges in information retrieval. The crucial role of language in disseminating, accessing, and retrieving information cannot be studied independently of syntax and semantics. Explaining and describing research in this field from both quantitative and qualitative perspectives, and understanding researchers' trends, is an important step in comprehending the significance of syntax and semantics in communication structures within modern information search and retrieval environments. Consequently, in this descriptive and analytical study, we conducted qualitative and quantitative analyses of studies in the field of syntax and semantics in information retrieval.

Literature Review

In recent years, there has been a lot of interdisciplinary research focusing on investigating the impact of language on the interaction between users and the web environment. These studies have discussed the language from various perspectives and have explored information retrieval across different types of information media, including web databases, search engines, commercial websites, and libraries. Tapsai (2019), Norouzi and Hamavandi (2018), Hammo (2009), Lazarinis (2008), Ofoghi, Yearwood & Ghosh (2006) have focused on different languages such as Persian, English, Arabic, and Greek. The findings show that the syntax and morphology, as well as the semantics of searched terms and phrases, have a significant impact on the retrieval of results. In addition, search tools tend to rely more on the general form of words instead of focusing on the real needs of users in order to improve the search process.
Due to the huge amount of information on the World Wide Web and the challenges related to information retrieval, researchers and software developers have turned to the Semantic Web to keep up with the changes. The Semantic Web has provided a large amount of structured and machine-understandable information on a wide range of topics (Guha, McCool & Miller, 2003). Semantic models perform well in identifying and recognizing synonyms, similar words, and semantic frameworks. Therefore, one of the most important challenges in the field of information storage and retrieval is to bridge the gap between the language used by information seekers and information providers (Rezaee Sharifabadi et al., 2010).
The current study aims to systematically review previous research findings on syntax and semantics in information storage and retrieval across different contexts. Each context represents different dimensions of knowledge representation systems, from traditional to semantics. Upon reviewing the research, it was found that no systematic review has been conducted with a focus on syntax and semantics in the field of information retrieval.

Methodology

In this qualitative research using Aveyard’s systematic review method, we aim to address the following questions:

What is the statistical status of studies in the field of syntax and semantics in storing and retrieving information?
What are the main subject areas that researchers have focused on in studies related to syntax and semantics in storing and retrieving information?
What research methods and approaches have researchers employed in this field?
What are the research gaps and areas that require further study in this field?

To gather relevant sources from information databases, we selected search keywords based on the research questions. Then, we used search strategies and various operators to combine the keywords and phrases, ensuring a comprehensive and effective search in Persian databases such as Magiran, Irandoc, SID, NoorMagz, ISC, and Civilica, as well as databases including Scopus, Emerald, ProQuest, and Google Scholar. There was no time limitation for the search. We recorded accepted sources such as articles and theses that were relevant and valid. By removing irrelevant and duplicate sources, we selected 12 Persian sources and 42 English sources. After categorizing, the studies were analyzed according to the type of source, research method, and tool. The results of the analysis of the studies were presented in the form of tables and graphs.

Results

The selected studies were categorized into three groups (their characteristics were described in detail): information retrieval, information organization, and information search based on the analysis of keywords and subjects raised in the sources. The results of the study revealed that among the 54 reviewed studies, Iranian researchers had conducted the most research in the field of syntax and semantics in information retrieval, with 12 studies. The United States followed with 5 studies, and China and Vietnam tied for third place with 4 studies each. The majority of the studies focused on syntax and semantics in information retrieval.

Discussion

Analysis of 54 selected studies has shown that these studies were conducted over a period of 26 years. The oldest study was included in the review back to 1997, while the most recent one is from 2022. This shows the dynamic nature of the field under investigation and demonstrates how it is constantly changing and being influenced by the advancement of web technologies. Furthermore, a thematic analysis of the research, based on the studies' keywords, reveals that "Ontology," as a tool of the semantic web, is closely linked to the semantic and syntax aspects of language in information retrieval.
Moreover, in a total of 54 studies, the majority were experimental (19), followed by applied (15) and analytical (9). Additionally, there were 6 studies that combined applied and analytical methods. Content analysis and comparative analysis each had 2 instances, while case studies were the least frequent with only 1 case. These studies have utilized tools such as ontology, search engines, and techniques including natural language processing, annotation, tagging, and indexing.
The discussion about exploring syntax and semantics in relation to information retrieval across different languages is believed to make a significant contribution to the development of future research literature in this field. This is because users’ native language plays a central role in forming search terms for information retrieval, based on subjective meanings, context, and content. Considering this point can effectively enhance information retrieval systems.

Conclusion

Although many studies have addressed various aspects of syntax and semantics in information retrieval, more research is needed to investigate syntax and semantics in information organization. It is also important to delve into and analyze their theoretical aspects in information retrieval, especially through interdisciplinary studies.
Moreover, the interconnectedness of various areas of study demonstrates the close relationship between syntax and semantics and linguistic issues in nearly every field that involves organizing, storing, and retrieving information. These areas include the study of syntax and semantics in relation to environmental sensors, indexing, identification and summarization of texts, plagiarism detection, natural language processing, repositories, improvement of query users and data retrieval in repositories and search engines, metadata enrichment, and image retrieval. The results of this research, such as its main themes, identified methods and approaches, and research gaps, can offer valuable insights for future studies.

Keywords

Main Subjects

Akhshik, S. S., Negahdari, K., & Emami, A. (2022). Persian Writing in GANJ: Investigating the Impact of Morphology, Semantics, and Writing Style on Iran's Treasure of Scientific and Technical Information. Sciences and Techniques of Information Management, 8(1), 193-220. DOI: 10.22091/stim.2021.6418.1505 [In Persian]
Aveyard, H. (2007). Doing a Literature Review in Health and Social Care. Translated by Pouria Sarami Forooshani & Fardin Alipour Gravand (2011). Tehran: Jameshenasan. [In Persian]
Abutorabi Gudarzi, H. (2012). Persian-English Cross-Language Information Retrieval: survey and improvement feasibility [Masters Thesis, Shahed University, Tehran] [In Persian]
Dorri, R. (2015). Comparison and Evaluation of Semantic Search Engines. Iranian Journal of Information Processing and Management, 30(2), 467-490. DOI: 10.35050/JIPM010.2015.044 [In Persian]
Rezai-Sharifabadi, S., Khosravi, A., Haji Zainalabedini, M. (2007). The Feasibility of Subject Authority Control of Persian Medical Databases available on the Web. Educational and Psychological Studies, 8(3), 183-201. [In Persian]
Rafiee, A.R. (2015).  Synonym and Antonym Extraction [Masters Thesis, Languages and Linguistics Center, Sharif University of Technology, Tehran] [In Persian]
Sadoughi, F., Valinejadi, A., Hassanzadeh, H. M., Bouraghi, H., & Pasyar, P. (2012). Image Semantic Retrieval Challenges and Thesauri Modern Applications. Iranian Journal of Information Processing and Management, 27(3), 641-666. [In Persian]
Ghayoomi, M. (2019). Identifying Persian Words Senses Automatically by Utilizing the Word Embedding Method. Iranian Journal of Information Processing and Management, 35(1), 25-50. DOI: 10.35050/JIPM010.2019.001 [In Persian]
Kamyar, H. (2011). Novel Semantic Term Weighting Approach in Text
Processing Applications [Masters Thesis, Ferdowsi University of Mashhad, Mashhad] [In Persian]
Karimi, E., Babaee, M., Hosseini Beheshti, M. (2018). Analysis of User query refinement behavior based on semantic features: user log analysis of Ganj database (IranDoc). Human Information Interaction, 5 (3), 1-14. [In Persian]
Mohammadian Jadval Ghadam, F. (2012). Provide a method for antomatic text Summarization Farsi [Masters Thesis, University of Isfahan, Isfahan] [In Persian]
Zolfaghar Kondori, Z., & Mosavi Miangah, T. (2015). Lexical Disambiguation of Polysemeous Adjectives in MT: A Corpus Based Study. Iranian Journal of Information Processing and Management, 30(3), 719-735. DOI: 10.35050/JIPM010.2015.030 [In Persian]
Nassiri, T., & Khandan, M. (2016). Bertrand Russell’s Mathematical Logic and Its Implications in Information Organization and Subject Headings. Librarianship and Information Organization Studies, 27(1), 7-24. [In Persian]
Nasiri, T., Riahinia, N., Neshat, N., Shaghaghi, M., Rasoli Poor, R. (2021). Carnapean Modal Semantics and its Implications for Ontology-Based Information Retrieval. Human Information Interaction, 8 (3), 1-8.
URL: http://hii.khu.ac.ir/article-1-3014-fa.html [In Persian]
Nojavan Aghdaragh, B. (2013). Similar sentences detection system for Multi - document summarization [Masters Thesis, University College of Nabi Akram] [In Persian]
Norouzi, Y. (2019). Context and Meaning in Information Retrieval: Emphasis on Mother Tongue. Iranian Journal of Information Processing and Management, 35(1), 1-24. DOI: 10.35050/JIPM010.2019.063 [In Persian]
Vickery, A., & Vickery, B. C. (1989). Information Science in Theory and Practice. Translated by Abdulhossein Farajpahlou (1380). Mashhad: Ferdowsi University of Mashhad. [In Persian]
Hooshyar, M. (2016). The comparison of powers of different kinds of text
contexts in sense disambiguation of English homographs [Masters Thesis, Shiraz University, Shiraz] [In Persian]