Enquête de l'American Journal of Computer Science and Engineering Libre accès

Abstrait

Automatic Thesaurus Construction for Afaan Oromo

Teshome Debushe

Information retrieval is searching relevant resources on the network that satisfies user information needs. Afaan Oromo is Cushitic sub-family of Afro-Asiatic super family and spoken mainly in Oromia region of Ethiopia and in neighbours. The percentage of Afaan Oromo speakers among Cushitic family is the highest ranked. Thesaurus is one resource that we use as the knowledge resource in many natural language processing systems. Due to this reason, thesaurus is very useful and necessary for many natural language resources, especially for dealing with semantics in Afaan Oromo like other natural language. One of the major problems of Afaan Oromo language is the vocabulary problem that concerns the differences between terms used for describing documents and the terms used by the searchers to describe their information they need. A way of handling the vocabulary problem is by building a thesaurus resource for Afaan Oromo language. In this paper, we apply thesaurus from document collections method to build Automatic Afaan oromoo thesaurus. To build thesaurus, the researcher reviewed related works and tools applicable for thesaurus construction adopted for other languages. The researcher developed sematic vector model to collect semantic similar terms from corpus prepared. Words that has similar semantic relation according to our model is registered in ontology repository. In this study the researcher used small size corpus which were collected from different news and media, Voice of America, Bariisaa, office reports, notices and other office documents and online educational resources that are written in Afaan Oromo and available on the web are collected. Lastly, we evaluate our system performance based on human judgment with continuous bag-of-words and skip-gram model with widow size three and ten respectively. In order to evaluate the performance of our system there is no gold standard. Due to this reason, the researcher used correlation of human judgment with continuous bag-of-words and skip-gram model. The correlation of human judgment with continuous bag-of-words and skip gram model is tested and scored 75.82% and 67.31% respectively. According to our experiment human judgment with continuous bag-of-word, appropriate for Afaan Oromo thesaurus construction.

Avertissement: test