Record Details

Effect of information gain on document classification using k-nearest neighbor

Register: Jurnal Ilmiah Teknologi Sistem Informasi

View Archive Info
 
 
Field Value
 
Title Effect of information gain on document classification using k-nearest neighbor
 
Creator Perwira, Rifki Indra
Yuwono, Bambang
Siswoyo, Risya Ines Putri
Liantoni, Febri
Himawan, Hidayatulah
 
Subject classification; feature selection; information gain; k-Nearest Neighbor; TF-IDF document
 
Description State universities have a library as a facility to support students’ education and science, which contains various books, journals, and final assignments. An intelligent system for classifying documents is needed to ease library visitors in higher education as a form of service to students. The documents that are in the library are generally the result of research. Various complaints related to the imbalance of data texts and categories based on irrelevant document titles and words that have the ambiguity of meaning when searching for documents are the main reasons for the need for a classification system. This research uses k-Nearest Neighbor (k-NN) to categorize documents based on study interests with information gain features selection to handle unbalanced data and cosine similarity to measure the distance between test and training data. Based on the results of tests conducted with 276 training data, the highest results using the information gain selection feature using 80% training data and 20% test data produce an accuracy of 87.5% with a parameter value of k=5. The highest accuracy results of 92.9% are achieved without information gain feature selection, with the proportion of training data of 90% and 10% test data and parameters k=5, 7, and 9. This paper concludes that without information gain feature selection, the system has better accuracy than using the feature selection because every word in the document title is considered to have an essential role in forming the classification.
 
Publisher Information Systems - Universitas Pesantren Tinggi Darul Ulum
 
Contributor
 
Date 2022-01-05
 
Type info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Peer-reviewed Article
 
Format application/pdf
 
Identifier https://journal.unipdu.ac.id/index.php/register/article/view/2397
10.26594/register.v8i1.2397
 
Source Register: Jurnal Ilmiah Teknologi Sistem Informasi; Vol 8, No 1 (2022): January; 50-57
Register: Jurnal Ilmiah Teknologi Sistem Informasi; Vol 8, No 1 (2022): January; 50-57
2502-3357
2503-0477
10.26594/register.v8i1
 
Language eng
 
Relation https://journal.unipdu.ac.id/index.php/register/article/view/2397/pdf
 
Rights Copyright (c) 2022 The Authors
http://creativecommons.org/licenses/by-nc-sa/4.0