CHAPTER ONE
1.0 INTRODUCTION
This chapter introduces the topic of the project work A System for Health Document Classification Using Machine Learning. In this chapter, we will consider the background of the study, statement of the problem, aims and objectives, methodology used to design the system, scope of the study, its significance, definition of terms, and we conclude with the project layout or organization of the project work.
1.1 BACKGROUND OF THE STUDY
Contemporarily, most hospitals, medical laboratories and other health facilities make use of some kind of information system. These could be either a hospital management system or a pharmacy management system. Among other functions that these systems provide, they are mainly used in collecting patient records. These information systems stores patient records in digital format. Numerous patient data are being recorded on a daily basis which forms a large data set popularly referred to as “Big Dataâ€.
Every day physicians and other health workers are required to work with this “Big Data†in other to provide solution. Some of the everyday tasks include information retrieval and data mining. Retrieving information from big data can be very laborious and time consuming. This has given rise to the study of text or document classification in other to aid the process of retrieving information from big data. Today, text classification is a necessity due to the very large amount of text documents that we have to deal with daily.
Document classification is the task of grouping documents into categories based upon their content. Document classification is a significant learning problem that is at the core of many information management and retrieval tasks. Document classification performs an essential role in various applications that deals with organizing, classifying, searching and concisely representing a significant amount of information. Document classification is a longstanding problem in information retrieval which has been well studied (Russell, 2018).
Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. Machine learning approaches to classification suggest the automatic construction of classifiers using induction over pre-classified sample documents. In this project work we will employ machine learning in classifying health documents.
1.2 STATEMENT OF THE PROBLEM
With the explosion of information fuelled by the growth of the World Wide Web it is no longer feasible for a human observer to understand all the data coming in or even classify it into categories. Also in the health sector, numerous patient records are being collected everyday and are used for analysis. How do we efficiently classify or categorize these health documents to complement easy retrieval.
1.3 AIM AND OBJECTIVES OF THE STUDY
The aim of this project is to develop A System for Health Document Classification Using Machine Learning.
Other objectives include:
1. Study the various machine learning classification algorithm.
2. Implement classification algorithm in JAVA.