• A System For Health Document Classification Using Machine Learning

  • CHAPTER THREE -- [Total Page(s) 3]

    Page 1 of 3

    1 2 3    Next
    • CHAPTER THREE SYSTEM ANALYSIS AND DESIGN
      3.0    INTRODUCTION
      This chapter shows all the modules and components used to design the system, and how they work together. It also shows us how the users of the system interact with the system.
      3.1    ANALYSIS OF THE EXSISTING SYSTEM
      Currently the existing system is manual, health workers presently classify health documents through stacking of physical files in file cabinets. This makes it difficult to retrieve files when a file of a particular category is required.
      3.2    ANALYSIS OF THE PROPOSED SYSTEM
      System analysis and design deal with planning the development of information systems through understanding and specifying in detail what a system should do and how the components of the system should be implemented and work together. System analysts solve business problems through analyzing the requirements of information systems and designing such systems by applying analysis and design techniques.
      3.2.1    REQUIREMENTS OF THE SYSTEM
      For the system to serve its intended purpose properly, the system will have to meet the following requirements.
      1.    It should be able to accept as input text documents with the following extension .txt, .doc .pdf.
      2.    It should be able to search for defined text in documents.
      3.    It should be able to summarize documents.
      4.    It should be able to categorize and summarize text
      5.    It should be able to tokenize text, carry out stemming and lemmatization.
      6.    It should be able to identify sentences.
      7.    It should able to perform Conference resolution, Word Sense Disambiguation and Sentence Boundary Disambiguation.
      3.3    TRAINING A MODEL
      In machine learning, models are used to train algorithms. The algorithm learns from the model to the point that when it will produce similar result when similar data (similar to the model) is presented to the algorithm. In this project work we make use of the OpenNLP API for document classification. The OpenNLP API is a set of Java tools from the Apache software foundation for carrying out natural language processing which is an aspect of machine learning and is the domain of our project work.
      In other to carry out the classification, we first train a model. Our model is built to identify disease such as malaria, hypertension and diarrhea. We opted to start with these three diseases as a little Google search shows them to be the most common diseases prevalent in Nigeria. In other to construct a model in OpenNLP, you need to create a file of training data. The training file format consists of a series of lines, the first word of the line is the category. The category is followed by text separated by whitespace. We use numerous lines of text containing the words malaria, hypertension and diarrhea which we source online mainly from Wikipedia to create a training file called” en- diseases.train”. The en-diseases.train file is passed to the train method of the DocumentCategorizerME class. The train method trains the file and outputs a model file with a .bin file name extension.
      3.4    CLASSIFYING THE DOCUMENT
      After training, the model file produced will be used to, classify the health documents. The “categorizer” method of the DocumentCategorizerME is used to classify the documents either into Malaria, Diarrhea or Hypertension.
      3.3    USE CASE DIAGRAMS
      The use case diagram is used to show the interaction between the system use cases and its clients without much detail. A use case diagram displays an actor and its use cases, the actors are also the users of the system.
      The users or actors of our document classification system include: Health Worker


  • CHAPTER THREE -- [Total Page(s) 3]

    Page 1 of 3

    1 2 3    Next
    • ABSRACT - [ Total Page(s): 1 ]ABSTRACTDue to the massive increase in medical documents every day (including books, journals, blogs, articles, doctors' instructions and prescriptions, emails from patients, etc.), it is becoming very challenging to handle and to categorize them manually. One of the most challenging projects in information systems is extracting information from unstructured texts, including medical document classification. The discovery of knowledge from medical datasets is important in order to make effective ... Continue reading---

         

      APPENDIX A - [ Total Page(s): 2 ]APPENDIX A ... Continue reading---

         

      APPENDIX C - [ Total Page(s): 1 ]APPENDIX Cen-diseases.trainMalaria is a life-threatening mosquito-borne blood disease caused by a Plasmodium parasite Malaria was eliminated from the U.S. in the early 1950sMalaria is typically spread by mosquitoesMalaria symptoms can be classified into two categoriesMalaria happens when a bite from the female Anopheles mosquito infects the body with PlasmodiumMalaria is a mosquito-borne infectious disease affecting humans and other animals caused by parasitic protozoansMalaria is a mosquito-bor ... Continue reading---

         

      APPENDIX B - [ Total Page(s): 11 ]APPENDIX B ... Continue reading---

         

      CHAPTER ONE - [ Total Page(s): 2 ]CHAPTER ONE1.0    INTRODUCTIONThis chapter introduces the topic of the project work A System for Health Document Classification Using Machine Learning. In this chapter, we will consider the background of the study, statement of the problem, aims and objectives, methodology used to design the system, scope of the study, its significance, definition of terms, and we conclude with the project layout or organization of the project work.1.1    BACKGROUND OF THE STUDYContemporarily, most hospita ... Continue reading---

         

      CHAPTER TWO - [ Total Page(s): 3 ]CHAPTER TWOLITERATURE REVIEW2.0    DOCUMENT CLASSIFICATIONClassification can be divided in two principal phases. The first phase is document representation, and the second phase is classification. The standard document representation used in text classification is the vector space model. The difference of classification systems is in document representation models. The more relevant the representation is, the more relevant the classification will be. The second phase includes learning from tr ... Continue reading---

         

      CHAPTER FOUR - [ Total Page(s): 5 ]CHAPTER FOUR SYSTEM IMPLEMENTATION4.0    INTRODUCTIONAfter careful requirement gathering, analysis and design, the system is implemented. Implementation involves testing the system with required data and observing the results to see if the system has been properly deigned or if it contains bugs. This is usually done with data which has known results. In this chapter we will implement the system designed.4.1    SYSTEM REQUIREMENTSTo implement the application, the computer on which it will r ... Continue reading---

         

      CHAPTER FIVE - [ Total Page(s): 1 ]CHAPTER FIVE SUMMARY AND CONCLUSION5.0    INTRODUCTIONThis chapter summarizes and concludes the project work; it also gives recommendations and insight to future work.5.1    SUMMARYIn this project work we were able to succeed in applying Natural Language Processing which is a branch of Machine Learning to Classifying Health related documents. We made use of the OpenNLP Application Programming Interface which is a Java API for training a model and classifying the documents. We make use of M ... Continue reading---

         

      REFRENCES - [ Total Page(s): 1 ]REFERENCERussell Power, Jay Chen, Trishank Karthik and Lakshminarayanan Subramanian (2018),“Document Classification for Focused Topics” https://cs.nyu.edu/~jchen/publications/aaai4d-power.pdf.Hull D., J. Pedersen, and H. Schutze (1996), “Document routing as statistical classification,” in AAAI Spring Symp. On Machine Learning in Information Access Technical Papers, Palo Alto.Fox C. (1992), “Lexical analysis and stoplist,” in Information Retrieval Data Structur ... Continue reading---