• Data Classification Using Various Learning Algorithms

  • CHAPTER ONE -- [Total Page(s) 2]

    Page 2 of 2

    Previous   1 2
    • 1.2 Problem Statement

      In data classification, the corresponding label (class) for any valid input is predicted based on a number of training examples referred to as "training set”. This is achieved using a classifier model learning algorithm is applied to a training set made up of past examples having the same set of attributes with the unseen example [8], [12]. However, before starting the training, the label of each example in the "training set" is known [19].

      To build a classifier model, an eager learner attempts to construct a general rule in the training phase which will subsequently be used in classifying unseen instances while a lazy learner delays the process until it is presented with an unseen instance [13]. The main disadvantage in eager learning is the long time which the learner takes in constructing the classification model but after the model is constructed, an eager learner is very fast in classifying unseen instances, while for a lazy learner, the disadvantage is the amount of space it consumes in memory and the time it takes during the classification [17]. This makes dimensionality reduction a very crucial preprocessing step because it facilitates classification, and compression of high-dimensional data and thus conserves memory and provides a compact representation of an original high-dimensional data [5].

      Researches have been conducted on how dimensionality reduction techniques affect the performance of classifiers [20]–[22]. However, very little attention is given to the extent to which these reduction techniques facilitate and preserve classification. Therefore, this thesis attempts to advance the research by investigating the extent to which dimensionality reduction preserves the classification of weather dataset, student dataset and the ionosphere dataset obtained from "UCI machine learning repository", in order to fill the gap in literature and provide steps for further research in the area of machine learning.


      1.3 Aim and Objectives

      The aim of this research is to investigate the extent to which dimensionality reduction techniques preserve classification.

       The objectives of the research are as follows:

      1. Implementation of fifteen dimensionality reduction techniques and using these techniques to reduce the weather and student datasets, as well as the ionosphere dataset obtained from the UCI machine learning repository [23].

      2. Implementation of the perceptron classification algorithm and using it to classify the data points of a two-class dataset. It shall also be applied to the datasets reduced from this two-class dataset using the techniques above, and comparisons will be made to determine the extent to which the reduction methods preserve the classification of the original dataset using the perceptron.

      3. Implementation of the k-Nearest Neighbors classification algorithm and comparing the performance of the dimensionality reduction techniques on preserving the classification of a dataset by the k-nearest neighbors and perceptron classification algorithms.

      4. Using confusion matrices to show the extent to which each dimensionality reduction method preserves classification of the original datasets and make comparisons with each other.


      1.4 Scope and Limitations

      This project is limited to showing the extent to which each of the dimensionality reduction methods implemented in this thesis preserves the classification of the original datasets by the perceptron and k-Nearest Neighbors classification algorithms. Accuracy will be used as the performance measure for showing the extent of the classification preservation, and this shall be obtained using confusion matrices.

       

      1.5 Thesis Structure

      This thesis consists of five chapters. Chapter 1 introduces dimensionality reduction and discusses its importance and applications to machine learning tasks. After presenting the problem to be addressed, the aim of the research is stated and the objectives are outlined.

      Chapter 2 presents a review of literature related to dimensionality reduction and machine learning in general. Existing literature on single layer neural network is reviewed.

      Chapter 3 describes the methodology used in this thesis. It discusses the methods in detail and explains how they are applied in achieving the objectives of the thesis. The results obtained from the methodology is presented and discussed fully in Chapter 4.

      Chapter 5, which is the final chapter, provides a summary of the work and the results obtained in this thesis, concludes the research

  • CHAPTER ONE -- [Total Page(s) 2]

    Page 2 of 2

    Previous   1 2
    • ABSRACT - [ Total Page(s): 1 ]Dimensionality reduction provides a compact representation of an original high-dimensional data, which means the reduced data is free from any further processing and only the vital information is retained. For this reason, it is an invaluable preprocessing step before the application of many machine learning algorithms that perform poorly on high-dimensional data. In this thesis, the perceptron classification algorithm – an eager learner - is applied to three two-class datasets (Student, ... Continue reading---