• Data Classification Using Various Learning Algorithms

  • CHAPTER ONE -- [Total Page(s) 2]

    Page 1 of 2

    1 2    Next
    • INTRODUCTION

      1.1 Background of the Study

      Data volumes and variety are increasing at an alarming rate making very tedious any attempt to glean useful information from these large data sets. Extracting or mining useful information and hidden patterns from the data is becoming more and more important but can be very challenging at the same time [1]. A lot of research done in domains like Biology, Astronomy, Engineering, Consumer Transactions and Agriculture, deal with extensive sets of observations daily. Traditional statistical techniques encounter some challenges in analyzing these datasets due to their large sizes. The biggest challenge is the number of variables (dimensions) associated with each observation. However, not all dimensions are required to understand the phenomenon under investigation in high-dimensional datasets; this means that reducing the dimension of the dataset can improve accuracy and efficiency of the analysis [2]. In other words, it is of great help if we can map a set of points, say n, in d-dimensional space into a p-dimensional space -where p << d-

      so that the inherent properties of that set of points, such as their inter-point distances, their labels, etc., does not suffer great distortion. This process is known as Dimensionality reduction [3].

      A lot of methods exist for reducing the dimensionality of data. There are two categories of these methods; in the first category, each attribute in the reduced dataset is a linear combination of the attributes of the original dataset. In the second category, the set of attributes in the reduced dataset is a subset of the set of attributes in the original dataset [4]. Techniques belonging to the first category include Random Projection (RP), Singular Value Decomposition (SVD), Principal Component Analysis (PCA), and so on; while techniques in the second category include but are not limited to the Combined Approach (CA), Direct Approach (DA), Variance Approach (Var),

       

      New Top-Down Approach (NTDn), New Bottom-Up Approach (NBUp), New Top-Down Approach (modified version) and New Bottom-Up Approach (modified version) [5].

      Dimensionality reduction provides a compact representation of an original high-dimensional data, which means the reduced data is free from any further processing and only the vital information is retained, so it can be used with many machine learning algorithms that perform poorly on high- dimensional data [6]. Calculation of inter-point distances is essential for many machine learning tasks and when the dimensionality increases, it has been proved that "the distance of a sample point to the nearest point becomes very similar to the distance of the sample point to the most distant point", thereby deteriorating the performance of machine learning algorithms [7]. Therefore, dimensionality reduction is an invaluable preprocessing step before the application of many machine learning algorithms.

      Machine learning is a scientific field in which computer systems can automatically and intelligently learn their computation and improve on it through experience [8], [9]. Machine learning algorithms are of two main types: supervised learning algorithms and unsupervised learning algorithms. These algorithms have been used in solving a lot of complex real-world problems [10], [11]. In unsupervised learning, the set of observations are categorized into groups (clusters) basing the categorization on the similarity between them. This categorization is otherwise known as clustering [8]. Many clustering algorithms exist, among which k-means clustering is the most famous for a large number of observations [12].

      Unlike clustering, classification is a supervised learning method in which the corresponding label for any valid input is predicted based on a number of training examples referred to as "training set," [8], [12]. Classification algorithms can further be categorized into eager and lazy learners, and this investigation considers one from each category. Eager learning algorithms attempt to construct a general rule or create a generalization during the training phase which can further be used in classifying unseen instances [13]. Example of eager learners includes decision trees, support vector machine, and the perceptron.

      The perceptron, an eager learner, is one of the earliest and simplest of all classification algorithms invented by Rosenblatt [14], basically used for classifying each point of a data set into either a positive or a negative label (1 or -1, good or bad, hot or cold, man or woman, etc.) [15]. It is interesting to know that in its basic form, it is still as valid as when it was first published [16].

      On the other hand, a lazy learner delays any generalization or model construction until it is presented with an unseen instance to be classified [17]. This idea of not conducting any processing until a lazy learner is presented with an unseen instance makes the learner to require a lot of space in memory for storing the whole of the training instances and processing them each time it is presented with a new unseen instance. Example of a lazy learner is the k-nearest neighbor classifier [18]. In this algorithm, the result/label of any given instance is predicted based on the label most common to its k nearest neighbors, k, in this case, is a user-defined positive integer, normally with a small value [15].

  • CHAPTER ONE -- [Total Page(s) 2]

    Page 1 of 2

    1 2    Next
    • ABSRACT - [ Total Page(s): 1 ]Dimensionality reduction provides a compact representation of an original high-dimensional data, which means the reduced data is free from any further processing and only the vital information is retained. For this reason, it is an invaluable preprocessing step before the application of many machine learning algorithms that perform poorly on high-dimensional data. In this thesis, the perceptron classification algorithm – an eager learner - is applied to three two-class datasets (Student, ... Continue reading---