CHAPTER ONE INTRODUCTION
1.1 Background of the study
An Information Retrieval System is a system that is capable of storage, retrieval and maintenance of information, the general objective of an Information Retrieval System is to minimize the overhead of a user locating needed information. Overhead can be expressed as the time a user spends in all of the steps leading to reading an item containing the needed information, the two major measures commonly associated with information systems are precision and recall. Information Retrieval (IR) is a large and growing field within Natural Language Processing (Magnus,2006). A cluster or allocation unit as it was formally called is referred to as the smallest logical amount of disk space that can be allocated to hold a file or directory. Hence, cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called clusters) are more similar (in some sense or another) to each other than those in other groups. It is a main task of exploratory data mining, and a common technique to statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval and bioinformatics (Linus, 2014). Cluster analysis itself is not one specific algorithm but the general task to be solved.
Cluster analysis is a technique that assigns items to automatically created groups based on a calculation of the degree of association between items and groups. In the information retrieval (IR) field, cluster analysis has been used to create groups of documents with the goal of improving the efficiency and effectiveness of retrieval, or to determine the structure of the literature of a field. The terms in a document collection can also be clustered to show their relationships. The two main types of cluster analysis methods are the nonhierarchical, which divide a data set of N items into M clusters, and the hierarchical, which produce a nested data set in which pairs of items or clusters are successively linked. The nonhierarchical methods such as the single pass and reallocation methods are heuristic in nature and require less computation than the hierarchical methods. Clustered files are often suggested as a way to cut down search time in similarity-based systems (Caroline and Stephen). In such an organization, similar documents are grouped together in clusters, and only the most promising clusters are examined.
The cluster hypothesis states the fundamental assumption we make when using
clustering in information retrieval.
Cluster hypothesis. “Documents in the same cluster behave similarly with respect to relevance to information needs.†The hypothesis states that if there is a document from a cluster that is relevant to a search request, then it is likely that other documents from the same cluster are also relevant (Linus, 2014). This is because clustering puts together documents that share many terms. In both cases, we posit that similar documents behave similarly with respect to relevance.
Tree clustering is a form of clustering algorithm that joins together objects successively into clusters, using some measures of similarity or distance. A typical example of this kind of clustering is the hierarchical tree. Hierarchical clustering is based on the core idea of objects being more related to nearby objects than to objects farther away. As such these algorithms connect objects to form clusters based on their distances.
1.2 Statement of the Problem
Although, Benue state is still a developing state but this have not really affected the increasing number of vehicles owners in the state, and this means more work for the federal road safety corps in Benue state, there is need for a clear statistics of vehicle owners in a particular local government and the Benue state in general, to combat the menace of fake vehicle registration, false driving license, vehicle theft and so on, and ensuring road rules and regulation are kept by road users through proper registration and monitoring, and to achieve this, a means of advance storage, processing and easy retrieval of information system is required , with this in mind, this study becomes very necessary as it will improve vehicle registration process as well as ensure quick and easy access to registered vehicles and their owners information by the FRSC anywhere and anytime in the state, this system will to a large extent reduce the challenges and restriction associated with the use of the manual process of registering vehicles.