Development Of An Information Retrieval System Using Tree-Structured Clustering | CHAPTER TWO

Development Of An Information Retrieval System Using Tree-structured Clustering
[FRSC Benue State]

Request Complete PDF Copy on WhatsApp

CHAPTER TWO -- [Total Page(s) 4]

Page 2 of 4

Previous 1 2 3 4 Next

- 2.3    Hierarchical Agglomerative Clustering
  Hierarchical Agglomerative Clustering (compare) is a similarity based bottom-up clustering technique in which at the beginning every term forms a cluster of its own. Then the algorithm iterates over the step that merges the two most similar clusters still available, until one arrives at a universal cluster that contains all the terms.
  In our experiments, we use three different strategies to calculate the similarity between clusters: complete-, average- and single-linkage. The three strategies may be based on the same similarity measure between terms, i.e. the cosine measure in our experiments, but they measure the similarity between two non-trivial clusters in different ways.
  2.3.1    Single Link: The single link method joins, at each step, the most similar pair of objects that are not yet in the same cluster. It has some attractive theoretical properties and can be implemented relatively efficiently, so it has been widely used. However, it has a tendency toward formation of long straggly clusters, or chaining, which makes it suitable for delineating ellipsoidal clusters but unsuitable for isolating spherical or poorly separated clusters. Single-link clustering defines the distance between two clusters A and B as the minimum distance between their members.
  2.3.2    Group Average Link: As the name implies, the group average link method uses the average values of the pair wise links within a cluster to determine similarity. All objects contribute to intercluster similarity, resulting in a structure intermediate between the loosely bound single link clusters and tightly bound complete link clusters. The group average method has ranked well in evaluative studies of clustering methods.
  2.3.3    Complete Link: The complete link method uses the least similar pair between each of two clusters to determine the intercluster similarity; it is called complete link because all entities in a cluster are linked to one another within some minimum similarity. Small, tightly bound clusters are characteristic of this method. Here the distance between clusters is the maximum distance between their members.
  Two other HACM are sometimes used, the centroid and median methods. In the centroid method, each cluster as it is formed is represented by the coordinates of a group centroid, and at each stage in the clustering the pair of clusters with the most similar mean centroid is merged. The median method is similar but the centroids of the two merging clusters are not weighted proportionally to the size of the clusters. A disadvantage of these two methods is that a newly formed cluster may be more like some point than were its constituent points, resulting in reversals or inversions in the cluster hierarchy.
  2.3.4    General Algorithm for the HACM
  All of the hierarchical agglomerative clustering methods can be described by a general algorithm:
  1.    Identify the two closest points and combine them in a cluster.
  2.    Identify and combine the next two closest points (treating existing clusters as points)
  3.    If more than one cluster remains, return to step 1.
  Individual HACM differ in the way in which the most similar pair is defined, and in the means used to represent a cluster. Lance and Williams (1966) proposed a general combinatorial formula, the Lance-Williams dissimilarity update formula, for calculating dissimilarities between new clusters and existing points, based on the dissimilarities prior to formation of the new cluster.
  The dendrogram is a useful representation when considering retrieval from a clustered set of documents, since it indicates the paths that the retrieval process may follow.

CHAPTER TWO -- [Total Page(s) 4]

Page 2 of 4

Previous 1 2 3 4 Next

- ABSRACT - [ Total Page(s): 1 ]Coming Soon ... Continue reading---
  
  APPENDIX A - [ Total Page(s): 2 ]REGISTRATION PAGE ... Continue reading---
  
  CHAPTER ONE - [ Total Page(s): 2 ]1.3    Justification for the StudyThis study provides a means of easy storage and retrieval of information of vehicles and their owners for the FRSC in Benue State. It eases the stress of searching through the entire directory when retrieving information on an existing record; it will ensure the provision of a clear statistics of vehicle owners in a particular local government in the state. The output of the study shall serve as a benchmark for the Federal Road Safety Corps on the ... Continue reading---
  
  CHAPTER THREE - [ Total Page(s): 7 ]Quality improvement and cost reduction:platform.due to a central communicationv.        Use of Less Space for Record Storage: There will be elimination of much space used in storing records by introducing a computer storage media (disks) which can keep vast volume of information in a less space.vi.Speed Optimization:This will eliminate the problems of time wasting in registering records, checking from one line to the next as well as preparing a revenue report which is faster than using man ... Continue reading---
  
  CHAPTER FOUR - [ Total Page(s): 2 ]CHAPTER FOURRESULT AND IMPLEMENTATION4.1    IntroductionSystems design could be seen as the application of systems theory to product development. According to Wikipedia it is defined as the process of defining the architecture, components, modules, interfaces, and data for a system to satisfy specified requirements.4.2    System RequirementIn developing any system, there is need to specify some system requirements for minimum performance. However, with respect to this work the system requi ... Continue reading---
  
  CHAPTER FIVE - [ Total Page(s): 1 ]CHAPTER FIVESUMMARY, CONCLUSION AND RECOMMENDATION5.1    SUMMARYThis project work is aimed at providing a software model for grouping a set of related records in the Federal Road Safety Commission. The system has been designed to automate data for which vehicle owners are being registered. Consistency, reliability, fairness and quick turnaround time is ensured with the use of this system. Based on the model used in this software, further improvements can be made in order to include other feat ... Continue reading---
  
  REFRENCES - [ Total Page(s): 1 ]REFERENCES1.    William B. Frakes and Ricardo Baeza-Yates.(1992). Information Retrieval    Data Structures & Algorithms. Prentice-Hall, Inc. ISBN 0-13-463837-9.2.    Ahmad, A. and Dey, L. (2007). A method to compute distance between two categorical values of some attributes in unsupervised learning for categorical data set.3.    Anderberg M.R. (1973). Cluster Analysis for Applications. Academic Press, New York.4.        Chandola Varun, Boriah Shyam and Kumar Vipin (2007). Simil ... Continue reading---
Request Complete PDF Copy on WhatsApp

Development Of An Information Retrieval System Using Tree-structured Clustering [FRSC Benue State]

CHAPTER TWO -- [Total Page(s) 4]

CHAPTER TWO -- [Total Page(s) 4]

Development Of An Information Retrieval System Using Tree-structured Clustering
[FRSC Benue State]