Carleton University
Technical Report TR-97-09
May 1997

TR-97-09: Designing Syntactic Pattern Classifiers Using Vector Quantization and Parametric String Editing

B. John Oommen & R.K.S. Loke

Abstract

We consider a fundamental problem in Syntactic Pattern Recognition (PR) in which we are required to recognize a string from its noisy version. We assume that the system has a dictionary which is a collection of all the ideal representations of the objects in question. When a noisy sample has to be processed, the system compares it with every element in the dictionary based on a nearest-neighbor philosophy. This is typically achieved using three standard edit operations -substitution, insertion and deletion. To accomplish this, one usually assigns a distance for the elementary symbol operations, d(., .), and the inter-pattern distance, D(., .), is computed as a function of these symbol edit distances. In this paper we consider the assignment of the inter­symbol distances in terms of the novel and interesting assignments -the parametric distances – recently introduced by Bunke et al. [4]. We show how the classifier can be trained to get the optimal parametric distance using vector quantization in the meta-space, and report classification results after such a training process. In all our experiments, the training was typically achieved in a very few iterations. The subsequent classification accuracy we obtained using this single­parameter scheme was 96.13 %. The power of the scheme is obvious if we compare it to 96.67%, which is the accuracy of the scheme which uses the complete array of inter-symbol distances derived from a knowledge of all the confusion probabilities.

TR-97-09.pdf