Carleton University
Technical Report TR-203
February 1992

Numeric Similarity and Dissimilarity Measures Between Two Trees

B. J. Oommen, K. Zhang, W. Lee

Abstract

Quantifying the measure of dissimilarity between two trees is a problem of intrinsic importance in the study of algorithms and data structures. But it is also of paramount importance in the fields of structural and syntactic pattern recognition. In this paper we define and formulate an abstract measure of comparison, n(T1, T2), between two trees T1 and T2. This measure is presented in terms of a set of elementary inter-symbol measures c.o(.,.) and two abstract operators <:B and®. By appropriately choosing the concrete values for these two operators and for c.o(.,.), this measure can be used to define the edit distance between two trees and the Size of their Largest Common Sub-Tree (SLCsT). Additionally, a special case of this measure can also be used to quantify Prob(T2IT1), the probability of receiving T2 given that T1 was transmitted across a channel causing independent substitution and deletion errors. Finally, the same abstract measure can be used to quantify the a posteriori probability of T1 being the transmitted tree given that T2 is the received tree containing independent substitution, insertion and deletion errors. Apart from the definition and formulation of the abstract measure n(T1, T2), we have derived the recursive properties of the measure and presented an iterative dynamic programming scheme to compute it. The analysis of the time and space complexities of the algorithm are also included. If for a tree T, Span(T) is defined as the Min { Depth(T), Leaves(T)}, and if the operations EB and ® are assumed to take unit time, the time and space complexities of this algorithm are :

Time: O(IT1l * IT2I * Span(T1) * Span(T2))
Space : O(IT1I * IT2l).

TR-203.pdf