Carleton University
Technical Report TR-15
January 1983

Similarity Measures for Sets of String

R.L. Kashyap & B.J. Oommen

Abstract

In the companion paper[3], we have presented a common basis for many of the similarity and dissimilarity measures involving a pair of strings. In this paper, we extend the results to capture various numerical and nonnumer­ical measures involving more than two strings. A measure D(X,Y, ••• ,Z) has been defined involving the set of strings {X,Y, ••• Z} in terms of two abstract operators• and 8 and a function 6<•,·> which has as many arguments as there are strings in the set {X,Y, ••• ,Z}. The quantity D(X,Y, ••• ,Z) represents various numerical and nonnumerical quantities involving {X,Y, ••• ,Z} such as Length of their Longest Common Subsequence, <LLCS) the Length of their Shortest Common Supersequence, CLSCS) the set of their com­mon subsequences, the set of their common supersequences and the set of their shuffles. The computational properties of D(X,Y, ••• ,z> have also been discussed.

Download

TR-15.pdf