[OCLUG-devel] Matching Strings

James Colannino james at colannino.org
Thu Aug 4 13:57:16 PDT 2005


Hey everyone.  I'm writing a C program that will report either duplicate 
strings in a file or strings that are the same up to n characters.  For 
example, let's say that I have the following file named text.txt:

albert
albert
albany
bert
biscuit
true
trouble
lone

If I were to do the following:

match text.txt

It would search for complete matches and should output the following:

albert
albert

If I were to do the following:

match -n 3 text.txt

It should output the following:

albert
albert
albany

But instead, it outputs this:

albert
albert
albany
albert
albany

I know why it does this.  I look for matches in the following way:

read string
search rest of file below it for match
read next string
search rest of file below it for match

Since albert is there twice, it reads the first instance of albert, 
outputs all matches below it, reads the second albert, and outputs all 
the matches below it, and so on.

My question is, how can I write my program in such a way that these only 
show up once?  In essence, I'd like it to report:

albert
albert
albany

Instead of:

albert
albert
albany
albert
albany

Here's a link to the source (it's probably not very good, but I tried to 
structure and comment it in such a way that it's at least easy to read 
-- I hope...)

http://james.colannino.org/match.c

Any help would be greatly appreciated :)

James


More information about the OCLUG-devel mailing list