Using VBA To Identify Repeated Words In A Passage Of Text

When you're focused on writing an article it's easy to lose track of words that have been repeated. Typically, when you're writing about key words or phrases there's every chance those words will be repeated several times.

But, with Excel and VBA we can report on a passage of text to find any repetitions.

Turning The Text Into An Array

For an example we'll use the opening two paragraphs of this article.

First, we'll turn the text - which we've saved into a string variable called txt - into an array by using the split function with a space as the delimiter. We'll need to add a leading and trailing space to give the first and last words equality.

myTxt=" " & txt & " "

allWords = Split(myText, " ")

' count

allwordsCount = UBound(allWords) + 1

Remember that in VBA arrays have a starting base of 0, unless you use the option base statement to declare it as 1.

Now we need to loop through each word in the array, and see if it is used more than once throughout the text. We can do that by creating another array but using the search word as the delimiter:

For x = 0 To UBound(allWords)

word = allWords(x)

We can create the new array here and use ubound to calculate how many times the word appears in the text; because of the extra spaces we added it's just a straight count. We've added spaces around the search word to take into account words within words, for example "often" and "of".

ct = UBound(Split(myText, " " & word & " "))

Finally, if the count is greater than 1, we'll write the result to the immediate window. We've used the string repeated to record any repetitions so we only report on a repetition once using the instr function.

If ct > 1 And InStr(repeated, word) = 0 Then
debug.Print word & " " & ct
repeated = repeated & word
End If
Next

Here's the final report:

you're 2
on 3
writing 2
an 5
to 2
of 2
words 3
be 2
repeated 2

Enhancements And Problems With The Code

Some issues to think about with this code might include the following:

Dealing with commas and full stopsOnly wanting to compare words of a certain lengthComparing phrases and combinations of words

Variations of the code could cover phrases or multiple word searches, and length could be similarly restricted. The problem with punctuation such as commas and full stops is that any code would view "hello" and "hello," as two different words, so any commas and full stops might have to be removed with the replace function before running the code.

Summary

This code snippet is an example of using VBA to solve a problem seemingly unrelated to rows and columns. It's another reason to find out more about VBA and the numerous ways it can improve your productivity.

Andy L Gibson is a former Web Site programmer rediscovering his interest in Excel applications for small business. His blog at http://solutions4business.wordpress.com/ holds the VBA code for all of his articles on Excel and is available for FREE download. He is developing an Excel application which will contain working examples of VBA methods and solutions.


View the original article here

  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • RSS

0 comments:

Post a Comment