Windows DevCenter    
 Published on Windows DevCenter (http://www.windowsdevcenter.com/)
 See this if you're having trouble printing code examples


Building Simple Lists Using Strings in VBA

by Andrew Savikas, author of Word Hacks
11/16/2004

Visual Basic for Applications (VBA), the language used for scripting Microsoft Word, isn't really known for its string-processing capabilities. You won't even find regular expression support in VBA (though you can bolt it on by referencing VBScript's RegExp object, something I discuss in my book, Word Hacks). But sometimes some string hacking is a quick and convenient way to solve a problem, and the string functions VBA does provide are often up to the task.

Using Strings for Simple Lists

I often write macros that look for paragraphs styled with one of several different styles, such as any paragraph that's styled as Heading 1, List Bullet, or Body Text Indent 3. A traditional approach to this would be to build a simple array of those three styles, and then test each paragraph's style for membership in the array, as the following code shows. Note that an underscore ( _ ) is the line-continuation character in VBA, indicating an optional line break.


Sub LookForSomeParas()
Dim vParasToFind() As Variant
Dim k As Integer
Dim bIsInList As Boolean
Dim para As Paragraph
  
vParasToFind = Array("Heading 1", _
			"List Bullet", _
			"Body Text Indent 3")
  
For Each para In ActiveDocument.Paragraphs
	bIsInList = False
	For k = 0 To UBound(vParasToFind)
		If para.Style = vParasToFind(k) Then
			bIsInList = True
			Exit For
		End If
	Next k
	If bIsInList Then
		' Do stuff to paragraph
	End If
Next para
End Sub	

A different method, and one that'll have you typing less than half the lines of code, uses a string to store the list of styles, as this snippet shows:


Dim sParasToFind as String
sParasToFind = _
	"/Heading 1/List Bullet/Body Text Indent 3/"

To use this kind of list, a macro would visit each paragraph, then check the list to see whether the paragraph's style is in it, using the built-in VBA InStr function. If the string doesn't appear in this string list, InStr returns 0.

To be sure you don't get a false match when one style's name is actually part of another's (accidentally flagging Body Text when you're looking for Body Text Indent 3, for instance), a delimiter is included to mark the beginning and end of each entry in the string list.

Here's the complete macro:


Sub LookForSomeParasUsingAStringList()
Dim sParasToFind As String
Dim para As Paragraph
 
sParasToFind = _
	"/Heading 1/List Bullet/Body Text Indent 3/" 
For Each para In ActiveDocument.Paragraphs
	If InStr(sParasToFind, _
			"/" & para.Style & "/") <> 0 Then
		' Do stuff to para here
	End If
Next para 
End Sub

While this may not be the best choice in every situation, it's a handy hack for quick-and-dirty list-membership tests, and as a bonus, it's more than twice as fast as the previous method.

I've used a slash ( / ) as the delimiter in this example, because it seems like a logical choice that most people can quickly recognize as a separator. But in the case of testing names of Word styles, it's actually not the best choice. Why? Well, / is a perfectly valid character for use in a style name, which could cause incorrect results for the membership test. When using this technique, it's best to choose a character that won't appear anywhere in any of the entries in the list, or any of the items you'll be checking for membership in that list. A better choice for this particular example would be a semicolon, which Word doesn't allow as a style name. Making that change results in the following:


Sub LookForSomeParasUsingString()
Dim sParasToFind As String
Dim para As Paragraph
 
sParasToFind = _
	";Heading 1;List Bullet;Body Text Indent 3;"
 
For Each para In ActiveDocument.Paragraphs
	If InStr(sParasToFind, _
			";" & para.Style & ";") <> 0 Then
		' Do stuff to para here
	End If
Next para
 
End Sub

Related Reading

Word Hacks
Tips & Tools for Taming Your Text
By Andrew Savikas

These string lists are useful in other situations, and I'll come back to them in the last section of this article. The next section discusses how to make up for the string-processing shortcomings of Word on the Macintosh and in Word 97.

String Processing (or the Lack Thereof) with Word for Macintosh and Word 97

Anyone who's tried to develop VBA macros for use with Word for Macintosh knows that it's a very different landscape than Word for Windows. It can be maddening to work around all the minor (and major) bugs and glitches in VBA on Word for the Macintosh. But at a company like O'Reilly, unlike in a more rigid corporate environment, we need to be able to accommodate authors and editors working on a variety of platforms and a variety of versions: Word 97, 2000, 2002, and 2003 for Windows; Word X for Mac OS X; and for a few users, even Word on Linux, using Crossover Office.

One of the biggest gotchas when working with Word for the Mac (and Word 97 for Windows, for that matter) is working around some conspicuously absent string functions.

With Word 2000, Microsoft included several important string-processing functions with VBA, including two very useful ones common in other scripting languages, Split and Join. Split turns a string into an array, separating the string at a given delimiter, by default a space. Join is Split’s complement, concatenating an array of strings into a single string and separating the smaller strings with a given delimiter, also a space by default.

But now it’s 2004, and Word VBA on a Macintosh still doesn’t include those functions. And of course, anyone still plugging away on Word 97 (I know more than a few) can’t run macros that use those functions either.

In addition to being invaluable tools for general use in writing Word macros, these functions are an important part of one solution to another Mac VBA problem, discussed in the next section, along with those string lists I discussed above.

As an example of how handy that Split function can be, let’s take a look at style aliases in a document. A style alias is an additional name you can give to a style. For example, given the style Heading 1, you could add an alias, h1, so that you can just type h1 into the Styles pull-down menu on the Formatting toolbar to apply the Heading 1 style. In that case, the Heading 1 style would be listed as Heading 1,h1.

Style aliases are added to a style name by putting a comma at the end of the style name, followed by an alias. You can even add multiple aliases, such as Heading 1,HeadA,h1.

It’s often helpful to remove the aliases from all the styles in a document, particularly before exporting the document to another program, such as QuarkXPress or FrameMaker, which may not interpret them correctly. The Split function makes it very easy to do, as the following macro shows:


Sub RemoveAllStyleAliases
Dim sty As Style
For Each sty In ActiveDocument.Styles
	sty.NameLocal = Split(sty.NameLocal, ",")(0)
Next sty
End Sub

The above code uses the Split function to get only the part of the style name before the first comma (which is the first element in the array returned by Split, accessed by its offset, 0), and it’s a handy utility function to include in a distributed template. But if some of those people you’re distributing it to are running Word on a Mac or are still on Word 97, they’ll be greeted with compilation errors when they try to use the macro.

So what’s the solution? Create your own versions of those important string functions. Here’s one version of a Split function written using VBA that will work on a Mac or with Word 97:


Function Split(ByVal strIn As String, _
	Optional ByVal strDelim As String = " ", _
	Optional ByVal lCount As Long = -1) _
		As Variant
Dim vOut() As Variant
Dim strSubString As String
Dim k As Integer
Dim lDelimPos As Long

k = 0
lDelimPos = InStr(strIn, strDelim)

Do While (lDelimPos)
	' Get everything to the left of the delimiter
	strSubString = Left(strIn, lDelimPos - 1)
	' Make the return array one element larger
	ReDim Preserve vOut(k)
	' Add the new element
	vOut(k) = strSubString
	k = k + 1
	If lCount <> -1 And k = lCount Then
	Split = vOut
	Exit Function
	End If
	' Only interested in what's right of delimiter
	strIn = Right(strIn, (Len(strIn) - _
	(lDelimPos + Len(strDelim) - 1)))
	' See if delimiter occurs again
	lDelimPos = InStr(strIn, strDelim)
Loop

' No more delimiters in string. 
' Add what's left as last element
ReDim Preserve vOut(k)
vOut(k) = strIn

Split = vOut
End Function

And here’s a Join function, again written using code that will work with Word on a Mac or with Word 97:


Function Join(ByVal vIn As Variant, _
	Optional ByVal strDelim As String = " ") _
		As String

Dim strOut As String
Dim k As Long
Dim lUpperBound As Long

lUpperBound = UBound(vIn)
For k = LBound(vIn) To (lUpperBound - 1)
	strOut = strOut & vIn(k) & strDelim
Next k

' Don't want to add delimiter after last element
strOut = strOut & vIn(lUpperBound)
Join = strOut
End Function

Fortunately, you don’t really have to write all of your own from scratch. Several web sites offer versions of these and other useful VB6 string functions (such as Replace and InStrRev) that you can use in your own macros. One such site is http://www.freevbcode.com/ShowCode.asp?ID=17. Note that if you want to use the string functions from that particular site with Word on a Mac, you’ll need to remove the Optional bCompare argument, and all the related code, from each one that uses it.

The Split function also comes in handy when working with string lists, as discussed earlier.

These string lists act like pseudo-arrays, without the extra code needed to build up a proper array. The lists still behave a lot like arrays, and can even be accessed by offset. For example, given the list:


sBunnies = "/flopsy/mopsy/cottontail/"

you can get to mopsy by using the Split function, as the following snippet shows:


MsgBox Split(Mid(sBunnies, _
			2, Len(sBunnies) - 2), "/")(1)

The Mid function is needed here to trim off the leading and trailing "/" from sBunnies. Once that's done, you can just use Split to get at the elements of the pseudo-array by offset. And once you've used the Split function, your list actually becomes a "real" array, because that's what Split returns—a variant of type Array. Just remember that the array returned by Split always has an offset of 0.

Simple String Lists, the Split Function, and Lengthy Collection Assignments

Like many distributed custom Word templates, the one we use at O'Reilly includes a significant amount of validation code. These procedures are used to help ensure that authors stick to a certain group of styles, to control which styles are available based on which book series the author is writing for, and to control certain template features, like context-sensitive formatting, which I discuss in my book.

All of this validation means that the template needs to "know" quite a bit about which styles meet certain criteria. That information isn't part of the built-in Word Styles collection, of course, so it's stored in a separate, custom-built class. The class is created dynamically when it's needed, using information stored in a Collection, kind of like the following simplified example:


Sub CollectionDemo()
Dim colStyles As Collection
Dim col As Collection
Set colStyles = New Collection
Set col = New Collection

col.Add Key:="AllowedInTemplate", Item:=True
col.Add Key:="UsesSmartQuotes", Item:=True
colStyles.Add Key:="Heading 1", Item:=col

MsgBox colStyles("Heading 1")("UsesSmartQuotes") 
' Above MsgBox Displays "True"
End Sub

The styles collection is then used to initialize the custom class, which allows other macros in the template to query that class for information on a particular style, such as whether paragraphs using that style should use "smart" (curly) or "straight" quotation marks.

When the custom class needs to contain information on the 100 or so styles used in the template, and each style has a dozen or so of these custom properties, it requires a very lengthy collection assignment statement, on the order of 1,000 lines or more. Not too much fun to type out, but Word VBA for Windows handles it just fine. But try a collection assignment that long on a Mac, and you get a cryptic error message. As best I could deduce, there was a limit to the number of lines of code I could include in a single procedure with VBA on a Mac. Breaking up the single, massive collection assignment into several smaller routines eliminated the error, but it felt like a kludge.

The fix I finally settled on—perhaps still a kludge, but at least a more interesting one—was to use a string list, like the ones described in the first section, "Using Strings for Simple Lists." All of the data about the styles is kept in a single string list, like this:


Dim sList As String
sList = sList & ";;Heading 1;;True;;True;;"
sList = sList & ";;Heading 2;;True;;False;;"
sList = sList & ";;Body Text;;False;;True;;"

Note that when presented like this, our string list begins to strongly resemble a table, which is the goal.

With this method, the data about each style takes up just one line of code, instead of the several needed to add it to the collection directly, like in the CollectionDemo example above. So how do you get the data from a "table" like this into a collection? By parsing it with the Split function, as shown in the following macro:


Sub StringListToLoadCollectionDemo()
Dim sDataTable As String
Dim colStyles As Collection
Dim col As Collection
Dim vDataRows As Variant
Dim v As Variant
Dim vDataFields As Variant
 
Set colStyles = New Collection
 
' Only need one line of code needed per style
sDataTable = sDataTable & _
	";;Heading 1;;True;;True;;"
sDataTable = sDataTable & _
	";;Heading 2;;True;;False;;"
sDataTable = sDataTable & _
	";;Body Text;;False;;True;;"
 
' All the code below stays the same, 
' regardless of the number
' of syles added to the "data table" above.

' Remove the leading and trailing ";;" 
' from the big string, otherwise first 
' and last row of the "data table" will 
' have one extraneous pair of semicolons, 
' either at beginning or end
sDataTable = Mid(sDataTable, 3, _
				Len(sDataTable) - 4)
 
' Parse the "data table"
vDataRows = Split(sDataTable, ";;;;")
For Each v In vDataRows
	 vDataFields = Split(v, ";;")
	 Set col = New Collection
	 col.Add Key:="AllowedInTemplate", _
	 Item:=vDataFields(1)
	 col.Add Key:="UsesSmartQuotes", _
	 Item:=vDataFields(2)
	 colStyles.Add Key:=vDataFields(0), _
	 Item:=col
	 Set col = Nothing
Next v
 
MsgBox colStyles("Heading 1")("UsesSmartQuotes") 
' Above MsgBox Displays "True"
 
End Sub

With this method, adding a new "field" of data about a particular style requires adding only one line of code, within the collection-assignment For Each loop. So as was the case with the string lists discussed in the first section, this method will save you some typing. Note also that, as with the bunny rabbit example in the previous section, you need to trim off the leading and trailing separator.

This certainly isn't anyone's idea of an ideal solution, but it is a quick and dirty way to compress a lengthy collection assignment to the point that Word on a Mac will at least process it. The big irony here is that the key to the fix here, the Split function, isn't included by default in Word VBA on a Mac.

While string processing with VBA probably won't ever be as good as some other (let's face it—most other) scripting languages, the string functions VBA does offer can sometimes provide interesting ways to reduce coding time, improve code performance, and even present solutions to cross-platform development obstacles.

Andrew Savikas is the VP of Digital Initiatives at O'Reilly Media, and is the Program Chair for the Tools of Change for Publishing Conference.


Return to the Windows DevCenter

Copyright © 2009 O'Reilly Media, Inc.