A few days ago I posted a blog entry on simple regular expression replacements in VBScript. Let me show you a more complex example. It helps to have a purpose, even for demonstration so my need is to convert an html table to CSV output using regular expressions. We’re going to need the functions I’ve written about before but I’ll post them again so you don’t have to go looking for them.
Let’s dig in. Here’s the table I want to parse.
I’ll begin by reading in the contents of the html file and saving it to a variable.
The next step is to strip out just the table. Using a regular expression, I find the text that matches everything between and including the table tags.
All that’s left at this point is to get rid of unnecessary tags like TH and convert TH and/or TD tags. If the pattern is matched in the string, then for every match I call my RegexReplace function.
It’s possible there might be some tags still in my tableText variable so I’ll process it one more time looking for any HTML tag and replace it with a blank (“”).
Now the tricky part. If I look at tableText there will be blank lines for any tags I replaced at the end. Plus if I wanted to save the output to a text file I need some way to parse this variable. My solution was to turn it into an array and enumerate it, only displaying lines with a length greater than 0.
When I run my script I get output like this:
Now before you think I’m some Regex guru (not by any means), I didn’t come up with any of the more complex regular expression patterns. Instead I went to my favorite site for this sort of thing, RegexLib.com. Fortunately many people have already done the hard work of developing regular expression patterns for all sorts of things. A little search and copy/paste and I’m in business. Because regular expressions work the same just about everywhere you can use these expressions in VBScript, PowerShell, PHP, Perl or probably anything you happen to be working in.
Download a text file with code from this entry here.
As always, if you need help with regular expression scripts or any other scripting problem please join me in the forums at ScriptingAnswers.com. Oh…don’t forget there is an entire chapter on using the REGEX object in VBScript in WSH and VBScript Core: TFM.