Searching Word

One of the last tasks I needed to accomplish to get Managing Active Directory with Windows PowerShell: TFM ready for publication was to assemble all the script samples. They’ll be available for download from SAPIENPress when the book is published.  I knew that most of the scripts were in my primary scripts directory, but probably not all.  I usually copied scripts from my test environment to my scripts directory but I knew some were bound to be missing.  I didn’t want to manually open 16 chapters, search for script names, check if I had the script and then move it to a new folder. I don’t have time for that. But I do have time to figure out how to do it in PowerShell.

What I had going for me was that every script sample referenced in a chapter was formatted with a specific style. All I had to do was search for the style and retrieve the text. That would be the name of my script which I could then look for and move if found. I could also keep a log of scripts not found so I could go back to my test machine and retrieve them. Here’s the PowerShell script I developed.

   1: $word=New-Object -COM "Word.Application"
   2:  
   3: $errorlog="c:\missing.csv"
   4: Set-Content $errorlog "Chapter,Script"
   5: Get-ChildItem c:\test\*.doc | foreach {
   6:     $file=$_.fullname
   7:     Write-Host $file
   8:     $doc=$word.Documents.Open($file) 
   9:     
  10:     $style=$word.ActiveDocument.Styles | 
  11:     where {$_.namelocal -eq "code Title"}
  12:     
  13:     $word.Selection.Find.Style = $style
  14:     
  15:     while ($word.Selection.Find.Execute()) {
  16:         $text=($word.selection.sentences | select Text).text
  17:         $script=(Join-Path "C:\scripts\posh" $text).Trim()
  18:  
  19:          if ((Get-Item $script -ea "silentlycontinue").exists) {
  20:              Write-Host "verified $script"
  21:              Move-Item $script -destination "c:\scripts\posh\ad"
  22:          }
  23:         else {
  24:             $msg="{0},{1}" -f $file,$script
  25:             Add-Content $errorlog $msg
  26:         }
  27:         
  28:      } 
  29:      
  30:      $doc.close()
  31:  
  32:  }
  33:  
  34: $word.quit()

I use the New-Object cmdlet to instantiate an instance of Microsoft Word using the -COM parameter. On lines 3 and 4 I create a csv file for recording missing scripts. I decided to make it a CSV file so I could use Import-CSV if necessary to parse it.  Although probably unnecessary, I copied all the Word docs to my Test directory and used Get-ChildItem to enumerate them (line 5).

For every document, I set variable $file to its full name and path (line 6). I needed the fullpath so that I could open the file in line 8.

Lines 10 and 11 define a variable for the style I want to search for, "code title".  In line 13 I define my search criteria. The Execute() method carries out the search and returns True if a match is made. So I used a While loop in line 15 to process all the found items. When a search is successful, the cursor or file pointer jumps to the found item just like it does when you manually search for something in Word.  Because the script file name is the only text on the line, I can use an expression like line 16 to retrieve the text. To build a file name I use Join-Path on line 17 to concatenate my script directory and the script name pulled from the Word doc.

At line 19 I need to see if the script exists using Get-Item. Notice the -ea parameter. By default if the file doesn’t exist PowerShell will raise an exception and the script will fail. By setting the -erroraction parameter to silentlycontinue, I suppress the exception. If the script doesn’t exist then the If expression will not be true and I can log information to my error file in lines 24 and 25. if the file exists, then it is moved to a new directory (line 21).  After checking the Word doc, is closed (line 30) and the next file is checked. At the end of the script, I quit Word.

I’m sure this would have taken me 30 minutes or more to accomplish all of this manually. With the script it took a few seconds. Sure there was a little time spent developing, but since I use the sample template for all my SAPIEN Press books I can reuse this script for future projects so it was time well spent, plus I learned a few new things about the Microsoft Word COM object.

If you’re interested, you can download the script here