More fun with Microsoft Word and PowerShell

I needed a demonstration for one of my last online PowerShell classes on using COM objects in PowerShell. I took an old VBScript that used Microsoft Word to get document statistics such as word and page count and transformed it into PowerShell. I quickly realized I could flesh out the demo into a larger function that I can actually use. Given that I generate a lot of Word docs, being able to get some document statistics is helpful. So I created a function called Get-DocStatistic. Here’s the function:

   1: Function Get-DocStatistic {
   2:  
   3: BEGIN {
   4:    $word=New-Object -com "Word.Application"
   5:   }
   6:  
   7: PROCESS {
   8:     $file=$_
   9:          
  10:     $doc=$word.documents.open($file.Fullname)
  11:     Write-Progress -Activity "Gathering document statistics" `
  12:     -Status "Analyzing $file" `
  13:     -currentoperation "Counting Words"
  14:     
  15:     $words=$doc.ComputeStatistics("wdStatisticWords")
  16:     
  17:     Write-Progress -Activity "Gathering document statistics" `
  18:     -Status "Analyzing $file" `
  19:     -currentoperation "Counting lines"
  20:     $lines=$doc.ComputeStatistics("wdStatisticLines")
  21:     
  22:     Write-Progress -Activity "Gathering document statistics" `
  23:     -Status "Analyzing $file" `
  24:     -currentoperation "Counting paragraphs"
  25:     $para=$doc.ComputeStatistics("wdStatisticParagraphs")
  26:     
  27:     Write-Progress -Activity "Gathering document statistics" `
  28:     -Status "Analyzing $file" `
  29:     -currentoperation "Counting pages"
  30:     $pages=$doc.ComputeStatistics("wdStatisticPages")
  31:     
  32:     #close document discarding any changes
  33:     $doc.close([ref]$false)
  34:     
  35:     $obj=New-Object PSObject
  36:     
  37:     $obj | Add-Member -MemberType NoteProperty -Name "Document"  -Value $file
  38:     $obj | Add-Member -MemberType NoteProperty -Name "Directory" -Value $file.directoryname
  39:     $obj | Add-Member -MemberType NoteProperty -Name "Filename"  -Value $file.Name
  40:     $obj | Add-Member -MemberType NoteProperty -Name "Size"  -Value $file.length
  41:     $obj | Add-Member -MemberType NoteProperty -Name "Words" -Value $words
  42:     $obj | Add-Member -MemberType NoteProperty -Name "Lines" -Value $lines
  43:     $obj | Add-Member -MemberType NoteProperty -Name "Paragraphs" -Value $para
  44:     $obj | Add-Member -MemberType NoteProperty -Name "Pages" -Value $pages
  45:     
  46:     write $obj
  47:        
  48:     }
  49:  
  50: END {
  51:      $word.Quit()
  52:     }
  53:  
  54: }

The function uses Begin, Process and End scriptblocks so you can pipe objects to it. In the Begin script block I create the Word.Application COM object. I only need to do this once so using the Begin script block is perfect because any code here executes once before any pipelined objects are processed.

In the Process script block I create new object for the Word document. This is also a COM object. You can pipe it to Get-Member to discover all of its properties and methods. One if its methods is ComputeStatistics(). This method takes a parameter indicating what statistic to compute. In VBScript I would have had to define a constant value for something like wdStatisticWords. But in PowerShell all I have to do is use the constant. PowerShell will get the correct constant value.

Throughout the Process block I use Write-Progress to let me know what is happening. I have a lot of Word docs and piping them to Get-Docstatistic can take some time.

After all the objects are processed the End script block runs where I quit and close Microsoft Word. Without this command I would have a Winword process running in the background.

Here are some examples on how you might use it.

PS C:\> dir c:\test\dns.docx  | Get-DocStatistic

PS C:\test> dir c:\test\dns.docx | get-docstatistic

Document   : C:\test\dns.docx
Directory  : C:\test
Filename   : dns.docx
Size       : 23606
Words      : 1952
Lines      : 510
Paragraphs : 422
Pages      : 10

PS C:\> $stats=dir “$env:userprofile\documents\*” -include *.doc,*.docx -recurse| Get-DocStatistic

PS C:\> $stats | Measure-Object pages -Sum

PS C:\scripts\PoSH> $stats | Measure-Object pages -Sum

Count    : 319
Average  :
Sum      : 7615
Maximum  :
Minimum  :
Property : Pages

I don’t know if you’ll find as much a need for this as I do, but you hopefully you’ve see how easy it is to use COM objects in PowerShell.