Could use a little help on this Duplicate Files Script

Ask your PowerShell-related questions, including questions on cmdlet development!
Forum rules
Do not post any licensing information in this forum.

Any code longer than three lines should be added as code using the 'Select Code' dropdown menu or attached as a file.
This topic is 10 years and 3 months old and has exceeded the time allowed for comments. Please begin a new topic or use the search feature to find a similar but newer topic.
Locked
User avatar
notarat
Posts: 24
Last visit: Mon Feb 12, 2018 6:56 am

Could use a little help on this Duplicate Files Script

Post by notarat »

I have a powershell script that, depending on how it is executed, will list the duped files or delete them.

To be perfectly honest, this script is a little over my head. What I would like is for a way to have this script record the fullpath, length, and lastwritetime of the duplicated files to a file somewhere but I can't seem to make much progress.

Can someone please help me out?
PowerShell Code
Double-click the code block to select all.
function Get-SHA512([System.IO.FileInfo] $file = $(throw 'Usage: Get-MD5 [System.IO.FileInfo]'))
{
        $stream = $null;
        $cryptoServiceProvider = [System.Security.Cryptography.SHA512CryptoServiceProvider];
        $hashAlgorithm = new-object $cryptoServiceProvider
        $stream = $file.OpenRead();
        $hashByteArray = $hashAlgorithm.ComputeHash($stream);
        $stream.Close();
 
        ## We have to be sure that we close the file stream if any exceptions are thrown.
 
        trap
        {
                if ($stream -ne $null)
                {
                        $stream.Close();
                }
                break;
        }       
 
        foreach ($byte in $hashByteArray) { if ($byte -lt 16) {$result += “0{0:X}” -f $byte } else { $result += “{0:X}” -f $byte }}
        return [string]$result;
}
 
$starttime=[datetime]::now
 
write-host "FindDupe.ps1 - find and optionally delete duplicates. FindDupe.ps1 -help or FindDupe.ps1 -h for usage options."
 
$matches = 0            # initialize number of matches for summary.
$filesdeleted = 0       # number of files deleted.
$bytesrec = 0           # Number of bytes recovered.
 
 
if ($args -eq "-help" -or $args -eq "-h") # check for help request, if found display usage options...
{
        ""
        "Usage:"
        "       PS>.\FindDupe.ps1 <directory/file #1> <directory/file #2> ... <directory/file #N> [-delete] [-noprompt] [-recurse] [-help]"
        "Options:"
        "       -recurse recurses through all subdirectories of any specified directories."
        "       -delete prompts to delete duplicates (but not originals.)"
        "       -delete with -noprompt deletes duplicates without prompts (but again not originals.)"
        "       -hidden checks hidden files, default is to ignore hidden files."
        "       -help displays this usage option data, and ignores all other arguments."
        ""
        "Examples:"
        "          PS>.\finddupe.ps1 c:\data d:\finance -recurse"
        "          PS>.\finddupe.ps1 d: -recurse -delete -noprompt"
        "          PS>.\finddupe.ps1 c:\users\alice\pictures\ -recurse -delete"
        exit
}
 
 
# build list of files, by running dir on $args minus elements that have FindDupe.ps1 switches, recursively if specified.
 
$files=(dir ($args | ?{$_ -ne "-delete" -and $_ -ne "-noprompt" -and $_ -ne "-recurse" -and $_ -ne "-hidden"}) -recurse:$([bool]($args -eq "-recurse")) -force:$([bool]($args -eq "-hidden")) |?{$_.psiscontainer -eq $false})
 
 
if ($files.count -lt 2)  # if the number of files is less than 2, then exit
{
        "Need at least two files to check.`a"
        exit
}
 
for ($i=0;$i -ne $files.count; $i++)  # Cycle thru all files
{
        if ($files[$i] -eq $null) {continue}  # file was already identified as a duplicate if $null, so do next file
 
        $filecheck = $files[$i]               # backup file object
        $files[$i] = $null                    # erase file object from object database, so it is not matched against itself
 
        for ($c=$i+1;$c -lt $files.count; $c++)  # cycle through all files again
        {
                if ($files[$c] -eq $null) {continue}  # $null = file was already checked/matched.
        
                if ($filecheck.fullname -eq $files[$c].fullname) {$files[$c]=$null;continue} # If referencing the same file, skip
        
                if ($filecheck.length -eq $files[$c].length)  # if files match size then check SHA512's
                {
                        if ($filecheck.SHA512 -eq $null)         # if SHA512 is not already computed, compute it
                        { 
                                $SHA512 = (get-SHA512 $filecheck.fullname)
                                $filecheck = $filecheck | %{add-member -inputobject $_ -name SHA512 -membertype noteproperty -value $SHA512 -passthru}                  
                        }
                        if ($files[$c].SHA512 -eq $null)         # resulting in no file being SHA512'ed twice.
                        { 
                                $SHA512 = (get-SHA512 $files[$c].fullname)
                                $files[$c] = $files[$c] | %{add-member -inputobject $_ -name SHA512 -membertype noteproperty -value $SHA512 -passthru}                          
                        }
                        
                        if ($filecheck.SHA512 -eq $files[$c].SHA512) # Size already matched, if SHA512 matches, then it's a duplicate.
                        {
                                
                                write-host "Size and SHA512 match: " -fore red -nonewline
                                write-host "`"$($filecheck.fullname)`" and `"$($files[$c].fullname)`""
 
                                $matches += 1                   # Number of matches ++
                                
                                if ($args -eq "-delete")        # check if user specified to delete the duplicate
                                {
                                        if ($args -eq "-noprompt")  # if -delete select, and -noprompt selected
                                        {
                                                del $files[$c].fullname  # then delete the file without prompting
                                                write-host "Deleted duplicate: " -f red -nonewline
                                                write-host "`"$($files[$c].fullname).`""
                                        }
                                        else
                                        {
                                                del $files[$c].fullname -confirm # otherwise prompt for confirmation to delete
                                        }
                                        if ((get-item -ea 0 $files[$c].fullname) -eq $null) # check if file was deleted.
                                        {
                                                $filesdeleted += 1              # update records
                                                $bytesrec += $files[$c].length
                                        }
 
                                }
        
                                $files[$c] = $null              # erase file object so it is not checked/matched again.
                        }
                }       
        }       # And loop to next inner loop file
}               # And loop to next file in outer/original loop
write-host ""
write-host "Number of Files checked: $($files.count)."  # Display useful info; files checked and matches found.
write-host "Number of duplicates found: $matches."
Write-host "Number of duplicates deleted: $filesdeleted." # Display number of duplicate files deleted and bytes recovered.
write-host "$bytesrec bytes recovered." 
write-host ""
write-host "Time to run: $(([datetime]::now)-$starttime|select hours, minutes, seconds, milliseconds)"
write-host ""
jvierra
Posts: 15439
Last visit: Tue Nov 21, 2023 6:37 pm
Answers: 30
Has voted: 4 times
Been upvoted: 33 times

Re: Could use a little help on this Duplicate Files Script

Post by jvierra »

Return a collection of files that are duplicates. For now deleting has been removed.

Code: Select all

PS C:\scripts> $files=.\FindDupe.ps1

Number of Files checked: 218.
Number of duplicates found: 15.
Number of duplicates deleted: 0.
0 bytes recovered.

Time to run: @{Hours=0; Minutes=0; Seconds=0; Milliseconds=330}

PS C:\scripts> $files

FileA                     FileB
-----                      -----
aclfile.pdf                aclfile.txt
Book1.xlsx                 newbook.xlsx
configfile.xml             EmailEvent.xml
CreateXMlDoc.vbs           CreateXMlDoc.vbs
dump.pdf                   dump.txt
FTPCLIENT.vb               FTPCLIENT.vb.txt
junk.pdf                   junk.txt
output.htm                 scripts.html
spec_out.txt               testbat.txt
starter.pdf                starter.txt
test2.pdf                  test2.txt
testip.pdf                 testip.txt
testmof.pdf                testmof.txt
testv.pdf                  testv.txt
stylesheet.css             stylesheet.css


PS C:\scripts>
See attached:
Attachments
FindDupe.ps1
(4.57 KiB) Downloaded 221 times
User avatar
notarat
Posts: 24
Last visit: Mon Feb 12, 2018 6:56 am

Re: Could use a little help on this Duplicate Files Script

Post by notarat »

jvierra wrote:Return a collection of files that are duplicates. For now deleting has been removed.

Code: Select all

PS C:\scripts> $files=.\FindDupe.ps1

Number of Files checked: 218.
Number of duplicates found: 15.
Number of duplicates deleted: 0.
0 bytes recovered.

Time to run: @{Hours=0; Minutes=0; Seconds=0; Milliseconds=330}

PS C:\scripts> $files

FileA                     FileB
-----                      -----
aclfile.pdf                aclfile.txt
Book1.xlsx                 newbook.xlsx
configfile.xml             EmailEvent.xml
CreateXMlDoc.vbs           CreateXMlDoc.vbs
dump.pdf                   dump.txt
FTPCLIENT.vb               FTPCLIENT.vb.txt
junk.pdf                   junk.txt
output.htm                 scripts.html
spec_out.txt               testbat.txt
starter.pdf                starter.txt
test2.pdf                  test2.txt
testip.pdf                 testip.txt
testmof.pdf                testmof.txt
testv.pdf                  testv.txt
stylesheet.css             stylesheet.css


PS C:\scripts>
See attached:

Thanks very much!
jvierra
Posts: 15439
Last visit: Tue Nov 21, 2023 6:37 pm
Answers: 30
Has voted: 4 times
Been upvoted: 33 times

Re: Could use a little help on this Duplicate Files Script

Post by jvierra »

This version is actually much cleaner and richer. The original will only find two files that match. This one will find any number of files that match.

This version also takes and array of paths so you can compare two directory structures.

.\Find-Dupes.ps1 c:\temp,c:\scripts -recurse

PowerShell Code
Double-click the code block to select all.
<#
    Find-Dupes.ps1
#>
#requires -version 3.0
[CmdLetBinding()]
Param(
    [string[]]$path=$pwd,
    [switch]$recurse,
    [switch]$delete,
    [switch]$noprompt,
    [switch]$hidden
    
)

Begin{
    $starttime=[DateTime]::Now
    $hashAlgorithm = new-object System.Security.Cryptography.SHA512CryptoServiceProvider
    function Get-SHA512{
        Param(
            [Parameter()]
            [System.IO.FileInfo]$file
        )
        $stream=$file.OpenRead();
        $hashByteArray=$hashAlgorithm.ComputeHash($stream);
        $stream.Close();
        foreach ($byte in $hashByteArray){
            if($byte -lt 16){
                $result += “0{0:X}” -f $byte 
            }else{
                $result += “{0:X}” -f $byte 
            }
        }
        return [string]$result;
    }
 
}

Process{
    
    $splat=@{
    Path=$path
    Recurse=$recurse
    Hidden=$hidden
    }
    $files=Get-ChildItem  @splat -file |
        ForEach-Object{
            $h = Get-SHA512 $_
            $_ | Add-Member -MemberType 'NoteProperty' -Name SHA512 -Value $h -PassThru
    }
         
}

End{

    $files | Group-Object SHA512 | ?{$_.Count -gt 1}
    $TotalFiles=($files|group sha512|Measure-Object -sum count).Sum
    $PSDefaultParameterValues=@{"Write-Host:ForeGroundCOlor"="green"}
    write-host "Number of Files checked: $TotalFiles"  # Display useful info; files checked and matches found.
    $NumberOfDupes=($files|group sha512 |?{$_.Count -gt 1}|Measure-Object -sum count).Sum
    write-host "Number of duplicates found: $NumberOfDupes"
    write-host "Time to run: $(([datetime]::now)-$starttime)"
}
This topic is 10 years and 3 months old and has exceeded the time allowed for comments. Please begin a new topic or use the search feature to find a similar but newer topic.
Locked