Beyond Zip – How to store 183 GB of VMs in a 16 GB file using PowerShell

I'm so happy! This has been an awesome, creative week, and every once in a while it needs to be said: I'm a genius! (but a humble and helpful one, ok maybe not so humble, but at least helpful) 🙂

This is the story on how to store 183 GB of virtual machines in a single 16 GB file (and the PowerShell script that does it)

Note: This technique works very well on other archives as well, like your library of ISO files. See the end of this post.

Earlier this week I was preparing a bunch of virtual machines for an upcoming class in Redmond. The virtual machines, 12 in total, held the entire System Center 2012 R2 suite, and a few extra infrastructure servers and clients. The complicating matter was that I was in Houston, in a hotel, and I had remoted into one of our lab servers in Sweden when building the VMs. The virtual machines were 183 GB in total, and I don't know if you frequently stay at hotels, but I can tell you that transferring 183 GB of data is stretching it 🙂

Shorthand: I needed to reduce the amount of bits downloaded over the wire, and I reviewed my options. They were:

  • Option #1 – Use the Hydration Kit once more, locally
  • Option #2 – Find a way to compress the VMs really good

Option #1 – Use the Hydration kit once more

Downloading the Hydration kit used to build the VMs, and regenerate them one more time was certainly a valid option. It's fully automated, and the hydration ISO file is "only" 24 GB (uncompressed), which turned into 21.6 file GB after compressing it with WinRAR (that took almost 2 hours, using normal compression)

image
The zipped (WinRAR) Hydration Kit.

Option #2 – Find a way to compress the VMs really good

I ended up starting with Option #1, using the Hydration Kit once more. After all, I knew it worked, and downloading a 21.6 GB file was at least acceptable, even though it took the entire day.

But while re-building the virtual machines, I couldn't quite let go of the thought of compressing the VMs, and an idea I had of maybe use data deduplication as an alternative to WinRAR compression.

Time for some testing:

Attempt #1 – Clean and Simple, WinRAR

Well, you got to start somewhere, and this was more for having a reference with normal archiving/compression than anything else. It wasn't exactly the first time I zipped together some virtual machines, put it that way.

  1. Archiving the VMs with WinRAR, normal compression

Result: The complete process took almost 4 hours and resulted in 71.3 GB of WinRAR files. Down from 183 GB to 71 GB is not bad, but still too much.

image
The zipped VMs, 71.3 GB in size.

Attempt #2 – Using a VHDX file for storage, DeDup the VHDX, and then WinRAR

I'm not sure archiving VMs for network transfer on crappy connections was the intended use of Data DeDuplication, but a little bit of outside-the-box thinking never hurt anyone 🙂

My next attempt was to create a VHDX file, enable DeDuplication, copy the VMs to it, run a DeDup job, and then archive the entire VHDX.

Disclaimer: I have no idea if this is supported or not, but it worked fine for me.

1. Create and mount (attach) a 200 GB VHDX file
2. Enable DeDuplication
3. Copy the VMS to the mounted VHDX – took about 15 minutes. I love that we have SSDs in all our lab servers (and that every employee get at least one dedicated lab server)
4. Run DeDuplication – ok, that took a bit longer, about 1 hour
5. Unmount (detach) the VHDX
6. Archive the VHDX using WinRAR – loooong coffee-break (or in my case, doing laundry in the hotel laundromats), that took almost 5 hours.

Result: The complete process to 6 hours and resulted in a 79.7 GB file, not what I had hoped for. Time for another test.

Attempt #3 – Using a VHDX file for storage, DeDup the VHDX, defrag, optimize (compact) and then WinRAR

OK, so just archiving the de-duped 183 GB VHDX file didn't do the trick, but what if I added a defrag, and Optimize-VHD (compact) to the test.

Surely enough, after a defrag, and a Optimize-VHD operation, the VHDX file shrunk to 25.2 GB. Question was, should a WinRAR or 7-Zip further shrink it?

image
The 25 GB VHDX file after defrag and optimize.

Content in it, still the same..

image

Yes it would!!!

Finally, 183 GB of VMs backed up into a 16 GB file.

Note: I consider the final archiving part optional, going from 183 GB to 25 GB is still OK, and with a VHDX file I can simply double-click it to access the content.

image

7-Zip with 1 GB dictionary was the winner.

Attempt #3 resulted in a 16 GB file, celebration time!

Here is the complete process, in PowerShell of course 🙂

Note: The script requires that you enabled DeDuplication on your Windows Server 2012 R2 host.

# Create the 200 GB VHDX file
$VHDXFile = 'C:\Tmp\VM-Archive.vhdx'
New-VHD -Path $VHDXFile -Dynamic -SizeBytes 200GB

# Mount (attach) the VHDX file
Mount-DiskImage -ImagePath $VHDXFile

# Initialize the VHDX file
$VHDXDisk = Get-DiskImage -ImagePath $VHDXFile | Get-Disk -Verbose
$VHDXDiskNumber = [string]$VHDXDisk.Number
Initialize-Disk -Number $VHDXDiskNumber -PartitionStyle MBR -Verbose

# Format the VHDX file with NTFS, and assign a driveletter (without getting prompted, hence the use of Add-PartitionAccessPath)
$VHDXDrive = New-Partition -DiskNumber $VHDXDiskNumber -UseMaximumSize -Verbose
$VHDXDrive | Format-Volume -FileSystem NTFS -NewFileSystemLabel VM-Archive -Confirm:$false -Verbose
Add-PartitionAccessPath -DiskNumber $VHDXDiskNumber -PartitionNumber $VHDXDrive.PartitionNumber -AssignDriveLetter

# Get the drive letter
$VHDXDrive = Get-Partition -DiskNumber $VHDXDiskNumber -PartitionNumber $VHDXDrive.PartitionNumber
$VHDXVolume = [string]$VHDXDrive.DriveLetter+":"

# Enable DeDuplication (assuming you added the role)
Enable-DedupVolume -Volume $VHDXVolume
Set-DeDupVolume -Volume $VHDXVolume -MinimumFileAgeDays 0

# Copy the VMs to the VHDX file
Copy-Item E:\Exported $VHDXVolume -Recurse

# DeDup the VHDX file
Start-DedupJob -Type Optimization -Memory 75 -Priority High -Volume $VHDXVolume -Wait

# Defrag the VHDX file
defrag $VHDXVolume /U /V /X

# Unmount (detach) the VHDX file
Dismount-DiskImage -ImagePath $VHDXFile -Verbose

# Optimize the VHDX
Mount-DiskImage -ImagePath $VHDXFile -Access ReadOnly
Optimize-VHD -Path $VHDXFile -Mode Full
Dismount-DiskImage -ImagePath $VHDXFile -Verbose

# Optional, archive using WinRAR 5 (command line version)
& 'C:\Program Files\WinRAR\Rar.exe' a 'D:\tmp\VM-Archive.rar' $VHDXFile

ISO Archives

Bonus, this technique works somewhat good on ISO archives as well

zip01

Happy Deployments, Johan

About the author

Johan Arwidmark

0 0 votes
Article Rating
Subscribe
Notify of
guest
4 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
Vipul
Vipul
18 days ago

What is the estimated time of completion for Attempt#3?

GSimos
GSimos
9 years ago

Johan!

This is exactly what I was looking for! I was preparing the Labs for a ConfigMgr Class in July and a later one in Autumn and the official MOC ones are nearly 100GB!!! By following your procedure and without compressing the Deduped VHDX they occupy ~23GB now!
At least this way I will not sleep over the machines waiting to get the files copied 🙂

You ROCK my friend! You really ROCK!

P.S. I have a hunch that the deduplication has some small room for improvement but I haven't got the time to experiment ATM….

Diagg
Diagg
9 years ago

one word : Awesomeness !!!


>