I'm so happy! This has been an awesome, creative week, and every once in a while it needs to be said: I'm a genius! (but a humble and helpful one, ok maybe not so humble, but at least helpful) 🙂
This is the story on how to store 183 GB of virtual machines in a single 16 GB file (and the PowerShell script that does it)
Note: This technique works very well on other archives as well, like your library of ISO files. See the end of this post.
Earlier this week I was preparing a bunch of virtual machines for an upcoming class in Redmond. The virtual machines, 12 in total, held the entire System Center 2012 R2 suite, and a few extra infrastructure servers and clients. The complicating matter was that I was in Houston, in a hotel, and I had remoted into one of our lab servers in Sweden when building the VMs. The virtual machines were 183 GB in total, and I don't know if you frequently stay at hotels, but I can tell you that transferring 183 GB of data is stretching it 🙂
Shorthand: I needed to reduce the amount of bits downloaded over the wire, and I reviewed my options. They were:
- Option #1 – Use the Hydration Kit once more, locally
- Option #2 – Find a way to compress the VMs really good
Option #1 – Use the Hydration kit once more
Downloading the Hydration kit used to build the VMs, and regenerate them one more time was certainly a valid option. It's fully automated, and the hydration ISO file is "only" 24 GB (uncompressed), which turned into 21.6 file GB after compressing it with WinRAR (that took almost 2 hours, using normal compression)
Option #2 – Find a way to compress the VMs really good
I ended up starting with Option #1, using the Hydration Kit once more. After all, I knew it worked, and downloading a 21.6 GB file was at least acceptable, even though it took the entire day.
But while re-building the virtual machines, I couldn't quite let go of the thought of compressing the VMs, and an idea I had of maybe use data deduplication as an alternative to WinRAR compression.
Time for some testing:
Attempt #1 – Clean and Simple, WinRAR
Well, you got to start somewhere, and this was more for having a reference with normal archiving/compression than anything else. It wasn't exactly the first time I zipped together some virtual machines, put it that way.
- Archiving the VMs with WinRAR, normal compression
Result: The complete process took almost 4 hours and resulted in 71.3 GB of WinRAR files. Down from 183 GB to 71 GB is not bad, but still too much.
Attempt #2 – Using a VHDX file for storage, DeDup the VHDX, and then WinRAR
I'm not sure archiving VMs for network transfer on crappy connections was the intended use of Data DeDuplication, but a little bit of outside-the-box thinking never hurt anyone 🙂
My next attempt was to create a VHDX file, enable DeDuplication, copy the VMs to it, run a DeDup job, and then archive the entire VHDX.
Disclaimer: I have no idea if this is supported or not, but it worked fine for me.
1. Create and mount (attach) a 200 GB VHDX file
2. Enable DeDuplication
3. Copy the VMS to the mounted VHDX – took about 15 minutes. I love that we have SSDs in all our lab servers (and that every employee get at least one dedicated lab server)
4. Run DeDuplication – ok, that took a bit longer, about 1 hour
5. Unmount (detach) the VHDX
6. Archive the VHDX using WinRAR – loooong coffee-break (or in my case, doing laundry in the hotel laundromats), that took almost 5 hours.
Result: The complete process to 6 hours and resulted in a 79.7 GB file, not what I had hoped for. Time for another test.
Attempt #3 – Using a VHDX file for storage, DeDup the VHDX, defrag, optimize (compact) and then WinRAR
OK, so just archiving the de-duped 183 GB VHDX file didn't do the trick, but what if I added a defrag, and Optimize-VHD (compact) to the test.
Surely enough, after a defrag, and a Optimize-VHD operation, the VHDX file shrunk to 25.2 GB. Question was, should a WinRAR or 7-Zip further shrink it?
Content in it, still the same..
Yes it would!!!
Finally, 183 GB of VMs backed up into a 16 GB file.
Note: I consider the final archiving part optional, going from 183 GB to 25 GB is still OK, and with a VHDX file I can simply double-click it to access the content.
7-Zip with 1 GB dictionary was the winner.
Attempt #3 resulted in a 16 GB file, celebration time!
Here is the complete process, in PowerShell of course 🙂
Note: The script requires that you enabled DeDuplication on your Windows Server 2012 R2 host.
# Create the 200 GB VHDX file $VHDXFile = 'C:\Tmp\VM-Archive.vhdx' New-VHD -Path $VHDXFile -Dynamic -SizeBytes 200GB # Mount (attach) the VHDX file Mount-DiskImage -ImagePath $VHDXFile # Initialize the VHDX file $VHDXDisk = Get-DiskImage -ImagePath $VHDXFile | Get-Disk -Verbose $VHDXDiskNumber = [string]$VHDXDisk.Number Initialize-Disk -Number $VHDXDiskNumber -PartitionStyle MBR -Verbose # Format the VHDX file with NTFS, and assign a driveletter (without getting prompted, hence the use of Add-PartitionAccessPath) $VHDXDrive = New-Partition -DiskNumber $VHDXDiskNumber -UseMaximumSize -Verbose $VHDXDrive | Format-Volume -FileSystem NTFS -NewFileSystemLabel VM-Archive -Confirm:$false -Verbose Add-PartitionAccessPath -DiskNumber $VHDXDiskNumber -PartitionNumber $VHDXDrive.PartitionNumber -AssignDriveLetter # Get the drive letter $VHDXDrive = Get-Partition -DiskNumber $VHDXDiskNumber -PartitionNumber $VHDXDrive.PartitionNumber $VHDXVolume = [string]$VHDXDrive.DriveLetter+":" # Enable DeDuplication (assuming you added the role) Enable-DedupVolume -Volume $VHDXVolume Set-DeDupVolume -Volume $VHDXVolume -MinimumFileAgeDays 0 # Copy the VMs to the VHDX file Copy-Item E:\Exported $VHDXVolume -Recurse # DeDup the VHDX file Start-DedupJob -Type Optimization -Memory 75 -Priority High -Volume $VHDXVolume -Wait # Defrag the VHDX file defrag $VHDXVolume /U /V /X # Unmount (detach) the VHDX file Dismount-DiskImage -ImagePath $VHDXFile -Verbose # Optimize the VHDX Mount-DiskImage -ImagePath $VHDXFile -Access ReadOnly Optimize-VHD -Path $VHDXFile -Mode Full Dismount-DiskImage -ImagePath $VHDXFile -Verbose # Optional, archive using WinRAR 5 (command line version) & 'C:\Program Files\WinRAR\Rar.exe' a 'D:\tmp\VM-Archive.rar' $VHDXFile
Bonus, this technique works somewhat good on ISO archives as well
Happy Deployments, Johan
This is exactly what I was looking for! I was preparing the Labs for a ConfigMgr Class in July and a later one in Autumn and the official MOC ones are nearly 100GB!!! By following your procedure and without compressing the Deduped VHDX they occupy ~23GB now!
At least this way I will not sleep over the machines waiting to get the files copied 🙂
You ROCK my friend! You really ROCK!
P.S. I have a hunch that the deduplication has some small room for improvement but I haven't got the time to experiment ATM….
one word : Awesomeness !!!