In April 2014 I wrote a blog post on how to use PowerShell and Data DeDuplication in Windows Server 2012 R2 to create small VHDX archives, very useful for transfer large content of virtual machines. In the example from 2014 I stored 183 GB of VMs in a 25 GB VHDX archive (which I also further compressed with 7-zip into a 16 GB file).
A 25 GB VHDX file, with DeDuplication enabled, storing 183 GB of VMs. Later compressed with 7-Zip into a 16 GB archive.
However… Microsoft Data DeDuplication is not for everyone…
The thing with Data DeDuplication from Microsoft, is that it’s only available in Windows Server 2012 and above server operating system, and even though it can be hacked (very unsupported) to work with Windows 8.1, it does not work with Windows 7 or Windows 10 (at least not yet).
Data DeDuplication not working in Windows 10 was exactly my issue today. I was preparing 24 classroom machines which were to have 107 GB of VMs copied to them, and even on a gigabit network, it does take quite some time to copy 24 x 107 GB over the network. The solution was using ZPAQ (Incremental Journaling Backup Utility and Archiver), which is an archive utility that supports Data DeDuplication on any Windows version (client or server).
Using ZPAQ to create an archive with Data DeDuplication
I had my VMs, again 107 GB in size, in the E:W10MasterClass folder, and to create a small ZPAQ archive of my VMs I simply downloaded zpaq64.exe, navigated to the folder, and run the following command:
.\zpaq64.exe add W10class.zpaq VMs
Note: Do Not use the ZPAQ tool on disks that are already DeDuped with the Microsoft Data DeDuplication feature, will cause corrupt archives. Thanks Bert Mueller for pointing that out!
Yes, don’t use ZPAQ on DeDuped NTFS volumes.
ZPAQ has five different compression levels: The default, method 1, is the fastest, and is intended for archives/backups where you compress often and extract rarely. I did some testing on the various methods, especially with method 2, which is intended for distributing files where you compress once and extract often, however, the results where about the same for my classroom setup, so I just used the default method. Less to type 🙂
After about 25 minutes I got the ZPAQ archive file, only 15.8 GB in size, much better than the original 107 GB.
Note: Because I have SSD drives in my machine, it doesn’t really matter much, from a performance point of view, that I create the archive on the same disk as my source files are. But if you’re using a SAN or local spindle (mechanical) disks, the archive creation is obviously a bit quicker if you read from one disk, and write to another.
107 GB of virtual machines archived in to a 15.8 GB ZPAQ archive.
Extract the ZPAQ archive
To extract the files again, after copying the 15.8 GB W10class.zpaq archive to the student machines, I used the following command:
.\zpaq64.exe extract C:\ClassroomSetup\W10Class.zpaq -to C:\
Note: The extraction is quite fast, it takes about 15 minutes for my 107 GB of data to be extracted (on student machines with SSD drives), and since it’s done locally on each machine, and I didn’t have to transfer 107 GB of data over the network, only 15.8 GB, I’m a happy camper 🙂
Happy Deployment, Johan