GSX Blog

Windows Server and Deduplication

Posted by Arnaud Buonaccorsi on Fri, Apr 12, 2013

gsx solutions windows 2012 resized 600This week at MMS 2013, Microsoft has continuously discussed the topic of cost reduction. They reduced the price of hardware on virtualization, and they did the same for their network...Now they are trying to reduce the price of the storage.

Basically, most of the discussion is around SMB storage and Hyper-V. It is not perfect, but the recent acquisition of StorSimple makes me think we should have nice improvements before the end of the year.

So for the moment, we have to continue saving storage on SAN and Microsoft Server 2012. 

A recent study showed that more than 20% of data is duplicated data! The most recent release of Microsoft Server 2012 allows you to get rid of the duplicated data. For example, if two users have the same file, both will see the file and will be able to use it - but the storage will save only once the file on the disk. This “magic” uses variable-chunking and compression, and it can be applied to your primary data. I conducted a test on SATA drive and Iscsi Drive. It works perfectly but Microsoft recommends using around 4GB of memory.

Microsoft uses terabytes of real data inside Microsoft to get estimates of the savings you should expect if you turned on Deduplication for different types of data. Below you can see the result:

windows server 2012 deduplication resized 600

There is a clear return on investment that can be measured in millions of dollars when using Deduplication. The space savings are dramatic and the money saved can be calculated pretty easily when you pay by the gigabyte!

Let's see how to deploy it now. I was very excited to use it on my Cluster Shared Volumes at GSX, but it is not supported…So let’s go on some ISCSI disks used for home folders.

Evaluating Deduplication: Is it interesting for you or not?

C:\> DDPEVAL.EXE \\GSXADServer\home

You will obtain something like this: 

Data Deduplication Savings Evaluation Tool 
Copyright 2011-2012 Microsoft Corporation. All Rights Reserved.

Evaluated folder: \\gsxadserver\home
Processed files: 3456
Processed files size: 568.03GB
Optimized files size: 12.02GB
Space savings: 24.01GB
Space savings percent: 8
Optimized files size (no compression): 11.47GB
Space savings (no compression): 571.53MB
Space savings percent (no compression): 1
Files with duplication: 267
Files excluded by policy: 80
Files excluded by error: 0
 

The second time, you can use the PowerShell command Measure-DedupFileMetadata to determine how much potential disk space can be reclaimed on a volume if you delete a group of folders, a single folder, or a single file, and then run a Garbage Collection job. 

Install Deduplication on your Volume 

Open the server Manager and add the right Roles and Features:

  1. From the Add Roles and Features Wizard, under Server Roles, select File and Storage Services (if it has not already been installed).

  2. Select the File Services check box, and then select the Data Deduplication check box.

  3. Click Next until the Install button is active, and then click Install.

Open PowerShell Session as administrator and import commandlet needed to use Deduplication:

PS C:\> Import-Module ServerManager
PS C:\> Add-WindowsFeature -name FS-Data-Deduplication
PS C:\> Import-Module Deduplication

Enable Data Deduplication:

PS C:\> Enable-DedupVolume E:

Specify the Number of days to wait before a new file starts to be scanned by the Deduplication Scheduler: 

PS C:\> Set-Dedupvolume E: -MinimumFileAgeDays 20

If you set MinimumFileAgeDays to 0, Deduplication will process all files, regardless of their age. This is suitable for a test environment, where you want to exercise maximum Deduplication. In a production environment, however, it is preferable to wait for a number of days (the default is 5 days) because files tend to change a lot for a brief period of time before the change rate slows.

Check that you volume E: is now correctly set:

PS C:\> Get-DedupVolume

Let’s play with it

Alright, it has set very easily...You did this job with PowerShell but you could do it with the Servermanager. Now a job as been created by the system to run the scan...However I’m very impatient, so I will run a manual scan. To do this:

Start-DedupJob –Volume E: –Type Optimization

For your information, you can launch different scans on your drive. Optimization will try to compress and remove duplicate files but this is your other choice:

-Type<Type>

Specify the type of Data Deduplication Job. The acceptable values for this parameter are:
  • Optimization
  • Garbage Collection
  • Scrubbing
  • Unoptimization

Now my dedup job is up and running. :)

Next, it is time to conduct a check-up:

PS C:\> Get-DedupJob

Like this I can see my current jobs that are running or are queued to run. It is pretty straightforward but I need to see the progress too. So let’s try this:

PS C:\> Get-DedupStatus | f1

With this command I can see the free space, space saved, optimized files, InPolicyfiles (the number of files that fall within the volume deduplication policy, based on the defined file age, size, type, and location criteria), and the associated drive identifier...Perfect!

The last part is to optimize the process (memory used, priority) and to set your schedule to conduct different scans. But this will be in another post. However if you are too impatient I invite you to check the technet page dedicated to Dedup where you will find a lot of resources.

 

describe the image  
Enter our "I Wish...Microsoft" campaign today to win a solution to your biggest IT headache! Just stop by booth #211 at the Microsoft Management Summit, Tweet us your wish and use #IwishMSFT, or go on our Facebook page to let us know your wish. Good luck everyone!

 

Tags: application performance management, Microsoft, mms2013, windows server, application monitoring, microsoft server 2012, Monitoring, Powershell, APM