This week at MMS 2013, Microsoft has continuously discussed the topic of cost reduction. They reduced the price of hardware on virtualization, and they did the same for their network...Now they are trying to reduce the price of the storage.
Basically, most of the discussion is around SMB storage and Hyper-V. It is not perfect, but the recent acquisition of StorSimple makes me think we should have nice improvements before the end of the year.
So for the moment, we have to continue saving storage on SAN and Microsoft Server 2012.
A recent study showed that more than 20% of data is duplicated data! The most recent release of Microsoft Server 2012 allows you to get rid of the duplicated data. For example, if two users have the same file, both will see the file and will be able to use it - but the storage will save only once the file on the disk. This “magic” uses variable-chunking and compression, and it can be applied to your primary data. I conducted a test on SATA drive and Iscsi Drive. It works perfectly but Microsoft recommends using around 4GB of memory.
Microsoft uses terabytes of real data inside Microsoft to get estimates of the savings you should expect if you turned on Deduplication for different types of data. Below you can see the result:
There is a clear return on investment that can be measured in millions of dollars when using Deduplication. The space savings are dramatic and the money saved can be calculated pretty easily when you pay by the gigabyte!
Let's see how to deploy it now. I was very excited to use it on my Cluster Shared Volumes at GSX, but it is not supported…So let’s go on some ISCSI disks used for home folders.
Evaluating Deduplication: Is it interesting for you or not?
You will obtain something like this:
Data Deduplication Savings Evaluation Tool
Evaluated folder: \\gsxadserver\home
The second time, you can use the PowerShell command Measure-DedupFileMetadata to determine how much potential disk space can be reclaimed on a volume if you delete a group of folders, a single folder, or a single file, and then run a Garbage Collection job.
Install Deduplication on your Volume
Open the server Manager and add the right Roles and Features:
From the Add Roles and Features Wizard, under Server Roles, select File and Storage Services (if it has not already been installed).
Select the File Services check box, and then select the Data Deduplication check box.
Click Next until the Install button is active, and then click Install.
Open PowerShell Session as administrator and import commandlet needed to use Deduplication:
PS C:\> Import-Module ServerManager
Enable Data Deduplication:
|PS C:\> Enable-DedupVolume E:|
Specify the Number of days to wait before a new file starts to be scanned by the Deduplication Scheduler:
PS C:\> Set-Dedupvolume E: -MinimumFileAgeDays 20
If you set MinimumFileAgeDays to 0, Deduplication will process all files, regardless of their age. This is suitable for a test environment, where you want to exercise maximum Deduplication. In a production environment, however, it is preferable to wait for a number of days (the default is 5 days) because files tend to change a lot for a brief period of time before the change rate slows.
Check that you volume E: is now correctly set:
PS C:\> Get-DedupVolume
Let’s play with itAlright, it has set very easily...You did this job with PowerShell but you could do it with the Servermanager. Now a job as been created by the system to run the scan...However I’m very impatient, so I will run a manual scan. To do this:
Start-DedupJob –Volume E: –Type Optimization
For your information, you can launch different scans on your drive. Optimization will try to compress and remove duplicate files but this is your other choice:
Specify the type of Data Deduplication Job. The acceptable values for this parameter are:
- Garbage Collection
Now my dedup job is up and running. :)
Next, it is time to conduct a check-up:
|PS C:\> Get-DedupJob|
Like this I can see my current jobs that are running or are queued to run. It is pretty straightforward but I need to see the progress too. So let’s try this:
|PS C:\> Get-DedupStatus | f1|
With this command I can see the free space, space saved, optimized files, InPolicyfiles (the number of files that fall within the volume deduplication policy, based on the defined file age, size, type, and location criteria), and the associated drive identifier...Perfect!
The last part is to optimize the process (memory used, priority) and to set your schedule to conduct different scans. But this will be in another post. However if you are too impatient I invite you to check the technet page dedicated to Dedup where you will find a lot of resources.