Inline deduplication is a technique that removes redundant components of data before writing it to a storage device. Eliminating duplicate pieces reduces the storage space requirements without compromising the safety of the data.
As the amount of data continues to grow, the demand for more storage space, data center size, cooling, network bandwidth, and other requirements increases. In addition, the growth adds operational complexity, administration time as well as risks. Consequently, ensuring data security and compliance becomes costly and challenging.
According to IDC, multiple duplicate copies of content account for about 75% of the data in storage today. As such, removing the redundancies can help organizations reduce their storage needs and costs, and this is where deduplication comes in.
Basically, deduplication, dedupe or deduplicate is the technique of eliminating duplicate components of data before backing up or for primary storage.
Currently, the five main types of deduplication are;
Although the outcome depends on the environment, the inline deduplication is probably more efficient and economical than the post-process technique for some applications. However, the savings it achieves depend on the type of files, frequency of backups, environment, and other variables. Typical solutions can cut the storage needs by a factor of between 10 and 30, and this translates to lower drive capacity and bandwidth requirements.
Generally, reducing the data footprint has benefits such as smaller data center space, and savings on hardware, software, bandwidth, and power.
How does inline deduplication work?
The technique compares new data with what is in the storage device and only writes unique parts of the content. If there are matching pieces, it does not write the data again but adds a pointer to the existing data in the storage media.
The deduplication software breaks the data sets into smaller parts and then uses algorithms to append identifying hashes to each of the chunks, file, byte or block. Using smaller data pieces delivers better reduction and storage efficiency.
When there is new data to write, the algorithms first checks if the hash identifier is in the storage and only writes the unique parts. If there is a match, it does not write the data but rather adds a pointer to the existing piece on the backup drive.
For example, if a file is 100% original, the system copies everything to the backup device. However, if there is a similar file on the backup, it does not back it up in its state; instead, the technology writes a pointer or placeholder to a hash table.
When restoring, the system uses pointers in the hash table listing to retrieve and copy the duplicate pieces of the content.
The removal of duplicates happens before the system writes the data to the disk, and this may slow down the backup process. However, eliminating redundant content reduces the amount of data to write, and the overall delay may be insignificant.
Benefits of inline deduplication
The benefits specific to inline processing include;
Hardware vs. software inline deduplicating appliance
The choice between hardware and software deduplication depends on the environment as well as current backup software and configuration. While you need additional software and configuration on older storage systems, the modern hardware such as flash comes with inbuilt inline deduplication options. If you have a system without the inbuilt option, you can extend its capabilities by inserting an inline deduplicating appliance in front of the existing legacy storage array.
Plug n play hardware appliances with built-in deduplicating capabilities provide faster processing and are easy to add. However, scalable is usually a challenge in addition to sometimes requiring complex integrations with existing infrastructure.
On the other hand, there are now powerful Intel processors that are enabling software-based solutions to deliver better performance without compromising on the speed. The software approach, such as the Altaro backup solutions and others, have fewer overheads, are less costly, more flexible, easily scalable to the petabyte scale level, and ideal for virtual and cloud environments.
When do you use inline deduplication? Although inline deduplication is one of the major data reduction techniques, it is not suitable for some applications. For example, it delivers negligible savings for engineering test data, music, video, x-ray data, etc.
The technology may not be the best fit for every environment and below are some areas where it works better.
As an example, imagine your organization has about 500 virtual machines running the same operating system. In such a case, each instance of the OS comprises of identical blocks. Using the inline deduplication, you only need to write each block once instead of 500 times.
Another application where technology delivers huge savings is when archiving emails. For example, instead of storing a copy of an attachment for every user, the technology will only write one copy to the backup storage media.
Applications in hyper-convergence infrastructure appliances and virtual desktop environments
Most HCI vendors prefer the inline deduplication to optimize internal storage. Compared to the post-process, the inline has better performance in addition to reducing storage capacity requirements and wear of the drives. Usually, the HCI appliances can only accommodate a limited number of physical disks and removing the duplicate data helps to optimize the limited storage space.
Inline deduplication is also suitable for VDI storage which has always been a challenge. Most people are usually after the performance when deploying virtual desktop environments. To achieve this, providers often use expensive, high-performance storage. By reducing the data footprint, you can efficiently use the limited storage that the expensive but high-performance drives offer without spending more on extra drives.
Deduplicating inline for primary storage
Although most organizations use inline dedupe for backup or on secondary disks, it is also applicable for primary storage. This is especially useful when you want to take advantage of the fast but expensive flash memory.
Unfortunately, the cost of the flash storage is usually very high and you may not justify purchasing larger capacities. But, eliminating the duplicate information allows you to save and enjoy the high speeds and a better return on investment.
In some applications, the inline deduplication has the ability to level the capacity playing field between the low-cost traditional storage arrays and high performance and costly all-flash arrays. For example, a 10:1 ratio means that a 10 terabyte all-flash array has the potential to store data at the same level as an 80 to 100 TB array.
Data volumes continue to grow at a faster rate than the drop in the price of storage. Yet, there is a need to look for ways to reduce the storage costs without sacrificing the security and quality of the data.
One of the most effective techniques is the inline deduplication which removes duplicate pieces before writing data to the drive. Consequently, the downstream operation such as the backup, archiving, replication and network transfers will benefit from the lower data footprint.