Deduplication best practices

Deduplication is a complex process that depends on many factors.

The most important factors that influence deduplication speed are:

To increase deduplication performance, follow the recommendations below.

Place the deduplication database and deduplicating vault on separate physical devices

To increase the speed of access to a deduplication database, the database and the vault must be located on separate physical devices.

It is best to allocate dedicated devices for the vault and the database. If this is not possible, at least do not place a vault or database on the same disk with the operating system. The reason is that the operating system performs a large number of hard disk read/write operations, which significantly slows down the deduplication.

Selecting a disk for a deduplication database

S = U / 64 + 10

where

S – disk size, in GB

U – planned amount of unique data in the deduplication data store, in GB.

For example, if the planned amount of unique data in the deduplication data store is U=5 TB, the deduplication database will require the free disk space not less than

S = 5*1024 / 64 + 10 = 90 GB

Selecting a disk for a deduplicating vault

For the purpose of data loss prevention, we recommend using RAID 10, 5 or 6. RAID 0 is not recommended since it not fault tolerant. RAID 1 is not recommended because of relatively low speed. There is no preference to local disks or SAN, both are good.

8 GB of RAM per 1 TB of unique data

This is a recommendation for a worst case scenario. It is not necessary to follow it if you do not experience a deduplication performance problem. However, if the deduplication runs too slowly, check the Occupied space parameter of the deduplicating vault. By adding more RAM to the storage node you can significantly raise the deduplication speed.

In general, the more RAM you have, the greater the deduplication database size can be, provided that the deduplication speed is the same.

Only one deduplicating vault on each storage node

It is highly recommended that you create only one deduplicating vault on a storage node. Otherwise, the whole available RAM volume will be distributed in proportion to the number of the vaults.

64-bit operating system

For optimal deduplication performance, install the storage node in a 64-bit operating system. The machine should not run applications that require much system resources; for example, Database Management Systems (DBMS) or Enterprise Resource Planning (ERP) systems.

Multi-core processor with at least 2.5 GHz clock rate

We recommend that you use a processor with the number of cores not less than 4 and the clock rate not less than 2.5 GHz.

Sufficient free space in the vault

Indexing of a backup requires as much free space as the backed up data occupies immediately after saving it to the vault. Without a compression or deduplication at source, this value is equal to the size of the original data backed up during the given backup operation.

High-speed LAN

1-Gbit LAN is recommended. It will allow the software to perform 5-6 backups with deduplication in parallel, and the speed will not reduce considerably.

Back up a typical machine before backing up several machines with similar contents

When backing up several machines with similar contents, it is recommended that you back up one machine first and wait until the end of the backed up data indexing. After that, the other machines will be backed up faster owing to the efficient deduplication. Because the first machine's backup has been indexed, most of the data is already in the deduplication data store.

Back up different machines at different times

If you back up a large number of machines, spread out the backup operations over time. To do this, create several backup plans with various schedules.

Configure alert notifications

It is recommended that you configure the "Vaults" alert notification in the management server options. This can help you to promptly react in out-of-order situations. For example, a timely reaction to a "There is a vault with low free space" alert can prevent an error when next backing up to the vault.