Fans of TV series Mr. Robot may have have seen the Season 2 premiere which centered on a ransomware attack initiated by hacker group fsociety. The target was Evil Corp, whose logo resembles that of the old “Enron” logo (which may have been the creator's intent), and the company found itself caught in a crippling hack.
Before we move any further, here’s a quick overview of Mr. Robot for those who are not familiar with the TV series.
Airing on USA Network, Mr. Robot is an American drama–thriller television series created by Sam Esmail. It stars Rami Malek as Elliot Alderson, a cybersecurity engineer and hacker who suffers from social anxiety disorder and clinical depression. Alderson is recruited by an insurrectionary anarchist known as "Mr. Robot", played by Christian Slater, to join a group of hacktivists. The group aims to cancel all debts by attacking the large corporation Evil Corp.
Ransomware: Reel and Real Life Threat to Data
Now if you have yet to make the connection between Reduxio and Mr. Robot, let me help connect the dots. Clearly, Mr. Robot is a fictitious story. However, as it is often stated, art can sometimes mimic life. In the storyline we are following, ransomware has infected the entire banking network of Evil Corp.
In a ransomware attack, the infectious code can hold a company and its valuable data hostage by using layered high level bit encryption. As its name implies, the hacktivists or attackers will then command large sums of money as ransom payment for decrypting the data or restoring it back to its original state. Ransom is often requested in Bitcoin currency. In Evil Corp’s case, the demand was for $5.9M USD.
In real life, not all ransomware threats command such a high price. But the effect on the victims is always devastating. I recall a customer of ours here in Reduxio whose company had such a high transaction workload, they couldn’t afford even a 5-minute RTO. For that period of time, the business would stand to lose $100K per second.
This is analogous to an HPC environment or Wall Street. Just imagine how many transactions are lost if a company can’t recover its data down to the last second.
Data Recovery by BackDating™
Some may try to recover data using the snapshot technology but by definition, restoring from a snapshot or instant replay is not “instant” if it can only manage 1 to 5 minutes RTO. Today’s e-commerce industry mandates an RTO of one second or less. If this can’t be achieved, companies lose a lot of money. This could even be a Resume Producing Event or RPE for the person responsible. If that’s you, good luck in updating your LinkedIn profile.
Going back to our client’s case mentioned above, Reduxio was able to get the company's data restored to one second before the viral attack, with zero data loss.
Just recently, I had a conversation with another Reduxio customer who also happens to be a fan of Mr. Robot. He quickly pointed out that if Evil Corp had a Reduxio system in place and implemented BackDating™, USA Network wouldn’t have much of an episode left. That particular storyline would have abruptly ended with Reduxio restoring Evil Corp data.
Now we’ll let the show’s producers do what they want with their narrative, but in real life, this is exactly how the story would end. Well, that is, if you have Reduxio as your primary storage device, and BackDating™ technology as your go-to recovery option.
So just what is this BackDating™ technology?
Whenever I’m asked to explain BackDating™ and how this will benefit or mitigate risk, I find it important to first examine what's been used historically and run this comparative analysis of how point-in-time copies, split mirror functionality, and/or some form of obsolete snapshot technology is used in today's systems. Let's take a quick look at these processes first.
There are a few practices that need to be observed when using point in time copies. For one, the data volumes to be backed up must be stable during the backup operation. Further, the data should not be allowed to change. If this is not enforced, inconsistent or corrupt data can result. For example, in an image-level backup, the resulting (restored) file system might contain corrupt file tables because of incomplete metadata updates. In a file system level backup, the data in the files or across several files could be inconsistent.
There are techniques you can use to obtain consistent backups. First, the application servers must be told to empty their buffers and pause their data updates at a consistent state. To support this, database applications have a third party method to quiesce the application, some storage manufacture have call outs via API to hardware to accelerate and insure consistency within a database. Then a quick point-in-time copy of the data must be performed. As soon as this point-in-time copy is captured, the application servers can be released to perform normal IO operations. The backup can then be performed using the point-in-time copy.
There are two main techniques for performing point-in-time copies: split mirror copies and copy-on-write snapshots. Let’s discuss these two further.
Backups Using Split Mirrors
Split-mirror functionality is generally provided by the disk array subsystems. In addition, split-mirroring is possible using logical volume manager software in the hosts or somewhere in the storage network. However, subsystem-based split-mirror copies are, by design, the highest consumption model.
The figure below illustrates the steps involved in a typical split-mirror operation.
- A mirror copy of the source volume is created.
- Once the mirror is started, every update to the original volume is also reflected to the mirror volume.
- Just before the backup is started, the application servers are rendered quiescent (paused), and the mirror between the two copies is broken.
- After the two mirrors are split, the server is free to use and update the original volume. The backup operation can now be performed using the mirror copy.
- After the backup operation is complete, the mirror copy should be resynchronized with the original volume. This means that the write operations must be logged while the mirror is broken and writes must be applied to the mirror copy after the mirror is reestablished. Data structures must be in place to denote which blocks are different between the two copies. In some implementations, it is also possible to perform the synchronization from the mirror copy back to the original volume.
- Finally, after the resynchronization, the mirror operations continue normally until the next split operation.
Besides being a backup source, mirrors are a useful tool for high-availability and disaster recovery tool. However, the original volume is not protected after the mirror split. To solve this problem, three-way mirroring is sometimes used. That way when one of the mirrors is split, at least two synchronized copies will exist at all times.
Backups Using Snapshots
Another point-in-time copy generation technique makes use of snapshots. Snapshots create almost instant copies of the source volume at the virtual level. When the data is updated in the original volume, algorithms are used to preserve the data in the snapshot copy (Brown).
One technique used in creating a snapshot is called copy-on-write. As the name implies, the original data is copied to the snapshot volume only if portions of the original data are overwritten. The figure below makes a clearer illustration of this.
Original Image Source: SnapMirror and SnapRestore: Advances in snapshot technology
Before the snapshot is taken, the servers are quiesced for a short time and a snapshot index is generated. At this point, both the original volume and the snapshot volume have pointers to the original data blocks, making the copies exactly the same as expected. This operation is generally fast, and server operations resume immediately.
When a write operation is pending, that operation waits until the original data is copied to a new block. Then, the snapshot pointers are updated to point to the new block with the original data. Once the copy is done, the write operation is allowed to complete, changing the original block. This is shown in the third step in the figure above. These operations ensure that the data pointed to by the snapshot indexes is always the data that existed at the point of the snapshot.
Instead of copy-on-write, you can use a technique called redirect-on-write, where the original block is left in the snapshot copy and a new block is used for the updated data. This removes the copy delay required by the copy-on-write method. However, redirect-on-write changes the layout of the original file blocks. When the snapshot is removed, the blocks should be copied back to their original locations. So copy-on-write is not really eliminated; it is just deferred.
When a backup is required, it can safely use the snapshot copy, which is guaranteed to be consistent. After the backup operation is complete, the snapshot volume and the blocks in it can be removed without affecting the original volume. Alternatively, the snapshot copy can be reserved as an online copy. Several snapshot implementations allow instant recovery without going to the backup tapes.
A snapshot volume still depends on the original volume for most of its data blocks. The snapshot space only contains the blocks that have been updated since the snapshot time. The blocks that are not updated are still in the original volume. Therefore, it is generally not advisable to rely solely on the snapshots. Snapshots can be used to recover earlier versions of files and deleted files.
Snapshots also have frequency limitations, usually a minimum of 5 minutes apart for most systems. However, in most cases, no more than a few snapshots a day are configured. But given the prevalence of latest virus threats like ransomware, and the millions of transactions that can take place in most applications or HPC environments during the time windows in between snapshots, you have to consider: how much data will be lost, and at what cost?
The space required for snapshots is proportional to the volume of data that has been updated since the time the last snapshot was taken. If this ratio is small, the space used for snapshots is insignificant. However, the inverse is also true and the “snapshot reserve space” is consumed quickly as differentials increase, even to the point of consuming the entire array capacity. This happens across all volumes that exist on a given array. Expanding volumes and will cause systems to request more space. To prevent any one volume from consuming all the space available on a given array, the snapshot feature is often turned off to conserve primary storage capacity.
Virtual Disk Service (VDS) and Virtual Shadow Copy Service (VSS) are Microsoft features that streamline the process of application quiescing, taking snapshots, and backing up of data. These interfaces provide hooks to coordinate the applications, the file system, TimeOS, and array subsystems.
Reduxio BackDating™ Technology
Now that we have a firmer grasp of the underlying processes behind point-in-time copy generation, let’s go back to Reduxio’s BackDating™ technology and see how it edges out dated point-in-time technologies.
BackDating™ can instantly recover data from any second with no pre-allocation of snapshot reserve space. It eliminates the need to configure, schedule, and manage snapshots and consistency groups as required in any conventional storage platform. These traditional platforms may use terms like instant replay and snapshots, all of which are simply descriptions or marketing names for obsolete forms of data protection.
What Reduxio does is reduce additional capacity required for BackDating™ by implementing global dedupe and compression. BackDating™ uses unique, proprietary metadata structures to represent physical deduped and compressed blocks, and the logical references to these blocks—both in terms of offset and more importantly, time of IO.
BackDating™ also has no performance impact because no data is copied or moved. And unlike snapshots and/or instant replays, no metadata has to be copied or moved via post-process like data progression on conventional storage subsystems. In other words, if you have to ask “what’s the fastest method of moving data,” don’t move it.
BackDating™ is performed on a global level on all volumes with Reduxio delivering the highest QoS. Operating on the assurance of such quality service, Reduxio therefore does not require or offer the ability to disable BackDating™.
- Always-on history. History is always tracked, and unlike copy-on-write, there is no schedule. Data is always recoverable and the application consistent.
- Recover to any second. Instantaneous recovery in place to any second in the past without moving data.
- Clone from any second. Instantaneous cloning for the creation of multiple versions of applications and data for development and test.
- Multi-level, independent thin clones. Thin, virtual writable clones that do not depend on the source volume.
- Bookmarks. Enables marking of specific timestamps for future recovery and cloning, or to provide a mount point/volume for any tape backup provider.
- Granular history management. Global and per volume management of history expiration times.
- Automatic consistency. Cross-volume BackDating™ to the same timestamp is inherently consistent without any configuration.
- Application support. Reduxio BackDating™ supports the recovery and cloning of platforms and applications.
- Reduxio StorApp for VMware vSphere. Storage management console for VMware virtualized infrastructure that provides integrated configuration and datastore recovery.
- Reduxio StorKit for Microsoft Windows Server. Native support for Microsoft backup application using a Reduxio VSS provider integrated with BackDating™.
In summary, traditional backups optimized with the features of Reduxio’s BackDating™ technology provide instantaneous restore and independent backup volumes for your backup provider.