Big Data Storage & Backup Solutions: On-premises & Cloud-based

Big Data is defined as “High volume, -velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.”

Regardless of the industry, enterprises rely on data and cannot tolerate data loss; this goes without saying about Enterprise Big Data as well. The acquisition of efficient and cost effective storage and backup solutions for big data can reinforce and improve enterprise productivity and profitability.  

Big data storage requires capacity and processing capabilities able to support IOPS intensive workloads. There are several storage solutions poised to address big data storage requirements. They comprise of on-premises solutions, cloud based solutions and a combination of both in the form of Hybrid Storage Solutions.

Which storage solution is the best for your enterprise? That depends on your enterprise data requirements. Each enterprise has unique data storage requirements. This implies that in terms of data storage solutions; a single solution cannot effectively address all kinds of enterprise data storage requirements. In other words, one shoe doesn’t fit all.

On-premises Big Data Storage

There are a number of choices for On-premises Big Data Storage:

The perfect solution as mentioned earlier depends on the enterprise data requirements.

Enterprise NAS Storage Solution

First, the Enterprise Network Attached Storage (NAS) Solution.  This appliance uses file level storage. The storage capacity and performance can be increased by adding more nodes. Conventionally, NAS appliances facilitate addition of disks to existing nodes; however, there are solution providers that provide innovative solutions like scaling-out. The addition of disk drives to an existing node is scaling up. When scaling up, the storage capacity is increased but the performance is compromised. Comparatively, the scaling out method not only adds to storage capacity but it also improves performance in multiples.

For instance, if the storage is currently utilizing two nodes and another node is added to this solution. Then the workload was initially distributed between two nodes, now it will be distributed among three nodes. The compute capabilities will improve from 2x to 3x.

Object Level Storage: Storage Area Network (SAN)

Another storage solution is object level storage using Storage Area Network (SAN) appliances. The architecture of these appliances replaces the tree-like architecture of the file level storage with a flat data structure. In this architecture data is located using unique IDs, just like the DNS system on the internet. This makes the handling of high volumes, of big data, a lot easier than the hierarchical structure.

Object storage products are being deployed to facilitate big data analytic environments and products due to their efficiency in addressing IOPS intensive workloads.

Hyper-scale Storage Solution

The best attribute of hyper-scale storage is its overwhelming size and ability to increase it indefinitely. In a data center, conventional hyper-scale storage runs on Petabyte scale. There are a number of differences between hyper-scale storage and conventional enterprise storage. For instance, hyper-scale storage is several times larger in magnitude than conventional storage (petabytes versus terabytes).

Hyper-scale storage is capable of supporting IOPS intensive workloads as they provision millions of users with just a few applications. Comparatively, traditional storage solutions support fewer users but have more applications. Service providers such as Social Media, Webmail, financial service providers and large government agencies use Hyper-scale storage solutions.

The downside to Hyper-scale storage is that it has a minimal set of features and it may lack redundancy. That is because the purpose of hyper-scale storage is to maximize the raw storage space while reducing the cost.

A plus to hyper-scale storage is that it tends to be software defined, relying on automation more than direct human involvement. This optimizes data storage and reduces the probability of errors by reducing human involvement.

Hyper-converged Storage Solution

A hyper-converged appliance is physical unit with storage, computation, virtualization and sometimes network technology combined into one. Hyper-converged appliances can be scaled out horizontally by adding more nodes. This facilitates IT administrators to create a distributed storage infrastructure using Direct Attached Storage (DAS) components from each physical server. These DAS are combined to create a logical pool of disk capacity.

Hyper-converged storage is a software defined storage that’s because each node, within a cluster, has a software layer running virtualization software. These software layers virtualize resources in an individual node and share them with other nodes within a cluster. This creates a single storage or compute pool accessible via a single interface.

On-premises Big Data Backup

On-premises backup comes in variable shapes and sizes. As big data requires high volumes processed at high velocities, on-premises backup infrastructure can effectively address these requirements. Based on the kind of data backed up, enterprises can efficiently leverage their backup storage space.

For instance, if the big data comprises of tier 1 data, then it will require high IOPS as compared to tier 2 and tier 3 data. In which case, keeping the tier 1 data on-premises is good practice as it improves productivity and makes important data highly available. However, tier 2 and tier 3 data makeup most of enterprise data. Based on access frequency, as these data types are less important keeping them backed up in on-premises infrastructure incurs costs.

In this instance, it is better to use cloud based services in combination with the backup infrastructure. By distributing the backup data over local infrastructure and the cloud, enterprises can efficiently leverage their storage space and make the entire backup process cost efficient as well.

Big Data Storage & Backup in the Cloud

The best thing about the cloud is how versatile it is. Cloud based services are now a must have for any enterprise. This includes big data storage and backup purposes.

As mentioned earlier, big data storage requires storage capacity and processing. In terms of storage capacity, the cloud is fulfilling. Enterprises can acquire storage services that facilitate simplified scalability. And these services are also capable of meeting the computation requirements of big data. Actually, experts recommend cloud powered data analytics for big data analysis because of the compute capabilities of the cloud.

For Big data storage, there are a ton of options available offered by service providers such as Amazon, Microsoft and Google. For instance, enterprises can effectively archive their data in Amazon Glacier with the lowest cost implications. For Hot and Cold data, there are storage tiers available such as Azure Hot blob storage, Azure Cool Blob storage, AWS S3 and AWS S3-IA. Other services such as Amazon Macie and Azure Confidential cloud computing are also promising because these services facilitate data security and privacy. These cost effective enterprise cloud storage solutions enable enterprises to efficiently utilize big data via the cloud.

By using cloud backup for big data, enterprises can utilize geo-redundant services that ensure high availability and data recovery. Enterprise cannot tolerate the loss of backed up data; that’s why they back it up in the first place. Using the cloud, backed up data can be replicated over multiple data centers. This way, the backups aren’t kept at a single location; adding another layer of security to the backup. Service providers ensure that the data being backed up to the cloud is protected via advanced encryption techniques before, after and during transit.

Summary

Big data storage and backup is a question that most enterprises juggle with. Enterprises can store and backup big data both on-premises and on the cloud. With on-premises data storage, enterprises can utilize File level storage (NAS), Object Level Storage (SAN), hyper-scale storage and/or hyper-converged storage.

On-premises backup solutions comprise of a collection of backup appliances that facilitate IOPS intensive workloads. Enterprises with research environments that emphasize on data accessibility should utilize backup appliances for their reduced latency.

Cloud based data storage and backup emphasize on the cost aspect. Computation is also not compromised with the cloud. Enterprises can scale-out almost infinitely and utilize additional resources at amazing costs to address their big data storage and backup requirements. Service providers such as Amazon, Microsoft and Google offer amazing services poised to effectively address big data storage and backup requirements. For more info on this topic, read our related article on the future of IoT and big data

Category: B2B News

Leave a comment!

Add your comment below, or trackback from your own site. You can also Comments Feed via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.

I agree to publishing my personal information provided in this comment.

Page last modified

Share
Tweet
+1
Share