Understanding data storage technologies

Advantech Australia Pty Ltd
By Jesse Chuang, Senior Software Manager
Thursday, 07 August, 2014

With the growing amounts of data being stored by industrial organisations today, understanding computer storage technologies is critical to ensuring the best-possible performance and reliability of access for the data stored on those systems.

With the popularity of computers and networks, most enterprises and organisations require storage of databases, documents, files and other types of information. Since IBM introduced the first hard disk drive (HDD) in 1956, disk technology has made great progress and plays an essential role in today’s information technology world.

For enterprise applications, HDDs are used with multiple-user computers running a variety of tasks such as transaction processing databases, computing software and storage management software. Due to fast-growing, data-intensive network services, the number and size of files has been growing at an exponential rate while at the same time information exchange has become more time critical. As a result, existing data storage systems face many challenges and HDDs must operate continuously in demanding environments as well as delivering the highest-possible performance without sacrificing reliability. Accordingly, choosing the appropriate storage system is an important key in establishing an effective and flexible fundamental storage architecture set-up. In this article, we will provide an overview of storage systems to help you understand the related concepts and knowledge so as to determine which solution will work best for your business.

Storage system technology trends

To define it simply, a storage system provides the data storage services for a host computing system and generally contains three major parts:

A storage medium to store digital data
A standard interface to connect to the computing system
A storage control unit to manage data transfer and disks

Taking an enterprise-grade storage system as an example, the HDDs are the main components that store the digital data, and the significant evolution in recent years is that the transfer interface has been changed from the original parallel bus to a serial bus architecture. Now it seems that both SATA (serial advanced technology attachment) and SAS (serial attached SCSI) are the mainstream disk drive interfaces. Currently SAS-based HDDs are used for enterprise-based storage solutions but there are a growing number of enterprises that have started using hard drives with SATA interfaces to reduce implementation costs, since it comes at a lower price and performance has been continuously improving (up to 6 Gbps).

In addition, the connecting interface serves as data access channel between the storage device and host, but different storage interfaces will affect the transmission speed and stability. Unlike the direct attached storage (DAS) system, which uses SAS to connect to the host computing system but within a limited distance (within 10 metres), a network attached storage (NAS) system, for example, is connected to computers over a network.

As mentioned earlier, enterprises have primarily adopted the HDD as a storage medium. However, it uses a mechanical arm with a read/write head to move around and read information from the right location on a storage platter. Such mechanical movement can easily cause vibration issues and so the access performance of HDDs also cannot be significantly improved. Consequently, flash memory in the form of solid state disks (SSD) - without moving parts - have been welcomed by the market, due to the considerations of durability, speed and power consumption.

Common storage systems

Using diverse control methods, a storage system is able to provide varied functions. There are four types of storage systems commonly in use, namely JBOD, RAID, NAS and SAN. The following describes these in detail.

As the name implies, JBOD (just a bunch of disks) is an array of hard disks that are merely concatenated together and haven’t been configured with any kind of redundancy mechanism. Although it can only control multiple independent disks at the same time without advanced features, many companies are still willing to use it to expand storage space, since it provides a lot of storage capacity for a host by a single interface at a relatively lower cost compared with other system options.

The second one is RAID (redundant array of independent disks). The basic idea is to combine a plurality of inexpensive drives into a disk array group, so that the performance achieved is even more than one expensive hard drive with huge amounts of storage. For the host system, the array can be accessed by the operating system as one single drive. Depending on the selected level of redundancy, RAID can offer better benefits than single hard disk, such as enhanced fault tolerance and increased data processing efficiency. The different schemes or architectures are named by the word RAID followed by a number and the common levels are RAID-0/1/5/6/10/50/60.

As previously described, NAS is a network-attached storage system and provides a cost-effective way to add hard disk space to a network. NAS is often connected to a host system via Ethernet, which is much cheaper than other interfaces and is a general connection interface in the enterprise network environment. Featuring networked and long-distance connections (compared with directly attached storage or DAS), the storage system and host system do not need to be placed in the same location, thus enhancing enterprise information security. Besides, the basic unit of storage on disk is different between DAS and NAS. The former is a block-level data transfer and the latter is file-level data sharing, so NAS is generally known as file-based storage.

The last one is SAN (storage area network) and, similar to NAS, transfers data between servers and storage devices through a network. Based on the network environment, a user can store or back up all data to a storage system which is located at another building to ensure that disaster recovery can be done within minutes rather than hours or perhaps days, and a user can still access information if there are major problems, such as a fire or flood. In contrast to NAS, the host system regards SAN as a DAS system due to taking a block as a basic unit of storage. In other words, this block-based storage has the advantages of DAS and NAS in that it allows multiple host systems to use the storage space without file-based limitations. Until recently, most SAN systems used a fibre interface, but now Ethernet network interfaces for SANs have become increasingly popular due to their competitive price and the widespread use of Ethernet in enterprises and organisations. Ethernet-based SANs are often called ‘IP-SAN’ since they use a network protocol that is based on IP (Internet Protocol).

The advanced features of SANs

From the above description of many types of storage systems, you will find that SAN seems to be the best choice for organisations with rapidly increasing IT needs such as video surveillance, broadcasting, medical image processing, cloud computing and big data processing - to maximise the utilisation of storage equipment.

Thin provisioning

One of benefits of installing a SAN is better disk utilisation through ‘thin provisioning’, which is a mechanism to provide a host with virtual capacity volumes and allows space to be easily allocated to servers, on a just-enough and just-in-time basis. Using thin provisioning, enterprises can maximise the value of their storage investment because of the on-demand allocation of blocks of data against the traditional method of allocating all blocks up front. For example, a user assumed that he may need 1 TB of storage space but only uses 100 GB of physical disk capacity right now - thin provisioning allows the system to create the virtualised disk space for the user and just incrementally add more physical disk capacity into the underlying storage pool when the user really needs it. Meanwhile, this great flexibility enables users to fully utilise the storage space more effectively rather than hundreds or thousands of partially utilised local disks wasting power and generating heat in the data centre.

Storage snapshots

Unlike traditional full backups that may take a long time to complete, the ‘storage snapshot’ only takes a minimum of time since it uses a set of reference markers or pointers to data stored in a SAN. By implementing copy-on-write methodology on entire block devices and copying changed blocks to other storage, a storage snapshot can preserve a self-consistent past image of the block device and consumes less disk capacity because the storage required to create a snapshot is minimal, and holds only the data that is changing. Additionally, if your server is running applications and has critical information that needs to be recovered quickly when a problem occurs, storage snapshot is a function that can provide speedy recovery.

Data deduplication

Data deduplication is a specialised data compression technique and works by eliminating duplicate data to ensure only one unique instance of the data is actually retained on physical disks. Since redundant data is replaced with a pointer to the unique data copy, this function can improve storage utilisation and can also be applied to network data transfers to reduce the number of bytes that must be sent. Combining deduplication with traditional compression can enable the size of the compressed file to be only one tenth of the original size; a system can save more storage space while keeping data integrity.

Scale-out storage

In order to establish a flexible and high-performance storage system, scale-out storage Ethernet architecture is ideally suited to virtualisation, cloud, and big data environments by adding more nodes to a system horizontally. The scale-out storage differs conceptually from the older scale-up approach which just vertically added storage capacity to the same head end; moreover, the performance will be significantly decreased when the loading is too heavy. On the contrary, capacity, computing and network connectivity are scaled equally in the scale-out environment, so performance can remain linear even as more units are added. Secondly, the host can freely access any storage space and these nodes can communicate with each other as well.

Conclusion

As businesses continue to grow and flourish, their storage systems need to be upgraded and the rise of emerging applications (such as cloud systems, video-on-demand services, high-definition image files, real-time and interactive data exchanges, etc) makes massive data management a serious problem. An industrial storage solution is preferable and applicable to enterprises, especially in non-consumer applications, to easily increase capacity with better performance and reliability and to maximise the storage management efficiency with effortless backup and maintenance while providing long-term investment protection for your company.

Understanding data storage technologies

Storage system technology trends

Common storage systems

The advanced features of SANs

Thin provisioning

Storage snapshots

Data deduplication

Scale-out storage

Conclusion

Collaborative robots: the smarter way forward

AOG bringing the best of the best to Perth in 2015

High-temperature gas flow measurement using optic fibre

Content from other channels on our network