Optimal architectures for intelligent storage systems

09 Dec.,2024

 

Optimal architectures for intelligent storage systems

Brute force has been the critical lever to achieving advances in storage technology for much of its history. Making storage devices bigger, faster and denser has worked well and will no doubt continue to contribute to improved storage systems, but it is no longer the best way to improve storage costs and performance. Instead, machine learning and analytics in the form of intelligent storage systems now drive the most important advances in storage technology.

Goto Maxrac to know more.

What is an intelligent storage system?

Pre-intelligent storage systems optimize for low-level operations of a storage device, such as reading from an SSD or sending a packet to a network interface. Intelligent storage systems operate at higher levels of abstractions by using data about device operations to improve performance within a device and observing data utilization patterns to optimize system-level operations.

Intelligence is incorporated into storage systems at three distinct levels: device-level optimizations, tiered storage and data lifecycle management, and data accessibility support. In the case of device-level optimizations, machine learning algorithms identify categories of data with similar access patterns. If a machine learning model predicts a set of blocks will likely be read in the future, those blocks can be copied into a cache prior to the time they are read.

This reduces read latency, which is particularly important for use cases like training machine learning models that require not only large volumes of data, but data delivered fast enough to keep up with the processing speed of GPUs and other accelerators.

Tiered storage and data lifecycle management support focus on larger volumes of data than device-level optimizations. While the latter targets the optimal placement of soon-to-be-used data and watches for potential failures, the former is designed to optimize the placement of data for long-term storage. For example, recently generated time-series data is more likely to be queried than older time-series data. Tiered storage intelligence can detect different access patterns to place likely-to-be-accessed data on low-latency but higher-cost storage, while migrating older data to a less expensive storage platform.

As the volume and variety of data grow, it becomes increasingly difficult to find specific data. Of course, searching for a specific file by name or restoring a set of archive files created at a particular time are straightforward operations. But not all data access requirements are so simple. For example, engineers may find intelligent machine learning models performing poorly in production because they have not been trained on a sufficiently broad training set.

They would likely want to search storage systems looking for data with specific characteristics. Intelligent indexing, tagging and retrieval are essential to implementing this use case.

Features of intelligent storage systems

Intelligence evolved in a variety of ways in living creatures, and a similar diversity of approaches is found in intelligent storage systems. Some of the most common features across intelligent storage systems are the use of predictive analytics, distributed storage and processing, optimal data placement and enhanced security.

Some intelligent systems use predictive analytic techniques. These are essentially statistical and machine learning methods that detect patterns in device operations data and use those patterns to predict how data stored on the device will be accessed.

For device-level optimizations, this kind of analysis depends on I/O trace logs, which are records of each operation on a device. Statistical techniques do not need to understand the meaning of the data to the user -- it makes no difference if a sales transaction or a sensor on a vehicle generated the data. What matters is the characteristics of the data, such as where it is located and when it was written. Clustering algorithms are typically used for this kind of predictive analytics.

Intelligent storage systems can span multiple devices. An IoT sensor can send data to an edge device for storage and further analysis. The edge device can perform a preliminary analysis of the data to determine which data should be stored locally and which should be sent to a centralized analysis system. For instance, an anomalous set of measurements from a sensor may indicate a problem that needs to be addressed immediately. That data is then sent directly to the next stage of the ingestion process, while other data is simply stored and sent in batches at a later time.

Intelligent systems can also manage data across different storage tiers. Recent time-series data is much more likely to be queried than older data, so it should be kept in low-latency storage while older data can be moved to lower-tier storage that may have longer latency but also costs less. This kind of data lifecycle management needs to be automated, as the volume and types of data that require this sort of management are not amenable to manual management. At best, humans may define high-level lifecycle management policies, but we will depend on intelligent storage systems to implement those policies across a range of data sets and use cases.

Enhanced security is another feature of intelligent storage. Applications and services will often have predictable patterns of CPU utilization, IOPS and other commonly monitored metrics. Variations from baseline operations can indicate a potential security problem but are not uncommon, so users will need to tune intelligent storage systems to detect variations that are indicative of a potential failure, security breach or other threat to operations. For example, machine learning engineers can collect performance data generated during controlled ransomware attacks to build models that can predict when an attack has started.

These features -- as described -- are task-specific, such as optimizing data placement. But, in practice, organizations implement them across a range of infrastructure components. For this reason, it is important to understand the overall architecture of intelligent storage systems.

The architecture of intelligent storage systems

Storage systems, including those of the intelligent variety, are part of a larger infrastructure that includes servers, networks and the storage systems themselves. Intelligent storage systems have four main components: a front end, a cache, a back end and a persistent store, such as a disk or SSD.

The primary role of the front end is to communicate with the storage system network. Front ends are made up of ports and controllers. Front-end ports enable host servers to connect to the storage system. Ports are designed to support transport protocols, such as SCSI and Fibre Channel. Front-end controllers are responsible for routing data to and from the cache. They are also responsible for optimizing I/O operations, which is typically done using command queueing algorithms that optimize the order in which I/O commands are executed.

Cache is low-latency memory used to reduce the time required to perform I/O operations, at least from the perspective of the application or service using the storage system. Caches store both data and information about the location of data in the cache and on disk.

Different strategies are used to manage data in a cache depending on what aspect of I/O operations should be optimized. With a write-back cache, for example, data is written to the cache and an acknowledgment is sent to the application. After the acknowledgment is sent, the data is written to disk. This minimizes the time the application waits for an acknowledgment, but at the risk of losing data if the cache should fail before writing the data to disk. Under the write-through strategy, data is written to the cache and immediately written to disk. This reduces the risk of data loss, but at the expense of longer latencies.

The back end is the interface between the cache and persistent storage, such as HDDs or SSDs. Like the front end, back ends consist of ports and controllers. Disks are connected to ports, and controllers manage the read and write operations to the disk. Intelligence in the back end includes error detection and correction.

While intelligence is important for optimizing operations within a storage system, to globally optimize storage and compute operations one needs to consider storage in the broader context of the full range of infrastructure that is in place. Today, that includes both on-premises infrastructure and cloud infrastructure.

Intelligent storage in hybrid cloud infrastructure

Potentially the best use of intelligence with storage management is when it is applied to optimize for workloads with applications and across hybrid cloud infrastructure.

Are you interested in learning more about types of intelligent storage system? Contact us today to secure an expert consultation!

Applications depend heavily on databases to manage structured data. Data-aware devices can use data about application-level operations to improve the storage performance of those databases. For example, a data-aware device can use data generated by a database management system to predict which data will be needed in the future. But applications and databases are rarely used in isolation. Instead, they are often parts of larger workflows.

The top level of storage intelligence is the ability to monitor workloads and optimize storage across multiple storage devices by using information collected across compute, networking and storage devices. This level of intelligence can help control cloud storage costs, which are often difficult to manage and too often lead to unexpected charges.

Improving storage performance and cost-effectiveness is no longer a matter of improving hardware. The most important advances are now being driven by intelligent software that spans low-level I/O operations on a single device all the way to monitoring workloads and optimizing data placement across hybrid cloud infrastructure. How intelligent is your storage infrastructure?

Explain the components of Intelligent Storage System and ...

An intelligent storage system consists of four key components: front end, cache, back end, and physical disks. Figure 4-1 illustrates these components and their interconnections. An I/O request received from the host at the front-end port is processed through cache and the back end, to enable storage and retrieval of data from the physical disk. A read request can be serviced directly from cache if the requested data is found in cache.

intelligent storage systems generally fall into one of the following two categories:

  • High-end storage systems.

  • Mid-range storage systems.

Traditionally, high-end storage systems have been implemented with active-active arrays, whereas mid range storage systems used typically in small and medium sized enterprises have been implemented with active passive arrays. Active passive arrays provide optimal storage solutions at lower costs. Enterprises make use of this cost advantage and implement active passive arrays to meet specific application requirements such as performance, availability and scalability. The distinctions between these two implementations are becoming increasingly insignificant.

High end storage systems, referred to as active-active arrays, are generally aimed at large enterprises for centralizing corporate data. These arrays are designed with a large number of controllers and cache memory. An active-active array implies that the host can perform I/Os to its LUNs across any of the available paths (see Figure 4-7).

To address the enterprise storage needs, these arrays provide the following capabilities:

  • Large storage capacity.

  • Large amounts of cache to service host I/Os optimally.

  • Fault tolerance architecture to improve data availability.

  • Connectivity to mainframe computers and open systems hosts.

  • Availability of multiple front-end ports and interface protocols to serve a large number of hosts.

  • Availability of multiple back-end fibre channel or SCSI RAID controllers to manage disk processing.

  • Scalability to support increased connectivity, performance and storage requirements.

  • Ability to handle large amounts of concurrent I/Os from a number of servers and applications.

  • Support for array-based local and remote replication.

In addition to these features, high end arrays possess some unique features and functional's that are required for mission-critical applications in large enterprises.

Mid range storage systems are also referred to as active passive arrays and they are best suited for small and medium sized enterprises, In an active passive array, a host can perform I/Os to a LUN only through the paths to the owning controller of that LUN. These paths are called active paths. The other paths are passive with respect to this LUN, As shown in Figure 4-8, the host can perform reads or writes to the LUN only through the path to controller A, as controller A is the owner of that LUN. The path to controller B remains passive and no I/O activity is performed through this path.

Mid-range storage systems are typically designed with two controllers, each of which contains host interfaces, cache, RAID controllers, and risk drive interfaces.

Mid-range arrays are designed to meet the requirements of small and medium enterprises, therefore, they host less storage capacity and global cache than active-active arrays. There are also fewer front-end ports for connection to servers. However, they ensure high redundancy and high performance for applications with predictable workloads. They also support array-based local and remote replication.

For more information, please visit warehouse with mezzanine.