The Channel Guide to Guaranteeing Storage Performance

Mark ConleyBy Mark Conley

We’re all familiar with the game Tetris. In it, you try to fit the block set in the best possible slot. Fitting customer applications into the capacity and performance slots of a storage cluster is much the same sort of challenge.

Channel partners know that customers want to deploy more applications, more nimbly. They want infrastructure solutions capable of delivering compute, networking and storage predictably and on demand. They want to dramatically raise operational efficiencies, innovate more quickly and respond to application and business challenges faster than ever.

As a trusted adviser, your role in providing this agility can differentiate you from the competition.           

At the heart of delivering infrastructure on demand, and as a service, is the concept of multi-tenancy. While the opportunity to run a broad array of applications within a single system may sound appealing, the reality can be very different. When a large number of performance-sensitive applications are consolidated onto a single platform (traditional or flash), noisy neighbor applications tend to show up and cause resource contention, unpredictable application performance and unhappy customers.

But with countless vendors and high-performance, all-flash solutions available on the market today, how do you, as a VAR, solution provider or system integrator, find the right one to meet your customer’s need for better performance in a multi-tenant setup?

The answer is simple: Quality of service is king.

QoS is a critical enabling technology for delivering consistent primary storage performance to business-critical applications in an enterprise infrastructure. But, simply providing raw performance is often not the only objective in these use cases. For a broad range of business-critical applications, consistent and predictable performance are the more important metrics.

Unfortunately, neither is easily achievable within traditional spinning-disk storage, or even all-flash, arrays. In a multi-tenant environment that is prone to noisy-neighbor issues, it is QoS that provides control over how resources are shared and prevents the most raucous application from disrupting the performance of all the other applications on the same system.

But effective QoS in storage is not very common. Most approaches to storage QoS are “soft” — meaning they are based on simple prioritization of volumes rather than hard guarantees around performance. They are often an afterthought, and as a result, they tend to be effective only as long as the scope of the problem remains small. At scale, they quickly fail. Some of these soft techniques that you, as your clients’ trusted adviser, should be aware of include:                                                                                               

  • Tiered storage can be part of the problem. With multiple tiers of different storage media come different tiers of performance and capacity. But tiering actually amplifies noisy neighbors as they appear hot and are promoted to higher performing, scarcer SSDs, displacing other volumes to lower performing disks.
  • Rate limiting sets a hard limit on an application’s rate of I/O or bandwidth. But this is a one-sided approach that is designed to protect the storage system, rather than deliver guaranteed QoS.
  • Prioritization defines applications as more or less important in relation to one another. While it can give higher relative performance to some apps, it can’t guarantee performance.
  • Hypervisor-based QoS deals more with noisy neighbors than with guaranteeing performance for individual VMs. Because the hypervisor has little visibility into the underlying storage system, it can’t address the core challenges caused by multi-workload environments.
  • Caching reduces contention for a spinning disk. The hottest data is kept in large DRAM or flash-based caches; this method offloads a significant amount of I/O from the disks. But it also causes highly variable latency and makes it impossible to predict the performance of any individual app.
  • Wide striping spreads a single volume across many disks. While this helps balance I/O load across the system, many more applications are now sharing each individual disk. A backlog at any disk causes widespread performance issues, and a single noisy neighbor can ruin the party for everyone.

The limitations of these approaches make it clear: To deliver predictable, guaranteed performance in a multi-tenant environment, QoS has to be a system design goal, considered from the very beginning. Being able to guarantee performance in all situations, including failure scenarios, system overload, variable workloads and elastic demand, requires an architecture built from the ground up specifically to guarantee QoS.

When You Need the Very Best

The right storage architecture can overcome predictability challenges. When a customer needs guaranteed performance, you can deliver by adhering to six core architectural requirements. It may not be an inexpensive setup, but results will be predictably good:

  • An all-SSD architecture. Even if an application doesn’t need the performance of SSD storage, all-flash systems can deliver consistent latency for every I/O. But this is just the beginning, because even an all-flash system can suffer from noisy neighbors and failures.
  • True scale-out architecture. In a traditional scale-up model, a controller is attached to a set of disk shelves. Capacity can be added by adding shelves, but as this happens, contention for controller resources increases, impacting performance. A true scale-out architecture adds controller resources and storage capacity in tandem, thus delivering linear, predictable performance gains as the system scales.
  • RAID-less data protection. RAID causes a significant performance penalty when a disk fails – often 50 percent or more – and rebuild times can take 24 hours or more. RAID-less data protection ensures predictable performance in any failure condition.
  • Balanced load distribution. While today’s storage systems use different techniques, in the end, they often result in uneven load distribution between storage pools, RAID sets and individual disks. If the system can’t even balance the I/O load it has, it can’t guarantee QoS to an individual application as that load changes over time. Only systems with balanced load distributions eliminate hot spots that create unpredictable I/O latency.
  • Fine-grain QoS control. Soft approaches like rate limiting and prioritization may provide a certain level of control, but they fail to ensure application performance in all situations, such as common performance bursts. Fine-grain QoS control guarantees volume performance under any circumstances, eliminating the noisy neighbor problem.
  • Performance virtualization. Modern systems virtualize the underlying raw capacity of their disks. But the performance of those individual volumes is a second-order effect, determined by a number of variables. That prevents storage systems from delivering any specific level of performance. By enabling performance virtualization, systems provide performance control independent from capacity and on demand.                                                        

Your customers’ storage teams are tasked with figuring out how to build a flexible, scalable platform that can support multiple workloads while improving operational efficiency. Up until now, storage admins or their MSP partners have spent the bulk of their time tuning, tweaking, planning and troubleshooting storage performance.

It’s time to put an end to this by providing a solution that not only ensures predictable performance for customers with multi-tenant environments, but also helps them gain a competitive edge. That will enable you to become a trusted partner for enterprises going to the cloud and transitioning to the next-generation data center.

Mark Conley is director of channel sales at NetApp SolidFire. Follow him at @ChannelJoe.

Leave a comment

Your email address will not be published. Required fields are marked *

The ID is: 53192