Cisco: Memory disaggregation is perhaps the most exciting trend in computer system design today

Future of Hardware Blog Series – Blog #2

In my first blog in this three-part series, I discussed the rapid growth of key performance metrics on CPU/GPU cores, memory, network throughput, storage capacity, and peripheral interconnects. These numbers certainly speak to the breadth the IT industry is willing to bring to the problem of supporting the explosion of workloads and applications that we are experiencing with the widespread adoption of cloud technologies, mobile and social. In this second blog in this Future of Hardware series, I talk about how memory upgrades in high-performance computer design are changing the game today.

Simply pouring more resources into difficult and complex challenges willy-nilly can often be an ineffective solution. If resources are not used intelligently and efficiently, it can lead to a lot of waste and excess. A relevant example is internal storage in computer systems. From the early days of computer system design, it was obvious that built-in storage was a required component. It made sense – computers and their applications generate a lot of data, and it has to be stored and retrieved somewhere, right? It would make sense that the best place to host the storage would be as close as possible to the apps themselves. Technological upgrades in high performance computing have certainly been a necessary addition.

But not all computer systems (whether it’s a server or a laptop) are built and used the same. Thus, in an environment with multiple computing systems, for example, a rack or a data center, there was a natural variation in internal storage usage from host to host. Some hosts were running hot and needed storage capacity upgrades. For other hosts, storage use was fallow like a wheat field during the coldest winters. In this case, one of the best solutions was to disaggregate the storage into external pools of disks and arrays, which led to a boom in external storage innovation by hard disk array drives like EMC and NetApp, and later by a number of flash/SSD array vendors. as Pure Storage. Additionally, storage pooling and sharing required other innovations such as intelligent partitioning, snapshotting, and RAID striping. Shared external storage has also simplified data backups and disaster recovery/business continuity, among many other benefits.

The reverse side of disintegrating memory

What if we could do the same with computer memory decoupling to solve the complexities of high-performance computing? As with internal storage, it has always been taken for granted in computer system design that memory should be tightly coupled to CPU architectures. Look at any computer motherboard and you’ll see just how much processors and memory chips tend to take up – of course, this also applies to the systems needed to power the processors and memory chips. This approach has logically impacted the way CPUs (and GPUs) should be designed and installed. It also had an impact on how applications were to be developed and operated.[Link]

But change is on the way. Over the next three to five years, I see an acceleration of memory decoupling from computer systems. And as in the storage example, this memory can be pooled and shared not just by the CPUs/GPUs of a single host, but by many. Also, it will lead to new innovations and approaches in how memory can be used (e.g. memory prioritization). This approach will also address some important shortcomings of the current compute system design with respect to internally attached memory:

  • Compute and memory requirements no longer need to be pre-provisioned, which tends to consume several terabytes just to accommodate growth.

  • Like internal storage, memory usage can vary from host to host, leaving some servers with lots of unused onboard memory that no other system can access.

  • The onboard memory and its supporting components (power, cooling, and electricity) took up valuable space that could otherwise be allocated to other use cases, such as more processing power.

You can easily imagine how avoiding these system design restrictions can eliminate many of the current complexities and limitations of computational operations. A logical result could be more compute-intensive and operationally efficient systems – think fewer physical servers that can support even more workloads and applications than is possible today. Also imagine new ways to scale disaggregated memory – using hot-swappable, pluggable DIMMs, for example, that will free up blades and motherboards for more processing, power and cooling.

This massive shift will not happen overnight and will require hardware, software, silicon and application industry players to come together in an ecosystem to ensure success. There is already a lot of activity to solve memory hierarchy and prioritization challenges – for example, VMware’s Capitola project, Open Compute’s Hierarchical Memory project, as well as a number of startups like MemVerge who are rising to this challenge.

We follow and actively participate in these new trends in high performance computing. In fact, we have already built a future-ready computing platform in the UCS X-Series modular system with an updated converged fabric and a new x-fabric that takes advantage of the disaggregation of memory, storage and networking to enable more efficient and intensive computations. operations. So whatever paths this exciting development takes, we will be ready.

Be sure to stay tuned for my third blog in this upcoming series, where you’ll dive even deeper into modern advances in Cisco’s storage hardware. If you missed the first blog in this series on the future of hardware, check out Today’s advances in computing hardware can power the next generation of “Moonshots.”


Future of Hardware Blog Series: Blog 1: Today’s Advancements in Computer Hardware Can Power the Next Generation of “Moonshots”

To share:

Sherry J. Basler