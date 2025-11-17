"By using MemVerge GISMO with XConn's Apollo switch, we're showcasing software-defined, elastic CXL memory that delivers the performance and flexibility needed to power the next wave of agentic AI and hyperscale inference." Post this

"As AI workloads and model sizes explode, the limiting factor is no longer just GPU count, it's how much memory can be shared, how fast it can be accessed, and how cost-efficiently it can scale," said Gerry Fan, CEO of XConn Technologies. "Our collaboration with MemVerge demonstrates that CXL memory pooling at 100 TiB and beyond is production-ready, not theoretical. This is the architecture that makes large-scale AI inference truly feasible."

To address these challenges, XConn and MemVerge are demonstrating a rack-scale CXL memory pooling solution built around XConn's Apollo hybrid CXL/PCIe switch and MemVerge's Gismo technology, optimized for NVIDIA's Dynamo architecture and NIXL software stack. The demo showcases how AI inference workloads can offload and share massive KV cache resources dynamically across GPUs and CPUs, achieving greater than 5× performance improvements compared with SSD-based caching or RMDA-based KV cache offloading, while reducing total cost of ownership. The demo particularly shows a scalable memory architecture for AI inference workloads where there is a disaggregation of prefill and decode work stages.

"Memory has become the new frontier of AI infrastructure innovation," said Charles Fan, CEO and co-founder of MemVerge. "By using MemVerge GISMO with XConn's Apollo switch, we're showcasing software-defined, elastic CXL memory that delivers the performance and flexibility needed to power the next wave of agentic AI and hyperscale inference. Together, we're redefining how memory is provisioned and utilized in AI data centers."

As AI becomes increasingly data-centric and memory-bound, rather than compute-bound, traditional server architectures can no longer keep up. CXL memory pooling addresses these limitations by enabling dynamic, low-latency memory sharing across CPUs, GPUs, and accelerators. It scales up to hundreds of terabytes of shared memory, reduces TCO through better utilization, reduces over-provisioning and enhances throughput for inference-first workloads, generative AI, real-time analytics, and in-memory databases.

SC25 attendees can experience the joint demo featuring a CXL memory pool dynamically shared across CPUs and GPUs, with inferencing benchmarks illustrating significant performance and efficiency gains for KV cache offload and AI model execution. For more details about SC25 and to register, visit https://sc25.supercomputing.org.

XConn Technologies Holdings, Inc. (XConn) is the innovation leader in next-generation interconnect technology for high-performance computing and AI applications. The company is the industry's first to deliver a hybrid switch supporting both CXL and PCIe on a single chip. Privately funded, XConn is setting the benchmark for data center interconnect with scalability, flexibility, and performance. For more information visit: xconn-tech.com.

MemVerge is a leading provider of AI memory software. MemVerge solutions help enterprises stand up long term memory for their agentic AI initiatives, and help AI data centers improve performance and efficiency by expanding and sharing memories between GPUs. For more information about MemVerge software, please visit memverge.ai.

