Sponsored by:

Visit AMD Visit Supermicro

Performance Intensive Computing

Capture the full potential of IT

AMD intros CPUs, accelerators, networking for end-to-end AI infrastructure -- and Supermicro supports

Featured content

AMD intros CPUs, accelerators, networking for end-to-end AI infrastructure -- and Supermicro supports

AMD expanded its end-to-end AI infrastructure products for data centers with new CPUs, accelerators and network controllers. And Supermicro is already offering supporting servers. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

AMD today held a roughly two-hour conference in San Francisco during which CEO Lisa Su and other executives introduced a new generation of server processors, the next model in the Instinct MI300 Accelerator family, and new data-center networking devices.

As CEO Su told the audience the live and online audience, AMD is committed to offering end-to-end AI infrastructure products and solutions in an open, partner-dependent ecosystem.

Su further explained that AMD’s new AI strategy has 4 main goals:

  • Become the leader in end-to-end AI
  • Create an open AI software platform of libraries and models
  • Co-innovate with partners including cloud providers, OEMs and software creators
  • Offer all the pieces needed for a total AI solution, all the way from chips to racks to clusters and even entire data centers.

And here’s a look at the new data-center hardware AMD announced today.

5th Gen AMD EPYC CPUs

The EPYC line, originally launched in 2017, has become a big success for AMD. As Su told the event audience, there are now more than 950 EPYC instances at the largest cloud providers; also, AMD hardware partners now offer EPYC processors on more than 350 platforms. Market share is up, too: Nearly one in three servers worldwide (34%) now run on EPYC, Su said.

The new EPYC processors, formerly codenamed Turin and now known as the AMD EPYC 9005 Series, are now available for data center, AI and cloud customers.

The new CPUs also have a new core architecture known as Zen5. AMD says Zen5 outperforms the previous Zen4 generation by 17% on enterprise instructions-per-clock and up to 37% on AI and HPC workloads.

The new 5th Gen line has over 25 SKUs, and core count ranges widely, from as few as 8 to as many as 192. For example, the new AMD EPYC 9575F is a 65-core, 5GHz CPU designed specifically for GPU-powered AI solutions.

AMD Instinct MI325X Accelerator

About a year ago, AMD introduced the Instinct MI300 Accelerators, and since then the company committed itself to introducing new models on a yearly cadence. Sure enough, today Lisa Su introduced the newest model, the AMD Instinct MI325X Accelerator.

Designed for Generative AI performance and built on the AMD CDNA3 architecture, the new accelerator offers up to 256GB of HBM3E memory, and bandwidth up to 6TB/sec.

Shipments of the MI325X are set to begin in this year’s fourth quarter. Partner systems with the new AMD accelerator are expected to start shipping in next year’s first quarter.

Su also mentioned the next model in the line, the AMD Instinct MI350, which will offer up to 288GB of HBM3E memory. It’s set to be formally announced in the second half of next year.

Networking Devices

Forrest Norrod, AMD’s head of data-center solutions, introduced two networking devices designed for data centers running AI workloads.

The AMD Pensando Salina DPU is designed for front-end connectivity. It supports thruput of up to 400 Gbps.

The AMD Pensando Pollara 400, designed for back-end networks connecting multiple GPUs, is the industry’s first Ultra-Ethernet Consortium-ready AI NIC.

Both parts are sampling with customers now, and AMD expects to start general shipments in next year’s first half.

Both devices are needed, Norrod said, because AI dramatically raises networking demands. He cited studies showing that connectivity currently accounts for 40% to 75% of the time needed to run certain AI training and inference models.

Supermicro Support

Supermicro is among the AMD partners already ready with systems based on the new AMD processors and accelerator.

Wasting no time, Supermicro today announced new H14 series servers, including both Hyper and FlexTwin systems, that support the 5th gen AMD 9005 EPYC processors and AMD Instinct MI325X Accelerators.

The Supermicro H14 family includes three systems for AI training and inference workloads. Supermicro says the systems can also accommodate the higher thermal requirements of the new AMD EPYC processors, which are rated at up to 500W. Liquid cooling is an option, too.

Do More:

 

Featured videos


Follow


Related Content

The AMD Instinct MI300X Accelerator draws top marks from leading AI benchmark

Featured content

The AMD Instinct MI300X Accelerator draws top marks from leading AI benchmark

In the latest MLPerf testing, the AMD Instinct MI300X Accelerator with ROCm software stack beat the competition with strong GenAI inference performance. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

New benchmarks using the AMD Instinct MI300X Accelerator show impressive performance that surpasses the competition.

This is great news for customers operating demanding AI workloads, especially those underpinned by large language models (LLMs) that require super-low latency.

Initial platform tests using MLPerf Inference v4.1 measured AMD’s flagship accelerator against the Llama 2 70B benchmark. This test is an indication for real-world applications, including natural language processing (NLP) and large-scale inferencing.

MLPerf is the industry’s leading benchmarking suite for measuring the performance of machine learning and AI workloads from domains that include vision, speech and NLP. It offers a set of open-source AI benchmarks, including rigorous tests focused on Generative AI and LLMs.

Gaining high marks from the MLPerf Inference benchmarking suite represents a significant milestone for AMD. It positions the AMD Instinct MI300X accelerator as a go-to solution for enterprise-level AI workloads.

Superior Instincts

The results of the LLaMA2-70B test are particularly significant. That’s due to the benchmark’s ability to produce an apples-to-apples comparison of competitive solutions.

In this benchmark, the AMD Instinct MI300X was compared with NVIDIA’s H100 Tensor Core GPU. The test concluded that AMD’s full-stack inference platform was better than the H100 at achieving high-performance LLMs, a workload that requires both robust parallel computing and a well-optimized software stack.

The testing also showed that because the AMD Instinct MI300X offers the largest GPU memory available—192GB of HBM3 memory—it was able to fit the entire LLaMA2-70B model into memory. Doing so helped to avoid network overhead by preventing model splitting. This, in turn, maximized inference throughput, producing superior results.

Software also played a big part in the success of the AMD Instinct series. The AMD ROCm software platform accompanies the AMD Instinct MI300X. This open software stack includes programming models, tools, compilers, libraries and runtimes for AI solution development on the AMD Instinct MI300 accelerator series and other AMD GPUs.

The testing showed that the scaling efficiency from a single AMD Instinct MI300X, combined with the ROCm software stack, to a complement of eight AMD Instinct accelerators was nearly linear. In other words, the system’s performance improved proportionally by adding more GPUs.

That test demonstrated the AMD Instinct MI300X’s ability to handle the largest MLPerf inference models to date, containing over 70 billion parameters.

Thinking Inside the Box

Benchmarking the AMD Instinct MI300X required AMD to create a complete hardware platform capable of addressing strenuous AI workloads. For this task, AMD engineers chose as their testbed the Supermicro AS -8125GS-TNMR2, a massive 8U complete system.

Supermicro’s GPU A+ Client Systems are designed for both versatility and redundancy. Designers can outfit the system with an impressive array of hardware, starting with two AMD EPYC 9004-series processors and up to 6TB of ECC DDR5 main memory.

Because AI workloads consume massive amounts of storage, Supermicro has also outfitted this 8U server with 12 front hot-swap 2.5-inch NVMe drive bays. There’s also the option to add four more drives via an additional storage controller.

The Supermicro AS -8125GS-TNMR2 also includes room for two hot-swap 2.5-inch SATA bays and two M.2 drives, each with a capacity of up to 3.84TB.

Power for all those components is delivered courtesy of six 3,000-watt redundant titanium-level power supplies.

Coming Soon: Even More AI power

AMD engineers continually push the limits of silicon and human ingenuity to expand the capabilities of their hardware. So it should come as little surprise that new iterations of the AMD Instinct series are expected to be released in the coming months. This past May, AMD officials said they plan to introduce AMD Instinct MI325, MI350 and MI400 accelerators.

Forthcoming Instinct accelerators, AMD says, will deliver advances including additional memory, support for lower-precision data types, and increased compute power.

New features are also coming to the AMD ROCm software stack. Those changes should include software enhancements including kernel improvements and advanced quantization support.

Are you customers looking for a high-powered, low-latency system to run their most demanding HPC and AI workloads? Tell them about these benchmarks and the AMD Instinct MI300X accelerators.

Do More:

 

Featured videos


Follow


Related Content

Developing AI and HPC solutions? Check out the new AMD ROCm 6.2 release

Featured content

Developing AI and HPC solutions? Check out the new AMD ROCm 6.2 release

The latest release of AMD’s free and open software stack for developing AI and HPC solutions delivers 5 important enhancements. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

If you develop AI and HPC solutions, you’ll want to know about the most recent release of AMD ROCm software, version 6.2.

ROCm, in case you’re unfamiliar with it, is AMD’s free and open software stack. It’s aimed at developers of artificial intelligence and high-performance computing (HPC) solutions on AMD Instinct accelerators. It's also great for developing AI and HPC solutions on AMD Instinct-powered servers from Supermicro. 

First introduced in 2016, ROCm open software now includes programming models, tools, compilers, libraries, runtimes and APIs for GPU programming.

ROCm version 6.2, announced recently by AMD, delivers 5 key enhancements:

  • Improved vLLM support 
  • Boosted memory efficiency & performance with Bitsandbytes
  • New Offline Installer Creator
  • New Omnitrace & Omniperf Profiler Tools (beta)
  • Broader FP8 support

Let’s look at each separately and in more detail.

LLM support

To enhance the efficiency and scalability of its Instinct accelerators, AMD is expanding vLLM support. vLLM is an easy-to-use library for the large language models (LLMs) that power Generative AI.

ROCm 6.2 lets AMD Instinct developers integrate vLLM into their AI pipelines. The benefits include improved performance and efficiency.

Bitsandbytes

Developers can now integrate Bitsandbytes with ROCm for AI model training and inference, reducing their memory and hardware requirements on AMD Instinct accelerators. 

Bitsandbytes is an open source Python library that enables LLMs while boosting memory efficiency and performance. AMD says this will let AI developers work with larger models on limited hardware, broadening access, saving costs and expanding opportunities for innovation.

Offline Installer Creator

The new ROCm Offline Installer Creator aims to simplify the installation process. This tool creates a single installer file that includes all necessary dependencies.

That makes deployment straightforward with a user-friendly GUI that allows easy selection of ROCm components and versions.

As the name implies, the Offline Installer Creator can be used on developer systems that lack internet access.

Omnitrace and Omniperf Profiler

The new Omnitrace and Omniperf Profiler Tools, both now in beta release, provide comprehensive performance analysis and a streamlined development workflow.

Omnitrace offers a holistic view of system performance across CPUs, GPUs, NICs and network fabrics. This helps developers ID and address bottlenecks.

Omniperf delivers detailed GPU kernel analysis for fine-tuning.

Together, these tools help to ensure efficient use of developer resources, leading to faster AI training, AI inference and HPC simulations.

FP8 Support

Broader FP8 support can improve the performance of AI inferencing.

FP8 is an 8-bit floating point format that provides a common, interchangeable format for both AI training and inference. It lets AI models operate and perform consistently across hardware platforms.

In ROCm, FP8 support improves the process of running AI models, particularly in inferencing. It does this by addressing key challenges such as the memory bottlenecks and high latency associated with higher-precision formats. In addition, FP8's reduced precision calculations can decrease the latency involved in data transfers and computations, losing little to no accuracy.  

ROCm 6.2 expands FP8 support across its ecosystem, from frameworks to libraries and more, enhancing performance and efficiency.

Do More:

Watch the related video podcast:

Featured videos


Follow


Related Content

Research Roundup, AI Edition: platform power, mixed signals on GenAI, smarter PCs

Featured content

Research Roundup, AI Edition: platform power, mixed signals on GenAI, smarter PCs

Catch the latest AI insights from leading researchers and market analysts.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Sales of artificial intelligence platform software show no sign of a slowdown. The road to true Generative AI disruption could be bumpy. And PCs with built-in AI capabilities are starting to sell.

That’s some of the latest AI insights from leading market researchers, analysts and pollsters. And here’s your research roundup.

AI Platforms Maintain Momentum

Is the excitement around AI overblown? Not at all, says market watcher IDC.

“The AI platforms market shows no sign of slowing down,” says IDC VP Ritu Jyoti.

IDC now believes that the market for AI platform software will maintain its momentum through at least 2028.

By that year, IDC expects, worldwide revenue for AI software will reach $153 billion. If so, that would mark a five-year compound annual growth rate (CAGR) of nearly 41%.

The market really got underway last year. That’s when worldwide AI platform software revenue hit $27.9 billion, an annual increase of 44%, IDC says.

Since then, lots of progress has been made. Fully half the organizations now deploying GenAI in production have already selected an AI platform. And IDC says most of the rest will do so in the next six months.

All that has AI software suppliers looking pretty smart.

Mixed Signals on GenAI

There’s no question that GenAI is having a huge impact. The question is how difficult it will be for GenAI-using organizations to achieve their desired results.

GenAI use is already widespread. In a global survey conducted earlier this year by management consultants McKinsey & Co., 65% of respondents said they use GenAI on a regular basis.

That was nearly double the percentage from McKinsey’s previous survey, conducted just 10 months earlier.

Also, three quarters of McKinsey’s respondents said they expect GenAI will lead their industries to significant or disruptive changes.

However, the road to GenAI could be bumpy. Separately, researchers at Gartner are predicting that by the end of 2025, at least 30% of all GenAI projects will be abandoned after their proof-of-concept (PoC). 

The reason? Gartner points to several factors: poor data quality, inadequate risk controls, unclear business value, and escalating costs.

“Executives are impatient to see returns on GenAI investments,” says Gartner VP Rita Sallam. “Yet organizations are struggling to prove and realize value.”

One big challenge: Many organizations investing in GenAI want productivity enhancements. But as Gartner points out, those gains can be difficult to quantify.

Further, implementing GenAI is far from cheap. Gartner’s research finds that a typical GenAI deployment costs anywhere from $5 million to $20 million.

That wide range of costs is due to several factors. These include the use cases involved, the deployment approaches used, and whether an organization seeks to be a market disruptor.

Clearly, an intelligent approach to GenAI can be a money-saver.

PCs with AI? Yes, Please

Leading PC makers hope to boost their hardware sales by offering new, built-in AI capabilities. It seems to be working.

In the second quarter of this year, 8.8 million PCs—that’s 14% of all shipped globally in the quarter—were AI-capable, says market analysts Canalys.

Canalys defines “AI-capable” pretty simply: It’s any desktop or notebook system that includes a chipset or block for one or more dedicated AI workloads.

By operating system, nearly 40% of the AI-capable PC shipped in Q2 were Windows systems, 60% were Apple macOS systems, and just 1% ran ChromeOS, Canalys says.

For the full year 2024, Canalys expects some 44 million AI-capable PCs to be shipped worldwide. In 2025, the market watcher predicts, these shipments should more than double, rising to 103 million units worldwide. There's nothing artificial about that boost.

Do more:

 

Featured videos


Follow


Related Content

Why Lamini offers LLM tuning software on Supermicro servers powered by AMD processors

Featured content

Why Lamini offers LLM tuning software on Supermicro servers powered by AMD processors

Lamini, provider of an LLM platform for developers, turns to Supermicro’s high-performance servers powered by AMD CPUs and GPUs to run its new Memory Tuning stack.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Generative AI systems powered by large language models (LLMs) have a serious problem: Their answers can be inaccurate—and sometimes, in the case of AI “hallucinations,” even fictional.

For users, the challenge is equally serious: How do you get precise factual accuracy—that is, correct answers with zero hallucinations—while upholding the generalization capabilities that make LLMs so valuable?

A California-based company, Lamini, has come up with an innovative solution. And its software stack runs on Supermicro servers powered by AMD CPUs and GPUs.

Why Hallucinations Happen

Here’s the premise underlying Lamini’s solution: Hallucinations happen because the right answer is clustered with other, incorrect answers. As a result, the model doesn’t know that a nearly right answer is in fact wrong.

To address this issue, Lamini’s Memory Tuning solution teaches the model that getting the answer nearly right is the same as getting it completely wrong. Its software does this by tuning literally millions of expert adapters with precise facts on top of any open-source LLM, such as Llama 3 or Mistral 3.

The Lamini model retrieves only the most relevant experts from an index at inference time. The goal is high accuracy, high speed and low cost.

More than Fine-Tuning

Isn’t this just LLM fine-tuning? Lamini says no, its Memory Tuning is fundamentally different.

Fine-tuning can’t ensure that a model’s answers are faithful to the facts in its training data. By contrast, Lamini says, its solution has been designed to deliver output probabilities that are not just close, but exactly right.

More specifically, Lamini promises its solution can deliver 95% LLM accuracy with 10x fewer hallucinations.

In the real world, Lamini says one large customer used its solution and raised LLM accuracy from 50% to 95%, and reduced the rate of AI hallucinations from an unreliable 50% to just 5%.

Investors are certainly impressed. Earlier this year Lamini raised $25 million from an investment group that included Amplify Partners, Bernard Arnault and AMD Ventures. Lamini plans to use the funding to accelerate its expert AI development and expand its cloud infrastructure.

Supermicro Solution

As part of its push to offer superior LLM tuning, Lamini chose Supermicro’s GPU server — model number AS -8125S-TNMR2 — to train LLM models in a reasonable time.

This Supermicro 8U system is powered by dual AMD EPYC 9000 series CPUs and eight AMD Instinct MI300X GPUs.

The GPUs connect with CPUs via a standard PCIe 5 bus. This gives fast access when the CPU issues commands or sends data from host memory to the GPUs.

Lamini has also benefited from Supermicro’s capacity and quick delivery schedule. With other GPUs makers facing serious capacity issues, that’s an important benefit for both Lamini and its customers.

“We’re thrilled to be working with Supermicro,” says Lamini co-founder and CEO Sharon Zhou.

Could your customers be thrilled by Lamini, too? Check out the “do more” links below.

Do More:

 

Featured videos


Follow


Related Content

Why CSPs Need Hyperscaling

Featured content

Why CSPs Need Hyperscaling

Today’s cloud service providers need IT infrastructures that can scale like never before.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Hyperscaling IT infrastructure may be one of the toughest challenges facing cloud service providers (CSPs) today.

The term hyperscale refers to an IT architecture’s ability to scale in response to increased demand.

Hyperscaling is tricky, in large part because demand is a constantly moving target. Without much warning, a data center’s IT demand can increase exponentially due to a myriad of factors.

That could mean a public emergency, the failure of another CSP’s infrastructure, or simply the rampant proliferation of data—a common feature of today’s AI environment.

To meet this growing demand, CSPs have a lot to manage. That includes storage measured in exabytes, AI workloads of massive complexity, and whatever hardware is needed to keep system uptime as close to 100% as possible.

The hardware alone can be a real challenge. CSPs now oversee both air- and liquid-powered cooling systems, redundant power sources, diverse networking gear, and miles of copper and fiber-optic cabling. It’s a real handful.

Design with CSPs in Mind

To help CSPs cope with this seemingly overwhelming complexity, Supermicro offers purpose-built hardware designed to tackle the world’s most demanding workloads.

Enterprise-class servers like Supermicro’s H13 and A+ server series offer CSPs powerful platforms built to handle the rigors of resource-intensive AI workloads. They’ve been designed to scale quickly and efficiently as demand and data inevitably increase.

Take the Supermicro GrandTwin. This innovative solution puts the power and flexibility of multiple independent servers in a single enclosure.

The design helps lower operating expenses by enabling shared resources, including a space-saving 2U enclosure, heavy-duty cooling system, backplane and N+1 power supplies.

To help CSPs tackle the world’s most demanding AI workloads, Supermicro offers GPU server systems. These include a massive—and massively powerful—8U eight-GPU server.

Supermicro H13 GPU servers are powered by 4th-generation AMD EPYC processors. These cutting-edge chips are engineered to help high-end applications perform better and return faster.

To make good on those lofty promises, AMD included more and faster cores, higher bandwidth to GPUs and other devices, and the ability to address vast amounts of memory.

Theory Put to Practice

Capable and reliable hardware is a vital component for every modern CSP, but it’s not the only one. IT infrastructure architects must consider not just their present data center requirements but how to build a bridge to the requirements they’ll face tomorrow.

To help build that bridge, Supermicro offers an invaluable list: 10 essential steps for scaling the CSP data center.

A few highlights include:

  • Standardize and scale: Supermicro suggests CSPs standardize around a preferred configuration that offers the best compute, storage and networking capabilities.
  • Plan ahead for support: To operate a sophisticated data center 24/7 is to embrace the inevitability of technical issues. IT managers can minimize disruption and downtime when some-thing goes wrong by choosing a support partner who can solve problems quickly and efficiently.
  • Simplify your supply chain: Hyperscaling means maintaining the ability to move new infra-structure into place fast and without disruption. CSPs can stack the odds in their favor by choosing a partner that is ever ready to deliver solutions that are integrated, validated, and ready to work on day one.

Do More:

Hyperscaling for CSPs will be the focus of a session at the upcoming Supermicro Open Storage Summit ‘24, which streams live Aug. 13 - Aug. 29.

The CSP session, set for Aug. 20, will cover the ways in which CSPs can seamlessly scale their AI operations across thousands of GPUs while ensuring industry-leading reliability, security and compliance capabilities. The speakers will feature representatives from Supermicro, AMD, Vast Data and Solidigm.

Learn more and register now to attend the 2024 Supermicro Open Storage Summit.

 

Featured videos


Follow


Related Content

HBM: Your memory solution for AI & HPC

Featured content

HBM: Your memory solution for AI & HPC

High-bandwidth memory shortens the information commute to keep pace with today’s powerful GPUs.

Learn More about this topic
  • Applications:
  • Featured Technologies:

As AI powered by GPUs transforms computing, conventional DDR memory can’t keep up.

The solution? High-bandwidth memory (HBM).

HBM is memory chip technology that essentially shortens the information commute. It does this using ultra-wide communication lanes.

An HBM device contains vertically stacked memory chips. They’re interconnected by microscopic wires known as through-silicon vias, or TSVs for short.

HBM also provides more bandwidth per watt. And, with a smaller footprint, the technology can also save valuable data-center space.

Here’s how: A single HBM stack can contain up to eight DRAM modules, with each module connected by two channels. This makes an HBM implementation of just four chips roughly equivalent to 30 DDR modules, and in a fraction of the space.

All this makes HBM ideal for workloads that utilize AI and machine learning, HPC, advanced graphics and data analytics.

Latest & Greatest

The latest iteration, HBM3, was introduced in 2022, and it’s now finding wide application in market-ready systems.

Compared with the previous version, HBM3 adds several enhancements:

  • Higher bandwidth: Up to 819 GB/sec., up from HBM2’s max of 460 GB/sec.
  • More memory capacity: 24GB per stack, up from HBM2’s 8GB
  • Improved power efficiency: Delivering more data throughput per watt
  • Reduced form factor: Thanks to a more compact design

However, it’s not all sunshine and rainbows. For one, HBM-equipped systems are more expensive than those fitted out with traditional memory solutions.

Also, HBM stacks generate considerable heat. Advanced cooling systems are often needed, adding further complexity and cost.

Compatibility is yet another challenge. Systems must be designed or adapted to HBM3’s unique interface and form factor.

In the Market

As mentioned above, HBM3 is showing up in new products. That very definitely includes both the AMD Instinct MI300A and MI300X series accelerators.

The AMD Instinct MI300A accelerator combines a CPU and GPU for running HPC/AI workloads. It offers HBM3 as the dedicated memory with a unified capacity of up to 128GB.

Similarly, the AMD Instinct MI300X is a GPU-only accelerator designed for low-latency AI processing. It contains HBM3 as the dedicated memory, but with a higher capacity of up to 192GB.

For both of these AMD Instinct MI300 accelerators, the peak theoretical memory bandwidth is a speedy 5.3TB/sec.

The AMD Instinct MI300X is also the main processor in Supermicro’s AS -8125GS-TNMR2, an H13 8U 8-GPU system. This system offers a huge 1.5TB of HBM3 memory in single-server mode, and an even huger 6.144TB at rack scale.

Are your customers running AI with fast GPUs, only to have their systems held back by conventional memory? Tell them to check out HBM.

Do More:

 

Featured videos


Follow


Related Content

Tech Explainer: What is CXL — and how can it help you lower data-center latency?

Featured content

Tech Explainer: What is CXL — and how can it help you lower data-center latency?

High latency is a data-center manager’s worst nightmare. Help is here from an open-source solution known as CXL. It works by maintaining “memory coherence” between the CPU’s memory and memory on attached devices.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Latency is a crucial measure for every data center. Because latency measures the time it takes for data to travel from one point in a system or network to another, lower is generally better. A network with high latency has slower response times—not good.

Fortunately, the industry has come up with an open-source solution that provides a low-latency link between processors, accelerators and memory devices such as RAM and SSD storage. It’s known as Compute Express Link, or CXL for short.

CXL is designed to solve a couple of common problems. Once a processor uses up the capacity of its direct-attached memory, it relies on an SSD. This introduces a three-order-of-magnitude latency gap that can hurt both performance and total cost of ownership (TCO).

Another problem is that multicore processors are starving for memory bandwidth. This has become an issue because processors have been scaling in terms of cores and frequencies faster than their main memory channels. The resulting deficit leads to suboptimal use of the additional processor cores, as the cores have to wait for data.

CXL overcomes these issues by introducing a low-latency, memory cache coherent interconnect. CXL works for processors, memory expansion and AI accelerators such as the AMD Instinct MI300 series. The interconnect provides more bandwidth and capacity to processors, which increases efficiency and enables data-center operators to get more value from their existing infrastructure.

Cache-coherence refers to IT architecture in which multiple processor cores share the same memory hierarchy, yet retain individual L1 caches. The CXL interconnect reduces latency and increases performance throughout the data center.

The latest iteration of CXL, version 3.1, adds features to help data centers keep up with high-performance computational workloads. Notable upgrades include new peer-to-peer direct memory access, enhancements to memory pooling, and CXL Fabric improvements.

3 Ways to CXL

Today, there are three main types of CXL devices:

  • Type 1: Any device without integrated local memory. CXL protocols enable these devices to communicate and transfer memory capacity from the host processor.
  • Type 2: These devices include integrated memory, but also share CPU memory. They leverage CXL to enable coherent memory-sharing between the CPU and the CXL device.
  • Type 3: A class of devices designed to augment existing CPU memory. CXL enables the CPU to access external sources for increased bandwidth and reduced latency.

Hardware Support

As data-center architectures evolve, more hardware manufacturers are supporting CXL devices. One such example is Supermicro’s All-Flash EDSFF and NVM3 servers.

Supermicro’s cutting-edge appliances are optimized for resource-intensive workloads, including data-center infrastructure, data warehousing, hyperscale/hyperconverged and software-defined storage. To facilitate these workloads, Supermicro has included support for up to eight CXL 2.0 devices for advanced memory-pool sharing.

Of course, CXL can be utilized only on server platforms designed to support communication between the CPU, memory and CXL devices. That’s why CXL is built into the 4th gen AMD EPYC server processors.

These AMD EPYC processors include up to 96 ‘Zen 4’ 5nm cores. Each core includes 32MB per CCD of L3 cache, as well as up to 12 DDR5 channels supporting as much as 12TB of memory.

CXL memory expansion is built into the AMD EPYC platform. That makes these CPUs ideally suited for advanced AI and GenAI workloads.

Crucially, AMD also includes 256-bit AES-XTS and secure multikey encryption. This enables hypervisors to encrypt address space ranges on CXL-attached memory.

The Near Future of CXL

Like many add-on devices, CXL devices are often connected via the PCI Express (PCIe) bus. However, implementing CXL over PCIe 5.0 in large data centers has some drawbacks.

Chief among them is the way its memory pools remain isolated from each other. This adds latency and hampers significant resource-sharing.

The next generation of PCIe, version 6.0, is coming soon and will offer a solution. CXL for PCIe6.0 will offer twice as much throughput as PCIe 5.0.

The new PCIe standard will also add new memory-sharing functionality within the transaction layer. This will help reduce system latency and improve accelerator performance.

CXL is also leading to the start of disaggregated computing. There, resources that reside in different physical enclosures can be available to several applications.

Are your customers suffering from too much latency? The solution could be CXL.

Do More:

 

 

Featured videos


Follow


Related Content

At Computex, AMD & Supermicro CEOs describe AI advances you’ll be adopting soon

Featured content

At Computex, AMD & Supermicro CEOs describe AI advances you’ll be adopting soon

At Computex Taiwan, Lisa Su of AMD and Charles Liang of Supermicro delivered keynotes that focused on AI, liquid cooling and energy efficiency.

Learn More about this topic
  • Applications:
  • Featured Technologies:

The chief executives of both AMD and Supermicro used their Computex keynote addresses to describe their companies’ AI products and, in the case of AMD, pre-announce important forthcoming products.

Computex 2024 was held this past week in Taipei, Taiwan, with the conference theme of “connecting AI.” Exhibitors featured some 1,500 companies from around the world, and keynotes were delivered by some of the IT industry’s top executives.

That included Lisa Su, chairman and CEO of AMD, and Charles Liang, founder and CEO of Supermicro. Here's some of what they previewed at Computex 2024

Lisa Su, AMD: Top priority is AI

Su of AMD presented one of this Computex’s first keynotes. Anyone who thought she might discuss topics other than AI was quickly set straight.

“AI is our number one priority,” Su told the crowd. “We’re at the beginning of an incredibly exciting time for the industry as AI transforms virtually every business, improves our quality of life, and reshapes every part of the computing market.”

AMD intends to lead in AI solutions by focusing on three priorities, she added: delivering a broad portfolio of high-performance, energy-efficient compute engines (including CPUs, GPUs and NPUs); enabling an open and developer-friendly ecosystem; and co-innovating with partners.

The latter point was supported during Su’s keynote by brief visits from several partner leaders. They included Pavan Dhavulari, corporate VP of Windows devices at Microsoft; Christian Laforte, CTO of Stability AI; and (via a video link) Microsoft CEO Satya Nadella.

Fairly late in Su’s hour-plus keynote, she held up AMD’s forthcoming 5th gen EPYC server processor, codenamed Turin. It’s scheduled to ship by year’s end.

As Su explained, Turin will feature up to 192 cores and 384 threads, up from the current generation’s max of 128 cores and 256 threads. Turin will contain 13 chiplets built in both 3-nm and 6-nm processor technology. Yet it will be available as a drop-in replacement for existing EPYC platforms, Su said.

Turin processors will use AMD’s new ‘Zen5’ cores, which Su also announced at Computex. She described AMD’s ‘Zen5’ as “the highest performance and most energy-efficient core we’ve ever built.”

Su also discussed AMD’s MI3xx family of accelerators. The MI300, introduced this past December, has become the fastest ramping product in AMD’s history, she said. Microsoft’s Nadella, during his short presentation, bragged that his company’s cloud was the first to deliver general availability of virtual machines using the AMD MI300X accelerator.

Looking ahead, Su discussed three forthcoming Instinct accelerators on AMD’s road map: The MI325, MI350 and MI400 series.

The AMD Instinct MI325, set to launch later this year, will feature more memory (up to 288GB) and higher memory bandwidth (6TB/sec.) than the MI300. But the new component will still use the same infrastructure as the MI300, making it easy for customers to upgrade.

The next series, MI350, is set for launch next year, Su said. It will then use AMD’s new CDNA4 architecture, which Su said “will deliver the biggest generational AI leap in our history.” The MI350 will be built on 3nm process technology, but will still offer a drop-in upgrade from both the MI300 and MI325.

The last of the three, the MI400 series, is set to start shipping in 2026. That’s also when AMD will deliver a new generation of CDNA, according to Su.

Both the MI325 and MI350 series will leverage the same industry standard universal baseboard OCP server design used by MI300. Su added: “What that means is, our customers can adopt this new technology very quickly.”

Charles Liang, Supermicro: Liquid cooling is the AI future

Liang dedicated his Computex keynote to the topics of liquid cooling and “green” computing.

“Together with our partners,” he said, “we are on a mission to build the most sustainable data centers.”

Liang predicted a big change from the present, where direct liquid cooling (DLC) has a less-than-1% share of the data center market. Supermicro is targeting 15% of new data center deployments in the next year, and Liang hopes that will hit 30% in the next two years.

Driving this shift, he added, are several trends. One, of course, is the huge uptake of AI, which requires high-capacity computing.

Another is the improvement of DLC technology itself. Where DLC system installations used to take 4 to 12 months, Supermicro is now doing them in just 2 to 4 weeks, Liang said. Where liquid cooling used to be quite expensive, now—when TCO and energy savings are factored in—“DLC can be free, with a big bonus,” he said. And where DLC systems used to be unreliable, now they are high performing with excellent uptime.

Supermicro now has capacity to ship 1,000 rack scale solutions with liquid cooling per month, Liang said. In fact, the company is shipping over 50 liquid-cooled racks per day, with installations typically completed within just 2 weeks.

“DLC,” Liang said, “is the wave of the future.”

Do more:

 

Featured videos


Follow


Related Content

Research Roundup: AI edition

Featured content

Research Roundup: AI edition

Catch up on the latest research and analysis around artificial intelligence.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Generative AI is the No. 1 AI solution being deployed. Three in 4 knowledge workers are already using AI. The supply of workers with AI skills can’t meet the demand. And supply chains can be helped by AI, too.

Here’s your roundup of the latest in AI research and analysis.

GenAI is No. 1

Generative AI isn’t just a good idea, it’s now the No. 1 type of AI solution being deployed.

In a survey recently conducted by research and analysis firm Gartner, more than a quarter of respondents (29%) said they’ve deployed and are now using GenAI.

That was a higher percentage than any other type of AI in the survey, including natural language processing, machine learning and rule-based systems.

The most common way of using GenAI, the survey found, is embedding it in existing applications. For example, using Microsoft Copilot for 365. This was cited by about 1 in 3 respondents (34%).

Other approaches mentioned by respondents included prompt engineering (cited by 25%), fine-tuning (21%) and using standalone tools such as ChatGPT (19%).

Yet respondents said only about half of their AI projects (48%) make it into production. Even when that happens, it’s slow. Moving an AI project from prototype to production took respondents an average of 8 months.

Other challenges loom, too. Nearly half the respondents (49%) said it’s difficult to estimate and demonstrate an AI project’s value. They also cited a lack of talent and skills (42%), lack of confidence in AI technology (40%) and lack of data (39%).

Gartner conducted the survey in last year’s fourth quarter and released the results earlier this month. In all, valid responses were culled from 644 executives working for organizations in the United States, the UK and Germany.

AI ‘gets real’ at work

Three in 4 knowledge workers (75%) now use AI at work, according to the 2024 Work Trend Index, a joint project of Microsoft and LinkedIn.

Among these users, nearly 8 in 10 (78%) are bringing their own AI tools to work. That’s inspired a new acronym: BYOAI, short for Bring Your Own AI.

“2024 is the year AI at work gets real,” the Work Trend report says.

2024 is also a year of real challenges. Like the Gartner survey, the Work Trend report finds that demonstrating AI’s value can be tough.

In the Microsoft/LinkedIn survey, nearly 8 in 10 leaders agreed that adopting AI is critical to staying competitive. Yet nearly 6 in 10 said they worry about quantifying the technology’s productivity gains. About the same percentage also said their organization lacks an AI vision and plan.

The Work Trend report also highlights the mismatch between AI skills demand and supply. Over half the leaders surveyed (55%) say they’re concerned about having enough AI talent. And nearly two-thirds (65%) say they wouldn’t hire someone who lacked AI skills.

Yet fewer than 4 in 10 users (39%) have received AI training from their company. And only 1 in 4 companies plan to offer AI training this year.

The Work Trend report is based on a mix of sources: a survey of 31,000 people in 31 countries; labor and hiring trends on the LinkedIn site; Microsoft 365 productivity signals; and research with Fortune 500 customers.

AI skills: supply-demand mismatch

The mismatch between AI skills supply and demand was also examined recently by market watcher IDC. It expects that by 2026, 9 of every 10 organizations will be hurt by an overall IT skills shortage. This will lead to delays, quality issues and revenue loss that IDC predicts will collectively cost these organizations $5.5 trillion.

To be sure, AI skills are currently the most in-demand skill for most organizations. The good news, IDC finds, is that more than half of organizations are now using or piloting training for GenAI.

“Getting the right people with the right skills into the right roles has never been more difficult,” says IDC researcher Gina Smith. Her prescription for success: Develop a “culture of learning.”

AI helps supply chains, too

Did you know AI is being used to solve supply-chain problems?

It’s a big issue. Over 8 in 10 global businesses (84%) said they’ve experienced supply-chain disruptions in the last year, finds a survey commissioned by Blue Yonder, a vendor of supply-chain solutions.

In response, supply-chain executives are making strategic investments in AI and sustainability, Blue Yonder finds. Nearly 8 in 10 organizations (79%) said they’ve increased their investments in supply-chain operations. Their 2 top areas of investment were sustainability (cited by 48%) and AI (41%).

The survey also identified the top supply-chain areas for AI investment. They are planning (cited by 56% of those investing in AI), transportation (53%) and order management (50%).

In addition, 8 in 10 respondents to the survey said they’ve implemented GenAI in their supply chains at some level. And more than 90% said GenAI has been effective in optimizing their supply chains and related decisions.

The survey, conducted by an independent research firm with sponsorship by Blue Yonder, was fielded in March, with the results released earlier this month. The survey received responses from more than 600 C-suite and senior executives, all of them employed by businesses or government agencies in the United States, UK and Europe.

Do more:

 

Featured videos


Follow


Related Content

Pages