Sponsored by:

Visit AMD Visit Supermicro

Performance Intensive Computing

Capture the full potential of IT

AMD drives AI with Instinct MI300X, Instinct MI300A, ROCm 6

Featured content

AMD drives AI with Instinct MI300X, Instinct MI300A, ROCm 6

Learn More about this topic
  • Applications:
  • Featured Technologies:

AMD this week formally introduced its AMD Instinct MI300X and AMD Instinct MI300A accelerators, two important elements of the company’s new push into AI.

During the company’s two-hour “Advancing AI” event, held live in Silicon Valley and live-streamed on YouTube, CEO Lisa Su asserted that “AI is absolutely the No. 1 priority at AMD.”

She also said that AI is both “the future of computing” and “the most transformative technology of the last 50 years.”

AMD is leading the AI charge with its Instinct MI300 Series accelerators, designed for both cloud and enterprise AI and HPC workloads. These systems offer GPUs, large and fast memory, and 3D packaging using the 4th gen AMD Infinity Architecture.

AMD is also relying heavily on cloud, OEM and software partners that include Meta, Microsoft and Oracle Cloud. Another partner, Supermicro, announced additions to its H13 generation of accelerated servers powered by 4th Gen AMD EPYC CPUs and AMD Instinct MI300 Series accelerators.

MI300X

The AMD Instinct MI300X is based on the company’s CDNA 3 architecture. It packs 304 GPU cores. It also includes up to 192MB of HBM3 memory with a peak memory bandwidth of 5.3TB/sec. It’s available as 8 GPUs on an OAM baseboard.

The accelerator runs on the latest bus, the PCIe Gen 5, at 128GB/sec.

AI performance has been rated at 20.9 PFLOPS of total theoretical peak FP8 performance, AMD says. And HPC performance has a peak double-precision matrix (FP64) performance of 1.3 PFLOPS.

Compared with competing products, the AMD Instinct MI300X delivers nearly 40% more compute units, 1.5x more memory capacity, and 1.7x more peak theoretical memory bandwidth, AMD says.

AMD is also offering a full system it calls the AMD Instinct Platform. This packs 8 MI300X accelerators to offer up to 1.5TB of HBM3 memory capacity. And because it’s built on the industry-standard OCP design, the AMD Instinct Platform can be easily dropped into an existing servers.

The AMD Instinct MI300X is shipping now. So is a new Supermicro 8-GPU server with this new AMD accelerator.

MI300A

AMD describes its new Instinct MI300A as the world’s first data-center accelerated processing unit (APU) for HPC and AI. It combines 228 cores of AMD CDNA 3 GPU, 224 cores of AMD ‘Zen 4’ CPUs, and 128GB of HBM3 memory with a memory bandwidth of up to 5.3TB/sec.

AMD says the Instinct MI300A APU gives customers an easily programmable GPU platform, high-performing compute, fast AI training, and impressive energy efficiency.

The energy savings are said to come from the APU’s efficiency. As HPC and AI workloads are both data- and resource-intensive, a more efficient system means users can do the same or more work with less hardware.

The AMD Instinct MI300A is also shipping now. So are two new Supermicro servers that feature the APU, one air-cooled, and the other liquid-cooled.

ROCm 6

As part of its push into AI, AMD intends to maintain an open software platform. During CEO Su’s presentation, she said that openness is one of AMD’s three main priorities for AI, along with offering a broad portfolio and working with partners.

Victor Peng, AMD’s president, said the company has set as a goal the creation of a unified AI software stack. As part of that, the company is continuing to enhance ROCm, the company’s software stack for GPU programming. The latest version, ROCm 6, will ship later this month, Peng said.

AMD says ROCm 6 can increase AI acceleration performance by approximately 8x when running on AMD MI300 Series accelerators in Llama 2 text generation compared with previous-generation hardware and software.

ROCm 6 also adds support for several new key features for generative AI. These include FlashAttention, HIPGraph and vLLM.

AMD is also leveraging open-source AI software models, algorithms and frameworks such as Hugging Face, PyTorch and TensorFlow. The goal: simplify the deployment of AMD AI solutions and help customers unlock the true potential of generative AI.

Shipments of ROCm are set to begin later this month.

Do more:

 

Featured videos


Follow


Related Content

Research Roundup: GenAI use, public-cloud spend, tech debt’s reach, employee cyber violations

Featured content

Research Roundup: GenAI use, public-cloud spend, tech debt’s reach, employee cyber violations

Catch up on the latest research from leading IT market watchers and analysts. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

Generative AI is already used by two-thirds of organizations. Public-cloud spending levels are forecast to rise 20% next year. Technical debt is a challenge for nearly 75% of organizations. And info-security violations by staff are nearly as common as attacks by external hackers.

That’s some of the latest research from leading IT market watchers and analysts. And here’s your Performance Intensive Computing roundup.

GenAI already used by 2/3 of orgs

You already know that Generative AI is hot, but did you also realize that over two-thirds of organizations are already using it?

In a survey of over 2,800 tech professionals, publisher O’Reilly found that fully 67% of respondents say their organizations currently use GenAI. Of this group, about 1 in 3 also say their organizations have been working with AI for less than a year.

Respondents to the survey were users of O’Reilly products worldwide. About a third of respondents (34%) work in the software industry; 14% in financial services; 11% in hardware; and the rest in industries that include telecom, public sector/government, healthcare and education. By region, nearly three-quarters of respondents (74%) are based in either North America or Europe.

Other key findings from the O’Reilly survey (multiple replies were permitted):

  • GenAI’s top use cases: Programming (77%); data analysis (70%); customer-facing applications (65%)
  • GenAI’s top use constraints: Lack of appropriate use cases (53%); legal issues, risk and compliance (38%)
  • GenAI’s top risks: Unexpected outcomes (49%); security vulnerabilities (48%); safety and reliability (46%)

Public-cloud spending to rise 20% next year

Total worldwide spending by end users on the public cloud will rise 20% between this year and next, predicts Gartner. This year, the market watcher adds, user spending on the public cloud will total $563.6 billion. Next year, this spend will rise to $678.8 billion.

“Cloud has become essentially indispensable,” says Gartner analyst Sid Nag.

Gartner predicts that all segments of the public-cloud market will grow in 2024. But it also says 2 segments will grow especially fast next year: Infrastructure as a Service (IaaS), predicted to grow nearly 27%; and Platform as a Service (PaaS), forecast to grow nearly 22.

What’s driving all this growth? One factor: industry cloud platforms. These combine Software as a Service (SaaS), PaaS and IaaS into product offerings aimed at specific industries.

For example, enterprise software vendor SAP offers industry clouds for banking, manufacturing, HR and more. The company says its life-sciences cloud helped Boston Scientific, a manufacturer of medical devices, reduce inventory and order-management operational workloads by as much as 45%.

Gartner expects that by 2027, industry cloud platforms will be used by more than 70% of enterprises, up from just 15% of enterprises in 2022.

Technical debt: a big challenge

Technical debt—older hardware and software that no longer supports an organization’s strategies—is a bigger problem than you might think.

In a recent survey of 523 IT professionals, conducted for IT trade association CompTIA, nearly three-quarters of respondents (74%) said their organizations find tech debt to be a challenge.

An even higher percentage of respondents (78%) say their work is impeded by “cowboy IT,” shadow IT and other tech moves made without the IT department’s involvement. Not incidentally, these are among the main causes of technical debt, mainly because they are not acquired as part of the organization’s strategic goals.

Fortunately, IT pros are also fighting back. Over two-thirds of respondents (68%) said they’ve made erasing technical debt a moderate or high priority.

Cybersecurity: Staff violations nearly as widespread as hacks

Employee violations of organizations’ information-security policies are nearly as common as attacks by external hackers, finds a new survey by security vendor Kaspersky

The survey reached 1,260 IT and security professionals worldwide. It found that 26% of cyber incidents in business occurred due to employees intentionally violating their organizations’ security protocols. By contrast, hacker attacks accounted for 30%—not much higher.

Here’s the breakdown of those policy violations by employees, according to Kaspersky (multiple replies were permitted):

  • 25%: Using weak passwords or failing to change passwords regularly
  • 24%: Visiting unsecured websites
  • 24%: Using unauthorized systems for sharing data
  • 21%: Failing to update system software and applications
  • 21%: Accessing data with an unauthorized device
  • 20%: Sending data (such as email addresses) to personal systems
  • 20%: Intentionally engaging in malicious behavior for personal gain

The issue is far from theoretical. Among respondents to the Kaspersky survey, fully a third (33%) say they’ve suffered 2 or 3 cyber incidents in the last 2 years. And a quarter (25%) say that during the same time period, they’ve been the subject of at least 4 cyberattacks.

Do more:

 

Featured videos


Follow


Related Content

Tech Explainer: How does design simulation work? Part 2

Featured content

Tech Explainer: How does design simulation work? Part 2

Cutting-edge technology powers the virtual design process.

Learn More about this topic
  • Applications:
  • Featured Technologies:

The market for simulation software is hot, growing at a compound annual growth rate (CAGR) of 13.2%, according to Markets and Markets. The research firm predicts that the global market for simulation software, worth an estimated $18.1 billion this year, will rise to $33.5 billion by 2027.

No surprise, then, that tech titans AMD and Supermicro would design an advanced hardware platform to meet the demands of this burgeoning software market.

AMD and Supermicro have teamed up with Ansys Inc., a U.S.-based designer of engineering simulation software. One result of this three-way collaboration is the Supermicro SuperBlade.

Shanthi Adloori, senior director of product management at Supermicro, calls the SuperBlade “one of the fastest simulation-in-a-box solutions.”

Adloori adds: “With a high core count, large memory capacity and faster memory bandwidth, you can reduce the time it takes to complete a simulation .”

One very super blade

Adloori isn’t overstating the case.

Supermicro’s SuperBlade can house up to 20 hot-swappable nodes in its 8U chassis. Each of those blades can be equipped with AMD EPYC CPUs and AMD Instinct GPUs. In fact, SuperBlade is the only platform of its kind designed to support both GPU and non-GPU nodes in the same enclosure.

Supermicro SuperBlade’s other tech specs may be less glamorous, but they’re no less impressive. When it comes to memory, each blade can address a maximum of either 8TB or 16TB of DDR5-4800 memory.

Each node can also house 2 NVMe/SAS/SATA drives and as many as eight 3000W Titanium Level power supplies.

Because networking is an essential element of enterprise-grade design simulation, SuperBlade includes redundant 25Gb/10Gb/1Gb Ethernet switches and up to 200Gbps/100Gbps InfiniBand networking for HPC applications.

For smaller operations, the Supermicro SuperBlade is also available in smaller configurations, including  6U and 4U. These versions pack fewer nodes, which ultimately means they’re able to bring less power to bear. But, hey, not every design team makes passenger jets for a living.

It’s all about the silicon

If Supermicro’s SuperBlade is the tractor-trailer of design simulation technology, then AMD CPUs and GPUs are the engines under the hood.

The differing designs of these chips lend themselves to specific core competencies. CPUs can focus tremendous power on a few tasks at a time. Sure, they can multitask. But there’s a limit to how many simultaneous operations they can address.

AMD bills its EPYC 7003 Series CPUs as the world’s highest-performing server processors for technical computing. The addition of AMD 3D V-Cache technology delivers an expanded L3 cache to help accelerate simulations.

GPUs, on the other hand, are required when running simulations where certain tasks require simultaneous operations to be performed. The AMD Instinct MI250X Accelerator contains 220 compute units with 14,080 stream processors.

Instead of throwing a ton of processing power at a small number of operations, the AMD Instinct can address thousands of less resource-intensive operations simultaneously. It’s that capability that makes GPUs ideal for HPC and AI-enabled operations, an increasingly essential element of modern design simulation.

The future of design simulation

The development of advanced hardware like SuperBlade and the AMD CPUs and GPUs that power it will continue to progress as more organizations adopt design simulation as their go-to product development platform.

That progression will continue to manifest in global companies like Boeing and Volkswagen. But it will also find its way into small startups and single users.

Also, as the required hardware becomes more accessible, simulation software should become more efficient.

This confluence of market trends could empower millions of independent designers with the ability to perform complex design, testing and validation functions.

The result could be nothing short of a design revolution.

Part 1 of this two-part Tech Explainer explores the many ways design simulation is used to create new products, from tiny heart valves to massive passenger aircraft. Read Part 1 now.

Do more:

 

Featured videos


Follow


Related Content

Tech Explainer: How does design simulation work? Part 1

Featured content

Tech Explainer: How does design simulation work? Part 1

Design simulation lets designers and engineers create, test and improve designs of real-world airplanes, cars, medical devices and more while working safely and quickly in virtual environments. This workflow also reduces the need for physical tests and allows designers to investigate more alternatives and optimize their products.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Design simulation is a type of computer-aided engineering used to create new products, reducing the need for physical prototypes. The result is a faster, more efficient design process in which complex physics and math do much of the heavy lifting.

Rapid advances in CPUs and GPUs that are used to perform simulation and software have made it possible to shift product design from the physical world to a virtual one.

In this virtual space, engineers can create and test new designs as quickly as their servers can calculate the results and then render them with visualization software.

Getting better all the time

Designing via AI-powered virtual simulation offers significant improvements over older methods.

Back in the day, it might have taken a small army of automotive engineers years to produce a single new model. Prototypes were often sculpted from clay and carted into a wind tunnel to test aerodynamics.

Each new model went through a seemingly endless series of time-consuming physical simulations. The feedback from those tests would literally send designers back to the drawing board.

It was an arduous and expensive process. And the resources necessary to accomplish these feats of engineering often came at the expense of competition. Companies whose pockets weren’t deep enough might fail to keep up.

Fast-forward to the present. Now, we’ve got smaller design teams aided by increasingly powerful clusters of high-performance systems.

These engineers can tweak a car’s crumple zone in the morning … run the new version through a virtual crash test while eating lunch … and send revised instructions to the design team before day’s end.

Changing designs, saving lives

Faster access to this year’s Ford Mustang is one thing. But if you really want to know how design simulation is changing the world, talk to someone whose life was saved by a mechanical heart valve.

Using the latest tech, designers can simulate new prosthetics in relation to the physiology they’ll inhabit. Many factors come into play here, including size, shape, materials, fluid dynamics, failure models and structural integrity over time.

What’s more, it’s far better to theorize how a part will interact with the human body before the doctor installs it. Simulations can warn medical pros about potential infections, rejections and physical mismatches. AI can play a big part in these types of simulations and manufacturing.

Sure, perfection may be unattainable. But the closer doctors get to a perfect match between a prosthetic and its host body, the better the patient will fair after the procedure.

Making the business case

Every business wants to cut costs, increase efficiency and get an edge over the competition. Here, too, design simulation offers a variety of ways to achieve those lofty goals.

As mentioned above, simulation can drastically reduce the need for expensive physical prototypes. Creating and testing a new airplane design virtually means not having to come within 100 miles of a runway until the first physical prototype is ready to take flight. 

Aerospace and automotive industries rely heavily on both the structural integrity of an assembly but also on computational fluid dynamics. In this way, simulation can potentially save an aerospace company billions of dollars over the long run.

What’s more, virtual airplanes don’t crash. They can’t be struck by lightning. And in a virtual passenger jet, test pilots don’t need to worry about their safety.

By the time a new aircraft design rolls onto the tarmac, it’s already been proven air-worthy—at least to the extent that a virtual simulation can make those kinds of guarantees.

Greater efficiency

Simulation makes every aspect of design more efficient. For instance, iteration, a vital element of the design process, becomes infinitely more manageable in a simulated environment.

Want to find out how a convertible top will affect your new supercar’s 0-to-60 time? Simulation allows engineers to quickly replace the hard-top with some virtual canvas and then create a virtual drag race against the original model.

Simulation can take a product to the manufacturing phase, too. Once a design is finished, engineers can simulate its journey through a factory environment.

This virtual factory, or digital twin, can help determine how long it will take to build a product and how it will react to various materials and environmental conditions. It can even determine how many moves a robot arm will need to make and when human intervention might become necessary. This process helps engineers optmize the manufacturing process.

In countless ways, simulation has never been more real.

In Part 2 of this 2-part blog, we’ll explore the digital technology behind design simulation. This cutting-edge technology is made possible by the latest silicon, vast swaths of high-speed storage, and sophisticated blade servers that bring it all together.

Do more:

 

Featured videos


Follow


Related Content

Why M&E content creators need high-end VDI, rendering & storage

Featured content

Why M&E content creators need high-end VDI, rendering & storage

Content creators in media and entertainment need lots of compute, storage and networking. Supermicro servers with AMD EPYC processors are enhancing the creativity of these content creators by offering improved rendering and high-speed storage. These systems empower the production of creative ideas.

 

Learn More about this topic
  • Applications:
  • Featured Technologies:

When content creators at media and entertainment (M&E) organizations create videos and films, they’re also competing for attention. And today that requires a lot of technology.

Making a full-length animated film involves no fewer than 14 complex steps, including 3D modeling, texturing, animating, visual effects and rendering. The whole process can take years. And it requires a serious quantity of high-end compute, storage and software.

From an IT perspective, three of the most compute-intensive activities for M&E content creators are VDI, rendering and storage. Let’s take a look at each.

* Virtual desktop infrastructure (VDI): While content creators work on personal workstations, they need the kind of processing power and storage capacity available from a rackmount server. That’s what they get with VDI.

VDI separates the desktop and associated software from the physical client device by hosting the desktop environment and applications on a central server. These assets are then delivered to the desktop workstation over a network.

To power VDI setups, Supermicro offers a 4U GPU server with up to 8 PCIe GPUs. The Supermicro AS -4125GS-TNRT server packs a pair of AMD EPYC 9004 processors, Nvidia RTX 6000 GPUs, and 6TB of DDR5 memory.

* Rendering: The last stage of film production, rendering is where the individual 3D images created on a computer are transformed into the stream of 2D images ready to be shown to audiences. This process, conducted pixel by pixel, is time-consuming and resource-hungry. It requires powerful servers, lots of storage capacity and fast networking.

For rendering, Supermicro offers its 2U Hyper system, the AS -2125HS-TNR. It’s configured with dual AMD EPYC 9004 processors, up to 6TB of memory, and your choice of NVMe, SATA or SAS storage.

* Storage: Content creation involves creating, storing and manipulating huge volumes of data. So the first requirement is simply having a great deal of storage capacity. But it’s also important to be able to retrieve and access that data quickly.

For these kinds of storage challenges, Supermicro offers Petascale storage servers based on AMD EPYC processors. They can pack up to 16 hot-swappable E3.S (7.5mm) NVMe drive bays. And they’ve been designed to store, process and move vast amounts of data.

M&E content creators are always looking to attract more attention. They’re getting help from today’s most advanced technology.

Do more:

 

 

Featured videos


Follow


Related Content

Tech Explainer: What’s the difference between Machine Learning and Deep Learning? Part 2

Featured content

Tech Explainer: What’s the difference between Machine Learning and Deep Learning? Part 2

In Part 1 of this 2-part Tech Explainer, we explored the difference between how machine learning and deep learning models are trained and deployed. Now, in Part 2, we’ll get deeper into deep learning to discover how this advanced form of AI is changing the way we work, learn and create.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Where Machine Learning is designed to reduce the need for human intervention, Deep Learning—an extension of ML—removes much of the human element altogether.

If ML were a driver-assistance feature that helped you parallel park and avoid collisions, DL would be an autonomous, self-driving car.

The human intervention we’re talking about has much to do with categorizing and labeling the data used by ML models. Producing this structured data is both time-consuming and expensive.

DL shortens the time and lowers the cost by learning from unstructured data. This elimnates much of the data pre-processing performed by humans for ML.

That’s good news for modern businesses. Market watcher IDC estimates that as much as 90% of corporate data is associated with unstructured data.

DL is particularly good at processing unstructured data. That includes information coming from the edge, the core and millions of both personal and IoT devices.

Like a brain, but digital

Deep Learning systems “think” with a neural network—multiple layers of interconnected nodes designed to mimic the way the human brain works. A DL system processes data inputs in an attempt to recognize, classify and accurately describe objects within data.

The layers of a neural network are stacked vertically. Each layer builds on the work performed by the one below it. By pushing data through each successive layer, the overall system improves its predictions and categorizations.

For instance, imagine you’ve tasked a DL system to identify pictures of junk food. The system would quickly learn—on its own—how to differentiate Pringles from Doritos.

It might do this by learning to recognize Pringles’ iconic tubular packaging. Then the system would categorize Pringles differently than the family-size sack of Doritos.

What if you fed this hypothetical DL system with more pictures of chips? Then it could begin to identify varying angles of packaging, as well as colors, logos, shapes and granular aspects of the chips themselves.

As this example illustrates, the longer a DL system operates, the more intelligent and accurate it becomes.

Things we used to do

DL tends to be deployed when it’s time to pull out the big guns. This isn’t tech you throw at a mere spam filter or recommendation engine.

Instead, it’s the tech that powers the world’s finance, biomedical advances and law enforcement. For these verticals, failure is simply not an option.

For these verticals, here are some of the ways DL operates behind the scenes:

  • BioMed: DL helps healthcare staff analyze medical imaging such as X-rays and CT scans. In many cases, the technology is more accurate than well-trained physicians with decades of experience.
  • Finance: For those seeking a market edge (read: everyone), DL employs powerful, algorithmic-based predictive analytics. This helps modern-day robber barons manage their portfolios based on insights from data so vast, they couldn’t leverage it themselves. DL also helps financial institutions assess loans, detect fraud and manage credit.
  • Law Enforcement: In the 2002 movie “Minority Report,” Tom Cruise played a police officer who could arrest people before they committed a crime. With DL, this fiction could turn into an unsettling reality. DL can be used to analyze millions of data points, then predict who is most likely to break the law. It might even give authorities an idea of where, when and how it could happen.

The future…?

Looking into a crystal ball—which these days probably uses DL—we can see a long succession of similar technologies coming. Just as ML begat DL, so too will DL beget the next form of AI—and the one after that.

The future of DL isn’t a question of if, but when. Clearly, DL will be used to advance a growing number of industries. But just when each sector will come to be ruled by our new smarty-pants robots is less clear.

Keep in mind: Even as you read this, DL systems are working tirelessly to help data scientists make AI more accurate and able to provide more useful assessments of datasets for specific outcomes. And as the science progresses, neural networks will continue to become more complex—and more like human brains.

That means the next generation of DL will likely be far more capable than the current one. Future AI systems could figure out how to reverse the aging process, map distant galaxies, even produce bespoke food based on biometric feedback from hungry diners.

For example, the upcoming AMD Instinct MI300 accelerators promise to usher in a new era of computing capabilities. That includes the ability to handle large language models (LLMs), the key approach behind generative AI systems such as ChatGPT.

Yes, the robots are here, and they want to feed you custom Pringles. Bon appétit!

 

Do more:

 

Featured videos


Follow


Related Content

What’s inside Supermicro’s new Petascale storage servers?

Featured content

What’s inside Supermicro’s new Petascale storage servers?

Supermicro has a new class of storage servers that support E3.S Gen 5 NVMe drives. They offer up to 256TB of high-throughput, low-latency storage in a 1U enclosure, and up to half a petabyte in a 2U.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Supermicro has introduced a new class of storage servers that support E3.S Gen 5 NVMe drives. These storage servers offer up to 256TB of high-throughput, low-latency storage in a 1U enclosure, and up to half a petabyte in a 2U.

Supermicro has designed these storage servers to be used with large AI training and HPC clusters. Those workloads require that unstructured data, often in extremely large quantities, be delivered quickly to the system’s CPUs and GPUs.

To do this, Supermicro has developed a symmetrical architecture that reduces latency. It does so in 2 ways. One, by ensuring that data travels the shortest possible signal path. And two, by providing the maximum airflow over critical components, allowing them to run as fast and cool as possible.

1U and 2U for you 

Supermicro’s new lineup of optimized storage systems includes 1U servers that support up to 16 hot-swap E3.S drives. An alternate configuration could be up to eight E3.S drives, plus four E3.S 2T 16.8mm bays for CMM and other emerging modular devices.

(CMM is short for Chassis Management Module. These devices provide management and control of the chassis, including basic system health, inventory information and basic recovery operations.)

The E3.S form factor calls for a short and thin NVMe SSD drive that is 76mm high, 112.75mm long, and 7.5mm thick.

In the 2U configuration, Supermicro’s servers support up to 32 hot-swap E3.S drives. A single-processor system, it support the latest 4th Gen AMD EPYC processors.

Put it all together, and you can have a standard rack that stores up to an impressive 20 petabytes of data for high-throughput NVMe over fabrics (NVMe-oF) configurations.

30TB drives coming

When new 30TB drives become available—a move expected later this year—the new Supermicro storage servers will be able to handle them. Those drives will bring the storage total to 1 petabyte in a compact 2U server.

Two storage-drive vendors working closely with Supermicro are Kioxia America and Solidigm, both of which make E3.S solid-state drives (SSDs). Kioxia has announced a 30.72TB SSD called the Kioxia CD8P Series. And Solidigm says its D5-P5336 SSD will ship in an E3.S form factor with up to 30.72TB in the first half of 2024.

The new Supermicro Petascale storage servers are shipping now in volume worldwide.

Learn more about the Supermicro E3.S Petascale All-Flash NVMe Storage Systems.

 

Featured videos


Follow


Related Content

Can liquid-cooled servers help your customers?

Featured content

Can liquid-cooled servers help your customers?

Liquid cooling can offer big advantages over air cooling. According to a new Supermicro solution guide, these benefits include up to 92% lower electricity costs for a server’s cooling infrastructure, and up to 51% lower electricity costs for an entire data center.

Learn More about this topic
  • Applications:
  • Featured Technologies:

The previous thinking was that liquid cooling was only for supercomputers and high-end gaming PCs. No more.

Today, many large-scale cloud, HPC, analytics and AI servers combine CPUs and GPUs in a single enclosure, generating a lot of heat. Liquid cooling can carry away the heat that’s generated, often with less overall cost and more efficiently than air.

According to a new Supermicro solution guide, liquid’s advantages over air cooling include:

  • Up to 92% lower electricity costs for a server’s cooling infrastructure
  • Up to 51% lower electricity costs for the entire data center
  • Up to 55% less data center server noise

What’s more, the latest liquid cooling systems are turnkey solutions that support the highest GPU and CPU densities. They’re also fully validated and tested by Supermicro under demanding workloads that stress the server. And unlike some other components, they’re ready to ship to you and your customers quickly, often in mere weeks.

What are the liquid-cooling components?

Liquid cooling starts with a cooling distribution unit (CDU). It incorporates two modules: a pump that circulates the liquid coolant, and a power supply.

Liquid coolant travels from the CDU through flexible hoses to the cooling system’s next major component, the coolant distribution manifold (CDM). It’s a unit with distribution hoses to each of the servers.

There are 2 types of CDMs. A vertical manifold is placed on the rear of the rack, is directly connected via hoses to the CDU, and delivers coolant to another important component, the cold plates. The second type, a horizontal manifold, is placed on the front of the rack, between two servers; it’s used with systems that have inlet hoses on the front.

The cold plates, mentioned above, are placed on top of the CPUs and GPUs in place of their typical heat sinks. With coolant flowing through their channels, they keep these components cool.

Two valuable CDU features are offered by Supermicro. First, the company’s CDU has a cooling capacity of 100kW, which enables very high rack compute densities. Second, Supermicro’s CDU features a touchscreen for monitoring and controlling the rack operation via a web interface. It’s also integrated with the company’s Super Cloud Composer data-center management software.

What does it work on?

Supermicro offers several liquid-cooling configurations to support different numbers of servers in different size racks.

Among the Supermicro servers available for liquid cooling is the company’s GPU systems, which can combine up to eight Nvidia GPUs and AMD EPYC 9004 series CPUs. Direct-to-chip (D2C) coolers are mounted on each processor, then routed through the manifolds to the CDU. 

D2C cooling is also a feature of the Supermicro SuperBlade. This system supports up to 20 blade servers, which can be powered by the latest AMD EPYC CPUs in an 8U chassis. In addition, the Supermicro Liquid Cooling solution is ideal for high-end AI servers such as the company’s 8-GPU 8125GS-TNHR.

To manage it all, Supermicro also offers its SuperCloud Composer’s Liquid Cooling Consult Module (LCCM). This tool collects information on the physical assets and sensor data from the CDU, including pressure, humidity, and pump and valve status.

This data is presented in real time, enabling users to monitor the operating efficiency of their liquid-cooled racks. Users can also employ SuperCloud Composer to set up alerts, manage firmware updates, and more.

Do more:

 

Featured videos


Follow


Related Content

Meet Supermicro’s Petascale Storage, a compact rackmount system powered by the latest AMD EPYC processors

Featured content

Meet Supermicro’s Petascale Storage, a compact rackmount system powered by the latest AMD EPYC processors

Supermicro’s H13 Petascale Storage Systems is a compact 1U rackmount system powered by the AMD EPYC 97X4 processor (formerly codenamed Bergamo) with up to 128 cores.

 

 

Learn More about this topic
  • Applications:
  • Featured Technologies:

Your customers can now implement Supermicro Petascale Storage, an all-Flash NVMe storage system powered by the latest 4th gen AMD EPYC 9004 series processors.

The Supermicro system has been specifically designed for AI, HPC, private and hybrid cloud, in-memory computing and software-defined storage.

Now Supermicro is offering the first of these systems. It's the Supermicro H13 Petascale Storage System. This compact 1U rackmount system is powered by an AMD EPYC 97X4 processor (formerly codenamed Bergamo) with up to 128 cores.

For organizations with data-storage requirements approaching petascale capacity, the Supermicro system was designed with a new chassis and motherboard that support a single AMD EPYC processor, 24 DIMM slots for up to 6TB of main memory, and 16 hot-swap ES.3 slots. That's the Enterprise and Datacenter Standard Form Factor (EDSFF), part of the E3 family of SSD form factors designed for specific use cases. ES.3 is short and thin. It uses 25W and 7.5mm-wide storage media designed with a PCIe 5.0 interface.

The Supermicro Petascale Storage system can deliver more than 200 GB/sec. bandwidth and over 25 million input-output operations per second (IOPS) from a half-petabyte of storage.

Here's why 

Why might your customers need such a storage system? Several reasons, depending on what sorts of workloads they run:

  •  Training AI/ML applications requires massive amounts of data for creating reliable models.
  • HPC projects use and generate immense amounts of data, too. That's needed for real-world simulations, such as predicting the weather or simulating a car crash.
  • Big-data environments need susbstantial datasets. These gain intelligence from real-world observations ranging from sensor inputs to business transactions.
  • Enterprise applications need to locate large amounts of data close to computing over NVMe-over-Fabrics (NVMeoF) speeds.

Also, the Supermicro H13 Petascale Storage System offers significant performance, capacity, throughput and endurance--all while keeping excellent power efficiencies.

Do more:

Featured videos


Follow


Related Content

Interview: How NEC Germany keeps up with the changing HPC market

Featured content

Interview: How NEC Germany keeps up with the changing HPC market

In an interview, Oliver Tennert, director of HPC marketing and post-sales at NEC Germany, explains how the company keeps pace with a fast-developing market.

Learn More about this topic
  • Applications:
  • Featured Technologies:
  • Featured Companies:
  • NEC Germany

The market for high performance computing (HPC) is changing, meaning system integrators that serve HPC customers need to change too.

To learn more, PIC managing editor Peter Krass spoke recently with Oliver Tennert, NEC Germany’s director of HPC marketing and post-sales. NEC Germany works with hardware vendors that include AMD processors and Supermicro servers. This interview has been lightly edited for clarity.

First, please tell me about NEC Germany and its relationship with parent company NEC Corp.?

I work for NEC Germany, which is a subsidary of NEC Europe. Our parent company, NEC Corp., is a Japanese company with a focus on telecommunications, which is still a major part of our business. Today NEC has about 100,000 employees around the world.

HPC as a business within NEC is done primarily by NEC Germany and our counterparts at NEC Corp. in Japan. The Japanese operation covers HPC in Asia, and we cover EMEA, mainly Europe.

What kinds of HPC workloads and applications do your customers run?

It’s probably 60:40 — that is, about 60% of our customers are in academia, including universities, research facilities, and even DWD, Germany’s weather-forecasting service. The remaining 40% are industrial, including automotive and engineering companies. 

The typical HPC use cases of our customers come in two categories. The most important HPC category of course is simulation. That can mean simulating physical processes. For example, what does a car crash look like under certain parameters? These simulations are done in great detail.

Our other important HPC category is data analytics. For example, that could mean genomic analysis.

How do you work with AMD and Supermicro?

To understand this, you first have to understand how NEC’s HPC business works. For us, there are two aspects to the business.

One, we’ve got our own vector technology. Our NEC vector engine is a PCIe card designed and produced in Japan. The latest incarnation of our vector supercomputer is the NEC SX-Aurora TSUBASA. It was designed to run applications that are both vectorizable and profit from high bandwidth to main memory. One of our big customers in this area is the German weather service, DWD.

The other part of the business is what we call “pizza boxes,” the x86 architecture. For this, we need industry-standard servers, including processors from AMD and servers from Supermicro.

For that second part of the business, what is NEC’s role?

The answer has to do with how the HPC business works operationally. If a customer intends to purchase a new HPC cluster, typically they need expert advice on designing an optimized HPC environment. What they do know is the application they run. And what they want to know is, ‘How do we get the best, most optimized system for this application?’

This implies doing a lot of configuration. Essentially, we optimize the design based on many different components. Even if we know that an AMD processor is the best for a particular task, still, there are dozens of combinations of processor SKUs and server model types which offer different price/performance ratios. The same applies to certain data-storage solutions. For HPC, storage is more than just picking an SSD. What’s needed is a completely different kind of technology.

Configuring and setting up such a complex solution takes a lot of expertise. We’re being asked to run benchmarks. That means the customer says, ‘Here’s my application, please run it on some specific configurations, and tell me which one offers the best price/performance ratio.’ This takes a lot of time and resources. For example, you need the systems on hand to just try it out. And the complete tender process—from pre-sales discussions to actual ordering and delivery—can take anywhere from weeks to months.

And this is just to bid, right? After all this work, you still might not get the order?

Yes, that can happen. There are lots of factors that influence your chances. In general, if you have a good working relationship with a private customer, it’s easier. They have more discretion than academic or public customers. For public bids, everything must be more transparent, because it’s more strictly regulated. Normally, that means you have more work, because you have to test more setups. Your competition will be doing the same.

When working with the second group, the private industry customers, do customer specify parts from specific vendors, such as AMD and Supermicro?

It depends on the factors that will influence the customer’s final selection. Price and performance, that’s one thing. Power consumption is another. Then, sometimes, it’s the vendors. Also, certain projects are more attractive to certain vendors because of market visibility—so-called lighthouse projects. That can have an influence on the conditions we get from vendors. Vendors also honor the amount of effort we have put in to getting the customer in the first place. So there are all sorts of external factors that can influence the final system design.

Also, today, the majority of HPC solutions are similar from an architectural point of view. So the difference between competing vendors is to take all the standard components and optimize from these, instead of providing a competing architecture. As a result, the soft skills—such as the ability to implement HPC solutions in an efficient and professional way—also have a large influence on the final order.

How about power consumption and cooling? Are these important considerations for your HPC customers?

It’s become absolutely vital. As a rule of thumb, we can say that the larger an HPC project is going to be, the more likely that it is going to be cooled by liquid.

In the past, you had a server room that you cooled with air conditioning. But those times are nearly gone. Today, when you think of a larger HPC installation—say, 1,000 or 2,000 nodes—you’re talking about a megawatt of power being consumed, or even more. And that also needs to be cooled.

The challenge in cooling a large environment is to get the heat away from the server and out of the room to somewhere else, whether outside or to a larger cooling system. This cannot be done by traditional cooling with air. Air is too inefficient for transporting heat. Water is much better. It’s a more efficient means for moving heat from Point A to Point B.

How are you cooling HPC systems with liquid?

There are a few ways to do this. There’s cold-water cooling, mainly indirect. You bring in water with what’s known as an “inlet temperature” of about 10 C and it cools down the air inside the server racks, with the heat getting carried away with the water now at about 15 or 20 C. The issue is, first you need energy just to cool the water down to 10 C. Also, there’s not much you can do with water at 15 or 20 C. It’s too warm for cooling anything else, but too cool for heating a room.

That’s why the new approach is to use hot-water cooling, mainly direct. It sounds like a paradox. But what might seem hot to a human being is in fact pretty cool for a CPU. For a CPU, an ambient temperature of 50 or 60 C is fine; it would be absolutely not fine for a human being. So if you have an inlet temperature for water of, say, 40 or 45 C, that will cool the CPU, which runs at an internal temperature of 80 or 90 C. The outbound temperature of the water is then maybe 50 C. Then it becomes interesting. At that temperature, you can heat a building. You can reuse the heat, rather than just throwing it away. So this kind of infrastructure is becoming more important and more interesting.

Looking ahead, what are some of your top projects for the future?

Public customers such as research universities have to replace their HPC systems every three to five years. That’s the normal cycle. In that time the hardware becomes obsolete, especially as the vendors optimize their power consumption to performance ratio more and more. So it’s a steady flow of new projects. For our industrial customers, the same applies, though the procurement cycle may vary.

We’re also starting to see the use of computational HPC capacity from the cloud. Normally, when people think of cloud, they think of public clouds from Amazon, Microsoft, etc. But for HPC, there are interim approaches as well. A decade ago, there was the idea of a dedicated public cloud. Essentially, this meant a dedicated capacity that was for the customer’s exclusive use, but was owned by someone other than the customer. Now, between the dedicated cloud and public cloud, there are all these shades of grey. In the past two years, we’ve implemented several larger installations of this “grey-shaded” cloud approach. So more and more, we’re entering the service-oriented market.

There is a larger trend away from customers wanting to own a system, and toward customers just wanting to utilize capacity. For vendors with expertise in HPC, they have to change as well. Which means a change in the business and the way they have to work with customers. It boils down to, Who owns the hardware? And what does the customer buy, hardware or just services? That doesn’t make you a public-cloud provider. It just means you take over responsibility for this particular customer environment. You have a different business model, contract type, and set of responsibilities.

 

Featured videos


Follow


Related Content

Pages