Performance Intensive Computing

Capture the full potential of IT

Tech Explainer: What is the intelligent edge? Part 1

Featured content

Tech Explainer: What is the intelligent edge? Part 1

The intelligent edge moves compute, storage and networking capabilities close to end devices, where the data is being generated. Organizations gain the ability to process and act on that data in real time, and without having to first transfer that data to the a centralized data center.

Applications:
Featured Technologies:

The term intelligent edge refers to remote server infrastructures that can collect, process and act on data autonomously. In effect, it’s a small, remote data center.

Compared with a more traditional data center, the intelligent edge offers one big advantage: It locates compute, storage and networking capabilities close to the organization’s data collection endpoints. This architecture speeds data transactions. It also makes them more secure.

The approach is not entirely new. Deploying an edge infrastructure has long been an effective way to gather data in remote locations. What’s new with an intelligent edge is that you gain the ability to process and act on that data (if necessary) in real time—without having to first transfer that data to the cloud.

The intelligent edge can also save an organization money. Leveraging the intelligent edge makes sense for organizations that spend a decent chunk of their operating budget transferring data from the edge to public and private data centers, which could be a cloud infrastructure (often referred to as “the core”). Reducing bandwidth in both directions and storage charges helps them control costs.

3 steps to the edge

Today, an intelligent edge typically gets applied in one of three areas:

Operational Technology (OT): Hardware and software used to monitor and control industrial equipment, processes and events.

Information Technology (IT): Digital infrastructure—including servers, storage, networking and other devices—used to create, process, store, secure and transfer data.

Internet of Things (IoT): A network of smart devices that communicate and can be controlled via the internet. Examples include smart speakers, wearables, autonomous vehicles and smart-city infrastructure.

The highly efficient edge

There’s yet another benefit to deploying intelligent edge tech: It can help an organization become more efficient.

One way the intelligent edge does this is by obviating the need to transfer large amounts of data. Instead, data is stored and processed close to where it’s collected.

For example, a smart lightbulb or fridge can communicate with the intelligent edge instead of contacting a data center. Staying in constant contact with the core is unnecessary for devices that don’t change much from minute to minute.

Another way the intelligent edge boosts efficiency is by reducing the time needed to analyze and act on vital information. This, in turn, can lead to enhanced business intelligence that informs and empowers stakeholders. It all gets done faster and more efficiently than with traditional IT architectures and operations.

For instance, imagine that an organization serves a large customer base from several locations. By deploying an intelligent edge infrastructure, the organization could collect and analyze customer data in real time.

Businesses that gain insights from the edge instead of from the core can also respond quickly to market changes. For example, an energy company could analyze power consumption and weather conditions at the edge (down to the neighborhood), then determine whether there's be a power outage.

Similarly, a retailer could use the intelligent edge to support inventory management and analyze customers’ shopping habits. Using that data, the retailer could then offer customized promotions to particular customers, or groups of customers, all in real time.

The intelligent edge can also be used to enhance public infrastructure. For instance, smart cities can gather data that helps inform lighting, public safety, maintenance and other vital services, which could then be used for preventive maintenance or the allocation of city resources and services as needed.

Edge intelligence

As artificial intelligence (AI) becomes increasingly ubiquitous, many organizations are deploying machine learning (ML) models at the edge to help analyze data and deliver insights in real time.

In one use case, running AI and ML systems at the edge can help an organization reduce the service interruptions that often come with transferring large data sets to and from the cloud. Intelligent Edge is able to keep things running locally, giving distant data centers a chance to catch up. This, in turn, can help the organization provide a better experience for the employees and customers who rely on that data.

Deploying AI at the edge can also help with privacy, security and compliance issues. Transferring data to and from the core presents an opportunity for hackers to intercept data in transit. Eliminating this data transfer deprives cyber criminals of a threat vector they could otherwise exploit.

Part 2 of this two-part blog series dives deep into the biggest, most popular use of the intelligent edge today—namely, the internet of things (IoT). We also look at the technology that powers the intelligent edge, as well as what the future may hold for this emerging technology.

Do more:

Check out Supermicro solutions for Intelligent Edge and IoT

Explore Supermicro systems for outdoor edge server

Meet the AMD EPYC 8004 CPUs, designed for the edge

Featured videos

Follow

Follow AMD

Follow Supermicro

Supermicro introduces edge, telco servers powered by new AMD EPYC 8004 processors

Featured content

Supermicro introduces edge, telco servers powered by new AMD EPYC 8004 processors

Supermicro has introduced five Supermicro H13 WIO and short-depth servers powered by the new AMD EPYC 8004 Series processors. These servers are designed for intelligent edge and telco applications.

Applications:
Featured Technologies:

Supermicro is supporting the new AMD EPYC 8004 Series processors (previously code-named Siena) on five Supermicro H13 WIO and short-depth telco servers. Taking advantage of the new AMD processor, these new single-socket servers are designed for use with intelligent edge and telco applications.

The new AMD EPYC 8004 processors enjoy a broad range of operating temperatures and can run at lower DC power levels, thanks to their energy-efficient ‘Zen4c’ cores. Each processor features from 8 to 64 simultaneous multithreading (SMT) capable ‘Zen4c’ cores.

The new AMD processors also run quietly. With a TDP as low as 80W, the CPUs don’t need much in the way of high-speed cooling fans.

Compact yet capacious

Supermicro’s new 1U short-depth version is designed with I/O in the front and a form factor that’s compact yet still offers enough room for three PCIe 5.0 slots. It also has the option of running on either AC or DC power.

The short-depth systems also feature a NEBS-compliant design for telco operations. NBS, short for Network Equipment Building System, is an industry requirement for the performance levels of telecom equipment.

The new WIO servers use Titanium power supplies for increased energy efficiency, and Supermicro says that will deliver higher performance/watt for the entire system.

Supermicro WIO systems offer a wide range of I/O options to deliver optimized systems for specific requirements. Users can optimize the storage and networking alternatives to accelerate performance, increase efficiency and find the perfect fit for their applications.

Here are Supermicro’s five new models:

AS -1015SV-TNRT: Supermicro H13 WIO system in a 1U format
AS -1115SV-TRNT: Supermicro H13 WIO system in a 1U format
AS -2015SV-TNRT: Supermicro H13 WIO system in a 2U format
AS -1115S-FWTRT: Supermicro H13 telco/edge short-depth system in a 1U format, running on AC power and including system-management features
AS -1115S-FDWTRT: Supermicro H13 telco/edge short-depth system in a 1U format, this one running on DC power

Shipments of the new Supermicro servers supporting AMD EPYC 8004 processors start now.

Do more:

Meet the new Supermicro servers

Read the blog: Meet the new AMD EPYC 8004 series processors

Download the datasheet: Supermicro H13 WIO Systems

Download the datasheet: Supermicro H13 telco/edge systems

Register to join a 30-minute webinar on the new Supermicro servers on Oct. 3, 2023, at 1 p.m. ET / 11 a.m. PT

Featured videos

Follow

Follow AMD

Follow Supermicro

Meet the new AMD EPYC 8004 family of CPUs

Featured content

Meet the new AMD EPYC 8004 family of CPUs

The new 4th gen AMD EPYC 8004 family extends the ‘Zen4c’ core architecture into lower-count processors with TDP ranges as low as 80W. The processors are designed especially for edge-server deployments and form factors.

Applications:
Featured Technologies:

AMD has introduced a family of EPYC processors for space- and power-constrained deployments: the 4th Generation AMD EPYC 8004 processor family. Formerly code-named Siena, these lower core-count CPUs can be used in traditional data centers as well as for edge compute, retail point-of-sale and running a telco network.

The new AMD processors have been designed to run at the edge with better energy efficiency and lower operating costs. The CPUs enjoy a broad range of operating temperatures and can run at lower DC power levels, thanks to their energy-efficient ‘Zen4c’ cores. These new CPUs also run quietly. With a TDP as low as 80W, the CPUs don’t need much in the way of high-speed cooling fans.

The AMD EPYC 8004 processors are purpose-built to deliver high performance and are energy-efficient in an optimized, single-socket package. They use the new SP6 socket. Each processor features from 8 to 64 simultaneous multithreading (SMT) capable ‘Zen4c’ cores.

AMD says these features, along with streamlined memory and I/O feature set, lets servers based on this new processor family deliver compelling system cost/performance metrics.

Heat-tolerant

The AMD EPYC 8004 family is also designed to run in environments with fluctuating and at times high ambient temperatures. That includes outdoor “smart city” settings and NEBS-compliant communications network sites. (NEBS, short for Network Equipment Building System, is an industry requirement for the performance levels of telecom equipment.) What AMD is calling “NEBS-friendly” models have an operating range of -5 C (23 F) to 85 C (185 F).

The new AMD processors can also run in deployments where both the power levels and available physical space are limited. That can include smaller data centers, retail stores, telco installations, and the intelligent edge.

The performance gains are impressive. Using the SPECpower benchmark, which measures power efficiency, the AMD EPYC 8004 CPUs deliver more than 2x the energy efficiency of the top competitive product for telco. This can result in 34% lower energy costs over five years, saving organizations thousands of dollars.

Multiple models

In all, the AMD EPYC 8004 family currently offers 12 SKUs. Those ending with the letter “P” support single-CPU designs. Those ending “PN” support NEBS-friendly designs and offer broader operating temperature ranges.

The various models offer a choice of 8, 16, 24, 48 or 64 ‘Zen4c’ cores; from 16 to 128 threads; and L3 cache sizes ranging from 32MB to 128MB. All the SKUs offer 6 channels of DDR memory with a maximum capacity of 1.152TB; a maximum DDR5 frequency of 4800 MHz; and 96 lanes of PCIe Gen 5 connectivity. Security features are offered by AMD Infinity Guard.

Selected AMD partners have already announced support for the new EPYC 8004 family. This includes Supermicro, which introduced new WIO based on the new AMD processors for diverse data center and edge deployments.

Do more:

Check out the 4th gen AMD EPYC 8004 processors

Download the datasheet: AMD EPYC 8004 Series Processors

Read the blog post: Supermicro intros edge, telco servers powered by new AMD EPYC 8004 processors

Watch a video: Dan McNamara, senior VP and GM of AMD’s server business unit, introduces the AMD EPYC 8004 processors

Read the AMD press release

Explore Supermicro’s AMD-powered servers

Featured videos

Follow

Follow AMD

Follow Supermicro

Tech Explainer: What’s the difference between Machine Learning and Deep Learning? Part 1

Featured content

Tech Explainer: What’s the difference between Machine Learning and Deep Learning? Part 1

What’s the difference between machine learning and deep learning? That’s the subject of this 2-part Tech Explainer. Here, in Part 1, learn more about ML.

Applications:
Featured Technologies:

As the names imply, machine learning and deep learning are types of smart software that can learn. Perhaps not the way a human does. But close enough.

What’s the difference between machine and deep learning? That’s the subject of this 2-part Tech Explainer. Here in Part 1, we’ll look in depth at machine learning. Then in Part 2, we’ll look more closely at deep learning.

Both, of course, are subsets of artificial intelligence (AI). To understand their differences, it helps to first understand something of the AI hierarchy.

At the very top is overarching AI technology. It powers both popular generative AI models such as ChatGPT and less famous but equally helpful systems such as the suggestion engine that tells you which show to watch next on Netflix.

Machine learning is a subset of AI. It can perform specific tasks without first needing explicit instructions.

As for deep learning, it’s actually a subset of machine learning. DL is powered by so-called neural networks, multiple node layers that form a system inspired by the structure of the human brain.

Machine learning for smarties

Machine learning is defined as the use and development of computer systems designed to learn and adapt without following explicit instructions.

Instead of requiring human input, ML systems use algorithms and statistical models to analyze and draw inferences from patterns they find in large data sets.

This form of AI is especially good at identifying patterns from structured data. Then it can analyze those patterns to make predictions, usually reliable.

For example, let’s say an organization wants to predict when a particular customer will unsubscribe from its service. The organization could use ML to make an educated guess based on previous data about customer churn.

The machinery of ML

Like all forms of AI, machine learning uses lots of compute and storage resources. Enterprise-scale ML models are powered by data centers packed to the gills with cutting-edge tech. The most vital of these components are GPUs and AI data-center accelerators.

GPUs, though initially designed to process graphics, have become the preferred tool for AI development. They offer high core counts—sometimes numbering in the thousands—as well as massive parallel processes. That makes them ideally suited to process a vast number of simple calculations simultaneously.

As AI gained acceptance, IT managers sought ever more powerful GPUs. The logical conclusion was the advent of new technologies like AMD’s Instinct MI200 Series accelerators. These purpose-built GPUs have been designed to power discoveries in mainstream servers and supercomputers, including some of the largest exascale systems in use today.

AMD’s forthcoming Instinct MI300X will go one step further, combining a GPU and AMD EPYC CPU in a single component. It’s set to ship later this year.

State-of-the-art CPUs are important for ML-optimized systems. The CPUs need as many cores as possible, running at high frequencies to keep the GPU busy. AMD’s EPYC 9004 Series processors excel at this.

In addition, the CPUs need to run other tasks and threads of the application. When looking at a full system, PCIe 5.0 connectivity and DDR4 memory are important, too.

The GPUs that power AI are often installed in integrated servers that have the capacity to house their constituent components, including processors, flash storage, networking tech and cooling systems.

One such monster server is the Supermicro AS -4125GS-TNRT. It brings together eight direct attached, double-width, full-length GPUs; up to 6TB of RAM; and two dozen 2.5-inch solid-state drives (SSDs). This server also supports the AMD Instinct MI210 accelerator.

ML vs. DL

The difference between machine learning and deep learning begins with their all-important training methods. ML is trained using four primary methods: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning.

Deep learning, on the other hand, requires more complex training methods. These include convolutional neural networks, recurrent neural networks, generative adversarial networks and autoencoders.

When it comes to performing real-world tasks, ML and DL offer different core competencies. For instance, ML is the type of AI behind the most effective spam filters, like those used by Google and Yahoo. Its ability to adapt to varying conditions allows ML to generate new rules based on previous operations. This functionality helps it keep pace with highly motivated spammers and cybercriminals.

More complex inferencing tasks like medical imaging recognition are powered by deep learning. DL models can capture intricate relationships within medical images, even when those relationships are nonlinear or difficult to define. In other words, deep learning can quickly and accurately identify abnormalities not visible to the human eye.

Up next: a Deep Learning deep dive

In Part 2, we’ll explore more about deep learning. You’ll find out how data scientists develop new models, how various verticals leverage DL, and what the future holds for this emerging technology.

Do more:

Explore AMD’s AI solutions

Browse Supermicro’s GPU server systems for deep learning

Meet the AMD Instinct MI Series accelerators

Check out the specs for Supermicro’s AS -4125GS-TRNT GPU server

Featured videos

Follow

Follow AMD

Follow Supermicro

What’s inside Supermicro’s new Petascale storage servers?

Featured content

What’s inside Supermicro’s new Petascale storage servers?

Supermicro has a new class of storage servers that support E3.S Gen 5 NVMe drives. They offer up to 256TB of high-throughput, low-latency storage in a 1U enclosure, and up to half a petabyte in a 2U.

Applications:
Featured Technologies:

Supermicro has introduced a new class of storage servers that support E3.S Gen 5 NVMe drives. These storage servers offer up to 256TB of high-throughput, low-latency storage in a 1U enclosure, and up to half a petabyte in a 2U.

Supermicro has designed these storage servers to be used with large AI training and HPC clusters. Those workloads require that unstructured data, often in extremely large quantities, be delivered quickly to the system’s CPUs and GPUs.

To do this, Supermicro has developed a symmetrical architecture that reduces latency. It does so in 2 ways. One, by ensuring that data travels the shortest possible signal path. And two, by providing the maximum airflow over critical components, allowing them to run as fast and cool as possible.

1U and 2U for you

Supermicro’s new lineup of optimized storage systems includes 1U servers that support up to 16 hot-swap E3.S drives. An alternate configuration could be up to eight E3.S drives, plus four E3.S 2T 16.8mm bays for CMM and other emerging modular devices.

(CMM is short for Chassis Management Module. These devices provide management and control of the chassis, including basic system health, inventory information and basic recovery operations.)

The E3.S form factor calls for a short and thin NVMe SSD drive that is 76mm high, 112.75mm long, and 7.5mm thick.

In the 2U configuration, Supermicro’s servers support up to 32 hot-swap E3.S drives. A single-processor system, it support the latest 4th Gen AMD EPYC processors.

Put it all together, and you can have a standard rack that stores up to an impressive 20 petabytes of data for high-throughput NVMe over fabrics (NVMe-oF) configurations.

30TB drives coming

When new 30TB drives become available—a move expected later this year—the new Supermicro storage servers will be able to handle them. Those drives will bring the storage total to 1 petabyte in a compact 2U server.

Two storage-drive vendors working closely with Supermicro are Kioxia America and Solidigm, both of which make E3.S solid-state drives (SSDs). Kioxia has announced a 30.72TB SSD called the Kioxia CD8P Series. And Solidigm says its D5-P5336 SSD will ship in an E3.S form factor with up to 30.72TB in the first half of 2024.

The new Supermicro Petascale storage servers are shipping now in volume worldwide.

Learn more about the Supermicro E3.S Petascale All-Flash NVMe Storage Systems.

Featured videos

Follow

Follow AMD

Follow Supermicro

Can liquid-cooled servers help your customers?

Featured content

Can liquid-cooled servers help your customers?

Liquid cooling can offer big advantages over air cooling. According to a new Supermicro solution guide, these benefits include up to 92% lower electricity costs for a server’s cooling infrastructure, and up to 51% lower electricity costs for an entire data center.

Applications:
Featured Technologies:

The previous thinking was that liquid cooling was only for supercomputers and high-end gaming PCs. No more.

Today, many large-scale cloud, HPC, analytics and AI servers combine CPUs and GPUs in a single enclosure, generating a lot of heat. Liquid cooling can carry away the heat that’s generated, often with less overall cost and more efficiently than air.

According to a new Supermicro solution guide, liquid’s advantages over air cooling include:

Up to 92% lower electricity costs for a server’s cooling infrastructure
Up to 51% lower electricity costs for the entire data center
Up to 55% less data center server noise

What’s more, the latest liquid cooling systems are turnkey solutions that support the highest GPU and CPU densities. They’re also fully validated and tested by Supermicro under demanding workloads that stress the server. And unlike some other components, they’re ready to ship to you and your customers quickly, often in mere weeks.

What are the liquid-cooling components?

Liquid cooling starts with a cooling distribution unit (CDU). It incorporates two modules: a pump that circulates the liquid coolant, and a power supply.

Liquid coolant travels from the CDU through flexible hoses to the cooling system’s next major component, the coolant distribution manifold (CDM). It’s a unit with distribution hoses to each of the servers.

There are 2 types of CDMs. A vertical manifold is placed on the rear of the rack, is directly connected via hoses to the CDU, and delivers coolant to another important component, the cold plates. The second type, a horizontal manifold, is placed on the front of the rack, between two servers; it’s used with systems that have inlet hoses on the front.

The cold plates, mentioned above, are placed on top of the CPUs and GPUs in place of their typical heat sinks. With coolant flowing through their channels, they keep these components cool.

Two valuable CDU features are offered by Supermicro. First, the company’s CDU has a cooling capacity of 100kW, which enables very high rack compute densities. Second, Supermicro’s CDU features a touchscreen for monitoring and controlling the rack operation via a web interface. It’s also integrated with the company’s Super Cloud Composer data-center management software.

What does it work on?

Supermicro offers several liquid-cooling configurations to support different numbers of servers in different size racks.

Among the Supermicro servers available for liquid cooling is the company’s GPU systems, which can combine up to eight Nvidia GPUs and AMD EPYC 9004 series CPUs. Direct-to-chip (D2C) coolers are mounted on each processor, then routed through the manifolds to the CDU.

D2C cooling is also a feature of the Supermicro SuperBlade. This system supports up to 20 blade servers, which can be powered by the latest AMD EPYC CPUs in an 8U chassis. In addition, the Supermicro Liquid Cooling solution is ideal for high-end AI servers such as the company’s 8-GPU 8125GS-TNHR.

To manage it all, Supermicro also offers its SuperCloud Composer’s Liquid Cooling Consult Module (LCCM). This tool collects information on the physical assets and sensor data from the CDU, including pressure, humidity, and pump and valve status.

This data is presented in real time, enabling users to monitor the operating efficiency of their liquid-cooled racks. Users can also employ SuperCloud Composer to set up alerts, manage firmware updates, and more.

Do more:

Read the solution guide: Supermicro Rack Scale Liquid Cooling Solutions (PDF)

Check out AMD EPYC 9004 Series server processors

Learn about Supermicro SuperCloud Composer

Featured videos

Follow

Follow AMD

Follow Supermicro

Meet Supermicro’s Petascale Storage, a compact rackmount system powered by the latest AMD EPYC processors

Featured content

Meet Supermicro’s Petascale Storage, a compact rackmount system powered by the latest AMD EPYC processors

Supermicro’s H13 Petascale Storage Systems is a compact 1U rackmount system powered by the AMD EPYC 97X4 processor (formerly codenamed Bergamo) with up to 128 cores.

Applications:
Featured Technologies:

Your customers can now implement Supermicro Petascale Storage, an all-Flash NVMe storage system powered by the latest 4th gen AMD EPYC 9004 series processors.

The Supermicro system has been specifically designed for AI, HPC, private and hybrid cloud, in-memory computing and software-defined storage.

Now Supermicro is offering the first of these systems. It's the Supermicro H13 Petascale Storage System. This compact 1U rackmount system is powered by an AMD EPYC 97X4 processor (formerly codenamed Bergamo) with up to 128 cores.

For organizations with data-storage requirements approaching petascale capacity, the Supermicro system was designed with a new chassis and motherboard that support a single AMD EPYC processor, 24 DIMM slots for up to 6TB of main memory, and 16 hot-swap ES.3 slots. That's the Enterprise and Datacenter Standard Form Factor (EDSFF), part of the E3 family of SSD form factors designed for specific use cases. ES.3 is short and thin. It uses 25W and 7.5mm-wide storage media designed with a PCIe 5.0 interface.

The Supermicro Petascale Storage system can deliver more than 200 GB/sec. bandwidth and over 25 million input-output operations per second (IOPS) from a half-petabyte of storage.

Here's why

Why might your customers need such a storage system? Several reasons, depending on what sorts of workloads they run:

Training AI/ML applications requires massive amounts of data for creating reliable models.
HPC projects use and generate immense amounts of data, too. That's needed for real-world simulations, such as predicting the weather or simulating a car crash.
Big-data environments need susbstantial datasets. These gain intelligence from real-world observations ranging from sensor inputs to business transactions.
Enterprise applications need to locate large amounts of data close to computing over NVMe-over-Fabrics (NVMeoF) speeds.

Also, the Supermicro H13 Petascale Storage System offers significant performance, capacity, throughput and endurance--all while keeping excellent power efficiencies.

Do more:

Check out the Supermicro H13 Petascale Storage Systems data sheet
Meet the AMD EPYC 9004 Series processors

Featured videos

Follow

Follow AMD

Follow Supermicro

Interview: How NEC Germany keeps up with the changing HPC market

Featured content

Interview: How NEC Germany keeps up with the changing HPC market

In an interview, Oliver Tennert, director of HPC marketing and post-sales at NEC Germany, explains how the company keeps pace with a fast-developing market.

Applications:
Featured Technologies:
Featured Companies:
NEC Germany

The market for high performance computing (HPC) is changing, meaning system integrators that serve HPC customers need to change too.

To learn more, PIC managing editor Peter Krass spoke recently with Oliver Tennert, NEC Germany’s director of HPC marketing and post-sales. NEC Germany works with hardware vendors that include AMD processors and Supermicro servers. This interview has been lightly edited for clarity.

First, please tell me about NEC Germany and its relationship with parent company NEC Corp.?

I work for NEC Germany, which is a subsidary of NEC Europe. Our parent company, NEC Corp., is a Japanese company with a focus on telecommunications, which is still a major part of our business. Today NEC has about 100,000 employees around the world.

HPC as a business within NEC is done primarily by NEC Germany and our counterparts at NEC Corp. in Japan. The Japanese operation covers HPC in Asia, and we cover EMEA, mainly Europe.

What kinds of HPC workloads and applications do your customers run?

It’s probably 60:40 — that is, about 60% of our customers are in academia, including universities, research facilities, and even DWD, Germany’s weather-forecasting service. The remaining 40% are industrial, including automotive and engineering companies.

The typical HPC use cases of our customers come in two categories. The most important HPC category of course is simulation. That can mean simulating physical processes. For example, what does a car crash look like under certain parameters? These simulations are done in great detail.

Our other important HPC category is data analytics. For example, that could mean genomic analysis.

How do you work with AMD and Supermicro?

To understand this, you first have to understand how NEC’s HPC business works. For us, there are two aspects to the business.

One, we’ve got our own vector technology. Our NEC vector engine is a PCIe card designed and produced in Japan. The latest incarnation of our vector supercomputer is the NEC SX-Aurora TSUBASA. It was designed to run applications that are both vectorizable and profit from high bandwidth to main memory. One of our big customers in this area is the German weather service, DWD.

The other part of the business is what we call “pizza boxes,” the x86 architecture. For this, we need industry-standard servers, including processors from AMD and servers from Supermicro.

For that second part of the business, what is NEC’s role?

The answer has to do with how the HPC business works operationally. If a customer intends to purchase a new HPC cluster, typically they need expert advice on designing an optimized HPC environment. What they do know is the application they run. And what they want to know is, ‘How do we get the best, most optimized system for this application?’

This implies doing a lot of configuration. Essentially, we optimize the design based on many different components. Even if we know that an AMD processor is the best for a particular task, still, there are dozens of combinations of processor SKUs and server model types which offer different price/performance ratios. The same applies to certain data-storage solutions. For HPC, storage is more than just picking an SSD. What’s needed is a completely different kind of technology.

Configuring and setting up such a complex solution takes a lot of expertise. We’re being asked to run benchmarks. That means the customer says, ‘Here’s my application, please run it on some specific configurations, and tell me which one offers the best price/performance ratio.’ This takes a lot of time and resources. For example, you need the systems on hand to just try it out. And the complete tender process—from pre-sales discussions to actual ordering and delivery—can take anywhere from weeks to months.

And this is just to bid, right? After all this work, you still might not get the order?

Yes, that can happen. There are lots of factors that influence your chances. In general, if you have a good working relationship with a private customer, it’s easier. They have more discretion than academic or public customers. For public bids, everything must be more transparent, because it’s more strictly regulated. Normally, that means you have more work, because you have to test more setups. Your competition will be doing the same.

When working with the second group, the private industry customers, do customer specify parts from specific vendors, such as AMD and Supermicro?

It depends on the factors that will influence the customer’s final selection. Price and performance, that’s one thing. Power consumption is another. Then, sometimes, it’s the vendors. Also, certain projects are more attractive to certain vendors because of market visibility—so-called lighthouse projects. That can have an influence on the conditions we get from vendors. Vendors also honor the amount of effort we have put in to getting the customer in the first place. So there are all sorts of external factors that can influence the final system design.

Also, today, the majority of HPC solutions are similar from an architectural point of view. So the difference between competing vendors is to take all the standard components and optimize from these, instead of providing a competing architecture. As a result, the soft skills—such as the ability to implement HPC solutions in an efficient and professional way—also have a large influence on the final order.

How about power consumption and cooling? Are these important considerations for your HPC customers?

It’s become absolutely vital. As a rule of thumb, we can say that the larger an HPC project is going to be, the more likely that it is going to be cooled by liquid.

In the past, you had a server room that you cooled with air conditioning. But those times are nearly gone. Today, when you think of a larger HPC installation—say, 1,000 or 2,000 nodes—you’re talking about a megawatt of power being consumed, or even more. And that also needs to be cooled.

The challenge in cooling a large environment is to get the heat away from the server and out of the room to somewhere else, whether outside or to a larger cooling system. This cannot be done by traditional cooling with air. Air is too inefficient for transporting heat. Water is much better. It’s a more efficient means for moving heat from Point A to Point B.

How are you cooling HPC systems with liquid?

There are a few ways to do this. There’s cold-water cooling, mainly indirect. You bring in water with what’s known as an “inlet temperature” of about 10 C and it cools down the air inside the server racks, with the heat getting carried away with the water now at about 15 or 20 C. The issue is, first you need energy just to cool the water down to 10 C. Also, there’s not much you can do with water at 15 or 20 C. It’s too warm for cooling anything else, but too cool for heating a room.

That’s why the new approach is to use hot-water cooling, mainly direct. It sounds like a paradox. But what might seem hot to a human being is in fact pretty cool for a CPU. For a CPU, an ambient temperature of 50 or 60 C is fine; it would be absolutely not fine for a human being. So if you have an inlet temperature for water of, say, 40 or 45 C, that will cool the CPU, which runs at an internal temperature of 80 or 90 C. The outbound temperature of the water is then maybe 50 C. Then it becomes interesting. At that temperature, you can heat a building. You can reuse the heat, rather than just throwing it away. So this kind of infrastructure is becoming more important and more interesting.

Looking ahead, what are some of your top projects for the future?

Public customers such as research universities have to replace their HPC systems every three to five years. That’s the normal cycle. In that time the hardware becomes obsolete, especially as the vendors optimize their power consumption to performance ratio more and more. So it’s a steady flow of new projects. For our industrial customers, the same applies, though the procurement cycle may vary.

We’re also starting to see the use of computational HPC capacity from the cloud. Normally, when people think of cloud, they think of public clouds from Amazon, Microsoft, etc. But for HPC, there are interim approaches as well. A decade ago, there was the idea of a dedicated public cloud. Essentially, this meant a dedicated capacity that was for the customer’s exclusive use, but was owned by someone other than the customer. Now, between the dedicated cloud and public cloud, there are all these shades of grey. In the past two years, we’ve implemented several larger installations of this “grey-shaded” cloud approach. So more and more, we’re entering the service-oriented market.

There is a larger trend away from customers wanting to own a system, and toward customers just wanting to utilize capacity. For vendors with expertise in HPC, they have to change as well. Which means a change in the business and the way they have to work with customers. It boils down to, Who owns the hardware? And what does the customer buy, hardware or just services? That doesn’t make you a public-cloud provider. It just means you take over responsibility for this particular customer environment. You have a different business model, contract type, and set of responsibilities.

Featured videos

Follow

Follow AMD

Follow Supermicro

Supermicro H13 JumpStart remote access program adds latest AMD EPYC processors

Featured content

Supermicro H13 JumpStart remote access program adds latest AMD EPYC processors

Get remote access to the next generation of AMD-powered servers from Supermicro.

Applications:
Featured Technologies:

Supermicro’s H13 JumpStart Remote Access program—which lets you use Supermicro servers before you buy—now includes the latest Supermicro H13 systems powered by 4th gen AMD EPYC 9004 processors.

These include servers using the two new AMD EPYC processor series introduced in June. One, previously codenamed Bergamo, is optimized for cloud-native workloads. The other, previously codenamed Genoa-X, is equipped with AMD 3D V-Cache technology and is optimized for technical computing.

Supermicro’s free H13 JumpStart program lets you and your customers validate, test and benchmark workloads remotely on Supermicro H13 systems powered by these new AMD processors.

The latest Supermicro H13 systems deliver performance and density with some cool technologies. These include AMD EPYC processors with up to 128 “Zen 4c” cores per socket, DDR5 memory, PCIe 5.0, and CXL 1.1 peripherals support.

Those AMD Zen 4c cores are designed for the sweet spot of both density and power efficiency. Compared with AMD’s previous generation (Zen 4), the new design offers substantially improved performance per watt.

Get started

Getting started with Supermicro’s H13 JumpStart program is simple. Just sign up with your name, email and a brief description of what you plan to do with the system.

Next, Supermicro will verify your information and your request. Assuming you qualify, you’ll receive a welcome email from Supermicro, and you’ll be scheduled to gain access to the JumpStart server.

Next, you’ll be given a unique username, password and URL to access your JumpStart account. Then you can run your test, try new features, and benchmark your application.

Once you’re done, Supermicro will ask you to complete a quick survey for your feedback on the program. That’s it.

The H13 JumpStart program now offers 3 server configurations. These include Supermicro’s dual-processor 2U Hyper (AS -2025HS-TNR); single-processor 2U Cloud DC (AS -2015CS-TNR); and single-processor 2U Hyper-U (AS -2115HS-TNR).

Do more:

Get started with Supermicro’s H13 JumpStart program

Learn more about AMD EPYC processors optimized for cloud-native workloads and AMD EPYC processors with 3D V-Cache for technical computing

Learn more about Supermicro H13 systems with new AMD EPYC processors

Featured videos

Follow

Follow AMD

Follow Supermicro

Interview: How German system integrator SVA serves high performance computing with AMD and Supermicro

Featured content

Interview: How German system integrator SVA serves high performance computing with AMD and Supermicro

In an interview, Bernhard Homoelle, head of the HPC competence center at German system integrator SVA, explains how his company serves customers with help from AMD and Supermicro.

Applications:
Featured Technologies:
Featured Companies:
SVA System Vertrieb Alexander GmbH

SVA System Vertrieb Alexander GmbH, better known as SVA, is among the leading IT system integrators of Germany. Headquartered in Wiesbaden, the company employs more than 2,700 people in 27 branch offices. SVA’s customers include organizations in automotive, financial services and healthcare.

To learn more about how SVA works jointly with Supermicro and AMD on advanced technologies, PIC managing editor Peter Krass spoke recently with Bernhard Homoelle, head of SVA’s high performance computing (HPC) competence center (pictured above). Their interview has been lightly edited.

For readers outside of Germany, please tell us about SVA?

First of all, SVA is an owner-operated system integrator. We offer high-quality products, we sell infrastructure, we support certain types of implementations, and we offer operational support to help our customers achieve optimum solutions.

We work with partners to figure out what might be the best solution for our customers, rather than just picking one vendor and trying to convince the customer they should use them. Instead, we figure out what is really needed. Then we go in the direction where the customer can really have their requirements met. The result is a good relationship with the customer, even after a particular deal has been closed.

Does SVA focus on specific industries?

While we do support almost all the big industries—automotive, transportation, public sector, healthcare and more—we are not restricted to any specific vertical. Our main business is helping customers solve their daily IT problems, deal with the complexity of new IT systems, and implement new things like AI and even quantum computing. So we’re open to new solutions. We also offer training with some of our partners.

Germany has a robust auto industry. How do you work with these clients?

In general, they need huge HPC clusters and machine learning. For example, autonomous driving demands not only more computing power, but also more storage. We’re talking about petabytes of data, rather than terabytes. And this huge amount of data needs to be stored somewhere and finally processed. That puts pressure on the infrastructure—not just on storage, but also on the network infrastructure as well as on the compute side. For their way into cloud, some these customers are saying, “Okay, offer me HPC as a Service.”

How do you work with AMD and Supermicro?

It’s a really good relationship. We like working with them because Supermicro has all these various types of servers for individual needs. Customers are different, and therefore they have their own requirements. Figuring out what might be the best server for them is difficult if you have limited types of servers available. But with Supermicro, you can get what you have in mind. You don’t have to look for special implementations because they have these already at hand.

We’re also partnering with AMD, and we have access to their benchmark labs, so we can get very helpful information. We start with discussions with the customer to figure out their needs. Typically, we pick up an application from the customer and then use it as a kind of benchmark. Next, we put it on a cluster with different memory, different CPUs, and look for the best solution in terms of performance for their particular application. Based on the findings, we can recommend a specific CPU, number of cores, memory type and size, and more.

With HPC applications, core memory bandwidth is almost as important as the number of cores. AMD’s new Genoa-X processors should help to overcome some of these limitations. And looking ahead, I’m keen to see what AMD will offer with the Instinct MI300.

Are there special customer challenges you’re solving with Supermicro and AMD solutions?

With HPC workloads, our academic customers say, “This is the amount of money available, so how many servers can you really give us for this budget?” Supermicro and AMD really help here with reasonable prices. They’re a good choice for price/performance.

With AI and machine learning, the real issue is software tools. It really depends what kinds of models you can use and how easy it is to use the hardware with those models.

This discussion is not easy, because for many of our customers today, AI means Nvidia. But I really recommend alternatives, and AMD is bringing some alternatives that are great. They offer a fast time to solution, but they also need to be easy to switch to.

How about "green" computing? Is this an important issue for your customers now?

Yes, more and more we’re seeing customers ask for this green computing approach. Typically, a customer has a thermal budget and a power-price budget. They may say, “In five years, the expenses paid for power should not exceed a certain limit.”

In Europe, we also have a supply-chain discussion. Vendors must increasingly provide proof that they’re taking care in their supply chain with issues including child labor and working conditions. This is almost mandatory, especially in government calls. If you’re unable to answer these questions, you’re out of the bid.

With green computing, we see that the power needed for CPUs and GPUs is going up and up. Five years ago, the maximum a CPU could burn was 200W, but now even 400W might not be enough. Some GPUs are as high as 700W, and there are super-chips beyond even that.

All this makes it difficult to use air-cooled systems. Customers can use air conditioning to a certain extent, but there’s only so much air you can press through the rack. Then you need either on-chip water cooling or some kind of immersion cooling. This can help in two dimensions: saving energy and getting density — you can put the components closer together, and you don’t need the big heat sink anymore.

One issue now is that each vendor offers a different cooling infrastructure. Some of our customers run multi-vendor data centers, so this could create a compatibility issue. That’s one reason we’re looking into immersion cooling. We think we could do some of our first customer implementations in 2024.

Looking ahead, what do you see as a big challenge?

One area is that we want to help customers get easier access to their HPC clusters. That’s done on the software side.

In contrast to classic HPC users, machine learning and AI engineers are not that interested in Linux stuff, compiler options or any other infrastructure details. Instead, they’d like to work on their frameworks. The challenge is getting them to their work as easily as possible—so that they can just log in, and they’re in their development environment. That way, they won’t have to care about what sort of operating system is underneath or what kind of scheduler, etc., is running.

Featured content

Featured videos

Follow

Follow AMD

Follow Supermicro

Related Content

Featured content

Featured videos

Follow

Follow AMD

Follow Supermicro

Related Content

Featured content

Featured videos

Follow

Follow AMD

Follow Supermicro

Related Content

Featured content

Featured videos

Follow

Follow AMD

Follow Supermicro

Related Content

Featured content

Featured videos

Follow

Follow AMD

Follow Supermicro

Related Content

Featured content

Featured videos

Follow

Follow AMD

Follow Supermicro

Related Content

Featured content

Featured videos

Follow

Follow AMD

Follow Supermicro

Related Content

Featured content

Featured videos

Follow

Follow AMD

Follow Supermicro

Related Content

Featured content

Featured videos

Follow

Follow AMD

Follow Supermicro

Related Content

Featured content

Featured videos

Follow

Follow AMD

Follow Supermicro

Related Content

Pages