Home ASIA TSMC is chasing a trillion-transistor AI bonanza

TSMC is chasing a trillion-transistor AI bonanza

0
TSMC is chasing a trillion-transistor AI bonanza

[ad_1]

TSMC has announced that its semiconductor manufacturing operations are rapidly recovering from the disruption caused by the earthquake that hit Taiwan on April 3 and that its revenue target for 2024 remains unchanged. The company’s factories were built with a high degree of earthquake resistance.

Management is conducting a comprehensive review of the situation, but as things stand now, we should step back from the headlines and make sure that the 10-year technology development scenario recently laid out by Chairman Mark Liu and Chief Scientist Philip Wong does not get lost in the shuffle.

On March 28, IEEE Spectrum, the magazine of the Institute of Electrical and Electronics Engineers, published an essay, “How We’ll Reach a 1 Trillion Transistor GPU,” which explains how “advances in semiconductors are feeding the AI boom.”

First, note that Nvidia’s new Blackwell architecture AI processor combines two reticle-limited 104-billion-transistor graphics processing units (GPUs) with a 10-terabytes-per-second interconnect and other circuitry in a single system-on-chip (SoC). 

Reticle-limited means limited by the maximum size of the photomask used in the lithography process, which transfers the design to the silicon wafer. TSMC is therefore aiming for a roughly tenfold increase in the number of transistors per GPU in the coming decade.

The essay starts off with a review of the progress of semiconductor manufacturing and artificial intelligence so far:

  • The IBM Deep Blue supercomputer that defeated world chess champion Garry Kasparovs in 1997 used 0.6- and 0.35-micron node technology.
  • The AlexNet neural network that won the ImageNet Large Scale Visual Recognition Challenge in 2012, launching the era of machine learning, used 40 nanometer (nm) technology.
  • The AlphaGo software program that defeated European Go Champion Fan Hui in 2015 was implemented using 5-nm technology, as was the initial version of ChatGPT.
  • Blackwell GPUs are made using a refined version of the 4-nm process used by TSMC to fabricate its predecessor, the Nvidia Hopper GPU.

With the computation and memory capacity required for AI training increasing by orders of magnitude, Liu and Wong note that “If the AI revolution is to continue at its current pace, it’s going to need even more from the semiconductor industry.” 

This will require not only moving to the 2-nm process node, which is scheduled for 2025, and then the 1.4-nm (or 14A, A for angstrom) node in 2027 or 2028, but also advancing from 2D scaling to 3D system integration:

“We are now putting together many chips into a tightly integrated, massively interconnected system. This is a paradigm shift in semiconductor-technology integration,” say the two executives. They explain this as follows:

In the era of AI, the capability of a system is directly proportional to the number of transistors integrated into that system. One of the main limitations is that lithographic chipmaking tools have been designed to make ICs of no more than about 800 square millimeters, what’s called the reticle limit. But we can now extend the size of the integrated system beyond lithography’s reticle limit.

By attaching several chips onto a larger interposer—a piece of silicon into which interconnects are built—we can integrate a system that contains a much larger number of devices than what is possible on a single chip. For example, TSMC’s chip-on-wafer-on-substrate (CoWoS) technology can accommodate up to six reticle fields’ worth of compute chips, along with a dozen high-bandwidth-memory (HBM) chips.

TSMC has already used CoWoS in its transition from 7-nm to 4-nm technology, putting 50% more transistors in the same area for Nvidia and other customers. It also uses a technology called system-on-integrated-chips (SoIC) to make the HBM used with GPUs. 

A high-badwidth memory chip consists of a stack of vertically interconnected dynamic random-access memory chips atop a control logic integrated circuit. According to TSMC, 12-layer HBM test structures have been created using 3D SoIC technology.

Next, we are told, optical interfaces based on silicon photonics “will allow the scaling up of energy- and area-efficient bandwidths for direct, optical GPU-to-GPU communication, such that hundreds of servers can behave as a single giant GPU with a unified memory.” 

These innovations plus advances in electronic design automation (EDA), materials science and fab equipment, should keep the energy-efficient performance (EEP) of semiconductor systems rising at its historical rate of about three times every two years. EEP expresses a combination of energy efficiency and processing speed. 

If this sounds complicated, that’s because it is. Liu and Wong themselves say, “From here, semiconductor technology will get harder to develop.” But help is on the way in the form of 3Dblox, an open-standard 3D IC design system sponsored by TSMC, Intel, EDA companies Cadence, Siemens and Synopsis and engineering software company Ansys. They call this “A Mead-Conway Moment for 3D Integrated Circuits.” 

In 1978, Professor Carver Mead of the California Institute of Technology and Lynn Conway of the Xerox PARC research and development company created a computer-aided design system that enabled engineers to design very large-scale integrated circuits without much knowledge of the semiconductor process technology required to make them. 3Dblox does the same for 3D ICs and packaging, say Liu and Wong, giving designers “a free hand to work on a 3D IC system design, regardless of the underlying technology.” 

“In the era of artificial intelligence,” Liu and Wong predict, “an integrated AI system can be composed of as many energy-efficient transistors as is practical, an efficient system architecture for specialized compute workloads and an optimized relationship between software and hardware.” That sounds like AI-enabled design of AI processors, most of them made by TSMC. 

Meanwhile, Taiwanese media report that most of TSMC’s manufacturing capacity is back on line. Buildings, some pieces of equipment and wafers in process were damaged, but the most important parts of the production lines, including the advanced (and very expensive) EUV lithography systems, were not.  

For the past 25 years, TSMC has been implementing what are called seismic management measures to protect its operations from earthquakes. As an indicator of their success, Taiwan’s DigiTimes reports that TSMC’s estimated loss from the April 3 earthquake, after insurance payments, is likely to be about NT$ 2 billion, or only US$62.2 million at the current exchange rate.

Follow this writer on X: @ScottFo83517667

[ad_2]

Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here