Nvidia deeply unhappy with TSMC, claims 20nm essentially worthless
By Joel Hruska on March 23, 2012
One of the unspoken rules of customer-foundry relations is that you virtually never see the former speak poorly of the latter. Only when things have seriously hit the fan do partners like AMD or Nvidia admit to manufacturing problems, and typically only after postponed launches and poor availability have made protestations that everything is fine unsustainable.
That’s why we were surprised — and our source testified to being stunned — that Nvidia gave the following presentation at the International Trade Partner Conference (ITPC) forum last November. Many of the company’s complaints regarding its current partnership with TSMC are exactly what you’d expect given the manufacturing problems the entire industry is facing. What’s surprising are Nvidia’s remarks concerning TSMC’s current cost curves and manufacturing ramps. This is normally the sort of information discussed quietly between a foundry and its customers or by the press with help from various anonymous sources. Discussing the problems publicly is a sign of just how frustrated the company has become.
Watch the underlines for emphasis
TSMC builds hardware for a huge number of companies, but those customers have very different needs and use a wide range of process technologies. Historically, Nvidia (and ATI/AMD) have been regular early adopters. The nature of graphics is that it can easily soak up new processes and the higher transistor counts they enable.
The flip side of that situation is that companies like AMD and Nvidia have also been responsible for assuming the risks associated with “risk production” and footing a hefty bill for the privilege. As those risks mount and costs skyrocket, Nvidia is increasingly unhappy with being asked to shoulder the burden. Nvidia’s slides talk about the need for “real” understanding, compromises on “rough justice,” and a closer relationship that looks more like that of an IDM (Integrated Device Manufacturer). For those of you who don’t know the term, Intel is an IDM — it handles both manufacturing and design. AMD used to be.
According to Nvidia, the current model is unsustainable. Here’s the company’s projected analysis for transistor costs at current and new nodes.
As the process nodes shrink, it takes longer and longer for the cost-per-transistor to fall below the previous generation. At 20nm, the gains all-but vanish. Want to know why Nvidia rearchitected Fermi with a new emphasis on efficiency and performance/watt? You’re looking at the reason. If per-transistor costs remain constant, the only way to improve your cost structure is to make better use of the transistors you’ve got.
As for wafer costs, they’ve become part of the problem.
What this slide states — we can’t even call it a suggestion — is that smaller processes no longer improve yields by leading to a greater number of chips per wafer. Instead, the complexities and difficulties of manufacturing at the new process create a cost structure that provides precious little incentive to manufacture at the new process.
If openly criticizing a foundry partner is unusual, showing data that suggests that your foundry partner can’t provide a cost-effective strategy for building hardware at next-generation process nodes is… a few steps past that point. The recent launch of the GTX 680, and that card’s trifecta of price/performance/power-efficiency actually strengthens the impact of this data. NV would’ve had a good idea how the GK104 was shaping up when it spoke at ITPC in November; this isn’t a case where a company is angry about the performance of a particular part and looking for someone to blame.
The GK104 is great, but it doesn’t change the nature or severity of the underlying problems. As for whether Nvidia’s unhappiness with TSMC heralds a potential alliance with GlobalFoundries, we’re dubious. Not only has GF only recently ironed out its own 28nm issues, the nature of the foundry business doesn’t allow for quick shifts. Indeed, part of the reason that manufacturers like TSMC have historically exercised such control over their partners’ PR releases is because once you’ve committed to a foundry, you’re locked in for a substantial period of time. The fact that there’s now two foundries available with cutting-edge technology doesn’t change that, and the Common Platform Alliance favored by IBM, Samsung, and GloFo only mitigates some of the problems with moving a design from foundry to foundry, it doesn’t remove them.
The real question, at least for TSMC’s other customers, is whether the graphs and charts Nvidia has shown are specific to the company’s own products or reflect universal trends. There’s good reason to suspect the latter; Nvidia may have had more trouble than some of TSMC’s other customers, but our analysis of semiconductor industry roadmaps revealed a great deal of uncertainty about the road forward. Nvidia opted to aggressively optimize GK104 precisely because the old strategy of bolting on more cores and ratcheting up transistor counts isn’t sustainable.
Further evidence for the accuracy of NV’s presentation comes, ironically, from the company’s primary GPU competitor. At AMD’s Financial Analyst Day, CEO Rory Read made a point of saying that the company no longer intends to aggressively transition to new process nodes given the diminishing marginal returns from doing so.
Change the color scheme, and Nvidia’s graphs could’ve dropped right into AMD’s presentations in early February.
Nvidia’s willingness to stand up and talk about these problems is an “Emperor’s new clothes” sort of moment. The long-term repercussions, if any, are still unclear.