Monday, July 21, 2014

Bottlenecks: DRAM & Moore's Law

In addition to Moore's law slowing there are bottlenecks between the memory device and the CPU. 




The article below discusses
" How long will it will it take to find a technology so fundamentally different and better from anything we have today that we can do away with the DRAM latency and power consumption bottleneck?

There is a need for -
"A new RAM technology that cut main memory accesses by an order of magnitude would be reason enough to reevaluate the entire balance of resources on a microprocessor. If accessing main memory was as fast as accessing the CPUs cache, you might not need cache on the CPU die or package at all — or at least, you wouldn't need anything beyond L1 and maybe a small L2."

More about Moore's Law bottleneck from March 2012 Moore's Law End? (Next semiconductors gen. cost $10 billion)

Ron
Insightful, timely, and accurate semiconductor consulting.
Semiconductor information and news at - 
http://www.maltiel-consulting.com/




DRAM is pretty amazing stuff. The basic structure of the RAM we still use today was invented more than forty years ago and, just like its CPU cousin, it has continually benefited from the huge improvements that have been made in fabrication technology and density improvements. Less than ten years ago, 2GB of RAM was considered plenty for a typical desktop system — today, a high-end smartphone offers the same amount of memory but at a fifth of the power consumption.
After decades of scaling, however, modern DRAM is starting to hit a brick wall. Much in the same way that the CPU gigahertz race ran out of steam, the high latency and power consumption of DRAM is one of the most significant bottlenecks in modern computing. As supercomputers move towards exascale, there are serious doubts about whether DRAM is actually up to the task, or whether a whole new memory technology is required. Clearly there are some profound challenges ahead — and there’s disagreement about how to meet them.

What’s really wrong with DRAM?

A few days ago, Vice ran an article that actually does a pretty good job of talking about potential advances in the memory market, but includes a graph I think is fundamentally misleading. That’s not to sling mud at Vice — do a quick Google search, and you’ll find this picture has plenty of company:
DRAM scaling
The point of this image is ostensibly to demonstrate how DRAM performance has grown at a much slower rate than CPU performance, thereby creating an unbridgeable gap between the two system. The problem is, this graph no longer properly illustrates CPU performance or the relationship between it and memory.  Moore’s law has stopped functioning at anything like its historic level for CPUs or DRAM, and “memory performance” is simply too vague to accurately describe the problem.
The first thing to understand is that modern systems have vastly improved the bandwidth-per-core ratio compared to where we sat 14 years ago. In 2000, a fast P3 or Athlon system had a 64-bit memory bus connected to an off-die memory controller clocked at 133MHz. Peak bandwidth was 1.06GB/s while CPU clocks were hitting 1GHz. Today, a modern processor from AMD or Intel is clocked between 3-4GHz, while modern RAM is running at 1066MHz (2133MHz effective for DDR3) — or around 10GB/sec peak. Meanwhile we’ve long since started adding multiple memory channels, brought the memory controller on die, and clocked it at full CPU speed as well.
ddr_memory_data_rate
The problem isn’t memory bandwidth — it’s memory latency and memory power consumption. As we’ve previously discussed, DDR4 actually moves the dial backwards as far as the former is concerned, while improving the latter only modestly. It now looks as though the first generation of DDR4 will have some profoundly terrible latency characteristics; Micron is selling DDR4-2133 timed at 15-15-15-50. For comparison, DDR3-2133 can be bought at 11-11-11-27 — and that’s not even highest-end premium RAM. This latency hit means DDR4 won’t actually match DDR3′s performance for quite some time, as shown here:

This is where the original graph does have a point — latency has only improved modestly over the years, and we’ll be using DDR4-3200 before we get back to DDR3-1600 latencies. That’s an obvious issue — but it’s actually not the problem that’s holding exascale back. The problem for exascale is that DRAM power consumption is currently much too high for an exascale system.
The current goal is to build an exascale supercomputer within a 20MW power envelope,sometime between 2018 and 2020. Exascale describes a system that has exaflops of processing power, and perhaps hundreds of petabytes of RAM (current systems max out at around 30 petaflops and only a couple of petabytes of RAM. If today’s best DDR3 were used for the first exascale systems, the DRAM alone would consume 54MW of power. Clearly massive improvements are needed. So how do we find them?

Reinvent the wheel — or iterate like crazy

There are two ways to attack this problem, and they both have their proponents. One method is to keep building on the existing approaches that have given us DDR4 and the Hybrid Memory Cube. It’s reasonably likely that we can squeeze a great deal of additional improvement out of the basic DRAM structure by stacking dies, further optimizing trace layouts, using through-silicon vias (TSVs), and adapting 3D designs. According to a recent research paper, this could cut the RAM power consumption of a 100-petabyte supercomputer from 52MW (assuming standard DDR3-1333) to well below 10MW depending on the precise details of the technology.
While 100PB  is just one tenth of the way to exascale, reducing the RAM’s power consumption by an order of magnitude is unquestionably on the right track.
DRAM-Types
The other, more profound challenge, is the idea of finding a complete DRAM replacement. You may have noticed that while we cover new approaches and alternatives to conventional storage technologies, virtually all the proposed methods address the shortcomings of NAND storage — not DRAM. There’s a good reason for that — DRAM has survived more than 40 years precisely because it’s been very, very hard to beat.
The argument for reinventing the wheel is anchored in concepts like memristorsMRAM,FeRAM, and a host of other potential next-generation technologies. Some of them have the potential to replace DRAM altogether, while others, like phase change memory, would be used as a further buffer between DRAM and NAND. The big-picture fact that Vice does get right is that discovering a new memory technology that was faster and lower power than DRAM really would change the fundamental nature of computing — over time.
It’s easy to forget that the trends we’re talking about today have literally been true for decades. 11 years ago, computer scientist David Patterson presented a paper entitledLatency Lags Bandwidth, in which he measured the improvements in bandwidth against data accesses across CPUs, DRAM, LAN, and hard drives (SSDs weren’t a thing at that time). What he found is summarized below:
Latency lags bandwidth
In every case — and in a remarkably consistent fashion — latency improved by 20-30% in the same time that it took bandwidth to double. This problem is one we’ve been dealing with for decades — it’s been addressed via branch prediction, instruction sets, and ever-expanding caches. It’s been observed that we add one layer of cache roughly every 10 years, and we’re on track to keep that with Intel’s 128MB EDRAM cache on certain Haswell processors.
A new main memory with even half standard DRAM latency would give programmers an opportunity to revisit decades of assumptions about how microprocessors should be built. A new RAM technology that cut main memory accesses by an order of magnitude would be reason enough to reevaluate the entire balance of resources on a microprocessor. If accessing main memory was as fast as accessing the CPUs cache, you might not needcache on the CPU die or package at all — or at least, you wouldn’t need anything beyond L1 and maybe a small L2.
How long will it will it take to find a technology so fundamentally different and better from anything we have today that we can do away with the DRAM latency and power consumption bottleneck? Given how such a fundamental breakthrough would be vital to our ability to reach exascale computing and beyond, though, I hope it’s soon.

Thursday, July 17, 2014

DRAM as Non Volatile Memory =>Longer Battery Life

If your cell phone has non removable batteries, its DRAM memory can be treated as as a battery backed up nonvolatile DRAM. This improve battery life time and speed of mobile devices "data committed to flash was reduced by about 40 percent." See more in the article below. 

Actually, these benefits should be achieved in any mobile device with an improved operating system.

"several innovations.
·         Quasi-NVRAM. They set aside a portion of system DRAM to act as a battery backed up nonvolatile DRAM.
·         Device driver. A new device driver and library that manage I/O between the qNVRAM and system flash memory.
·         Persistent Page Cache. A new data structure in SQLite using quasi-NVRAM to perform in-place updates to the database files. 
·         Relaxed data flushing. Absorbs repeated writes to table files to further reduce I/O."


Ron
Insightful, timely, and accurate semiconductor consulting.
Semiconductor information and news at - 
http://www.maltiel-consulting.com/




Summary: You still hear complaints about nonremovable batteries in mobile devices – mostly Apple – but there is an upside: the ability to eliminate performance overhead. Here's how.
By Robin Harris for Storage Bits | July 14, 2014 in

Three researchers, Hao Luo, Lei Tian and Hong Jiang of the University of Nebraska, asked a simple and seemingly obvious question. Since our mobile devices have non-removable batteries why don't we treat DRAM as if it were nonvolatile?
Their paper, qNVRAM: quasi Non-Volatile RAM for Low Overhead PersistencyEnforcement in Smartphones was presented at the Usenix HotStorage conference last month.

Background
Typically Android mobile devices rely on SQLite, a shared preference key value store or the filesystem API to save persistent data on local flash. These employ journaling or file-level double-writes to ensure persistency.
The problem is that these techniques require multiple writes to storage, incurring substantial system overhead in devices that are already performance and power constrained.
For example, they found that more than 75 percent of Twitter data was written for persistency reasons. Looking at a group of common mobile apps they found that anywhere from 37 percent to 78 percent of the data writes were for atomicity. From the paper:


Courtesy the authors.
Furthermore, it turns out that Android kernel reliability — where these data structures reside — is quite good, based on bug fixes and user support calls. They analyzed Android issue reports and found that only 10 reports or 0.05 percent of all 19,670 reported issues related to Android defects with unexpected or random power-off. That implies a small chance that unexpected power failure may occur.
The test
The researchers constructed a prototype test system with the with several innovations.
·         Quasi-NVRAM. They set aside a portion of system DRAM to act as a battery backed up nonvolatile DRAM.
·         Device driver. A new device driver and library that manage I/O between the qNVRAM and system flash memory.
·         Persistent Page Cache. A new data structure in SQLite using quasi-NVRAM to perform in-place updates to the database files. 
·         Relaxed data flushing. Absorbs repeated writes to table files to further reduce I/O.
Results
Implemented on an Android smartphone they found that
"...qNVRAM speeds up the insert, update and delete transactions by up to 16.33x, 15.86x and 15.76x respectively."

The Storage Bits take 
Furthermore, the amount of data committed to flash was reduced by about 40 percent. Given how common constant feed updates are on mobile devices, this is a significant result.
Some are miffed that many smartphones don't have easily removable batteries. This research shows the upside of such designs: all DRAM can be treated as NVRAM whether on Android or Apple's iOS.
Note that qNVRAM can't replace flash. DRAM is more power-hungry and costly than flash.
But research shows that by reducing the I/O overhead of the system with qNVRAM, significant gains in performance — and presumably battery life — can be achieved at very little cost. It also simplifies the problem of extending flash endurance.
It was obvious five years ago with the advent of non-removable batteries on phones and notebooks that engineers could take a new look at achieving persistency. Congratulations to the researchers for taking a rigorous approach to the problem.


Wednesday, July 9, 2014

Will NAND DIMMs take of?

Product like ULLtraDIMM (see article below) can take off  and become a major NAND SSD product line as SanDisk proves it in the market and Diablo Technologies license it to additional SSD suppliers.

More about ULLtraDIMM - Sandisk's future is far from ULLtraDIMM: Diablo tie-up holds promise Goldmine in the making for flash-DIMM server shop


This product is advancing memory system designs similar to the discussion in May 2012 blog - Apple NAND Storage in Upcoming MacBook Pros


Ron
Insightful, timely, and accurate semiconductor consulting.
Semiconductor information and news at - http://www.maltiel-consulting.com/




Will SanDisk Corp's New Product Revolutionize the Storage Industry?

In January, storage provider SanDisk  (NASDAQ: SNDK  ) announced ULLtraDIMM, a new form of flash storage that promises drastic speed increases for high-performance applications. SanDisk already has one big customer, IBM (NYSE: IBM  ) , for its ULLtraDIMMs, giving some early validation to this new technology. Let's take a closer look at ULLtraDIMMs and the effect they might have on the industry.

A bit of background
Storage, such as flash SSDs or hard disk drives, typically sits far away from the CPU and main memory. This means that there is latency due to the time it takes data to travel between the storage device and the CPU. For hard drives, this is not particularly relevant because their own internal latency is much greater than the transport latency. However, for flash, which is completely electronic and therefore much faster than hard disks, this becomes an issue.

The first approach to reducing this latency was to move flash from the drive and onto the PCIe communications bus, a step closer to the CPU. This provided an improvement, but it is not the optimal solution. PCIe still has latency compared to the memory bus, which sits right next to the CPU.

The new technology
Enter SanDisk's ULLtraDIMM, a flash storage that connects directly to the memory bus through the DIMM form factor, just like DRAM memory. Connecting to the memory bus gives ULLtraDIMMs a write latency of under 5 microseconds, compared to around 50 microseconds for PCIe SSDs. The shorter communication distance also means less power consumption, a crucial concern in densely packed enterprise servers.

Sandisk currently provides 200 GB and 400 GB ULLtraDIMMs, and users can add as many ULLtraDIMMs as they have memory slots. ULLtraDIMMs are implemented in a way that makes them look like a normal storage device to the operating system, but achieving this requires modifications to the BIOS which mean that ULLtraDIMMs are not yet plug-and-play devices.

Current and future applications
IBM has already signed up as a customer for Sandisk's ULLtraDIMMs. The enterprise giant has added some functionality to ULLtraDIMMs and rebranded them as eXFlash memory-channel storage. The new technology is available in IBM's X6 family of servers.

According to SanDisk and IBM, memory-channel storage will be useful for applications such as big data analytics, transactional databases, high-frequency trading, and virtualized environments. So far, it seems that IBM's X6 servers have been particularly popular with Wall Street firms. Speaking with EnterpriseTech, SanDisk senior director of marketing Brian Cox said, "we have all kinds of hedge funds begging us to deliver this."

What this means for the industry
It will take several more computer manufacturers to sign on in order to fully validate ULLtraDIMMs as a technology and to reveal the applications where it might best be used. SanDisk's management said that this will require evangelization and a surrounding ecosystem, and that they are continuing to invest in order to bring ULLtraDIMMs to wider adoption.  
Revenue from ULLtraDIMMs is estimated to be small in 2014. However, if ULLtraDIMMs catch on, competitors such as Micron  (NASDAQ: MU  ) (which is increasingly focusing on providing SSDs rather than selling chips as a commodity) will certainly follow with their own versions of memory bus flash. Currently, no other major flash provider is talking about this kind of technology; this means that SanDisk will have first-mover advantage for at least a few quarters.

It's also not clear yet to what extent memory-channel storage might displace PCIe or server-side flash. If the total cost of ownership for an ULLtraDIMM is comparable to or lower than current SSDs, the technology might eventually become the dominant form of flash storage. However, it is also possible that ULLtraDIMMs will complement rather than replace other forms of storage, much like flash itself did with hard disk drives in the enterprise space.

In conclusion
SanDisk has introduced a new form of flash storage called ULLtraDIMM, which sits right next to the CPU on the memory bus and provides much smaller latencies than existing solutions. In a partnership with IBM, ULLtraDIMMs have been shipping to enterprise customers, with expected applications such as high-frequency trading. While the technology certainly sounds promising and might be valuable for both SanDisk and IBM down the line, it's still too early to tell what effect this will have on the storage industry and enterprise computing in general. 
Warren Buffett: This new technology is a "real threat"At the recent Berkshire Hathaway annual meeting, Warren Buffett admitted this emerging technology is threatening his biggest cash-cow. While Buffett shakes in his billionaire-boots, only a few investors are embracing this new market which experts say will be worth over $2 trillion. Find out how you can cash in on this technology before the crowd catches on, by jumping onto one company that could get you the biggest piece of the action. Click here to access a FREE investor alert on the company we're calling the "brains behind" the technology.