Showing posts with label data center. Show all posts
Showing posts with label data center. Show all posts

Friday, November 22, 2013

Facebook: Flash Storage in Database

Facebook continue to propel flash NAND memory usage in server and storage through open source implementations. See below more details on Flash Based Database.
“With the advent of flash storage, we are starting to see newer applications that can access data quickly by managing their own dataset on flash instead of accessing data over a network. These new applications are using what we call an embedded database.
“… When database requests are frequently served from memory or from very fast flash storage, network latency can slow the query response time. Accessing the network within a data center can take about 50 microseconds, as can fast-flash access latency. This means that accessing data over a network could potentially be twice as slow as an application accessing data locally. “
Additional technical details at Under the Hood: Building and open-sourcing RocksDB


More about Facebook Propels SSD Flash Storage


Ron
Insightful, timely, and accurate semiconductor consulting.
Semiconductor information and news at - http://www.maltiel-consulting.com/



 

Facebook’s latest open source effort: a flash-powereddatabase called RocksDB

by Derrick Harris



SUMMARY:

Facebook has open sourced a new embedded database called RocksDB that’s meant to take advantage of all the performance flash has to offer, from right on the application server. It might be a sign of best practices to come.

Facebook is on an open source roll lately, and on Thursday announced its latest open source project — an embedded key-value store called RocksDB. The company uses it to power certain user-facing applications that would suffer too much from having to access an external database over the network and to eliminate the certain problems relating to non-fully utilized IO performance on flash storage devices.

Facebook database engineer Dhruba Borthakur describes the design of and rationale behind RocksDB in some detail in a blog post, but the biggest factor leading to its creation might be the emergence of relatively inexpensive flash storage cards for servers (or, in Facebook’s case, custom-built servers packed entirely with flash).

“With the advent of flash storage, we are starting to see newer applications that can access data quickly by managing their own dataset on flash instead of accessing data over a network. These new applications are using what we call an embedded database.

“… When database requests are frequently served from memory or from very fast flash storage, network latency can slow the query response time. Accessing the network within a data center can take about 50 microseconds, as can fast-flash access latency. This means that accessing data over a network could potentially be twice as slow as an application accessing data locally. “

RocksDB was designed with these new hardware realities in mind, so it can take full advantage of the IOPS potential of flash memory as well as the computing power of many-core servers, Borthakur explains. Facebook has posted the results of a benchmark test running on a Fusion-io-powered server on the RocksDB GitHub page, and claims it’s significantly faster than Google’s LevelDB embedded key-value store.

From a broader IT perspective, RocksDB signals that the shifts in storage and computing economics that made the big data movement possible are now making their way into web application development, albeit using a storage media most organizations would consider using for storing “big data.” Facebook is performance hungry, but it’s also cost-sensitive, and it wouldn’t be storing “close to a petabyte of data across different applications,” as Borthakur writes, if the cost to do so was out of control.


He offered a handful of application types an embedded database like RocksDB is suitable for, including:

1. A user-facing application that stores the viewing history and state of users of a website.

2. A spam-detection application that needs fast access.

3. A graph-search query that needs to scan a data set in realtime.

4. RocksDB can be used to cache data from Hadoop, thereby allowing an app to query Hadoop data in realtime.

5. A message-queue that supports a high number of inserts and deletes.

In fact, Facebook has been finding all sorts of new ways to utilize flash as stepping stone between slow disks on one hand and expensive-but-fast RAM on the other.

Facebook is no doubt an early adopter of flash-heavy application architectures, but it’s also probably serving as a guiding light for other companies and their developers who want to achieve Facebook-like performance. As flash prices continue to drop — and now that Amazon Web Services is offering a whole suite of flash-backed instances on EC2 (the prices of which should also drop) — it’s conceivable we’re approaching an era of ever-better web and mobile applications that communicate with the network and the hard drive as little as possible.

Thursday, October 10, 2013

Facebook Propels SSD Flash Storage

Facebook continues to advance the frontier of flash NAND SSD in servers and data centers. The article bellow discusses  Flashcache and other open source projects.

"internally developed caching software, called Flashcache, to more efficiently use the thousands of solid-state drives (SSDs) that the social networking giant deploys to store frequently consulted data.
The newly released Flashcache 3.0 is able to make better decisions about what data to cache, while reducing the amount of wear and tear on expensive flash disks."
Earlier in the year Facebook asked for the cheapest and slowest flash memory
and Seven Questions for Facebook Infrastructure Guru Frank Frankovsky


Ron
Insightful, timely, and accurate semiconductor consulting.
Semiconductor information and news at -
http://www.maltiel-consulting.com/





Facebook open-source cache squeezes more from flash disks

Facebook continues to push the boundaries of storage and server technology in order to more quickly serve its billion users, and the results are being offered as open-source technology that can also benefit other companies.
Recently, Facebook updated its internally developed caching software, called Flashcache, to more efficiently use the thousands of solid-state drives (SSDs) that the social networking giant deploys to store frequently consulted data.
The newly released Flashcache 3.0 is able to make better decisions about what data to cache, while reducing the amount of wear and tear on expensive flash disks.
“With these improvements, Flashcache has become a building block in the Facebook stack,” wrote Domas Mituzas, a Facebook database engineer who authored a blog post explaining the updates to the open-source software.
The work aims to improve overall Facebook performance without unduly driving up operating costs.
“While the cost per GB for flash is coming down, it’s still not where it needs to be,” Mituzas wrote. Given the premium prices commanded for SSDs, Facebook doesn’t want to wear out these disks too quickly. “SSDs have limited write cycles, so we have to make sure that we’re not writing too much.”

Other open-source projects

Flashcache is one of a number of software projects that Facebook originally developed in house that the company has also released as open source. Earlier this year, for instance, the company also released a virtual machine, called HipHop, that speeds the processing of PHP code.
The company hopes that other organizations could reuse such programs as HipHop and Flashcache and eventually contribute to their further development. Like other open-source caching software such as memcache and Redis, Flashcache can be used to speed the responsiveness of a heavily visited website or popular Web application.
Facebook originally created Flashcache to boost the responsiveness of the MySQL databases that store user data. The software can be loaded onto the Linux kernel as a module without making any changes to the kernel itself.
The idea behind Flashcache is to use SSDs to hold the material that is most requested by users. SSDs tend to be faster than traditional rotating platter hard drives, though they are also more expensive by the GB when compared to hard drives. So it would not be cost-effective for Facebook to store all of its data on SSDs, especially if the vast majority of Facebook user data is rarely consulted.

FACEBOOK
Facebook found hotspots in its cache, where frequently consulted data could cluster in small areas, causing bottlenecks.
Although designed to work with MySQL and the MySQL InnoDB database storage engine, Flashcache can be used as a general caching mechanism for Linux systems.

Flashcache can also speed times it takes to write data to disk, from the user’s perspective, by saving newly updated data on SSD first and then writing it to the hard drives later.
The updated Flashcache module improves performance in read-write distribution, cache eviction and write efficiency.
Analyzing Flashcache performance, Facebook had found that most of its caches have a small subset of data that is read much more frequently than most of the other data.
With the previous version of Flashcache, 50 percent of a cache’s contents accounted for 80 percent of disk operations. Such a concentration of frequently consulted material could cause performance bottlenecks.
To improve Flashcache’s read-write distribution, the engineers developed a number of techniques to automatically position the data so that cache reads are distributed more evenly across the SSD. Now 50 percent of the cache accounts for 50 percent of the disk operations.
To improve the process of determining which data to move off the cache, a process called cache eviction, Flashcache switched from using the FIFO (first in first out) algorithm—in which the oldest data in the cache is removed first to make room for new data—to a LRU (least recently used) algorithm, which discards the data that hasn’t been requested for the longest period of time.
Improvements were also made in write efficiency.
Previously the software would write to disk only when it had a certain amount of data that was ready to be written. This resulted in uneven performance across different caches, however. So, Facebook engineers developed an approach that would write the cached data to disk whenever a copy of that data was requested by a user, which resulted in a smoother flow of write operations.
Thanks to these improvements, the updated caching mechanism has an average hit rate—or information that is requested by users that resides in cache—of 80 percent, up from 60 percent in the previous version. This means more data is served more quickly.
Updating the software has also slashed server I/O (input/output) required to read data by 40 percent, and reduced the I/O required to write data by 75 percent. For a company that is running thousands of servers, such a reduction in traffic can help make more efficient use of servers and keep hardware costs manageable.

Monday, April 16, 2012

Intel, AMD, or ARM Servers?

The article below about low-power servers doesn't mention Intel's large growth in database servers.  Intel had more than $10 billion in revenues from ICs sold into data centers, including servers, storage products, and networking.

Considering the high power usage by databases, it is not surprising that Intel wants to dominate the low-power servers market.  It is likely that databases and cloud servers  (20% of sales, 3x PC segment growth) added to Intel's growth in 2011.


Ron Maltiel

 

 

Intel to Face Off Against AMD, ARM with New Low-Power Server



Intel may think extremely low-power servers are only a small part of the server market, but that isn't stopping the company from competing in this segment.

At its Intel Developer Forum in Beijing, the company said it will release a six-watt, dual-core processor known as Centerton, based on a dual-core Atom design, in the second half of this year.
Atom is Intel's low-power core design and the company has been pushing variants of it for everything from smartphones and tablets to set-top boxes and netbooks. Centerton will be a 32nm, dual-core system-on-chip (SoC) that will draw six watts of power. This will be a 64-bit chip with support for the large amounts of memory that many server applications require.

The microserver market is particularly interesting because of a theory that massive "scale out" deployments—typically large Web farms—would do better with a larger number of physical cores if those processors used much less power than traditional servers. SeaMicro was perhaps the biggest early advocate of this process, coming out with its SM10000 server using up to 512 Atom cores and its proprietary fabric for connecting multiple processors, along with the associated memory and input/output. Last year, it upped the ante by switching to 64-bit Atom N570s.

SeaMicro was acquired by AMD earlier this year, and the company is widely expected to start using that fabric with its own processors and offer the technology to its OEMs.

Meanwhile, a number of companies—notably Calxeda, with its EnergyCore ARM processor— have been talking about using multiple ARM-based cores to compete in the microserver market. The concept is good, and Calxeda says its processor can draw as little as 1.5 watts for a dual-core server, although it is 32-bit only. This, too, is expected to go into real production in the second half of this year.

ARM has announced a 64-bit architecture that many vendors are expected to embrace, but 64-bit ARM cores aren't expected to be available for volume production until 2014.

Intel also said it will be shipping a new version of its Xeon E3 processor based on the Ivy Bridge architecture, manufactured on a 22nm process using "tri-gate" transistors. These chips are aimed at slightly larger servers, usually a single socket 1 U rack or blade server that effectively is the equivalent of a high-end desktop. Currently, Intel offers 45- and 25-watt versions of the 32nm Sandy Bridge-based E3s, along with a 15-watt Pentium 350. It hasn't yet listed power requirements for the next generation E3s, but suggests the 22nm process will be more power efficient.

These should be competing more with AMD's recently announced Opteron 3200, which is meant to be a lower-power, low-price variant on the company's Opteron 4200 chip, based on the Bulldozer architecture. The four-core versions of these chips are rated at 45 watts.

Whether from Intel, AMD, or one of a variety of ARM providers, we're seeing more competition in the server market, particularly in the low-power segment. That's leading to lots of innovation and the potential for companies to save a lot on power bills.