A Blog by Jonathan Low

 

Dec 16, 2013

How to Manage for Efficiency As Amount of Internet User Data Far Surpasses User Numbers

For all of the warp speed efficiencies the digital era has brought to our lives, when it comes to data, we remain our old analog selves.

The amount of data being generated by and about users far surpasses the growth in users themselves. We are turning the cloud into a fusty old attic, spurred on by the fact that most of the storage cost is absorbed by the companies offering services rather than consumers.

It seems apparent that charging for these services will come some day, causing the most spectacularly gigantic intellectual yard sale in the history of all life forms. But not yet. Because the money to made from all those users still depends on getting them to spend as much time and income on the various services as possible. Which means offering inducements to keep them coming. Or, to be more accurate, eliminating any obstacle to the unfettered convenience of life on the net.

As a result, companies like Facebook must become ruthlessly efficient in deploying and using the systems installed to manage all of this dreck: grandchildren photos from Christmas' past; fraternity party videos unwisely uploaded to YouTube, mountains of internal memos, strategy papers and annual reports from enterprises who did not survive the passionate but ultimately vain efforts to salvage them. The list is as long as the human imagination is fertile. But the effort to keep up with the mass of data accumulating is both never-ending - and utterly crucial. JL

Pankaj Mehra reports in GigaOm:

The growth in analytics surpasses even the growth in Internet traffic. User data in Facebook’s warehouses grew 4,000 times between 2009 and 2013 even as the number of users grew 5.5 times from 200 million to 1,115 billion
In the Internet.org paper issued in September 2013, Facebook, Ericsson and Qualcomm noted that boosting datacenter efficiency is essential to bringing down costs and expanding Internet access across the globe. Offering many lessons for enterprises working to improve real-time responsiveness to compete in our information economy without breaking the bank, the paper noted that integrated efforts have the potential to increase datacenter efficiency by an astounding 100x over the next five to 10 years.
This achievement requires two key innovations outlined by Facebook. First of all, the underlying costs of delivering data need to be reduced. Secondly, we need to streamline data use by building more efficient applications. It is no coincidence that flash memory in the datacenter is playing an important role in both areas.

The Need for Efficiency: Very Big Data

embedded-infographic-600-logo
It is clear that efficiency is essential when you examine the volume and velocity of the data we now create daily. In the Internet.org paper, Facebook notes that every day, more than 4.75 billion content items are shared on Facebook. Intel’s “Internet Minute” infographic notes that there are 1.3 million YouTube views every 60 seconds. And Cisco predicts that today’s astounding Internet traffic will grow 4.5x by 2017.
Data about data is now frequently shared online, ironically quantifying something IT professionals know very well: At work and at play, we generate huge amounts of information daily.
Facebook notes that analytics-driven personalization of each user’s experience required processing “tens of thousands of pieces of data, spread across hundreds of different servers, in a few tens of microseconds.” The complexity meant Facebook was processing 1000 times more traffic inside its data centers as information traveled between servers and clusters, than the traffic into and out of its facilities. Cisco predicts that data center traffic will grow 3x, reaching 7.7 zettabytes by 2017.
While few enterprises are faced with the challenge of processing this much information today, Facebook’s solutions for managing data at scale present important lessons for companies of all sizes, on boosting IT efficiency while guarding the bottom line.

Saving millions while scaling services

Flash memory in data center servers is quickly becoming critical to achieving efficiency in scalable IT. Facebook and other leading hyperscale and enterprise companies have been using flash for processing more transactions using fewer servers, thereby reducing operating costs. Besides being a thousand times faster than disk drives, modern flash uses much less energy than yesterday’s spinning disk storage, or even the DRAM memory in servers. Flash creates less heat, and thus requires far less energy to cool. It also requires less space in the datacenter compared to racks of disks, for even greater savings and efficiency.
All three Facebook data centers easily beat the industry gold standard of 1.5 Power Usage Effectiveness (PUE), with Facebook reporting that its PUEs range from 1.04 to 1.09. In a recent report, Businessweek noted that Facebook’s Sweden data center is, “By all public measures…the most energy-efficient computing facility ever built.”
As Facebook notes in its Internet.org paper, “Making the delivery of data faster and more efficient is crucial to ensuring access to more people in more diverse regions of the world.” Beyond the philanthropic appeal of expanding internet access and improving quality of life for billions, these lower-cost infrastructures create new markets for businesses of all sizes, as more and more consumers are able to participate in the global information economy.

The importance of optimizing applications for efficiency


While Facebook’s attention is focused on reducing the amount of data used by devices on the consumer side of the Internet, there are also interesting lessons for enterprises to learn on how data can be streamlined on the server side. For example, flash memory finally makes it possible to move beyond disk-era application code. By streamlining software code and removing the unnecessary layers of complexity associated with archaic storage architectures, applications perform faster while using less data. For instance, data management systems can shortcut costly address translation layers, avoid costly double buffering, and eliminate locking and resource overheads associated with input-output operations that remain outstanding for long periods of time.
To break free from the limitations of disk-era architectures, new open source APIs available on Github make it possible for enterprise software developers to add flash-aware operations to their applications. These APIs speed up application performance by optimizing for all-electronic solid state flash memory’s efficient protocols and processes, rather than assuming data access paradigms best suited for mechanical disk drive heads moving across spinning platters. Flash APIs can lower capital expenditures by helping flash last longer, because they reduce the number of data volume write operations in half. Flash-aware APIs further simplify development, helping developers to get applications to market faster. Open-source data-tier application innovators MariaDB and Percona already exploit Open NVM Atomic Writes API in their code.

Capitalizing on Strategic Change in IT

As enterprises balance the need to scale with the need to manage costs, Facebook’s lessons are increasingly important for companies operating in the information economy. Across open source and traditional computing, flexible solutions optimized with intelligence will be critical assets for tomorrow’s business leaders.
I personally see this transition as a shift in emphasis from the technology to the information in IT as information intelligence and analytics finally take the lead in a story that has long been overshadowed by hardware and software technology complexities.
With the democratization of data being led by Facebook and the Open Compute project, many leaders are already working to understand and adopt these key practices to ensure they remain competitive amidst ongoing IT industry transitions. At OCPSummitV, it will be interesting to see what breakthroughs continue to add uncommon value to common standards.

0 comments:

Post a Comment