Cloud Data Centers: Bigger, Faster, Cheaper!
SEE LAST PAGE OF THIS REPORT Paul Sagawa / Artur Pylak
FOR IMPORTANT DISCLOSURES 203.901.1633 /.1634
sagawa@ / firstname.lastname@example.org
July 30, 2012
Cloud Data Centers: Bigger, Faster, Cheaper!
- Data processing architecture has swung like a pendulum, from a centralized mainframe era, to decentralized PCs, and now back to the cloud. Modern cloud data centers feature distributed architectures that share computing and storage resources across widely dispersed locations at nearly limitless scale, delivering exceptional performance for extraordinarily low costs. The considerable advantages stem from four major factors: Scale economies in purchasing technology and hiring talent facilitate cost and skill advantages. Scope puts servers much closer to users, improving response time and cutting costs. Innovative data center design can dramatically reduce capital and operating costs. Finally, software innovation led by Google, and followed in the open source Hadoop, allows applications to work on processor arrays and data sets of nearly limitless size. Against these advantages, enterprises will weigh transition costs, slowly receding security concerns and application-specific idiosyncrasies to determine which applications should be shifted to the cloud and how quickly. As the change proceeds, private data center investment will wane and traditional IT will commoditize, with the leading cloud providers, IT consultants and their customers the big winners.
- Data processing has shifted from the decentralization of the PC era, back to centralized data centers, and ultimately, the cloud. In the mainframe era, computing was expensive, scarce and rationed by central IT departments. Minicomputers reduced cost and complexity, making it reasonable to dedicate computing to departmental groups, a trend that was punctuated by the PC ethos of pushing processing right to the desktop. Improving networks started the pendulum back with the rise of client/server approaches that augmented local processing with shared resources, a concept maximized by virtualization, which gives users a private slice of a shared data center. Cloud computing takes this to the maximum, allowing minimally capable access devices to harness the near limitless processing and storage available on the web.
- Key innovations – i.e. ubiquitous access, distributed architecture, MapReduce, portable platforms, etc. – mitigate disadvantages and amplify advantages of centralization. Cloud processing is only possible due to innovations that have risen over the past decade. Fast, ubiquitous internet access is now a given; distributed data center architecture puts servers close-by; MapReduce processing keeps users and applications distinct while parsing demands in parallel to processors and storage; and portable platforms are cheap and easy to use. The access, performance and ease of use burdens of centralized processing are largely eliminated, while the cost and control benefits are magnified.
- Cloud hosts have substantial scale advantages in constructing data centers, connecting them and building world-class expertise in running them. The advantages of cloud architecture start from economies of scale. The top cloud hosts have become the largest IT buyers in the world, leveraging their own data center needs as well those of commercial customers, and as such have leverage to exact vendor concessions for hardware, software and communications unavailable to most enterprises. These companies can also justify building world-class IT design and management expertise.
- Distributed data centers put servers closer to users improving application response times and reducing communications costs. The distance between a user and a server has a dramatic impact on performance – each router hop geometrically adds latency and error, favoring distributed data center architecture, as exemplified by top cloud players. Proximity also reduces communications costs and facilitates increasingly mobile workers.
- Leading edge data center designs cut power needs and costs dramatically, while minimizing equipment and facilities spending. With technology costs falling, electricity has become a major expense for data center operators. The top cloud players, such as Google and Microsoft, are achieving extraordinary power efficiency standards impossible for private data centers to even approach. These designs also employ commodity hardware and open source software customized to their own proprietary specs, low cost technology choices unavailable to smaller and less sophisticated buyers, deployed to maximum efficiency.
- Software innovations pioneered by Google and inherent in open source solutions make cloud data centers limitlessly scalable and unusually efficient. Google’s search franchise stems from MapReduce, a revolutionary way of breaking application tasks to small pieces allocated evenly across a huge array of processors sharing an equally huge bank of storage. Thus, IT can be used as efficiently as possible, with the possibility of scaling from a single core to many thousands as needed. This functionality is mimicked in the open source standard Hadoop, which forms the basis for most cloud processing architectures. This technology amplifies the benefits of scale and scope for the cloud relative to smaller data centers.
- Migration of enterprise apps to the cloud will be slow but substantial, proceeding as transition costs and security concerns are addressed. While cloud providers can offer dramatic cost and performance improvements vs. private data centers, transition costs can be significant, potentially requiring a change from existing applications and/or infrastructure software. Cloud providers are addressing the security issues that have been the biggest concern adoption, but many enterprises will require a longer track record before making the jump.
- The biggest and most technically sophisticated hosts will dominate, commoditizing traditional IT, but providing opportunities for IT consultants. Companies with big, consumer-driven web franchises – AKA Google, Amazon, Facebook and Microsoft – start the cloud hosting game with huge scale and scope advantages in enterprise hosting. Most hardware vendors will be commoditized as demand shifts from private to cloud data centers, while many software vendors struggle to transition, even as the cloud opens cheaper and better alternatives for their customers. The exceptions will be component hardware (e.g. disk drives and processors), SaaS software, and IT consulting.
Straight, No Chaser
Through history, data processing has moved like a pendulum – the first computers were single user devices devoted to specific applications. The advent of batch processing allowed computers to be shared, and the 1964 launch of IBM’s 360 architecture popularized the time sharing operating system, bringing in the era of the central mainframe. By the ‘70’s, the pendulum began to swing back, as users and departments grew frustrated with competing for time on mainframes and began buying relatively inexpensive minicomputers to fill their needs. As microprocessor prices continued to fall, smaller and smaller computers became practical, eventually ushering in the PC era with the 1982 introduction of the DOS/x86 PC.
By the ‘90’s, the rise of local area networks began to facilitate more equitable resource sharing, and the pendulum began to swing back with the rise of Client-Server architecture. In this era, departmental clusters of processing heavy servers and storage supported the relatively light capabilities of PCs on desktops and in laptop bags. A decade ago, Client-Server architecture morphed again, with commercial virtual machine technology giving IT managers a tool to allocate data center computing resources to users and applications in a seamless and efficient way, and taking the pendulum further back toward centralization.
For enterprise IT, the cloud is the next big thing, completing the round trip back to centralization and taking the concept even further, by sharing resources across multiple organizations. The move to the cloud is facilitated by industry innovations that are erasing the traditional drawbacks of a centralized approach. The ubiquitous availability of fast internet connectivity is the first key, making cloud-based applications nearly as available as those running directly on a user’s device. Second, modern cloud services distribute their data center resources geographically, putting servers closer to users and minimizing the performance delays caused by latency and error that builds geometrically as distances increase on the Internet. Third, virtualization gives users the appearance and performance of a dedicated machine, requiring little of the user and seamlessly allocating resources to their needs. Finally, devices are changing too, with relatively inexpensive, portable and cloud-friendly tablets beginning to crowd out the more generously spec’ed laptops and desktops of old.
The benefits of cloud computing derive from four major factors: Scale makes the cloud cheaper, geographic coverage speeds response times, leading edge design takes cloud cost and performance advantages even further, while proprietary software architectures that break applications into discrete tasks that can be solved in parallel by as many individual processor cores as necessary allows massive computing firepower to be brought to bear where ever needed, with extraordinary utilization of capacity. It also follows that the most successful cloud hosts – Amazon, Microsoft and Google – carry huge data processing requirements for their own successful consumer internet businesses. The investment in scale to support these massive franchises, and the expertise gained in building them, goes a long way in establishing sustainable advantage.
We expect that the migration of enterprise applications onto these emerging computing utilities to happen slowly, as IT managers grapple with transition costs, prioritize applications, and resolve lingering concerns over security and reliability. However, the most appropriate applications for the cloud, involving either broad, flexible access needs (e.g. CRM) or sheer processing horsepower (e.g. business intelligence/big data), tend to be those that are driving growth in need for data center capacity. As such, we see incremental IT demand shifting sharply away from private data centers, as the relentless growth of cloud data centers becomes the primary market. This is a profound change for IT hardware vendors accustomed to selling high margin, value-added solutions to private data center customers. Cloud operators do not buy those boxes – they buy processor chips and disk drives directly from component manufacturers and have them installed on boards to their specifications by contract manufacturers. Nor do cloud operators buy commercial infrastructure software – they write it themselves or customize open source solutions. Even applications vendors should be worried, as the cloud revolution is a natural break point for long standing customers to reconsider their commitments to expensive to maintain legacy programs.
Beyond the biggest and most sophisticated hosts – which begin with Google, Microsoft and Amazon in some order, with Facebook an almost inevitable entrant down the line – commodity component plays (Intel, ARM, Seagate, Western Digital, and others), IT consultants (IBM, Accenture, and others) and SaaS application vendors (Salesforce.com, NetSuite, and others) should be the biggest commercial beneficiaries of the cloud movement. However, the biggest winners will the enterprise customers that best use the cost and performance advantages of the cloud to their own benefit.
Exh 1: Major Eras in Computing, 1940s to Present
What Goes Around, Comes Around
The first computers were designed to address specific programs, requiring that the vacuum tubes and wires be reconfigured if the flow of calculations were to be altered. While the technology of the day meant that these devices were size of a room, their lack of programmability made them singular in their purpose. By the ‘50’s, programmability based on magnetically stored instructions became common as computers became commercial devices, allowing for batch processing of various computation jobs in sequence, a crude form of resource sharing. With the shift from vacuum tubes to transistors and on to integrated circuits, came the microprocessor and the 1964 launch of the IBM System/360 architecture which would dominate computing for two decades. The mainframe era ushered in real time applications, with high priority users transacting with the computer on dedicated terminals rather than waiting for their batch job to run overnight. However access to the corporate mainframe was typically tightly controlled, allocated to specific departments, applications and users on a top down basis (Exhibit 1).
The ‘70’s saw the rise of minicomputers, shared computers inexpensive enough to be justified by smaller constituencies. This development began a decentralization trend, as departments within larger organizations began to deploy minicomputers to move their applications from the crowded corporate mainframe. With the ‘80’s, the pendulum swung even further to decentralization with the dawn of the PC era. PCs were cheap enough that, eventually, almost everyone in an enterprise might have one of their own. The benefits were clear: no one would have to wait their turn for computer time and computing could be turned toward the needs of ordinary work life. Power to the people.
The advent of local area networks started the pendulum back in the other direction. Even as PCs grew more powerful and users found profound new applications to enhance their productivity, sharing data and coordinating applications across populations of employees were difficult in a sea of independent workstations. Client/Server architecture augmented desktop PCs with beefier counterparts that could provide processing and storage for shared files and applications. As local networks extended into the Internet, the Client/Server model followed, with browsers evolving as tools for PC “clients” to access web site servers. With this, the voracious appetite for more and more desktop processing power began to wane.
The concept of Client/Server was extended with the commercial introduction of virtual machines. New applications could be launched within logical partitions on a large server cluster that behaved as if it were a separate distinct machine. Users could be supported with software that appeared to be running on their local computer, but instead were resident on the server, allowing applications to be supported and updated centrally, while reducing the need for expensive power on the desktop. These changes saved money and time by leveraging the relatively cheaper processing power of central servers, forestalling user equipment upgrades and bringing software under tighter control. Gartner estimates that more than 40% of all x86 server loads in private data centers are now running on virtualized machines, with projections that could rise to more than 75% in a couple of years (Exhibit 2).
Exh 2: Percentage of x86-Architecture Workloads Running Virtual Machines
Public cloud data centers take the virtualization idea a giant step further (Exhibit 3). Unlike private data centers, public cloud data centers have been designed from the ground up to be infinitely scalable and seamlessly distributed to many distinct physical locations. The resources of these massive networks of servers and storage can be allocated on the fly, keeping users and applications separate while parsing their demands in parallel to as much capacity as is needed. Users with minimally capable access devices can marshal almost limitless power via the Internet, conceptually bringing us full circle to the IBM 360 mainframe and its dumb terminals.
Exh 3: Virtualization to Public Cloud Road Map
Why are we here?
Public cloud computing is only possible because of a set of related, but still distinct innovations that have played out over the past decade. Fast internet access is nearly ubiquitous in the US today, with wireless 3G, 4G, and WiFi networks augmenting fixed broadband. Employees and business partners can use the Internet to log in to enterprise applications, cutting the tether to private network connections at workplace workstations. We expect broadband, and particularly the wireless variants, to continue to get faster and more completely available as time goes on (Exhibit 4).
At the same time, web-based businesses have pushed their computing resources closer to their users, breaking up data centers and distributing processors and storage to geographically dispersed facilities. This approach was pioneered by content delivery network leader Akamai in the late ‘90’s as a means to avoid the delays caused by network congestion. The internet works by forwarding data packets from router to router like a bucket brigade between user and server. Each router along the way adds its own latency and error, with a geometric cumulative effect on the delay and response time experienced by the user. Thus, putting the server as close as possible to the user has a dramatic effect on perceived performance. Cloud computing companies, most of them with substantial consumer web franchises of their own to support, ran hard with this concept and have extensive networks of distributed data centers from which to deliver services to users.
Exh 4: Wireless and Wireline Advances, 2000-present
However, distributed data centers created a problem. Traditional data processing architecture required that application tasks be dedicated to finite and predetermined clusters of server processors. Splitting huge central server locations into many, smaller locations would have required dedicating programs to specific sites with no way to balance loads between facilities, meaning that at any given moment some locations would be overtaxed while others lay fallow. Google solved this problem with a program architecture called MapReduce, which broke application computations into small tasks that could be “mapped” out to many processors in parallel, with the results “reduced” back to a single output (Exhibit 5). This parallelism allowed lightning fast execution, limitless scalability and balanced resource utilization across the entire network of distributed data center resources. Influential technical white papers written on MapReduce and submitted to the technology community resulted in free open source versions of the technology, named “Hadoop” after the stuffed elephant toy of an early project leader’s young son. The widespread use of Hadoop in distributed data center networks is a primary difference between the architecture of the cloud and traditional private data centers.
Finally, the rise of smartphones and tablets is beginning to foster workforce mobility as employees demand access to their desktop applications from their personal devices. The adoption of these lean access platforms into enterprises as replacements for aging PCs will play to the benefit of cloud-based architectures that do not presume the availability of local processing or storage, and that accommodate the natural portability of these devices.
Exh 5: Mapreduce/Hadoop Parallel Computing Schematic
In data centers, bigger is better. Big means you can buy in bulk – processors, disk drives, switches and fiber links. Big means you can get great deals on real estate and often, lock up the most advantageous locations in the process. Big means you can justify hiring staffs of Ph.D.s to design your data centers, write customized code, maintain your software, and address customer needs, so you don’t need to rely on expensive value-added services from your vendors while providing world-class IT support to your customers. Across the cost categories for private data centers, public cloud operators have significant advantages in all of them.
For example, Google buys hundreds of thousands of server chips every year directly from Intel and has them mounted on to blades of their own design by Asian contract manufacturing shops. They buy disk drives directly from Seagate and do the same. Standardized processor and storage boards are then mounted in racks in the many Google data center sites, running with Google proprietary software. No value-added bells and whistles from systems vendors – layers of security, management software, etc. – to add bloat and cost. No expensive maintenance contracts, systems support or incompatibilities between vendors. Just the best price/performance hardware, with best-in-class homegrown systems, supported by crack IT professionals. More or less, this is how Amazon, Microsoft, and Facebook run their data centers as well, and it is much, much, MUCH less expensive per unit of computing or storage than the typical private data center.
Getting to the scale of a Google or an Amazon is extraordinarily expensive. Google’s total property, plant and equipment was $9.6B at the end of 2011, almost all of it driven by data center assets (Exhibit 6). Capital spending was $3.4B for the year, also almost all driven by data center expansion (Exhibit 7). Microsoft, which may have more non-data center facilities, is just behind at $8.2B on $2.4B in capex. Amazon is no piker with $4.4B in PPE and $1.8B in 2011 capital spending, while new comer Facebook reports PPE of $1.9B as of March, up more than $1B since the beginning of 2011. It is important to note, that each of these companies must invest in IT to support rapidly growing consumer franchises – search, email, e-commerce, social networking, media streaming, etc.. Commercial cloud services are side line of sorts, a way to leverage these massive distributed data center assets into an additional revenue stream (Exhibit 8). Given this, it may be difficult for would-be cloud hosting competitors to achieve competitive costs without a similarly massive anchor tenant, while private data centers have no chance at all.
Exh 6: Plant, Property, and Equipment of Major Data Center Players
Exh 7: Capital Expenditures of Major Data Center Players
Exh 8: Global Cloud Infrastructure Services Forecast, 2010-2016 CAGR: 41.7%
Coast to Coast and Around the World
In a world of increasing employee mobility, the Internet extends the reach of networked data centers into nearly every corner of the earth’s surface. However, proximity still counts for something. On the internet, distance relates to router hops, with each router stopping to read each data packet before sending it down the line to the next router on the bucket brigade. The more hops, the more times the data stops to be read and processed, adding latency with each queue. Of course, sometimes a router’s queue for processing is already full, and a packet is lost. In these cases, the receiving computer will notice the missing packet and send message back down the line to resend the entire transmission. The effect of latency and error is compounded on a geometric basis with increases in the number of router hops between the user and the server, to the extent that Akamai estimates that a 4GB file sent from a server 1000 miles away could take more than 12 times longer to download than the same file sent locally (Exhibit 9).
This has a dramatic effect on the perceived performance of applications – even a fraction of a second of unanticipated delay is noticeable and, if commonplace, frustrating. Distributed cloud data centers are designed to flexibly assign processing tasks to the locations closest to the user, minimizing these delays (Exhibit 10). Of course, large distributed data centers will have more sites and greater geographic coverage adding real performance advantages to applications with geographically diverse or mobile users. Scope can also benefit fault tolerance, leaving the full system less vulnerable to failure in any particular processing location.
Exh 9: Effect of Distance on Content Delivery Performance
Exh 10: Known US Data Center Locations of Apple, Amazon, Facebook, Google, and Microsoft
T’aint Whatcha Do, It’s the Way Thatcha Do it
The mainframe era had mandated expensive computer rooms with raised floors and precise climate control systems. The PC era first saw boxes with their own self-contained cooling fans, morphing into server “blades” that can be fit in racks, increasing density to save space, but increasing the burden for dissipating heat. Meanwhile, storage, which had been deployed as dedicated disk drives integrated into the box servers, was broken out into specialized systems in racks of their own, networked back to the blade servers as a shared resource, with a specialized storage server to manage the process. This is, more or less, the typical private data center design, and the bread and butter products for the IT vendors that sell into that market.
Meanwhile, Google had its own ideas. Servers didn’t need to be quite as dense, as electricity costs were a bigger part of data center expenses than floor space. Nonetheless, Google saw no need to pay for extraneous anything – bells, whistles, even sheet metal – and custom designed their own “skinless” server to be manufactured to their own specifications with 30% less materials costs than typical blade servers. Notably, Google’s servers include disk drives on the same motherboard as the processor. This design works because of software that splits both processing and storage into small tasks that can be mapped out to widely dispersed resources, but reduced back to their original form quickly and as needed. The proximity of processing and storage maintains an intrinsic balance between the two, while shaving valuable time and eliminating a separate storage system as an expensive and vulnerable point of failure. Google’s whole system works on redundancy – data is replicated three times over and if a drive fails, Google replaces it and moves on. This approach has clear advantages on costs – Google’s cost per server is likely at most several hundred dollars versus over $1000 for an entry level blade server (Exhibit 11).
In recent years, Google has taken its modular approach even further – standardized server/storage boards fit into standardized racks, which fit into standardized containers that can be quickly and cheaply deployed to data center facilities all over the globe. Older hardware with less processing power is first repurposed to less computationally intensive business areas, such as Gmail or Picassa, but is eventually taken out of service and recycled.
Exh 11: Server / Component Cost Comparison
Energy use and cooling are other considerations data centers take seriously. Most data center operators are concerned with Power Usage Effectiveness (PUE), which is total facility power / IT equipment power. A PUE of 2.0 means that for every watt of IT power consumed, an additional watt is consumed to cool and distribute power to the IT equipment. Though an ideal PUE would be 1.0, many data centers are still far from it. According to a survey of 500 data centers by the Uptime Institute in 2011, the average PUE was 1.8. Most recent figures peg Amazon at 1.45, Facebook at 1.5, and Microsoft at 1.25. Google leads at 1.13 due to its own architectural innovations driving the numerator lower (Exhibit 12). Conserving power is done a number of ways. Choosing a location in a cool climate is ideal since it reduces cooling costs, but not practical given the goal of distributing data centers geographically to reduce latency between the centers and users. Innovations resulting in efficiency by reducing energy for cooling include operating data centers at higher temperatures/higher humidity, managing airflow, and using recycled water for cooling. Power is further saved by optimizing power distribution and reducing power conversion steps. Google is known to include battery units in each of its servers to provide for uninterruptible power supply and has taken other steps to reduce line losses.
Exh 12: Calculating Power Usage Effectiveness (PUE)
While Google is the undisputed leader in the re-imagining of the data center, other major cloud operators – e.g. Amazon, Microsoft, and Facebook – are known to be investing in the same direction, and gaining some of the same advantages over private data centers. Nonetheless, Google itself has over 300 patents related to data center design and perhaps more tied to entities like “Exaflop LLC,” which is likely one of several LLCs set up so Google could secure patents for its inventions under radar. According to the US Patent Office, Exaflop holds 27 patents mostly around data center cooling and power management (Exhibit 13). Some of the major patents held by Google include “large scale data processing in a distributed and parallel processing environment,” “Container-based data center,” “Motherboards with integrated cooling,” and “Apparatus and method for data warehousing.” It would seem that the gap between private and the cloud may continue to widen.
Exh 13: Data Center Patents Assigned to Exaflop LLC (Mountain View, CA), a.k.a. Google, Last 12 Months
The Secret Sauce
Google’s other Hall of Fame contribution to the history of data processing goes hand in glove with its data center design. Google’s MapReduce technology takes computational problems and breaks them into smaller sub-problems that can be distributed or “mapped” to individual processors working in parallel. The results of these calculations are then collated or “reduced” to generate a solution. This technique allows for any problem that CAN be logically broken into pieces to be solved as quickly and as cheaply as possible, with the potential of applying a limitless amount of firepower to solve unusually complicated problems, such as regenerating Google’s index of the entire Internet. Google maintains the specifics of its MapReduce and related software technologies as state secrets, but shared the basics in a series of technical white papers that inspired the creation of the open source technology Hadoop that is implemented by Google’s cloud competitors.
MapReduce is not a panacea, and there are very structured, transactional applications that are not well served by its indeterminate nature, but it is a revolutionary tool for a wide range of important applications, and in particular, the “big data” problems that have become central focuses for many enterprises. For enterprises looking to transition applications to the cloud, MapReduce and Hadoop offer horsepower to solve problems that cannot be addressed within a private data center and computational speed on appropriate applications that cannot be matched. While legacy applications ported to a cloud environment are not designed to make use of this technology, we expect new commercial applications will be developed on a hosted SaaS model that will make it sing.
Watching Paint Dry
We believe that the scale, scope and architectural advantages of the leading cloud data center operators are so profound that private solutions will eventually be entirely uncompetitive on both cost and performance. That said, there are considerable impediments to a speedy transition. First, it is expensive to move – at the very least programs and data have to be carefully ported to the new environment, and users trained on any changes in the way they are used. For many applications, it would be better or even necessary to scrap the legacy code and start fresh, an even more daunting proposition. IT managers will need to be very confident of the benefits before taking on these costs. Moreover, the public cloud raises suspicions of security and reliability, with stories of hacks and outages considerable obstacles to rapid adoption. To this end, commercial cloud operations are investing to address these concerns, which should not be an intrinsic disadvantage of cloud hosted services.
While cloud operators work to mitigate transaction costs, demonstrate more robust security and reliability, and nurture purpose-built hosted applications that make best use of the strengths of the distributed cloud data center architecture, enterprises are slowly dipping their toes in the water. Periodic surveys of IT managers and CIOs have seen a transition to the cloud moving steadily to the top of the priority list. Software development is one of the most active areas of enterprise engagement in the cloud, taking advantage of the scalability and easy provisioning of cloud services. Broadly accessed and non-transactional applications like CRM are also early targets for the cloud, with SaaS leader Salesforce.com a pioneer (Exhibit 14).
Exh 14: Total Software Revenue Forecast for Cloud Delivery Within Enterprise Application Software Markets, 2007-2015
We also believe “big data” applications, like analyzing web logs, pattern searching, or document sorting and indexing, will be easy candidates for transition to the cloud. It is important to note that these are amongst the fastest growing and most resource intensive applications in the enterprise market, and that moving them to the cloud will remove the need for many organizations to continue to invest in internal data center capacity. With this, we expect demand for traditional IT to soon wane, as cloud data centers and their commoditized approach to data center hardware and software to become the primary industry growth driver (Exhibit 15).
Exh 15: Reasons for Transitioning from On-Premises to the Cloud, 2011
Winners and Losers
The advantages of the major cloud hosts: Google, Amazon, Microsoft, Facebook are quite substantial given the high upfront costs of building data centers and the years of experience and lead time it takes to optimize a large data center operation. It’s difficult to imagine new small hosts will emerge, let alone be able to compete with Google. Aside from its head start and lead in data center architecture, Google’s mega data centers are substantial investments estimated to have cost over $600M apiece. Given the fast pace of technological change and continued investment required, $600M is a big price tag, particularly when you need to buy more than one. Smaller data center operators like Rackspace and Saavis may be attractive acquisitions for the many traditional IT names looking to make a play in the business. Perhaps some of the biggest winners are IT consultants that will help companies saddled with code from legacy systems migrate to new cloud based platforms and applications. Accenture and IBM are well positioned in this regard, and perhaps Rackspace.. Companies like Infosys and WiPro may also benefit as companies saddled with legacy code will need expertise to migrate over to new systems (Exhibit 16).
Other winners include could applications and Software as a Service (SaaS) companies like Salesforce.com and NetSuite. It is also likely new purpose built cloud application vendors will emerge also benefiting from the paradigm shift to the cloud. Finally, we see companies playing in cheap component hardware also well positioned as big data center companies turn to contract manufacturers to build customized hardware at scale. This would include companies that sell commoditized semis, storage, networking equipment – e.g. Intel, AMD, Seagate, and Western Digital – as well as commodity distributors – e.g. Arrow, Avnet, Tech Data, and Ingram Micro.
The losers in the big data shift are traditional IT players as well as premium hardware makers. Old line software companies like Oracle and CA will struggle to transition to the cloud. Traditional hardware names like Dell, HP, and EMC are notable losers as each is overly exposed to the old paradigm and their efforts to the play the cloud have come late.
Exh 16: Winners and Losers