December 20, 2010
Ahead in the Cloud
Before the word “cloud” had escaped the lips of those information technologists who would create it, the rainmakers at UCSB’s Department of Computer Science were helping to shape it.
Whether migrating services to distant computer clusters, spinning out scholars to found startups like RightScale or Eucalyptus Systems (or bolstering nascent heavyweights like Google), working on grid servers, helping design the Alexandria Digital Library or theorizing about the next generation, UC Santa Barbara computer scientists have had impact on the cloud.
Now that the cloud metaphor has established itself in the popular mind—even as its exact definition bedevils the IT crowd and its promise bewitches investors—the university’s impact is more recognized and celebrated.
In the field itself, UCSB computer scientists have a strong tradition in both distributed systems—parceling out connected resources among disparate machines connected in a network—and database management.
“Not many places have these two disciplines so close,” says Amr El Abbadi, chair of the Department of Computer Science. “We actually have the same faculty working together on distributed systems and on databases, and together this is the magic that makes the cloud exciting—and makes it succeed.”
“We think of the cloud as the next level of distributed systems,” says Professor Chandra Krintz, whose own research makes programming on the cloud easier.
“From a research perspective,” reflects Professor Divyakant Agrawal, “if you think about people working in the area of networking, people like us who have been working on databases and distributed systems, and such, this was the goal.”
At its most basic, the cloud means moving computer hardware and software somewhere else, and being able to easily manage them remotely. Anyone with a Hotmail or Gmail e-mail account—which means logging into a server somewhere else to operate the account—has tasted the cloud. “Although the term cloud may be new,” observes El Abbadi, “we have been living in this world for a while.”
A second inroad came through software as a service, programs reached and run over the Web, say a Google application that replaces the latest iteration of a Microsoft product that would be downloaded onto the user’s own computer.
“We’ve really got a good group of people here to solve these really complex questions in distributed systems. We’re on the cutting edge of what industry wants to take forward.”
Pointing to local software-as-a-service companies like AppFolio and Citrix Online, El Abbadi explains, “You go to them to provide a service. I don’t have it here.”
So-called SaaS has been followed by “platform as a service” (exemplified by Krintz’s Research on Adaptive Compilation Environments (RACE) Laboratory’s work on AppScale, which creates Google applications in the cloud) and “infrastructure as a service” (as seen with Eucalyptus, which allows easy connection to the public cloud).
System administrators worldwide—from small but data-intensive businesses to the behemoths of social networking—salivate over the cost savings and scalability of the cloud, hoping to move an entire nest of IT resources into a data center “somewhere else,” accessing it on demand via the Internet or a local area network. Although it’s easiest to think just of machines and data storage offsite, the “cloud” of resources—which is where the name hails from—can include a network, operating systems, application programs and even a place, like AppScale, to develop programs.
Where is that “somewhere else”? Anywhere you can drop a bunch of servers, as long as there’s sufficient bandwidth and connectivity. As RightScale CEO Michael Crandell told business news network CNBC, “We’re a cloud computing company, so we actually manage, compute and storage resources that can be anywhere in the world, and it really doesn’t matter where we’re located.”
Amazon, Google and Microsoft, among others, rent out space in what’s known as the public cloud; with no on-campus data center for a private cloud at present, UCSB’s computer scientists usually frolic at Amazon. In the public cloud, one enterprise’s data and code aren’t alone; in one of the miracles of the cloud-seeding breakthrough known as virtualization, they jostle alongside data and services for lots of other enterprises residing in those same third-party data centers.
Private clouds, meanwhile, are common where security, better performance or sheer bulk make it sensible for a business or agency to run its own discrete cluster of machines. In between there are lots of hybrid models and tangoing between public and private clouds.
Such outsourcing of the heavy computing recalls the old mainframe model, where a (then) supercomputer labored away in an air-conditioned room somewhere in the bowels of the building as users at “dumb” terminals in the same building fed it requests for processing.
“We are evolving to where this ‘mainframe’ now is a data center, with huge numbers of machines, racks full of machines,” says El Abbadi. The terminals are now desktops, laptops, maybe even mobile devices, “and whenever they need something, they go and do it ‘over there.’ ”
When shoveling data to the cloud, it’s not going to an even more spectacular next-generation mainframe, but to a bunch of smaller, albeit quite powerful, servers that may not even be in the same physical data center. The data is distributed over this cluster in a way that dynamically maximizes the efficiency of the cloud, creating “virtual computers” handling client requests. This virtualization of units of computing—remember the “distributed systems” that UCSB has excelled at—cleared the air for the cloud.
Virtualization and access to broadband, Agrawal says, were the critical technological advances that allowed the cloud to form in the last few years. “What has happened with virtualization,” he explains, “is the concern about matching software to the underlying hardware—the type of machine you are running, the type of disk you have—those concerns have become a nonissue.”
And while he sees those advances as necessary, they aren’t sufficient for a paradigm change like the cloud, he believes, noting the hopes attached to grid computing a decade ago and the marketplace’s yawn. “In my mind, whenever there is a technology transformation that has happened, it doesn’t happen just because the technology is feasible. It also has to happen from a business model,” and the business model here is to remove “the really non-essential parts of running a business” to the cloud professionals.
“The convergence of these two aspects makes the cloud feasible, and I think at this point it will gain more and more momentum,” Agrawal says.
Giving Them The Business
Cloud computing’s potential saw Merrill Lynch famously predict the cloud computing marketplace will exceed $150 billion next year. Forrester Research’s Rich Fichera, in a more recent—and measured—prediction for SearchCloudComputing.com, suggested he expects a majority of enterprises will use a private cloud in the next five years, while a “substantial minority” will be using the public cloud for serious business.
A survey by RightScale finds the biggest drivers are scalability and cost: rapidly expanding (or declining) enterprises can find all the computing resources they need as they need them, with someone else worrying about having enough total capacity or up-to-date infrastructure. No more buying much more than you need for a worst-case scenario, or not buying for a worst-case scenario and then facing one. Data centers, by the way, usually putt along using less than half their own prodigious capacity, with the excess capacity set aside for bursts of activity or failures of individual machines.
What’s holding IT managers back? Security, for one. Professor Ben Y. Zhao recalls a survey last year in which four out of five executives considering the cloud—but remaining aloof—cited security concerns.
Zhao and Professor Christopher Kruegel are collaborating on a way to allow users to move their non-cloud applications to the cloud while protecting the privacy of their data through encryption.
“Our tool,” Zhao explains, “will tell you which pieces of logic cannot be encrypted and that you must take care to actually protect. Everything that’s left can be fully encrypted and moved up. …Even if someone was to compromise the cloud— which is not difficult to believe at all—your data would still be safe at night.”
“It’s all the pieces of the cloud—databases, the theoretical foundations, the scheduling and the infrastructure and the platforms, the programming languages and the security. In a positive way, it’s been the perfect storm.”
Their research finds that 80 to 90 percent of the data can be fully encrypted, and much of which can’t—say directions to your terminal’s display about what color to make the screen—don’t present security concerns.
“We have a very strong security group, and it is a very practical security group and that’s one of the things that make this department kind of special,” says El Abbadi, citing scholars such as Zhao, Kruegel, Giovanni Vigna and Richard A. Kemmerer.
Because the department has always had, as El Abbadi says, “a practical bent,” its connection with Main Street has always been exceptionally strong. “We can be traditional, we can be entrepreneurial.”
He cited such local lights as RightScale (which offers management of applications in the cloud), AppFolio (Web-based property management service) and Eucalyptus (cloud infrastructure) as cloud-specific success stories born at UCSB.
“UCSB has a great history of building great startups, especially in the cloud space,” says Zhao; private industry has shown it agrees.
Recently CNBC spotlighted the College of Engineering—and the Department of Computer Science in particular—in a segment dubbed “Welcome to Techtopia.” Guests Kevin O’Connor, founder of Internet advertising pioneer DoubleClick, and cloud CEOs Michael Crandell of RightScale and Brian Donahoo of AppFolio stressed the joys of running young tech companies in the shadow of UCSB.
That strong connection to private industry can be a garden of delight for scholars. “One of the challenges of the space,” says Zhao, “is that for academics like us it’s hard to find the interesting problems by ourselves because we don’t have the large-scale data and the large-scale machines that companies do.
“And the problems are interesting when things are very large. …Industry has the first exposure to some of the interesting problems. We’re relying on them to tell us what the problems are.”
“We’ve really got a good group of people here to solve these really complex questions in distributed systems,” notes Krintz. “We’re on the cutting edge of what industry wants to take forward. I think it’s exciting for students as well, because we’re having practical impact immediately, important impact—these are hard problems to solve if we’re really going to get to the next level of technology.”
She and Zhao epitomize the joy-of-the-scientific-hunt mentality at UCSB.
“I want to solve really hard computer science problems,” says Krintz. “I wake up every morning looking forward to that. And if I have to be driven by the bottom line, then I necessarily have to solve problems in a particular way. Staying away from that gives me the freedom to find the right solution, and then someone else can commercialize it.”
Fred Chong is leading an effort to develop an experimental miniature data center for research on increasing energy efficiency in data centers—essential to the well-being of the cloud.
While it currently has no specific Cloud 101 course to join the Cloud Expos and Cloud Journals that have blown in in the decade or so since “cloud” joined the IT glossary, huge swaths of UCSB’s computer science research sustains, improves, expands or leverages the cloud.
One exemplar of serving the cloud without focusing on it is the story of startup Eucalyptus Systems.
In a purely academic project sponsored by the National Science Foundation, a team led by Professor Rich Wolski studied how NSF’s supercomputers could be combined with Amazon’s market-leading public cloud to perform complex computations in weather forecasting. Insights from that project, and the open-source software artifacts it generated, paved the way for the commercial enterprise that is Eucalyptus—an open-source software platform that enables companies, universities and other enterprises to turn their computing resources into their own Amazon-compatible clouds.
Wolski has not been lost to UCSB; he’s slated to teach again this coming year.
Krintz’s RACE Lab, through the support of Google, IBM and the NSF, has also been working on making life easier for cloud dwellers—in this case, application developers. Her lab’s AppScale project, also open-source, gives them “a giant virtual machine” in the cloud where they can write, deploy and debug a program against a set of interfaces and have it work anywhere your platform exists, “from a laptop to the Google cloud.” While tempting to private industry, AppScale has not been commercialized.
Although it was designed specifically for Google’s popular public cloud, it also works with other clouds and interfaces so a wider range of computation problems can be solved.
“Scientists,” Krintz says, “should not be bogged down in how to program their algorithm; they should be able to express it in a very high level language and we should be able to make it very efficient and scale automatically.”
Nor should they worry about the hardiness of their data or fret about retrieving it. In the first-generation cloud, Agrawal says, “maintaining the consistency and the reliability and keeping data meaningful was all thrown in the hands of the person who is writing the application at a high level. You can think that as a user, you are responsible if things go wrong. And I think people are now recognizing that as a significant problem.”
Noting that “technology over the last 20 years has worked very hard making these things very safe, very stable and very reliable,” he and El Abbadi are working on keeping that so. “When it’s your own data, things are very simple. But when it’s shared data, being accessed by multiple people at the same time, then you have to start worrying about making sure you are getting data that has been modified correctly.”
Meanwhile, Krintz sees two big challenges to the cloud going forward: her forte, programming for the cloud, “and how do you do it in a way that’s energy efficient?”
While the cloud generally reduces electricity consumed by the end user, data centers are voracious energy users. They are most energy efficient at minimum or maximum use, and both states are the exception, not the rule. At UCSB, the Institute for Energy Efficiency and its Greenscale Center for Energy-Efficient Computing are addressing energy consumption and computer cooling head on.
Greenscale has harnessed computer scientists like Fred Chong and Timothy Sherwood with computer engineers to essentially redesign the building blocks of the entire chain so that energy use in the cloud is as scalable as its data-handling.
Given the amount of energy consumed by data centers—by some estimates they use $30 billion of power a year—companies like Google are sponsoring research to cut the bill without harming the centers’ speed or elasticity. Chong, for instance, is leading an effort to develop an experimental miniature data center where researchers can conduct “radical experiments” impossible at a working center.
Those energy hogs meanwhile produce inordinate amounts of heat, which hurts performance and degrades electronics. Greenscale is tackling the issue on two tracks, aiming to reduce the heat generated in the first place and also to find innovative methods of cooling through things like better heat sinks and improved air flow.
Some of the biggest challenges, from the hardware point of view, are the network bandwidth, the flow of data, which as Agrawal notes, underlies the cloud’s raison d’être. Within the College of Engineering, experts in computer networking and photonics are beavering away at those fundamental issues of computer engineering, focusing on more, faster and cheaper.
On the software end, Professor Kevin Almeroth and Professor Elizabeth Belding, through their Networking and Multimedia Systems Lab, are examining other networking challenges, including those posed by mobile devices.
And as the virtual computers may span multiple sites, so too will the client resources they handle. That’s no problem if the data is pretty much independent of the other data, notes Zhao. But when the data is ‘highly connected,” breaking it apart, reconstituting it, and interpreting it on the fly, gets commensurately hairier. And as luck would have it, two common data sets loaded with interdependencies are graphs and social networks like the uber-popular Facebook.
“There are some solutions out there, but none of them are that good,” says Zhao, who is working on a better one.
“That’s why our department has risen over the last 10 years, because we focus on hard problems but they also have practical significance,” Krintz says with conviction. “So they’re useful in the near term, but we are still taking risks and figuring out how to evolve to the next generation of hardware and software.
“Santa Barbara is unique: there is no other institution in the United States, in the world, that is working on cloud software at this level,” Krintz says. “It’s all the pieces of the cloud—databases, the theoretical foundations, the scheduling and the infrastructure and the platforms, the programming languages and the security. In a positive way, it’s been the perfect storm.”