Why are some people uncomfortable with cloud computing? What are the limitations and is there a way forward?



The recent sudden change in Twitter terms of service for developers — the consensus is, despite attempts to backtrack, that they are against third party clients — has unleashed a debate about the current model for web development. Big server farms hold the data and do the number crunching while users access services through thin clients, mostly browsers. At the same time, web services are run by a single company in a monolithic fashion: Google and Microsoft for search, Facebook and Linkedin for social networking, Twitter for microblogging. But it doesn't have to be that way. We all use email and there is ample choice of email providers; you can even run your own server. On top of that, all email providers participate in the same network: you can email anyone independent of what their email address is. This is possible because all email providers use protocols like POP, IMAP and SMTP, that have been donated to the commons by their inventors — notably, the late Jon Postel. The internal workings of the servers can be completely different, as long as they use SMTP to talk to each other and POP or IMAP to talk to the clients. It didn't have to go the way it did. At some point there were several  competing incompatible email services: AOL, Compuserve etc. We could all be using AOL email addresses today if AOL had won, but an open standard won instead. Email is not the only example. The DNS, the system that translates domain names into numeric addresses is based on an international collaborative effort, is distributed in architecture and ownership and based on open, free protocols. Skype, while proprietary, has a decentralized architecture whereby central servers provide directory and search services, but the heavy lifting of voice and video communication is taken on directly by the software running on the user machines. In this case it works well for Skype which retains total control of the network while keeping its server and network costs in check. Then there is the extreme example, the BitTorrent network, where almost every feature, with the partial exception of search, works in a completely decentralized fashion: there are no servers and the clients talk to each other for all their needs. The protocol is open and while several companies offer BitTorrent related software and services, none of them owns or controls significant portions of the network. Among the advantages of decentralized solutions:
  • Competition: different providers can compete on price and service — I see ad supported models as a degradation of service.
  • Lack of lock in: users can switch to alternative providers of equivalent services.
  • Privacy: with appropriate design choices, data can be kept close to the owner and protected with cryptography when on route elsewhere.
  • No need for giant server farms.
  • More local resources, less sensitive to network latency.
  • Disconnected operation.
  • Reliability: harder for an individual, organization or accident to take down a distributed service — not impossible though.
  • Can't go out of business: if there is no central owner, he can't file for bankruptcy. But even distributed services can fall to spam or obsolescence, think of USENET or gopher for instance.
Disadvantages include limitations in what features can be implemented, for instance distributed web search is still an open problem, despite claims to the contrary; difficulty in policing the system — see the email spam problem, or the malware problem on BitTorrent for instance; and lack of economic incentives deriving from the control of a centralized network.
In particular people have been discussing the role and future of open source software in the age of centralized web services. The freedom promise becomes empty for the user if the free software runs on a central server, with the exception of the browser or other thin client.
Several projects have focused on alternatives for specific services, like Diaspora for Facebook and identi.ca or rstatus for Twitter. These are positive developments and I hope they get the traction they deserve (I can be reached as piccolbo@identi.ca). But they fall short in two respects, at least as far as I understand them. The first is that they are still bound to a multi-server model. So if you are an open source developer, you still have to run some server to make your software available. If your software takes off, you need to find money to host the system, possibly a lot of money. As a user your data is still in somebody else's hands: he could sell it, destroy it or what not. The other problem with these projects is that they solve one specific problem but they don't provide a programming environment for open source to go distributed. While still based on a multi-server approach, the now open source Wave protocol represents a step in the right direction. It is a distributed framework to create a variety of networked applications on top of it. It doesn't help though that Google shut down Wave the application that was built on top of it. Another general answer is proposed by unhosted. Their system is still multi-server, but the server provides only passive, encrypted storage with discovery and authentication services on top of it and all the processing happens in the client. It's not clear what limitations are entailed by having passive, non programmable servers — imagine for instance receiving a message in such an architecture — but advantages like increased privacy and reduced lock-in are clear. Another interesting solution is to build couchapps, javascript apps backed by a couchdb, a document store with replication and version merging features, extensible with server-side javascript. Again it is not clear to me how general couchapp is as a basis for web development, but it is interesting. These are all very encouraging developments, but it seems to me we haven't seen one or a few solutions attract the open source community in significant numbers, let alone gain a user base. Let me list some elements that should be part of this solution.
  1. A directory service. It should be possible for a user to be addressable with his email address, user@domain, for a variety of services besides email, without the actual domain being that of a server that runs anything. For instance, my address is antonio@piccolboni.info. You couldn't tell from this address that I use gmail, and I could switch to another provider without you having to know. Since this little magic is performed by the DNS, it seems like the first candidate to look into.
  2. Authentication. It should be easy for people to establish a communication with a party knowing that they are who they say they are. On the other hand it should be possible to interact anonymously, when both parties are OK with that.
  3. Communication. Applications can send messages to each other. XMPP and extensions, such as wave, or RSS and pubsubhubbub.
  4. Storage. Data should be moved near where the users are for performance and ownership, but also copied elsewhere for reliability and availability, such as in a distributed social network all of your friends could hold a copy of your data, so that at least one copy will be available at all times with high probability.
  5. Encryption. Since your data is going to be all over the place, to keep it secure we need encryption and a system for key distribution.
  6. Availability. Since in the most fully distributed scenario nodes can be laptops that are on and off all the time, availability can be achieved not only by replication, but also with delegation. That is, when your laptop is off, one of your friends' machines would take over and, for instance, send you reminders. This means that the discovery service need to find an alternate server when the main one is off, communication can either fail (synchronous) or be delayed (asynchronous) etc. An alternative could be to use smartphones or wireless routers as supplementary servers, since they are always on. They might not be able to store a lot of data or serve it, but they could queue messages or send alerts when the main server is off-line.
It seems to me most or all of the pieces of the puzzle exist. We have TaoheFS for distributed storage, fully P2P networks such as BitTorrent, XMPP or RSS for messaging and public key criptography. Key distribution is still a problem and the availability of unreliable, personal servers is also a challenge, but hopefully somebody can package all these things into an open source P2P platform so that people with product ideas, but who can not solve all these problems on their own, can  start creating and unleash another wave of innovation and freedom.