Category Archives: research

Decentralizing the Web… Again

…cloud computing represents centralization of information and computing resources, which can be easily controlled by corporations and governments. [Jaeger, et al. Link]

In the wake of Prismgate or the Snowden Affair or whatever we’re going to call this kerfuffle, I’ve been struck by how the current centralized nature of the World Wide Web has facilitated the surveillance. While the Web’s technical architecture is distributed—no single server is essential for the continued functioning of the overall system—in practice the economic realities of web-scale computing have encouraged a centralization of user data in a relatively small number of providers. These are the Googles and Facebooks of the world. These kingpins of the Internet also happen, by and large, to be American corporations. What a windfall this provided the NSA!

This intense concentration of personal information is simply too valuable—for companies, governments, and individuals alike. It’s being abused, and will continue to be abused as long as it exists. But the Web and, more generally, the Internet are all about distributed systems. World Wide Web. Internetwork. It’s about lots of little nodes connected by the network. Would it be possible to reclaim the distributed heritage of the Web?

Companies like Google actually use huge datacenters powered internally by distributed computation to power your web requests. What if that computation was moved from its central location out to the nodes of the wider network? There are at least two obstacles to this happening: the first is technical, the second is economic.

Technical Requirements

How can you run a world-class web application like those provided by Google, with no central servers? Many others have thought about this and worked toward a solution. Here’s the sort of system I would like to see:

  • Globally Distributed. That’s the point—no single node contains all or even a substantial minority of the data. Nor does any single nation.
  • Redundant. The loss of individual nodes is extremely unlikely to lead to data loss due to redundant backups.
  • General. It can run an email app, a social networking app, a web search app, a calendar app, and so on.
  • Private. Users decide what data to share with whom and under what circumstances.
  • Anonymous. Participation on an anonymous basis is possible.
  • Secure. Replicas of data are encrypted so the compromise of a distant node does not reveal personal information to those not authorized to view it.

Many of these conditions are already met in cloud computing environments, but in controlled, centralized conditions. We should move distributed computing technologies out of the datacenter and onto the broader Internet.

Economic Implications

Now, the economics.

The current centralized model is supported almost entirely through the advertising revenues of the central provider. You don’t pay for a Gmail account—at least, not with money. You pay by being subjected to advertising. And, if you respond to that advertising, you pay by buying things from advertisers. If you think about it, in this model, you aren’t even the customer—you are the product. Google sells access to you to advertisers. But all of this advertising revenue pays for the infrastructure so you don’t have to—the hardware, the manpower, the electricity, etc. This arrangement is easy for the average guy or gal, but has some definite downsides. The immortal words of Jeff Hammerbacher come to mind:

“The best minds of my generation are thinking about how to make people click ads. That sucks.” [link]

How could the average web user be induced to pay for their own server in a distributed web application? It should be noted that web users already pay for their web access—$50+ dollars per month to the ISP. What if that fee included a server that was their home base on the web? A cheap, fault-tolerant photo storage service? A highly secure social networking endpoint? A super-fast email app, without the creepy targeted ads? I admit it’s a tough sell. I don’t know the whole answer. If it requires more than minimal additional work by users, the prospect is doomed. But if it provides a better, easier, safer experience—the premium web experience—then perhaps people will pay a little more? Dalton Caldwell’s experiment is very relevant here.

But what if that’s the wrong question, and we should be asking, How could the average web user continue to receive free web applications without the support of advertising revenue? How could this possibly be done? By establishing a global-scale computation marketplace. So you buy a computer—tablet, phone, laptop, desktop, it doesn’t matter—and connect it to a distributed social network application. It contains your social network data and serves it to any requesting information about you (only giving out the information you want it to, of course.) You want your data to be available while you’re offline, though, so you offer payment (via Bitcoin or something similar) to any who will host your data, up to a limit of 5 copies, with payment depending on the historical uptime of each node. But others on the network also want backups, and you take payments in exchange for hosting their data. Want to search the social network? Provide micropayments to nodes to induce them to participate; receive micropayments for helping other nodes make their own searches.

Those who require more resources will spend money to facilitate searches, backups, etc. Those who require less resources may earn money by renting out their mostly-idle server. Perhaps the average user, by renting their computer out to users of various distributed applications earns as much as they spend. Thus the application is free and is not funded by advertisers but by power-users, whose interests are more aligned with the interests of the general userbase.

Seduced by Big Data

What do you need of you want to be Big Brother? Big Data, of course!

Data is powerful, and “big data” is very powerful. I deal with it every day in my work as a research scientist at Adobe, where I write and utilize algorithms capable of processing petabytes of data. I’ve been actively recruited by the creepily-named data mining company, Palantir (named for the all-seeing stones in Lord of the Rings). In grad school learned about powerful statistical methods for discovering latent information hiding in plain sight in ordinary data, and I learned just how easy it is to infer entire social networks from pairwise relationships, like who you call on the phone or who you email.

The Bush and Obama administrations have been culling records of billions of phone calls, emails, web searches, and more every day for years, with shocking disregard for your and my right to privacy. (Your local senators and congressmen have almost universally gone along with this invasive practice.) Billions a day for years is most definitely big data, and just as definitely is the cause for the construction of NSA’s huge new data center in Utah, just across the freeway from my workplace.

Defenders of these surveillance programs say that all of this monitoring is okay because it’s only gathering metadata. It’s true that the actual content of your phone calls is not available without obtaining a more traditional sort of warrant. But the metadata being collected—phone numbers, IP addresses, which number called which number when—is extremely powerful. Phone numbers are very easily mapped to names and addresses. It would also be trivial to discover the social networks behind the phone calls. You and your friends and family would show up together on a “map” of connections, like the one I created of American senators in an earlier post. Please forgive a little reductio ad Hitlerum, but in the wrong hands, such a tool would have made Hitler’s “final solution” a simple matter of searching the computer for Jewish names and sending the Gestapo knocking. Those helping people escape would have been exposed by their connections to non-approved groups on the social graph—another easy search!

My point is that big data allows government to build tools of immense and invasive power, and that such power will prove too great a temptation for an ambitious politician to resist. And the more complete the government’s vision, the more full its grasp of every citizen’s life and relationships, the more cataclysmic the consequences should the government itself fall into unscrupulous hands.

But maybe that’s already happened.

“A Republic, If You Can Keep It”

NSA 9000

I’m really disturbed (surprised isn’t quite the right word) at what I’m learning about the government’s massive, untargeted surveillance of millions of American citizens over the last 7 years. I thought we had this debate during the Bush administration and all decided it was illegal and should stop. Apparently, folks at the NSA and in the Bush and Obama administrations had different ideas.

In case you aren’t aware, a secret court has created secret law supposedly authorizing the federal government to spy on you, your friends, and your family. That means every email you read or write, every search you run on Google, every call you make on Skype. And the bureacracy asking the secret FISA court for approval to do this is so massive and so obscured by secrecy that there exists no single list of all of its activities. It’s called Big Brother, after all.

In my mind this is a clear violation of the fourth amendment:

The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.

No, it doesn’t mention electronic communications (not having been invented by 1789), but they are the modern analogue to “papers”. The wide sweep of surveillance as currently conducted seems to blatantly violate the requirement that no warrant be issued without a specific description of the people, places, and things involved. There is nothing specific about monitoring all phone calls.

The Constitution provides strong protections on privacy that are in this case being clearly disregarded in the name of national security. Combine it with the recent revelations of IRS targeting of conservative groups, and Justice Department intimidation of journalists, and a picture emerges of a gartantuan bureaucracy in which the systems, processes, and perceived mandates of government overwhelm by its very nature the interests of the individual.

Sign me up for the class-action lawsuit.