In Neal Stephenson’s “Snowcrash,” Hiro Protagonist, the aptly named protagonist of the book, side hustles as a “Stringer for CIC.” He is, essentially, a gig economy worker collecting digital intelligence and posting it to a massive data marketplace. Users of the data marketplace can then access this library to search for any information they want.
This concept of a community “dataverse” where people or corporations freely share data is almost universal to sci-fi but is obviously missing in the real world. While the early internet gave us Wikipedia, this clearly falls far short of a structured, real-time, global database of collective intelligence. All of which begs the question: if this is ubiquitous to sci-fi culture, why isn’t there a billion dollar company built around this exact paradigm today?
The cynical answer is that this is not how multi-billion dollar companies have become multi-billion dollar companies to date. In fact, on the contrary, fortunes have been made by building the richest gated internal dataverses possible. A slightly less cynical hypothesis is that we have lacked the economic incentive to fund it, the organizational structures to manage it, and the technology to build it.
Self-Sovereign Data: The Next Killer Application
A dataverse, providing unbounded access to information for all, will only emerge from a mature data economy where data can be fluidly, valued, purchased and utilized seamlessly. Unfortunately until recently, the data economy has looked more like a barter system where each transaction is opaque and slow; and rather than being distributed, most data-related value is instead concentrated in a few monopolies. But the rapid adoption of blockchain technologies has made it clear that that is about to change.
Web3, the decentralized internet built on the back of blockchain technology, shifts ownership and power back to users and away from monolithic platforms (e.g., Facebook, Google, Amazon etc). It is a world where people own the platforms they use, earn money fairly for content they generate, and are sovereign over their data rather than held hostage by it. Users who are then sovereign over their data can choose to freely trade or monetize their data, giving rise to a rich data economy.
This evolution begins with users claiming more rights over their data from large platforms, but also extends to the ways organizations will monetize and build around newly available data. Data will emerge as the third major category of assets next to physical and financial assets. It has the potential to bring the next wave of capital and users over to Web3 and represents the next killer application of blockchain much in the way that decentralized finance was the first.
How Web3 unlocks the Data Economy
There are three ways Web3 could transform how we build a vibrant data economy:
Data, like all digital assets, is incredibly hard to value given that it can be replicated at almost zero cost. Yet, fundamental to any economy is a widely accepted mechanism for valuing an asset and tracking ownership. Fortunately, digital scarcity is a core value proposition of blockchains. By tokenizing access to data we can track ownership and lineage; creating open markets to determine fair value.
One of the most striking achievements of Web3 has been the massive shift in how communities self-organize. Building on patterns pioneered in open source software, Web3 organizations are able to rapidly enlist communities to contribute in small increments. This is driven by tokens which serve as payment and shares of ownership that directly align incentives without the need for formal employment or contracts. These organizations could be immensely valuable for creating shared data protocols that incentivize the curators, maintainers, and contributors necessary for success, without creating the threat of lock-in.
- Shared State :
One of the largest technical issues with data is that it is siloed. Siloed data has a tendency to diverge in structure, quality and standards. The internet is the ultimate example of this as a networked array of data silos that leaves information trapped within each application and company. A better model for this would be a shared database with tight paradigms for access control that we all could draw from and contribute to. This would require large buy-in and incentives (see 1 and 2) and a massively distributed database. As Balaji Srinivasan’s essay Yes, You May Need a Blockchain explains, blockchains can behave as exactly that. While there are practical problems around scaling, it is a pattern that could change data engineering dramatically.
An internet-native data layer
Until Bitcoin the concept of payments and cash were not native to the internet, instead companies like Stripe built infrastructure to bridge the gap between banks and the internet. Financial applications were still beholden to the underlying infrastructure which was slow and costly and wrapped in a complex layer of regulation and bureaucracy. Ethereum introduced programmable smart contracts with its own token (internet money) which created a modern, composable financial infrastructure. This unlocked an explosion of DeFi applications which offered much more attractive financial options and drove massive global user adoption.
While an internet-native financial economy is a prerequisite for a new internet-native data economy it is not a complete solution in and of itself.
So what then are the building blocks of an internet-native data layer?
In his vision for Ocean Tokens: “From Money Legos to Data Legos" Trent McConaghy, co-founder of Ocean Protocol, provides an overview on how data can build on top of a structure similar to DeFi. Ocean and other early leaders in the space are beginning to converge around a number key elements that compose this data layer:
- Self-Sovereign IDs: A shared universal system for the identification of people, organizations and devices that is independently owned by its constituents and not a third-party .
- Data Wallets: Interfaces for the secure management of personal data assets.
- Protocols for Tokenization & Data Exchanges: Agreed upon ways of allowing configurable access to data through tokens and a listing marketplace for those tokens.
- Secure Data Enclaves: Neutral compute areas that allow a party to send an algorithm to be trained or run on a particular set of data and get results without ever disclosing the underlying data.
- Data Oracles: The equivalent of data APIs for developers to access data on the blockchain from external sources.
- Data Unions (DAOs): Decentralized autonomous organizations governing a contributory data network.
Data Economy Landscape
Inspired by Matt Turck’s data landscapes, I’ve compiled a landscape for the organizations, technologies and products that represent the Data Economy as it stands today. Inclusion is not based on the use of data alone, nor does data need to be the only product or value proposition. Instead, blockchain-based Data Economy organizations fall into one or many of the following groups:
- Decentralized infrastructure essential use and collection of data in Web3 applications.
- Protocols for facilitating the portability of user data between applications.
- Blockchains for whom data privacy is a primary differentiator.
- Applications for utilizing data beyond blockchain metrics (e.g weather, medical records, wearable IOT, social media etc)
- Marketplace for the monetization of data enabled by blockchain tokenization.
- Some organizations could be placed in multiple locations based on their products/features, they are placed based on their primary focus.
- Identity could wide landscape in itself, these are a selection of projects which have showed promise or a focus on data in particular.
Today the data economy has its roots in DeFi, and therefore the largest players (ex. Chainlink) concentrate on tooling and infrastructure for it. But we are already beginning to see a shift. Insurance applications which were originally targeted at fellow DeFi applications have grown to include more traditional lines like weather (Arbol) or travel (Koala). With the launch of projects like Sign in with Ethereum from Spruce, users will begin to keep their data in wallets rather than storing it in-applications. This will allow users to monetize their data through new organizations called data unions, that allow groups of consumers to pool their data and share in its monetization. Similarly, while most of the data that is being put on oracle networks today is DeFi orientated, demand for traditional 3rd party data companies to provide more “real world” (weather, traffic, commerce, movement) data will grow and be an opportunity for new revenue streams for these companies. Infrastructure like Helium will also allow low cost IOT devices to help collect data and transact with each other directly on chain.
When Will The Dataverse Surpass Web2 Monopolies?
Network effects are difficult to break and getting users to adopt Web3-based social media and e-commerce platforms won't be easy. Yet, I believe that progress will compound rapidly once consumers begin to have real ownership in the networks they use. There will be no shortage of entrepreneurs and builders seeking to capture some of the more than $4 trillion in equity market capitalization tied up in companies built upon captive data.
The “dataverse”, or the Data Economy, is a big audacious goal and our current patterns of thinking lead us to believe that big problems must be solved by big companies. Even Stephenson conceived of it as being controlled by a centralized entity, the CIC was the CIA merged with the Library of Congress. However, much in the way Web3 has grown, the dataverse will not be the product of a centralized roadmap instead it will emerge from an ecosystem of complementary projects.