Small (and Big) Data

Every time a new hacking incident hits the news, the response follows a similar path: news reports about millions of customer data points stolen, quotes from security consultants, breathless reports from news anchors about how many millions of people’s records were taken, assurances from the hacked company that critical customer information was not taken, commitments to shore up the data breach, offers of credit and identity monitoring services, and promises to be better in the future.

And we go back to our lives and forget, until the next one, and the cycle repeats.

But what if the response was different?

What if we, as users, reached a breaking point and consciously decided to care about the security of our data?

* * *

Any company that stores lots of information about people is a target for hackers, and whatever is worth getting hacked, will get hacked. (link)

The incentives lie in the hands of the hackers. Fast-growing companies need to feed the growth engine, and under-investment into structure and security is almost inevitable. Significant events drive course corrections, at least until growth reasserts its primacy over security and structure, and the cycle repeats.

But the under-investment in security is natural for a fast-growing company. “Security” is a tough value proposition for companies, because it’s hard for prospective users and customers to understand, verify, and price appropriately. How much will we pay on the margin to insure against an improbable event, and how should we understand how to price a security premium? It’s something that people simply aren’t equipped to understand and value appropriately, and even with significant market education (“marketing”) by companies, it’s unknown if it’s a value proposition that will shift consumer preferences. In some sense, as users, we don’t have the mental overhead required to evaluate the negative repercussion of every single data point we put into every service we use. And so we continue on.

But if the cost-benefit analysis changed, it could have a seismic effect, a step-change instead of a gradual shift. If a future data security breach finally pushed us to care about the security of our data, how would we react? It’s unlikely we’ll believe companies that they are “more secure”; more likely IMO is that we would shift to using services that are more secure through data abstinence. If the service doesn’t hold our data, then there’s nothing to steal. ^[1]

* * *

At the beginning of this year I contributed to Canvas8’s 2015 outlook into technology, shopping, transportation, beauty, communication, and more (PDF download here):

Our Expert Outlook is going live today. It includes ideas from 33 Canvas8 Network experts on what to expect in 2015 http://t.co/4k2zvz9dAF
— Canvas8 (@Canvas8) January 21, 2015

In the Outlook I focused on passwords, wearables, and the potential of “small data”, behind the reasoning that newer mobile technologies are opening up the potential to build smart services that don’t depend on big data. As I wrote earlier:

… perhaps the idea of storing less data will move towards broader uses. Earlier this year, I gave a talk about the Internet of Things at Startup Iceland, and in it, I posited the idea that as devices and sensors get smarter, they have the potential to reduce our reliance on the smart cloud and push more decisions and processing to the edge, away from centralized cloud services, and away from hackers. Most “smart things” today are dumb sensors connected to the clever cloud; meaning, that the devices push data to cloud services, which then draw interferences and matches that push decisions down to smart sensors. The future of smart things could be smarter devices that are able to use data locally without pushing and storing it in the cloud. If we build more powerful devices at the edge, we won’t need to depend on the cloud to the same degree. And if less data is shared and stored in the cloud, there is less data to be hacked.

* * *

What I didn’t mention in the Outlook, however, was artificial intelligence. Like many investors and entrepreneurs, I’ve recently started to pay much more attention to artificial intelligence and machine learning as the number of pitches and applications of the technology have mushroomed in the last couple months. While we’ve seen hundreds of millions of dollars invested into big data technologies, the value of big data has always come from the insights derived from data that lead to actions that make our lives better, not by the data itself. While many of the first insights have been fairly simple, we’re starting to see deeper insights and offerings by companies applying artificial intelligence, deep learning, and other technologies utilizing the datasets provided by big data technologies to draw new insights. We’re at the cusp of many more practical applications of artificial intelligence, and I believe that we’re going to see more AI, not less.

With that in mind, is it preposterous to think that the big data trend could be replaced by small data?

The better question is to think of where small data makes sense. For many applications and systems, an all-encompassing, permanent data store shouldn’t be necessary. One of the reasons I’m bullish on wearables and proximity-based technologies is that they potentially could facilitate “small data” strategies; one doesn’t need to use the cloud to understand who a person is when the devices can identify that locally, they won’t need to store all observed intent data, they won’t need big data if small data can do. Small data isn’t strictly about less data being held in the cloud about us, it’s about a more nuanced sense about what, when, where, and why the data we share with services will make our lives better.

And my bet right now is that AI will enhance that understanding; even as AI needs big data and its datasets to “learn”, it could also help educate people about how and when it’s valuable to share data, but also when small data is enough.

The example is often use is Tokenization, which replaces sensitive credit card data with with “token” values that are useless to hackers to steal (more about that here). ↩︎