Mo Data, Mo Problems

If you’ve spent any time in high tech the last few years, you’ve probably heard the term “big data” more than you care to recall.  It’s become a constant refrain, and the subject of plenty of breathless cheerleading, much like “the cloud”, “social media”, and countless other trends that preceded it.  This is not to say that big data is not important, but context and meaning are essential.  Big data has many roles to play, but it’s not an end in itself, as Shira Ovide explains so concisely in her recent Wall Street Journal piece

“Data for data’s sake” is the first major weakness of the big data obsession cited by Ovide, and it’s probably the most salient.  This a classic case of valuing inputs over outputs – the idea that if we only collect enough data, good things will happen.  This sort of magical thinking is somewhat reminiscent of past crazes for purely A.I./algorithmic approaches to data science, but at least in those cases there was some concept of outputs and programmatic attempts at sense-making. 

Of course, big data also isn’t going anywhere, and many worthy analytical endeavors demand that we address it.  However, it is essential to distinguish between warehousing, searching and indexing, and actual analysis.  Focusing solely on storage and performance creates a sort of computational uncertainty principle, where the more we know, the less we understand.

As Ovide also notes, there is also a critical gap in analytical talent, which big data has done more to expose than mitigate.   Computing power can go a long way towards making big data manageable and facilitating insight – if paired with a sufficient dose of human ingenuity.  Simply put, humans and computers need each other.  "Pattern recognition” is frequently cited as a benefit of a big data approach, but computers can't learn to spot patterns they've never seen.  As a result, the value of the analyst in defining the correct patterns and heuristics becomes all the more important. 

Appropriately enough, the most valuable and elusive elements lurking within big datasets are often human: fast-moving targets such as terrorists, cyber criminals, rogue traders, and disease carriers who tend to slip through the cracks when algorithms are deployed as-is and left unattended.  The old playground retort that it “takes one to know one” actually applies quite well to these types of situations.

Human capital is a key part of the equation, but it’s not enough to acquire the right talent – you need to address the inevitable organizational challenges that come with retooling for a big data future.  Ovide notes that many companies are installing “Chief Analytics Officers”, and while I want to reserve judgment, the cynic in me suspects this reflects the bias of large organizations to centralize power and create new titles as a first line of defense against unfamiliar problems.  A chief analytics officer could be the catalyst to instill readiness and analytical rigor throughout the organization, but whether this reinforces or dilutes the perception that big data is everyone’s concern is a fair question.

More than anything else, I would analogize the challenges of big data to the differences between conventional warfare and counter-insurgency.  In conventional warfare, the targets are distinct and obvious.  In counter-insurgency, the enemy is hiding among the population.  Much as you can occupy an entire country without knowing what’s really going on outside the wire, you can warehouse and perhaps even index massive data stores without producing actionable insights.  Effective big data approaches, like effective counterinsurgency, require the right balance of resources, sheer power, ingenuity, and strong and constant focus on outcomes.  In the long run, the willingness to pursue a population-centric strategy may well prove to be the difference.