We are what we measure

How minor choices in data measurement and analysis can dramatically impact our identities, economies, and lives

7 min readSep 15, 2017

We’re in a golden age of “big data.” From sensors on our bodies to cookies on the web, data is so ubiquitous that it’s measured in “zettabytes” (a very, very large number that they didn’t teach us back in the day).

Big data begs to be simplified, to have zettabytes of complexity transformed into cold, hard, and measured truths: One simple headline can explain the economic cost of a massive hurricane. An annual series of charts has become the holy word on “Internet Trends.” Fashion trend forecasts offer near certainty on trends, and thus designer budgets.

But statistics are not just cold, hard, and measured truths. Behind every statistic is an opinion, and behind every “data-driven” decision are a whole lot of people who decide:

What to measure and how to collect the data
How to interpret, visualize, and present the results
Where to distribute the results and amplify the reach

…and how to finance the analysis, of course.

Research biases are well understood and documented in the scientific community. This is not an article about how to avoid or identify bias in research, or a referendum on the rise of data.

Instead, my goal is to remind us that data is defining the fabric of modern identity, from the mundane to the existential. It subtly shapes who we are, what we believe, and how spend our time.

Here are a few examples, from 90s hip hop to unemployment.

1. What to measure — and the rise of 90s hip-hop

A top 10 list is intended to reflect how popular something has been in the past, like the most popular books from the last week or most popular movies of the last month. But popularity begets popularity, and being on a top list can also be a leading indicator of future success.

Take the Billboard Hot 100. Since 1958, the list has been the dominant ranking system for American music. In the early days, the songs that made the Hot 100 had strong sales in record stores (according to store owners) and a lot of plays on the radio (according to DJs). But record store owners and radio DJs are not impervious to bias. “Music labels nudged or outright bribed radio DJs to plug certain records [to Billboard],” according to Derek Thompson’s Hit Makers. Why? Music labels wanted songs to churn through the top lists, so they could keep making money on new music.

Thompson shows that by changing how data was collected for the Billboard Hot 100, popular music changed as well.

Specifically, in 1991, Billboard transitioned away from self-reported data to using point-of-sale data, from cash registers and Nielsen’s radio airplay monitoring. After this change, popular music became far more lasting because the there was no longer bias toward song churn in how the data was collected. Thompson notes, “The ten songs that have spent the most time on the Hot 100 were all released after 1991.”

A more surprising change? Hip-hop began surging in the rankings.

On June 22, 1991, the week after Billboard updated its chart methodology, Niggaz4life by N.W.A. beat Out of Time by R.E.M., marking the first time a rap group had the most popular album in the country…In markets where popularity matters, information is marketing. When music listeners learned how popular hip-hop really was, it made hip-hop even more popular.

Again, one simple choice in what to measure and how to collect the data had a profound impact on American music taste.

2. How to interpret data — and the strength of the American workforce

We rely on monthly unemployment numbers to signal the health of the entire American economy. But there are actually multiple government sanctioned measures of unemployment that are intended to jointly paint a clear picture of the American workforce. As Drew Desilver explains in the Pew Research Center this past March:

But the unemployment rate is just one indicator of how the U.S. economy is doing, and it’s not always the best one. Simply being out of work isn’t enough for a person to be counted as unemployed; he or she also has to be available to work and actively looking for work (or on temporary layoff)…There are, in fact, five other monthly measures of what the BLS calls “labor underutilization” besides the official unemployment rate, as well as scores of other measurements — labor force participation rates, employment-population ratios, average weekly wages, average hours worked and more. Knowing what those other data points are, where they come from and how they’re calculated is critical in understanding what they do — and don’t — tell us about the nation’s workers.

The additional data collected by BLS enables us to calculate not just the “unemployment rate” as we know it, but a total of “six different measures of labor underutilization, labeled U-1 through U-6, with broader or narrower parameters than the official unemployment rate.”

In this case, having an overly simplistic perspective on how to interpret, visualize, and present unemployment data might be obfuscating bigger trends, from the gig economy to whether “robots are taking our jobs.”

Having a clear picture of the US workforce is critical. One seemingly simple measure like unemployment rate can influence life-altering policy decisions about minimum wage, welfare, and Medicare and Medicaid. And while the correlation between unemployment and election results are not always clear, there was a relationship between people’s perception of their future ability to work and Trump’s electoral success in certain counties. Perhaps part of the oversight in 2016 was overlooking the measures that mattered.

3. Amplifying results — and the glorification of startup failure

“The truth is, 9 out of 10 startups fail.” This stat is tossed around the startup world as an excuse, explanation, and exclamation. It helps serial entrepreneurs explain away past failures and successful entrepreneurs applaud their improbable success.

9 in 10 fail, according to Forbes, Fortune, Mashable, Entrepreneur and more.

But the truth is, 9 out of 10 startups don’t fail. According to a great column by Erin Griffith, this stat is far from the fact or fate:

Cambridge Associates, a global investment firm based in Boston, tracked the performance of venture investments in 27,259 startups between 1990 and 2010. Its research reveals that the real percentage of venture-backed startups that fail — as defined by companies that provide a 1X return or less to investors — has not risen above 60% since 2001. Even amid the dotcom bust of 2000, the failure rate topped out at 79%.

Where did the original stat on “9 in 10” come from? Not clear. Perhaps it was a rule of thumb devised by VCs, or a result of an analysis with limited data.

In this case, old or inaccurate measurement had amplified reach.

In perpetuating the 9 in 10 myth, not only has the startup community normalized failure, it has perhaps discouraged individuals with great ideas but lower risk tolerances from starting their own companies.

These are just a few examples that remind us behind every statistic is an opinion, and behind every “data-driven” decision are a whole lot of people who decide:

What to measure and how to collect the data
How to interpret, visualize, and present the results
Where to distribute the results and amplify the reach

So now what?

I’m a big believer in the power of data to empower individuals, businesses, governments, scientists, and social movements. In fact, I work at Amino, a data-driven healthcare company whose entire mission is to use data for good by shedding light on the cost and quality of healthcare.

Because of that belief, and the scores of people working to delegitimize data as fake news or phony facts, it’s more important than ever that individuals and institutions are cognizant of the influence data has in our lives.

Perhaps it’s time for everyone to start thinking like data scientists and become more acutely aware of the biases that inform layers of data-driven decision making.

This knowledge can be used for good.

Earlier this year Trump, called for deep cuts in foreign aid. The Bill & Melinda Gates Foundation saw that media coverage for foreign aid often focused on impact investment failures, while the data shows a very different picture. “In the past three decades, the annual number of deaths of children under age 5 has dropped from about 11 million deaths to fewer than 6 million,” Gates shared with NPR this week. The Foundation used historic data from the past few decades to estimate “how many additional children will die in 2030 if the world scales back investments in global health.”

We are what we measure, and we can’t fix what we don’t see. Let’s be sure we’re looking at the right numbers moving forward — lives and livelihoods are depending on it.

Join 30,000+ people who read the weekly 🤖Machine Learnings🤖 newsletter to understand how AI and big data will impact the way they work and live.