Tuesday, May 31, 2011

Data vs. information

Today's post from Dr. Groves, Director of the U.S. Census Bureau encapsulates what this blog, data insights, is all about.

What’s the difference between “data” and “information?”

We’re entering a world where data will be the cheapest commodity around, simply because the society has created systems that automatically track transactions of all sorts. For example, internet search engines build data sets with every entry, Twitter generates tweet data continuously, traffic cameras digitally count cars, scanners record purchases, RFID’s signal the presence of packages and equipment, and internet sites capture and store mouse clicks. Collectively, the society is assembling data on massive amounts of its behaviors. Indeed, if you think of these processes as an ecosystem, it is self-measuring in increasingly broad scope. Indeed, we might label these data as “organic,” a now-natural feature of this ecosystem.

Information is produced from data by uses. Data streams have no meaning until they are used. The user finds meaning in data by bringing questions to the data and finding their answers in the data...
To read the entire post, see: http://blogs.census.gov/2010census/2011/05/designed-data-and-organic-data.html

In an era when there are more sources of data available than ever before, we analysts are challenged to use that data well and in innovative ways. In recent years I have also found that simply because lots of data exist, and the public knows that lots of data exist, analysts are expected to HAVE everything and to KNOW everything with an immediacy that is often impractical and sometimes impossible. In other words, expected to synthesize "information" on any given topic simply because data exist.

The challenge for analysts has gone from turning tailored research data into research findings, to taking streams of sometimes incomplete, and clearly not "tailored," data and turning them into useful information. This change, in some ways, is like having an open fire hydrant and being asked to use the geyser to water an orchid. What you need is there, but certainly not in the form you need it.

The ability to use data well requires both strong traditional analytical training and a clever and creative streak. If anything, careful analysis is more important than ever before, but that alone is no longer sufficient. To be able to capitalize on these new waves of data, analysts will need to develop an ability to synthesize statistics from multiple sources, and also to be critical of the data available. (What pieces are missing? How was the data source changed over time? Is the data representative of a whole population or a selected subset? How can, or should, or shouldn't, the data be extrapolated to other groups?)

These questions and others will certainly keep us busy for a long time.

No comments:

Post a Comment

your insights?