The MIT Sloan Management Review recently asked Does Data Have a Shelf Life? According to the article:
Creating insights from data is an important, and costly, issue for many companies. They spend time and effort collecting data, cleaning it, and using resources to find something meaningful from it. But what about after the insights have been generated? Do insights have a shelf life? If so, when should knowledge gleaned from old data be refreshed with new data?
[Researchers} suggest that for real-world Knowledge Discovery in Databases (KDD) — applications like customer purchase patterns or public health surveillance — new data is imperative. “It could bring in new knowledge and invalidate part or even all of earlier discovered knowledge. As a result, knowledge discovered using KDD becomes obsolete over time. To support effective decision making, knowledge discovered using KDD needs to be kept current with its dynamic data source.”
What do the researchers suggest companies do? “Model an optimal knowledge refresh policy.”
The author of the article wisely asked the researcher to explain what this is, in lay terms, and was told:
“The model itself aims at deciding when to run KDD to refresh knowledge such that the combined cost of knowledge loss and running KDD is minimized,” wrote Fang in an email. He explained that knowledge loss refers to the phenomenon that knowledge discovered by a previous run of KDD becomes obsolete gradually, as new data are continuously added after the KDD run. Knowledge loss has impacts on several levels: if KDD is run too infrequently, for instance, customers may not respond to promotions that are based on obsolete customer purchase patterns; yet there is a personal cost of managing the KDD process, and there are computation costs of running KDD, regardless of whether it’s run in-house or in the cloud, so running it frequently isn’t the answer.”
My take: I love it when the geniuses at MIT create stuff too complicated for any Harvard MBA to understand.
I can’t speak to data or “knowledge” regarding public health surveillance. I can tell you, though, that the use of the word “surveillance” by the researchers was not the smartest choice of words right now.
I can speak, however, to using data about customer purchase patterns to generate “knowledge.”
Why is “knowledge” in quotes? Because I have no idea what the researchers are talking about when they use the word. Ask marketers what their current “knowledge” regarding customer purchase patterns is, and 999 out of 1000 will say “Huh?” (The other one will cite his firm’s Net Promoter Score).
Using data in marketing doesn’t go through some neat and orderly process (e.g., Data -> Insights -> Knowledge) like some academics would like to think.
Roughly speaking, there are two paths data does go through: 1) Data -> Model -> Action, and 2) Data -> Human Intervention -> Decision.
The first path describes database marketing efforts, where data is input into (and used to develop) a marketing model, and after the model is run, action (contact/no contact) is taken (I could have called this third step “decision”, but it might be worth distinguishing an automated decision from a human decision).
This might sound like a straightforward process, but the number of data elements that go into any model is a messy process, that involves testing, and is subject to a cost/benefit analysis of acquiring the data.
The second path describes the other trillion ways in which marketing decisions get made. It’s messy. Lots of data, some elements more relevant (and/or timely) than others. But lots of human intervention. And lots of iterations.
But nowhere in these paths do marketing decision-makers stop and think about what “knowledge” they’ve gained.
In this context of the first path, the model could be thought of as “knowledge.” Since I don’t know of any marketer who would argue that the relevancy and accuracy of any marketing model is constant over an infinite period of time, you could say that that knowledge has a shelf-life.
Many marketers evaluate the effectiveness of their models at various stages in the life cycle of the model. A well-performing model isn’t likely to get messed with. As a result, a model to predict the shelf-life of the model isn’t something I see too many marketing departments adopting.
In the context of the second path, good luck identifying the “knowledge.”
Marketing practitioners just don’t think in terms of “knowledge.”
What drew my attention to the Sloan article was the title: Does Data Have a Shelf Life? The article, however. isn’t really about data, it’s about identifying the shelf-life of knowledge.
Too bad, because the shelf-life of data is the more interesting topic.
The question, as stated, however, is a no-brainer. Of course, data has a shelf-life. The challenge isn’t figuring out whether or not data has a shelf-life, it’s figuring out what that shelf-life is. Reducing the problem down to a formula or model just isn’t realistic. Why not? Because of religion and politics.
If you don’t think there’s religion in marketing, you’re a naive fool There are countless marketers who believe something about marketing that can’t be empirically proved. And if you believe something on faith alone, that’s called religion.
As for the politics of data, assume for a moment that I have data that proves the marketing channel you manage produces superior results compared to other channels. Would you care if that data is three years old? You wouldn’t. But the managers heading up the other channels (looking to increase their budget) would care.
Bottom-line: There’s no question that marketing data has a shelf-life. But determining what that shelf-life is subjective, and I can’t imagine any marketing department relying on a model to figure it out.
As more data sources become available and are used by marketers — and the need to act on that data on a more real-time basis grows — figuring out the shelf-life of marketing data will become a bigger issue for marketers.
It may turn into an advantage for data providers, however. Those that can demonstrate the shelf-life of their data (as well as the shelf-life of competitive data sets) — and successfully defend the determination of that shelf-life — may gain competitive advantages.