“Garbage in, garbage out.” We’re all familiar with this phrase. When trying to analyze some of the rougher datasets, we’ll hear this expression from those who are questioning the results. But does the old adage always ring true? Should all “bad” data be considered “garbage” that is useless in analysis? Moreover, what is “garbage” data?
The truth is we live in an imperfect world, which is reflected in the data around us. Getting a perfect dataset might be the goal of many people, but in reality it’s like an impossible mission. Data can be refined and refined, but what about all that refuse? What experience is beginning to showing us is that, if we use imperfect datasets correctly, that so-called “garbage” might in reality be a gift gone unnoticed.
If you’ve done a data collection job, you’ll often find many deficiencies, even if the data was collected directly from the field by experienced data collectors. The problem can be that the data isn’t “neat” enough (not all in the same format or digitized yet); sometimes it’s not even complete in every attribute or the sample just isn’t big enough. When were lucky and the data appears to be good in every sense, you may still end up screaming, “Oh, no! It’s 3-years old already!” So ultimately we have to face the fact that no matter how good the data is when we have big chucks on hand, it’s usually history already or fast becoming. The clock never stops ticking.
Good data is important, that’s a given. But we also need to learn how to leverage all the imperfect data that surrounds us as well because it represents the vast majority of data and always will. The key is to focus on the problems that we like to solve instead of trying to get precise figures. For example…if we can still manage to pull out a big trend or key conclusion from a dataset, then we should probably accept the result and make some decisions based on it. The other trick is finding some possible distinguishing characteristics in the data instead of trying to identify all the factors with significant statistical proof. There are many new ideas being put forth these days that are leading to interesting applications of data that was once thought to be fairly useless. Once we find unique characteristics, we should be able to test them. After all, timing is usually more important in business decisions in today’s fast fact-changing world.
Sometimes when no direct data is available, we can use surrogate data. Surrogate data is simply a set of indirect data that can be used in helping us to at least find some clues, which might help us find the right direction for answering a question. For example, here in Asia, direct income data is not as prevalent or granular in most countries, so we usually use real estate property data or even electricity consumption figures to indicate which area is wealthier. Or we use office building, financial companies, and/or bank branch density data to identify where key commercial areas and dense working population clusters are. This kind of surrogate data is easier to get and maintain; however, it takes a little bit effort to make it useful for making decisions.
Good data takes time and effort to accumulate. The first steps are usually not easy because the immediate results or benefits from them are hard to initially see. You may need to make a lot of investments in hardware, software, and human resources and even consider adjusting your working habits and processes. At getchee, we often encourage our clients to start collecting their data where the consumer lives and spends most of their time. We believe it’s an investment that slowly matures over a few years, becoming valuable both in itself and the experience.
Imperfect data, or “garbage” data forces us think harder and become more skilled at making intelligent predictions. Reliable or “good” data will then lead us to a better decisions and results. There will always be data headaches in emerging markets to deal with, but the truth is, without any data at all, you’ll always make blind guesses. So before you consider throwing out that “garbage”, think again. There might be more gold in it than you think. To learn more about data, check out our article “5 Easy Ways To Learn More About Big Data“