Earlier today I came across an article intriguingly titled "Big Data is Just a Scam." Big data is my business, so I read it and actually found it to be quite funny and well written. However, as it is Captain Dash's job to preserve, protect, and defend data, I feel it is my duty to respond. Article: http://www.pcmag.com/article2/0,2817,2455435,00.asp
First of all, I would like to reiterate that it was both intelligently written and effective in addressing several key points. Namely: 1) Big Data is not well defined 2) There's public confusion surrounding whether "big data value" lies in the collection/analysis or in the data itself… 3) The confusion leads to a lot of hype and 4) Big data can be blind
He uses the example that I've heard several times before in big-data-bashing dissertations: the ever-present Amazon frustration example. If you research a product on Amazon on Monday, you will continue to be bombarded by ads for that product for the rest of the week. "Big data" doesn't know if you looked up that product, read a ton of horrible reviews about it, then decided that you definitely don't want it. "Big data" also doesn't know if upon researching the product you decided to buy it, and so you'll have ads pushed at you for a purchase you already made.
Let's rewind a little bit. The word value means "the importance, worth, or usefulness of something." You can't derive value from something you don't use. If I have a hairbrush and I don't use it, then I'm getting no value from it. In that sense, if I have a storage system for my "big data" but I do nothing with it, then yeah, it's worth nothing. If I have a broken hairbrush and I DO use it, then I'm probably still not getting value from it in the same way that if I have poor-quality data, no analytics tools in the world will be able to fix it. However, it's equally possible to have a hairbrush that's perfect but I don't use it properly. In Amazon's case, their data is pure- simply records of your behavior on their website. However, they've received criticism because (1) They don't know how to figure out if you've already purchased a product in order to stop sending you ads and (2) They don't know how to determine your personal reactions to what you see on their website. They don't know your motivations and they see no difference between you looking at a product page and loving the product or you looking at a product page and deciding you never want to see that product again. Which leads us to my next point:
Big data can't know how you're feeling. Data collection and analysis can be skewed by the human element at any stage. The path to compromised data can begin if the collector chooses to collect survey data, which often has limited accuracy because people tend to lie on surveys and respond subjectively. That path continue if the analysts makes far-fetched conclusions based off of inconclusive data (ex: 40% of people google searched 'flu symptoms' doesn't mean that those people have the flu).
Big data can't filter truth through a psychological lens (yet) and it can't be perfect. Your data quality is only as accurate as your collection methods.
I believe that the author is really trying to say that a lot of the hype surrounding big data is unwarranted. However, calling it a "scam" and blaming companies' collection/analytics mistakes on the concept of big data just isn't logical. A stupid company blindly trusts their numbers, but an even stupider company ignores the opportunity to gain insight from the untapped value that their numbers hold. Obviously outliers and extraordinary circumstances exist that can skew data in a certain direction and compromise its quality and integrity. Really what "big data" is is just the fact that companies have so much internal and external data that is just waiting for analytics tools to tap into it. Once the proper analytics tools are applied you can see a bird's eye view of your company. I don't know about you, but I'd rather be looking down from above than be lost on the ground.