Big Data and Causation: The Curly Fries Case Study

Curly-Fry-Cutter-review Big data can be an extremely accurate predictive tool and often reveal patterns and outliers, but often the relationships between big data and the information it attempts to represent are convoluted and misleading. To delve through the data  we must ask "why" to figure out the reasons why a particular big data finding exists and connect the numbers to their real-life representations. The fact that "big data" is often open for interpretation leaves room for human error, meaning that we must be especially aware and make a concerted effort to apply intelligence to the data in order to glean as much accurate value from it as possible.

An interesting case came to light in the recent TEDX MidAtlantic talk when scientist Jennifer Golbeck revealed a phenomenon dubbed the “Curly Fries” Case Study. The concept is simple but quite bizarre. When scientists were looking into the possibility of selling people’s personal social media activity data to future employers and marketers, they found that there was an astronomically high percentage of “smart-people” that had liked curly fries on Facebook. The question is why, and the potential thought processes used to answer that question can reveal a common analyzing error typically made when formulating explanations to support big data findings.

A mechanic, robotic approach to analysis would use the straight-line method to connect one and two and conclude that smart people like curly fries. The hypotheses that could follow: If my child develops a penchant for curly fries at a young age, can this predict his future intelligence? If I feed my child curly fries, will he become smarter?

In reality, we have to take the big data findings at face value and not form irrational conclusions. The study has told us that a lot of smart people like curly fries on FACEBOOK. So what intelligent and realistic explanations could support this finding? If you think about the mechanics of facebook and the trickle-down approach through friend networks on social media sites, you'll realize likely that a person with a large friend network of equally smart people and a high influence factor “liked” curly fries then the rest followed.

It’s important to distinguish between correlation and causation. It’s very easy to assume that an outlier in a dataset indicates a causal relationship. In reality, smart people are not more likely to like curly fries and it can be due to a unique coincidence independent of the data.

Lesson learned, and enjoy your Tuesday!

Data-Fully Yours,

Captain Dash


Big Data & Education Collide: Standardized Testing and Child Self Esteem

(Disclaimer…this is a rant) Big data in education…what a mighty subject to attempt to cover. There's no denying that big data is touching every industry, and of course education is a field teeming with profitable opportunities. The formation of the future generation is not a responsibility that any country takes lightly, and so many companies (especially those in big data) are realizing the potential to improve and perfect the learning process through analytics and trend spotting in order to optimize students' education. 

It would really take all day to weigh the merits and flaws of the big data - education collision which is why I will discuss in specifics one relevant issue: standardized testing.

In the United States, the entire national school system uses standardized testing to measure the performance of students individually and in comparison to one another, rank teacher performance, and rank schools (which then affects their funding). What not many people know is that standardized testing is also used on a trial/error experimental basis to measure the effectiveness of curriculums, teaching methods, and textbooks.

These usages may seem innocent at worst, yet there still exist purveying issues behind big data application, particularly when relating to a subject as provoking and controversial as children's education.

There are many issues with this that span from privacy to the unfairness of funding allocation to the fact that teachers and administrators are given financial incentives for having higher-scoring students. Yes, those are all massive problems but today I'm not going to talk about any of that. Rather, I'm going to address the elephant in the room which is that...

The testing is being used to prove something it doesn't measure. A standardized test may be effected at 10 different schools in 10 different cities. The school who performs the best is viewed as the "best" school with the most effective teachers and the best curriculum. It's assumed that the teachers and the lessons are better at this school.

Equally, 5 teachers could teach the same lesson. Whichever classroom scores the highest is thought to have the best teacher.

5 different lessons could be taught. The highest scoring classroom is thought to have the best lesson for learning the concept.

So what is the problem with all of these assumptions? The problem is that a standardized test is black and white. When it's used to measure varying performances and pick out best practices, it's assumed that all variables have been held constant and that only ONE variable is changed (ex: teachers thought to be best at one school because they have consistently higher test scores). In reality, there are many factors that influence a child's score on a test that span beyond the simple effectiveness of the teacher or lesson or textbook or how intelligent the child is.

There exists a very well-known theory called Maslov's hierarchy of needs. Basically it says that all human needs can be visualized in a pyramid shape, where the most basic needs are on the bottom rung, and only once the basic needs have been filled can a person move up to the higher rung of needs.


This theory can also be roughly applied to children. If self-actualization (the top rung) can be defined more in terms of a child's fulfillment of their own potential and effective learning, then that means that all of the needs on the lower rungs of the pyramid must be fulfilled in order for them to achieve their academic potential. There can be many reasons why a child or a group of children (in an entire school) are blocked from the lower rungs of the pyramid. They could live in a low-income or at-risk neighborhood and the problem could lie with their parents or their home life. Or the school environment itself could be harmful - there may be a problem with bullying (safety) or with students feeling a sense that they don't belong. These factors can't and aren't factored into the results of a standardized test…which makes a standardized test inaccurately representative of what it's meant to measure.

I really believe that this hierarchy of needs theory plays a huge role in influencing how effectively children learn. Low self esteem in particular is a major issue that is actually propagated by standardized testing. Basically, a child's self esteem is influenced by how their parents see them, how their school sees them, and how they see themselves. If at a young age a child receives low test scores, they know that they're performing lower than "average" compared to their peers and their parents are aware of it as well. At first their parents will think that the child is having a "hard time" but if this issue persists, they will inevitably come to the realization that their child has academic difficulties indicative of lower intelligence or just a lack of "book smarts." The child will consider him/herself as less intelligent or worse in school, and this will affect their identity and self esteem and perpetuate the cycle of low performance.

Now, the problem with this is twofold. These test scores damage a child's self esteem because they and their parents are made aware of these test scores. Now, it's not realistic to think that there's a way to administer these tests yet keep the scores from the parents (the argument would be that they have "a right to know"), so how would the education system go about minimizing the impact that these scores have on the child's perception of him or herself? The answer to that would be that the school shouldn't put such a heavy weight on these test scores. Subjective measures of performance should be highlighted just as much as standardized, objective measures. These measures could be essays, creativity, the child's personal qualities (a good leader, etc). It would certainly be an idea worth considering to require (on a national level) that teachers pay attention to students' personal qualities and report those back to students and their parents during parent-teacher conferences. Schools with more funding (typically in high-income areas) usually do take this approach to student development, but it would be interesting if national policies reflected a shift in attempting to improve children's self esteem.

The OTHER problem that hinders children's self esteem is the fact that they really only have one job…to be a student. They aren't old enough to have a job or have discovered and nurtured natural talents yet. So if a student is not successful at the one thing they do all day (go to school) then their sense of identity and self esteem will be damaged.

Now, most schools require that students participate in athletics, but what often occurs is that children who are younger than the rest or less physically developed tend to be overshadowed by their older, stronger classmates in sports and in the classroom. So what is an alternative method to help a student build a sense of identity independent of their "book smarts" that can't be damaged by test scores? Creative pursuits of course! Many schools who lack funding have cut creative and art programs in favor of allocating their funding toward more "practical" subjects that they feel will have a more direct effect on student's academic performance. However, what they don't realize is that a student's self esteem is essential in their ability to learn and the quality of their academic performance, and that sense of self worth can in turn be nurtured by creativity.

Earlier I also touched on the fact that schools (particularly in at-risk areas) may have a bullying problem or the students feel like they don't belong. On a national level, policies require that bullying be met with strict discipline and that resources like school counseling must be available to students who are dealing with social problems. Sounds like a good idea, but when you were 6-14 years old did you ever think to seek out help from a school counselor? Realistically, children aren't going to look for help from an adult if they're having difficulty…in fact, a student might not even realize that there's a problem in the first place. The solution to this is a focus on workshops, seminars, and possibly even required subjects that emphasize empathy and teach students the tools to set personal boundaries, deal with social issues, and strive for social harmony.

I apologize for the long rant, but I really feel like it's time to highlight the issue of big data not from a frenetic parent's perspective ("My child is not a data point!") or the administration's perspective ("Big data helps leverage and standardize the national schooling system") or from the perspective of a big data education company looking to make a profit. Students are the purpose of education, so we must try to gain a deeper and more holistic understanding of how children learn and how their environment affects them in order to figure out exactly how much value certain big data initiatives are actually adding.


Signing off for now,

The Captain



Big Data Leveraging Across All Industries Levels the Playing Field

Today I saw an interesting headline informing the public that "leveraging big data is the new price of entry for the manufacturing industry." Indeed, in an evolving world where the digital is touching virtually every industry that statement holds true. Big data implementation used to be more of a leg-up and gave companies a competitive edge; now in this leveled playing field companies must leverage big data simply to keep pace with their competitors. Although the article in question specifies the manufacturing industry, is it possible that different industries must implement big data analytics differently to achieve the same value? How?

E-commerce- You don't have surveillance cameras to track your customers' journey through your store. You don't have the ability to see or hear your consumers travel through your physical space, making comments or stating complaints. You need some way to track their path online while connecting this to behaviors and reactions, like purchases, conversions, and bounces.

A stupid company doesn't continuously evolve, improve, and address mishaps. Data analytics is necessary for any company who wants to figure out what they're doing right, what they're doing wrong, and what they have potential to do differently. A webpage has a high bounce rate? Boom. Change it. Your conversion rates are low? Boom. Make your products look more attractive with better images or more specific content.

Brick and mortar retail- These companies profit from the benefits of existing in the sensory physical world but also are burdened with several challenges that accompany a more physical experience. For one, they have a clear, eye-to-eye view in and around their store and a clear ability to connect physical actions and behaviors to purchase. However, they also have more moving parts at play that influence their success. They have to ensure that their location is appropriate and determine if people in the surrounding area will want to buy their products (a.k.a. if their target demographic is accessible), their store layout, setup, physical branding, etc.

E-commerce sites exist in competition on more of an equal playing field in the sense that they all exist in cyberspace and are accessible to all. Their location is where they show up on google search, and their store "interior" is their digital space that they can craft to reflect their branding. In fact, this actually heightens the level of competition because the benefits of physical location (like being a neighborhood store) will not influence buyers to purchase from them over others. Basically a consumer's only deciding factor when choosing to purchase or pursue a relationship with an e-commerce retailer is what's in their site and their online reviews.  Data analytics can help them gain a more focused sense of their digital surroundings - who is searching for them, what consumers say about them, and how people respond to and interact with their online presence.

What do these two have in common? 

1. Brick-and-mortar retailers increasingly need an online presence to survive- They need analytics to manage their online presence for the same reason that e-commerce sites addition to aiding them in forming a cohesive unit between it and their physical presence.

2. With e-commerce, the "company" may be online, but the product is still physical. E-commerce and brick-and-mortar retail still have physical products and supply chains, physical people behind the computer screens, and need data to make sense of their physical operations. The physical world is a large jumbled space - relationships and correlations are often difficult, if not impossible, to identify and tease out in that world. We need data analytics to record information and put pieces on paper, then side-by-side to see what we find. Valuable activities and indicators to watch are departmental spending, revenue per employee, trends in purchases, etc.

And manufacturing?

3. Manufacturing has no website to worry about or social media presence to manage. They can use data analytics on a strictly internal basis. Manufacturing is a complicated process, dealing with many automated systems that all have unique relationships with each other, from each step of the supply chain to the production processes. Big data analytics can be used to connect operations by continent, country, city, etc. and figure out lag times, isolate problems, and identify patterns, in addition to measuring productivity in terms of value added per employee or equipment.


These industries may be different and implement big data for varying purposes, but the message is still the same: big data deals with numbers, and every company across every industry needs to sit down and look at their numbers if they want a hope of increasing a special little number called revenue.


Big Data Makes Way for the Segment of One

marketing_to_one_305086 As the famous Henry Ford quote goes, "I can make a car in any color you want as long as it's black." While we can all praise him for his confidence,  the truth is that in his time he had every reason to be confident. He had complete control over his domain. He was the only one making cars, everybody wanted one, and to boot he had mass production on his side with his invention of the assembly line.

In Henry Ford's world it was all about making the product then watching the consumers jump on it. Soon afterward, the small and relatively young marketing world required personal selling, one-on-one on the salesroom floor. Then it became about automation, mass-advertising and pushing the product at the masses.

Now it's come full circle and we're back to stressing personal relationships with single customers. Enter the segment of one, a marketing concept that basically means that a company micro-segments into such fine, small groups that each one only has room for a single person.

So who is doing this? Of course, the major players like Amazon, Netflix, and Pandora target users on an individual basis. Amazon, for example, collects your search history and past purchase history to create a consumer profile that becomes more detailed the more you interact with their website. They then put predictive algorithms to their vast stores of customer interaction data in order to accurately predict what you will want to see and buy, which they then deliver to you through a plethora of channels- ads, recommended products, emails, etc.

Netflix follows a similar approach. They know your show and movie watching preferences based on your past viewing material. They figure out (through comparison against their database) that if you like to watch Show X you will also like Show Y. Obviously their algorithms are more complicated than that, but that's their basic predictive principal that leads to their customized treatment of customers.

The thing about these processes, however, is that the companies get more accurate results the larger the data set is. Similar to the way that a larger focus group or amount of trials leads to more accurate results within an experiment, the key to these sorts of analytics is that the larger the customer base and the more data on each customer, the more accurate their targeting is. And the more you use their website, the more data they have on you, the more accurate the consumer profile they create, the smaller the group they can segment you into, and thus the more pertinent information they'll be able to offer you.

So what does the future look like for the Segment of One? Basically segmenting into groups of a single user is the height of personalized relationships and the closest thing we can get to having one-on-one conversations with every client in this modern world. Although the process is actually quite impersonal and automated, the more accurate the segmentation is the less automated it will feel to the consumer. The larger the database of comparison and the more accurate the analytics processes, the higher the ability to create a fairly accurate estimate of your likes and dislikes early on and the higher likelihood that you can be converted into a long-term high-value asset.

Relationships are the present and certainly the future of marketing. Companies need to focus their efforts not only on throwing their net out into the market to snare strays but also on luring in and sustaining repeat customers through conversation-centered marketing. If you know what a customer wants, you'll be able to imbue your encounters with them with relevant, interesting and thoughtful information that will apply only to them. And voila! You have nest of repeat customers, a web of segments of one.

Challenging the 'Big Data is Just a Scam' Argument

Earlier today I came across an article intriguingly titled "Big Data is Just a Scam." Big data is my business, so I read it and actually found it to be quite funny and well written. However, as it is Captain Dash's job to preserve, protect, and defend data, I feel it is my duty to respond. Article:,2817,2455435,00.asp

First of all, I would like to reiterate that it was both intelligently written and effective in addressing several key points. Namely: 1) Big Data is not well defined 2) There's public confusion surrounding whether "big data value" lies in the collection/analysis or in the data itself… 3) The confusion leads to a lot of hype and 4) Big data can be blind

He uses the example that I've heard several times before in big-data-bashing dissertations: the ever-present Amazon frustration example. If you research a product on Amazon on Monday, you will continue to be bombarded by ads for that product for the rest of the week. "Big data" doesn't know if you looked up that product, read a ton of horrible reviews about it, then decided that you definitely don't want it. "Big data" also doesn't know if upon researching the product you decided to buy it, and so you'll have ads pushed at you for a purchase you already made.

Let's rewind a little bit. The word value means "the importance, worth, or usefulness of something." You can't derive value from something you don't use. If I have a hairbrush and I don't use it, then I'm getting no value from it. In that sense, if I have a storage system for my "big data" but I do nothing with it, then yeah, it's worth nothing. If I have a broken hairbrush and I DO use it, then I'm probably still not getting value from it in the same way that if I have poor-quality data, no analytics tools in the world will be able to fix it. However, it's equally possible to have a hairbrush that's perfect but I don't use it properly. In Amazon's case, their data is pure- simply records of your behavior on their website. However, they've received criticism because (1) They don't know how to figure out if you've already purchased a product in order to stop sending you ads and (2) They don't know how to determine your personal reactions to what you see on their website. They don't know your motivations and they see no difference between you looking at a product page and loving the product or you looking at a product page and deciding you never want to see that product again. Which leads us to my next point:

Big data can't know how you're feeling. Data collection and analysis can be skewed by the human element at any stage. The path to compromised data can begin if the collector chooses to collect survey data, which often has limited accuracy because people tend to lie on surveys and respond subjectively. That path continue if the analysts makes far-fetched conclusions based off of inconclusive data (ex: 40% of people google searched 'flu symptoms' doesn't mean that those people have the flu).

Big data can't filter truth through a psychological lens (yet) and it can't be perfect. Your data quality is only as accurate as your collection methods.

I believe that the author is really trying to say that a lot of the hype surrounding big data is unwarranted. However, calling it a "scam" and blaming companies' collection/analytics mistakes on the concept of big data just isn't logical. A stupid company blindly trusts their numbers, but an even stupider company ignores the opportunity to gain insight from the untapped value that their numbers hold. Obviously outliers and extraordinary circumstances exist that can skew data in a certain direction and compromise its quality and integrity.  Really what "big data" is is just the fact that companies have so much internal and external data that is just waiting for analytics tools to tap into it. Once the proper analytics tools are applied you can see a bird's eye view of your company. I don't know about you, but I'd rather be looking down from above than be lost on the ground.

Signing Off,

Captain Dash


Backlash over Google's Flu Tracker Reveals Issues with Big Data Privacy and Corporate Agendas

google-zip From 2011 to 2013, Google created Google Flu Tracker, which supposedly tracked and reported the occurrences of flu cases by area. It emerged right at the first crescent of the big data wave and was promoted by the tech giant as proof of the powerful and relevant big data applications.

Recently, however, an article published by the Science journal attacks the validity of the GTF report and calls into question not the concept of big data itself but the integrity of Google’s data collection and reporting methods. The data in the GFT report was based on the amount of flu-related Google search queries; Google claimed that they “had found a close relationship between” flu-related search terms and actual flu cases, and so they used flu search terms to indicate areas of higher flu case occurrences.

Turns out, the relationship between the two was not close at all. The Science journal article reveals that Google overestimated the occurrence of flu cases by a whopping 50% on average.

The responses to this news have varied. Some think that this big data malfunction was due to an overhyping of big data’s abilities, while others believe that Google fell victim to the trap of simply collecting the wrong information and mistranslating it.

Others like myself believe that this “misunderstanding” is actually representative of something a bit more sinister and troubling than a simple case of misinterpreted data. In my opinion this scandal shines a public light on the all-too-common (yet rarely publicized) danger of “big data wishful thinking,” which has little to do with errors in the data itself but rather highlights the irresponsibility on the part of large companies like Google who release these “big data reports.”

Google basically collected a mass of data about the flu and thought to themselves “How great would it be if this data could actually tell people who has the flu?” The potential for a logical connection existed between flu-related search terms and actual cases of the flu, so they decided to bridge that connection without conducting proper research or due-diligence to confirm the validity of their claim.

The solution would have been simple: Google could have tracked the Internet activities of a group of test subjects, with and without the flu, and connected the instances of flu-related search terms between the parties in order to paint a very realistic picture of the relationship between searches and actual flu cases.

Google is a massive, hyper-intelligent company with the intelligence and the resources to conduct this sort of authentication study. So why didn’t they?

Because for some companies, big data isn’t about the truth. Proper experimental processes dictate that data must be collected and objectively analyzed in order to come to a conclusion. However it appears that in Google’s case they began with the end in mind, a desired conclusion that would improve their image, credibility, and ranking within the big-data-sphere. With their conclusion already chosen, they filled in the blanks with data that they surely knew was not completely suitable or relevant to support their claim.

This entire case highlights one of the main dangers of big data. Even just the term “big data” sounds so technical and sterile that it’s hard to believe that it could be manipulated to reflect a company’s agenda. We live in a technocracy where the public blindly puts their unquestioning faith in tech giants like Google and Microsoft to the point that these companies have more than ample opportunity to delude the public for the sake of furthering their own agenda.

It’s difficult to believe that big data is a tool that can be manipulated by these companies, yet we must recognize that data collection methods can be flawed and analysis processes skewed in favor of a desired conclusion that benefits the analyzer.

Big data privacy laws pose a unique challenge in the sense that, while they’re meant to keep our personal information safe, they also limit the amount of information that companies are allowed to reveal pertaining to their reports. In the case of the Google Flu Tracker, Google wasn’t allowed to reveal the search terms upon which the entire report was based. The problem was that we only saw Step 3, the conclusion of the report, yet Steps 1 and 2, the raw data and analytics processes, remained shrouded in the darkness of stringent privacy regulations.

And with that, dear readers, I wish you a Happy Friday and advise you to always consider the source of your information before you believe it at face value.

Data-Fully Yours The Captain

The Science article can be viewed here:

How Big Data is Helping People: From Disaster Relief to Human Trafficking


In this world nothing is ever truly one-dimensional. Where there is light there is also darkness, and while some people would have you believe that big data is a menace and a privacy hazard, there’s no denying that it helps people.

Here are a few of the ways that Big Data has stretched its little digital fingers into the humanitarian realm and helped make our world better.

Disaster Relief:

With the widespread use of social media during times of national disaster, it’s possible for aid organizations to pick out trending words and phrases amongst messages posted to Twitter and Facebook to discover new developments in the after-effects of the disaster or pinpoint which areas need help.  With the location known, aid organizations can bring aid directly to the places that need it, which cuts response time and improves efficiency.

In addition, specific organizations have developed data analytics tools to assist during times of crisis as well. Google Crisis Maps shows the locations of resources or damaged areas and Google People Finder can help people reconnect with family and friends in the aftermath of a natural disaster.

Human Trafficking:

Big data has made major strides in improving the effectiveness of anti-human trafficking efforts. Google Giving (the humanitarian branch of the organization) gave $3 million to team up 3 international anti-human trafficking organizations with tech companies in order to amass and analyze all relevant data and create new technologies and processes to gather more accurate human trafficking data in the future. The partnership aims to gain a clear view of the areas, the patterns, and the flows of human trafficking through the amassing of data across borders.

One of the key developments of the partnership has been the creation of a global human trafficking hotline. Currently several national hotlines exist but they are too isolated and uni-modal to be truly effective at stopping international human trade. They accept only phone calls and only from specific areas or countries. A portion of Google’s grant will go toward the development of a global multi-modal hotline that can accept data inputs from email, SMS, mobile, etc.

The new system will push data at the hotline instead of having to pull it from the caller (ex: asking the caller to describe their surroundings). If a victim manages to text or call the hotline, data pertaining to their exact location will be pushed automatically which can significantly cut down response time.

With uniformity across the world along with big data amassing and mapping abilities, human traffickers will hopefully find greater difficulty in running operations across borders.

While big data can certainly help by streamlining processes, no major progress will be made until the issue of awareness has been tackle. The problem is that a lot of people think that slavery doesn’t exist anymore in the Western world. A hotline won’t be effective until the issue of human trafficking has been publicized to the point where it will be possible to get the phone number into victims’ hands. The general population also needs to be informed of the propensity of human trafficking so that they’ll be able to identify signs of it and report it to the hotline.

Saving Children:

A $1.3 million grant was given to the Children’s Hospital of Los Angeles to fund a massive data mining/storage project. The pediatric facility will store and mine data about children’s physical conditions through sensors. The data can be processed and analyzed to learn more about each sickness and identify symptom patterns. For example, it could be discovered that if a child with pneumonia hits a temperature of over 102 degrees, it’s likely that the next day they will also report an increase in chest pains and will need to increase their narcotics dosage.


One of the key challenges will be the adoption of big data processes in humanitarian aid organizations. Google Giving and several other progressive aid entities have recognized that technology has great potential to spur advancements in these areas; however, many others organizations either don’t have the skills and competences necessary to implement similar initiatives or they feel that their current systems don’t have the ability to integrate with or support big data analytics.

Additionally, it’s very difficult to amass data in the areas that are most at-risk in the case of natural disasters- namely marginalized areas where many people have limited access to cell phones and Internet. It’s not realistic to believe that these people will be able to report their locations during times of crisis. Clearly these new technologies have limitations and won’t be able to be miraculously adopted over night, but that doesn’t mean that disaster relief through data is impossible. Through a combination of gradual technological adoption in marginalized areas in addition to concerted efforts to amass as much relevant information as possible, big data will pave be the road to recovery in times of need.



Big Data Algorithms Generate Entirely Creative Recipes

ratatouille-production-stills-ratatouille-1847049-1902-2560 Big data has been hellbent on revolutionizing every industry from fashion to exercise to automobile construction, and now it's turned its beady digital eye to cooking! Researchers at IBM have created a program that can generate creative, entirely new recipes by way of an algorithm and a massive database of recipes and gastronomic information. 

Pick a key ingredient, choose the type of cuisine you want to cook (Mexican, Indian, Chinese, etc.) and the type of dish (quiche, casserole, salad, etc.), then watch as it invents an entirely novel recipe that challenges traditional food pairings.

The cognitive capabilities of the program result from a combination of algorithms that accurately mimic creativity when designing a recipe. First, a genetic algorithm mutates and hybridizes food types, pairings, and processes in accordance with the food pairing principle, which dictates that foods sharing a common flavor molecule pair well together. Then the recipe optimizes the  Bayesian Surprisal, a mathematical tools that quantifies the element of surprise. It dictates that a greater element of surprise can be achieved by maximizing the contrast between existing beliefs about food combinations and the beliefs introduced by the new recipe.

There's no doubt that the ability to generate countless recipes that are mathematically GUARANTEED to taste good will reform the food development industry. Schools will also certainly benefit. It’s always been a challenge to find foods that are nutritious but also satisfy the taste buds, and this program will allow schools to do just that in an efficient, cost-effective manner.

However, the societal implications of this program are harrowing indeed. Food and cooking are integral aspects of our culture and have been based off of creativity, inspiration, and ancestry for as long as humans have existed. We still safeguard recipes that have been passed down for generations, a reminder of our history and a way to remember our roots. If we allow computers to take over such a crucial part of our cultural identity, then we will essentially be relegating ourselves to a computer culture.

We must understand that there's a fine line between developments that make our lives better and those that could potentially lead us down a slippery slope toward technological over-dependence. A recipe-generating program may seem innocent enough, but in reality it's the development of computational creativity on the quest to create cognitive systems that is mildly unnerving.

And with that, dear readers, I wish you a wonderful week and a good night's sleep!

Data-Fully Yours,

The Captain



The Human Brain Project and the Big Data Snowball


How the Needs of Big Data Projects Spur Technological Advancements


What is more “disruptive” than innovation? What can interrupt a growth trajectory and alter the course of technological evolution in a way that a new phone or  holographic chart never could? A change in perspective. 

In 1400, the world was flat. Human knowledge of anything beyond province borders was shrouded in darkness, misunderstanding, and misconception. Science and technology were not yet advanced enough permit discoveries beyond the capabilities of our own senses: we could only prove what we could see, hear, or touch.

By 1900, the earth's nature was largely understood, but our knowledge of space and the universe remained hazy and uncertain.

By 2014, we’ve made massive discoveries into areas as abstract and distant as outer space, yet we are still in the dark about certain fields close to home as our own minds. This lifetime will be revolutionary - projects are in motion that will certainly illuminate the cavities in human knowledge and unveil the mysteries of our own consciousness.

The Human Brain Project

In October 2013, 80 top universities and research institutions joined together in Lausanne, Switzerland to embark on the most ambitious neuroscience project in history: the 10 year, 1.19 billion euro Human Brain Project. The project aims to create a perfect simulation of a human brain on a supercomputer, allowing scientists to study brain diseases and mutations, simulate the effects of drugs on the brain, and query beyond the confines of possible experimentation without the need for human or animal subjects.

Big Data

With 100 billion neurons and 100,000 billion synapses, the human brain is the most complex and mysterious machine on earth. To make a blueprint of the molecular architecture that defines the organization and development of the human brain, neuroscientists, doctors, data scientists, and roboticists will work together to gather and mine brain data.

These scientists will collaborate within 6 interconnected departments: neuroinformatics, brain simulation, high-performance computing, medical informatics, neuromorphic computing and neurorobotics.

The most challenging activity will be high-performance computing, largely because the technology required to advance the project beyond the initial stages does not yet exist. The computing department will have to make massive strides in terms of memory and storage capabilities. The supercomputer necessary to store the information pertaining to the intricacies of the human brain would have to be 1,000 more powerful than the best existing technology. In this sense, the Human Brain project will not only explore medical possibilities but also stimulate the development of neuromorphic computers, which are modeled directly after the brain and combine human intelligence with the capacity of computers.

In many respects, this project seems impossible, more of a government funded initiative to bring Europe to the forefront of science than an actual project intended to be supervised to completion. Yes, the organization of the project has been roughly structured, but the technological developments needed to complete each step may not be feasible. There are so many ambiguous, hazy areas connecting one step of the project to the next that it seems that it could go in infinite directions depending on the discoveries made during the initial stages. The project could easily shift focus from brain disease research to an exploration of human consciousness depending on the discoveries made along the way.

The possibilities are endless and the ethical implications somewhat alarming, but with such a massive budget and the world's leading scientists on board, it’s likely that the next ten years will prompt insights into the workings of the human mind that will change our perspective toward cognition, consciousness, and every other scientific realm....and when perspective shifts, the world never looks the same again.


How Big Data Will Change Your Company Culture


It's not a question of why, but of how. 

A company has two basic functions- to create a product and sell it.  With the Big Data Boom allowing companies to gain an accurate insight into the viability of their activities, the way that companies approach the latter function has been entirely revolutionized.

Traditional marketing practices dictated that a marketing division must use every platform possible to publicize the company and advertise the product- internet, TV, radio, billboards, taxi top adverts, etc.   Traditional marketers were blindfolded, shooting their advertisements off into the darkness. In the face of current data analytics capabilities, that school of thought has been rendered obsolete.

Now marketers have the analytics tools to be able to see all of their past activities spread out before them, qualified and valued. They are able to directly see which of their marketing efforts are futile and which are valuable- which campaigns were total busts and which marketing outlets affect the most revenue.

As a result, marketing departments become more streamlined. They stop spending money on activities that don’t directly contribute to their revenues. The company begins to trim their activities, cut costs, and approach advertising with a clear-cut, surgical precision. Traditional marketing practices aimed to TRY to help the consumer, but now they know they are absolutely able and designed to. They become completely customer-centric and now have the time and resources to try to find innovative ways to address the needs of the customer and advertise their product. The focus shifts from quantity of advertising campaigns to quality, creativity, and innovation. There are so many different advertising possibilities- merely in the realm of social media there are a  vast amount of platform possibilities and varying methods for advertising on each- examples are vine videos, posting news tweets, recording a company documentary on youtube, etc.

Big Data won't make your company an automated entity fueled by and relying on only numbers for direction. Rather, the deeper understanding of your companies' activities made available through advanced data analytics tools will shift its focus toward creative ways to optimize on the known-best-practices- namely creativity, innovation, and ingenuity.


Signing off!

Captain Dash

Big Data + Sports = Big Win?


A recent article on once again revealed the endless big data application possibilities - now in the arena of sports! In August, the NFL teamed up with software company SAP to launch a fantasy football analytics tool that allows for fantasy footballers to access historical and third-party data about their favorite sports teams. Just after, the NBA commissioned SportVU- a high-accuracy player and ball tracking system that can access 25 data points from each player every second-  to be used in all games in 2014 with the results freely available to all teams. Another sports technology called ClearSky employs wearable wireless sensors to gather 1,000 data points per second on each player and has already been adopted by handball teams in Norway.

The application possibilities of this data are immense. Teams can analyze game results to see patterns in specific players performance, quantify the value of particular plays, and gain insight into their competition’s weaknesses. So what will this mean for the future of sports? Will the teams’ ability to bring their performance to a whole new level of heightened ability and precision make the games more thrilling or more predictable? Will the scientification of sports kill the thrilling experience that has attracted people to sporting events for thousands of years or will it serve to engage fans further?

And finally the ever-present question- are these sports leagues going too far or are they simply riding the wave of the big data boom in order to capitalize on the inevitable?

You can read the entire article here:

Big Data- In the Eyes of HR, Are We People Or Numbers?


Here at Captain Dash we never cease to be amazed at how big data continues to revolutionize every aspect of business processes- including Human Resources. Some big data doomsdayers fear that HR departments rely too heavily on data to aid in their hiring decisions. By using data analytics tools, HR departments can cut down the time and effort needed to hire new recruits. Data can be used to analyze the locations of possible candidates and discern who may need to relocate, scan online networking sites to see which candidates are active and respected in their fields, and even piece together a candidate personality profile based on their online profiles.

Likewise, analysis of past years' employee performance data can piece together a construction of the ideal candidate based off of elements ranging from which universities produce the most successful employees to which personality traits have lent themselves to success at the company.

These big data wet blankets have expressed concern that data analytics eliminates the human element of HR processes. Indeed, there are some companies that rely solely on numbers and scores to influence their hiring decisions. However, some of the most powerful companies with the deepest pockets (and therefore the most technology at their disposal) still rely heavily on human interaction when seeking recruits. Google, for example, uses data analytics to identify potential candidates and then narrow the pool of applicants in their initial round of recruitment. However, past those preliminary rounds they revert to the "human approach" and conduct telephone and in-person interviews with potentials. Google has a sensible, moderate approach- they understand that data analytics can eliminate a whole lot of work yet also understand that people are not a sum of their achievement scores. There are many subtleties to a person that only an in-person interview can reveal.

Some people fear that the rise of big data means that they too will soon be viewed as just a number. However, what stands between us and a George Orwell "1984" -esque future is the value placed on the human element. Common sense tells us that a 30 minute interview can reveal things that years of data could have failed to predict- Tech powerhouses like Google know this and so do we, so I think we can rest easy knowing that we are safe for now!

Truth, Justice, and the Data Way!

Your Captain

#OpenData: Open Discussion

Open Data is the way forward in aiming to alter the knowledge of modern society. It can help us attain knowledge of important public sectors such as Healthcare, Sports, and Politics. Through this research, value is created to form transparency, innovation, and measurement of policies. Already there have been some great examples of how Open Data has benefited our society, but is it benefiting everyone?

At Captain Dash, we try to look at things with a global perspective. Approach them with a pinch of salt, if you will. We love how Open Data can benefit various social groups, and how it can reveal data we may not have known before. Some great examples of this are the Finish 'Tax Tree', Britain's 'Where Does My Money Go?', Video Game developers Watchdog's We Are Data, and of course the current uprisings and political unrest in Egypt and Brasil has led many to dig deep into Open Data available to them.

If you are going to benefit from Open Data, it is important to quash some of the myths that surround it. Even it's title, "Open Data", it's a bit vague, no? Open to whom, exactly? It is can be assumed that for the Open Data to be available to the public only those who bare the skills to analyze a lot of unorganized data will comprehend and benefit from it.

One of the main myths surrounding Open Data is it's functionality for every constituent. This assumes that Open Data users have the resources, expertise, and capabilities to make use of this data. In other words, some data requires the use of statistical techniques, a deep understanding of underlying data, and an understanding of the types of (casual) relationships. Only persons having an understanding of statistical techniques and other knowledge needed for processing Open data are able to make sense out of the data and to understand the implications.

Another myth surrounding Open Data is how it will apparently result in an Open Government. Be it as it may, an Open Government would allow for transparency and engagement to allow effective oversight, in reality it may result in an information overload. Large differences will appear as obstacles to Open Data analysts and different conclusions may result from different individual analysis. If we are to have an Open Government,the public sector would want to undergo complete transformation. What's strange is that more information does not always translate as better, more democratic, or result in more rational decision making.

Finally, a major myth that surrounds Open Data is that all information should be publicized with no restrictions. Issues such as privacy legislation, limited publishing resources, the quality of information, and complexity of data structures can all have an affect on publicizing information without restrictions. The paradox is that regulation and policies can on the one hand enhance the publicizing of data, whereas on the other hand, policies and regulations inhibit data sharing. Also, data sets generate income for some public organizations. For example, in the Netherlands, some organizations' revenue models are based on the income generated by asking users for a fee for access.

What is your own take on Open Data? Are you benefiting from it or do you see it as only beneficial to a select group of people? Let us know via twitter (@captain_dash).

Yours Faithfully,

The Captain

Data Analytics to make it BIG in 2013

Ah, Tuesday. The most productive day of the week, according to a survey taken by Accountemps. You know what else is productive? Marketers who know how to operate more efficiently. Meghan Keaney Anderson knows 27 ways Marketers can do this.

What's more, we have good news. It appears 2013 will see Marketers planning to spend more on data in 2013, and more than half percent say they’re planning to hire new employees for data/analytics jobs this year. This is according to a survey carried out by Data-Rich and Insight-Poor. 700 Marketers were surveyed and, what's more, it was found that website analytics, email campaigns and social media have become the preferred route for marketers generate their customer data, while display advertising, direct mail, print and telemarketing are not as effective.

With such brilliant news that more people will be looking to invest time into data analytics, it comes as no surprise to hear Gartner Research Vice President Kurt Schlegel state that "many midsize enterprises have yet to even start their BI and analytic initiatives, we expect the market for BI and analytics platforms will remain one of the fastest-growing software markets."

Are you ready to make it BIG in 2013? This is the year business moves from SaaS to Data as a Service. The world's data is doubling every year. If 5.1 billion people own a cell phone, not all of them are just consuming alone. Creation: Marketers will be able to benefit enormously from the large amount of data created from mobile users, thus making productivity more plentiful and intense. We can expect smartphones to be the epicenter of our lives in the years to come, from banking to tracking, social media to even starting a car. All of these factors will tie in with Marketers making use of the vast amounts of data being consumed and created around the globe in the coming years.

Big Data is USELESS...

temp (2)


Big Data. You either love it or you hate it. Like the Star Wars prequels. Or licorice...

Either way, big data doesn't have to be the worst thing ever to happen to you. Too many marketers, managers, and IT professionals are Tweeting, Facebooking, and blogging about how much Big Data "is, like, the worst." 

Here at Captain Dash, we obviously disagree wholeheartedly, and thankfully this article at TLNT disagrees (to an extent) as well. Big Data will remain nothing but a buzzword until people start to PUT IT TO WORK. UTILIZE. UNDERSTAND. What's the point of letting big data just keep buzzing around you and your company if you're not going to snatch it out of the air and make it work for you? It will probably be your most profitable employee. Big Data's favorite form of salary? Analytics. And a required bonus? Human insight. Put it all together and you have a long-standing match made in heaven.

If you Google search the sentence "I hate big data," some results will include articles titled Why We Need To Kill Big Data, and Fanning The Flames Of Big Data

Why all the hate? Don't hate, correlate! Seriously, though. Sure, the word has been used so much that the babies of data scientists say it as their first words, but hate the name, not the game. Big data is unbelievably useful to any company, and any individual. Instead of lamenting the sheer mass of data waiting to be analyzed, just start analyzing. There is a myriad of tools out there to help you. Tools that help with analytics, tools that help with organization, services that help with collection, organizations that help with data management. The trick is to find the services that work for you. Big data will forever remain an annoying fly buzzing around us until we understand what it can do for us.

One of the biggest misconceptions of big data is that it will remove the human element from business interactions. That "insight" refers to the results gleaned from a computer algorithm. Well, no offense to whoever thinks that way, but HUH?! The human element, especially in marketing, will forever decide the difference between mathematically sound decisions and inspired, discerning, and shrewd decisions. So it's time to stop being afraid of obsolescence resulting from big data, and start accepting the value and back-up that big data can provide. Humans make oustanding decisions all the time. Reinforce those outstanding decisions with big data-corroborated facts? Big win.

So try not to pay too much attention to "big data" as a phrase. It's a label that stuck. Perhaps it's misleading and an inadequate description of a veritable phenomenon, but that is irrelevant to how this phenomenon will change our world. You could call it "OMGDATA" and it would still be lifechanging. Hate the name, love the game.

Faithfully yours,

Captain Dash

Blogosphere on Big Data

big data tips blogs word cloud

As a blog devoted to big data and all that comes with it, we thought it would be a great idea to check out the other blogs on the blogosphere and see what they had to say on the subject. An overwhelming majority of the blogs we found talked about tips on how to climb out from underneath the data deluge, and so we decided to put together all of the best and most relevant tips from those blogs. Above is an actual word cloud from the entirety of text from each of these blogs. (Click to zoom) Some very interesting words that popped up multiple times were retention, loyalty, and time. Wonder how those fit in to the Big Data puzzle...

What's that idiom about a dead horse?

Unsurprisingly, other popular terms were business, mobile, analytics, customers, big, and strategy. While the surprising terms were indicators of the unspoken benefits of successful big data strategies, these expected terms are related to the future of 2013 and the partners that are inherent in big data transition : mobile and analytics. We won't rehash the obvious; Big Data is essential for the success of all companies in the near future. We know it, you know it, heck, we've all been shouting it to the skies so much we wouldn't be surprised if aliens on Pluto know it. The trick to this success is going about it in the right way. Not forgetting essential cogs to this machine, and keeping it well oiled with flowing processes, carefully obtained and classified data, and a constant human presence.

How, Captain Dash? Oh won't you tell us?

Simmer down, of course we'll tell you. We want you to be prepared for the revolution, because only the STRONGEST WILL SURVIVE. Okay, just kidding, but the most efficiently-adapted will definitely have more competitive advantage than ever before. And what's the one goal of any successful marketer? I'll wait for you to go grab your university textbook.... Competitive Advantage! And so with great aplomb and satisfaction, we present to you our top tips for tackling Big Data. May the odds be ever in your favor!

Spring Cleaning

If your company has any data-centric plan for 2013, you best believe that it will include some sort of data cleaning. According to Mary Shacklett, who blogged for TechRepublic on the matter, you'll need to do/have a few things if you want to get to a place of peace amidst the chaos. Data retention, classification, deduplication, tiered storage, quality data, and a sense of fierceness and relentlessness. We couldn't agree more. Think about it : you need to be one hundred percent clear on what type of data you'll collect and keep, and for how long. Without these protocols clearly established, you'll be buried underneath the avalanche, brave data skiier. Buried. And then there's the obvious need for classification: which data is more important for the business, which data requires certain security clearances over others, etc. What's cool is that this paves the way for automated tools to quickly "tag" the data for you. After the initial backlog, this process will run itself. Deduplication: totally obvious. Duplicates of data skew statistics and analytics! As far as tiered storage goes, this is just a question of economization. Most frequently accessed data on top, less frequent on bottom. Minimizes time spent retrieving important data. In terms of quality data, this just refers to the completeness of your data. You can deduplicate all you want, but if your data is incomplete, broken, or inaccurate, it will not only skew your analytics and business intelligence, but also screw up any application you use it in. Last but not least, BE FIERCE. No, we're not quoting a Beyonce lyric. We're serious. If you, as a non-IT professional, aren't part of regular IT data meetings, get in there. Apply these cleaning principles from now on to incoming data as well as legacy data. Take initiative and do it now!

Tri-fecta: Mobile, Big Data, Cloud

Sanjay Poonen, guest contributor for Gigaom, wrote a blog post that cleverly advised businesses to treat these three concepts as one to tackle them properly. If these three things are transforming IT at the same time, which they obviously are, then why not tackle them at the same time? Only logical, right? Take for example a smartphone. Mobile, right? Average of 41 applications, according to the post. Each of those applications  producing huge amounts of data via its internet connection. These huge amounts of data are being transmitted through the cloud. So you see, to retain that competitive advantage, a company needs to find a way to optimize the relationship between these three seemingly unrelated IT concepts. If you needed any further convincing that neither one of these problems will go away while the other exists (sensing a Batman/Joker metaphor in here somewhere...or maybe Harry Potter/Voldemort?) think about this: there is no saturation point for the mobile phenomenon on the horizon. 87% of the world's population owns a mobile phone. And that percentage is not close to stagnation. The sooner a business starts to think of these three rising stars as one massive undertaking, they will be able to prepare before their competition, and...let's see if we've been paying attention.... what do we want? Competitive Advantage!

Dude, that's so meta....

METADATA! The first thing you'll think, hopefully, is data about data. That's more or less the jist of it. What does metadata have to do with big data? We find out in this great blog post written by Nicole Laskowski at TechTarget. Metadata is incredibly useful, and has been for years, even before this big data boom. The post cites the example of metadata use in solving crimes-- James McAfee flees Belize to escape a murder investigation, and the police find him through the geotagged metadata in a photo of McAfee in the New York Times. This principle applies to  business just as much to crime fighting. If you have an unwieldy group of data, wouldn't it be nice to quickly see the basic information without having to delve deep into the data itself? "You don't have that option with big data," says Gwen Thomas, founder and president of the Data Governance Institute, "it's like drinking from a fire hose. You'll get knocked over." By utilizing metadata, companies can craft basic rules or definitions for a data concept and attach it to enterprise-wide data used for analytics. And herein lies the potential for big data: if we can get all of big data with metadata under it, our analytical capabilities will increase in potential and decrease in time it takes.

But what does it all mean, Captain?

It means, it's time to stop rejoicing that Big Data is here to put your company on the map, and start managing the nigh-unmanageable flow that's coming through your business doors. It's simpler than you think, and with the right mindset and the right tools, 2013 will be less a year for reporting, and more a year for doing.

As Always,

Captain Dash




Big Data : Clairvoyant? or Blind?

light at the end of big data tunnel

A recent article published in Forbes, written by gyro President Rick Segal, is interestingly titled Big Data's Big Blind Spot. In a nutshell, the article explores the potential pitfalls of Big Data as well as the potential for Big Data to be just hype, hope, or hassle. Many CMOs think all three.

Big Data has been promoted and analyzed and revered by many, and in almost equal amounts, relegated to just a passing fad inspired by a particularly poignant buzz word. Now, at Captain Dash, we firmly stand with those ready to brace themselves for the ever-present digital revolution. However, we also promote the human element.

The article makes a fantastic point - in the end, Big Ideas trump Big Data, and more importantly, humanly relevant ideas. Big Data will help the world make history, but not before the world takes the reins of its own data outflows. This is no surprise of course, even our dashboards promote this idea; big data can give you the visualization and the tools you need, however it is up to the human element to actually make the connections in their minds and find the insights using their intelligence.

It might seem to many CMOs, or any person caught up in the maelström that is Big Data, that they're in a tunnel, surrounded by dataflows upon dataflows that they don't know how to control. The thing is, that with the right tools such as dashboards, aggregators, and various other innovative technologies that rein in the data, a person can get a handle on the chaos and make the most intelligent decisions they've ever made. The trick is to understand that the human insight is the most integral and fundamental block in dealing with Big Data. There is a light at the end of that chaotic tunnel.

Once we remember that we drive the data, and fully understand what it could mean for our future, we can start  to make history.

Data-fully yours,

Captain Dash


Quantify your... wife?

Have you ever considered the Quantified Self movement as a.... couple's tool? Start considering, because the Quanitified Spouse is here.

The Quantified Self, if you forgot, is a movement in which people use data acquisition to track different aspects of their daily lives, such as caloric intake, how many miles jogged, amount of food consumed, etc. This has gotten easier with incredibly innovative technology advancements such as a fork that vibrates if you eat too quickly and sends your information wirelessly to an app. Or, a bracelet that sends your daily activity to the Nike cloud so you can reach your exercise goals.  And so, one can track, quantify, and collate their lives to get a better understanding of their bodies and their minds.

You're welcome for refreshing your memory!

The Huffington Post recently reported on a... shall we say... slightly upgraded version of the Quantified Self movement. Quantified Spouse. This progression, was helped into the mainstream by David Asprey, a Silicon Valley entrepreneur who is a very strict "Quantified Self-er." He, apparently, put himself on a... well... a very particular kind of diet. Read the article to find out more, but know that Asprey was measuring the effect of a certain something-something on his overall life satisfaction. Very interesting stuff.

As it turns out, Asprey didn't stop at his own body. He tracks the sleep paterns of himself and his wife using Zeo sleep bands. He encourages other husbands to track their wives ovulation cycles to detect patterns in their wives' happiness! As it turns out, other couples are catching on. Whenever two Quantified Selfers come together, there is sure to be some track-on-track action. These couples insist that data is an indispensable part of their relationship and that it takes out some of the personal emotions. It remains an objective third-party factor that can oftentimes help solve or even eliminate arguments altogether.

As if this trend could go any farther, there's more. Some couples use tracking to keep an eye on their spouses. For example, one woman utilizes the data collected from her WiFi connected scale to monitor her husband's weight gain when she's on a business trip. A husband uses his wife's sleep score to determine if he should be extra careful about what he says that day.  These couples contend that because the data removes the subjective emotions that can usually cloud judgment, the peace is kept in the relationship dynamics.

So we all know that Big Data and the data acquisition trend are here to stay. Can't really call it a trend if it's just going to continue growing...

But the question here is, how deeply will data permeate our lives? Just how accessible will it become? No doubt we will find the answer sooner rather than later...

As always,

The Captain

P.S. We released a short promo video, check it out! Click HERE for the video. 

Don't forget to follow us on Twitter and Facebook!

Recruiting Big Data : Trends in IT Jobs

For the first time ever, jobs relating to big data are ranking high among the IT skills wish lists of top executives. With the increased demand of big data analytic capabilities and the increased need to control the data bottleneck, enterprises are hiring skilled coders, programmers, and “data scientists” more than ever.

In fact, data analysis/analytics ranked 4th in skills recruiters desire for 2013 ; it was not even in the top 10 in 2012, according to a survey by technology career site Gartner analysts project that by 2015, 4.4 million IT jobs will be related to managing big data and interestingly enough, one of the most popular job titles, “Software Engineer” is dwindling on job boards as companies are looking for more specific skills.

Regina Casonato, managing vice president of information management research at Gartner, says that “In IT, we believe there will be need for individuals skilled in data integration, data preparation, enterprise content management, datawarehousing, large-scale database management systems, and data sourcing -- how to get data from social, public sites and censuses.” Casonato even believes that the role of “Chief Data Officer” will soon emerge. Unfortunately, due to the lack of formation and education in these types of specialities, a third of the 4.4 million big data-related jobs will remain unfilled. So to stay ahead of the trend and find innovative and unique job opportunities, unemployed IT professionals should avoid the uneducated curve and specialize in big data.

Is Datavolution within reach... or not?

We noticed recently that there are lots of people who argue with the fact that data are widdely accessible to the masses. They believe that most consistant data are hiden in very well protected stronghold datacenters. In addition, the common belief is that if the data were accessible, it wouldn't mean that they are of any use for the mass audience.

These arguments are partially true and we found interesting to try to summarize the main blocking points for a mass usage of data.

We believe that there are three of these bootlenecks :

1° Deal with the absence of standard

Data come of many ways and even more numerous formats. There has been some atempt to standardise data into some international and unified formats, but the less we can admit is that these effort haven't paid yet. As a result any platform that wants to use eterogenious data has to reformat them prior any consistant usage. That is part of the work the makes among many other players. There are good reasons to be optimistic there ; not that data are becoming standardised, but there are an increasing number of technologies that can reformat the data so that they can be exploited more easily. On another topic, and on the contrary to the common belief, data are widely available. One can access to almost any type of data about traffic, weather, pollution, demographics, export/import goods (deep detailed), energy, etc. On this topic the article  The Data Deluge From the Economist is quite accurate.

2° Ease the access to data.

Although there are almost unlimited amounts of data transiting through the Internet, it remains very difficult to find them. There is no real search engine for data and no real one stop shopping place for them yet. A few initiatives -among which is probably the most well known- are currently trying to overcome this issue, but we still lack a Datapedia offering.

3° Provide fun

Data is usually boring. Nothing comparable to typing your friend's name into google or looking at Twitter. There are generaly few people spending their week-end playing with series of data and Excel. In order to let anyone play with the data, it has to be through a very friendly interface that has more to do with the right side of our brain (the emotionnal one) than the left (the rational one). It mean that it must emerge some news tools, that are revolutionnary compared to the ones that have existed over the past quarter of century. But overall, it means that the era of the data spreadsheet -The Excel Data standard- shall end to be replaced by the era of the  datavisualization, or in other words, of the graphical representation of data.

Unlooking these three bottlenecks would be critical to make data becoming the next revolution phase of the Internet. It would also probably require that the data become more culturally accepted ; i.e. that we understand that it is not because an information is made out of data that it is necessarly a challenge to understand compare, and use. This might be the more difficult issue to deal with, but there is no doubt that it will be overcome very soon.