Data Lakes - The Way Forward


Data lakes have been in big data news lately. From being marked down as a terrible idea to being THE big data trend to watch out for in 2015, everyone has something to say about them. Enterprises, especially large ones, are wondering whether or not to make the switch to this de-structured, easily accessible form of data storage. Consultants stand divided citing pros and cons while the fence sitters wait with baited breath to see which side the coin will eventually fall on.

So, what are data lakes exactly? Imagine a repository where all the silos have been broken down to create a free flow, easily accessible environment for data to exist in and which is scalable to meet the company’s needs in the future as and when said data is needed. This is called a data lake – an opposite of a data warehouse which collects and sorts data before storing it in the relevant silo and ultimately discards the older data to be able to add newer data.

Now, as much as one may question the risks and advantages of making this big shift there is no doubt that this shift in inevitable. There is already a huge production and storage of data, which is only growing. As this data grows the demand for storage and mining solutions grows with it. That data lakes offer a solution at a low cost makes them very desirable. The other factor that makes them attractive to various industries is the agility and options they offer for insight into that data. But, the factor that is the most controversial is the silo breakdown. The availability of all data across all levels leading to true data democratization!

So what happens when data silos are broken down and why is it so terrible according to most people? With data stored in no particular structure and with no definitions it is accessible by all departments at all levels to be played with.  While this is obviously a security risk, the bigger argument is that not everyone is qualified to make sense of this data thus affecting data lineage and quality. Other than that there is also the concern that since data lakes by themselves are not equipped to provide any clarity, different departments and people will come out with different results thus creating more chaos than solutions.

While, these arguments are sound they tend to cloud over the enormous world of possibilities opened up to us by data lakes. Massaging and making data answer our questions is the way we operate at this point in time. But, how do we know which questions to ask? How do we know whether the question we are asking is the correct one? The answer to that is that we don’t. Thus, when we choose to break down silos and play with data across functions and departments we open ourselves to questions and realizations we never imagined. Up until now everything has been built around simple questions that we defined over the years. Imagine 5 years from today, with the help of data lakes, we will have access to the all our data old and new, thus being able to ask and answer newer, more refined questions. The openness combined with the agility alone makes data lakes one of the better solutions going forward.

 So, is it time to roll up our sleeves and turn to data lakes? Are they the real future? We have a long way to go and a lot to learn in the world of free flow data but at the same time we cannot ignore that more and more non technical departments are turning to data for answers and these answers can come our way only with a clear open structure. We can compare this change to the coming of Amazon onto the Internet. At that time many businesses wondered if they really needed a website for their business. That question has very clearly been answered 20 years later today. The companies that survived were those who made the switch. The real question is not if this is the future, but, are we truly prepared to make the transition. Making this transition obviously means sharing data but it also means letting go of your data, letting go of the command of your silo. It is the companies, which are prepared to let go of the established rules of data ownership, and power that will be able to survive this switch. These are the companies that will gain competitive advantage in time to come.