13 Feb Big Data challenges for EIM
2014 has been a year for Big Data landing on its feet. We saw businesses and many start-ups actually deploying Big Data and gaining competitive edge through use of Big Data Analytics. The Technology has matured, and there are plenty of great case studies and success stories.
Big data is no longer a Buzz word – it is changing our behaviour and what we expect as consumers.
Companies like Uber, Yelp, Netflix and many more are examples of success stories for Big Data that have been able to capture market share by keeping the costs down and offer better experience and services to their customers.
As a consumer I am excited about Netflix launching this year in Australia. Having two young kids it’s great to have the flexibility to rent movies from the convenience of my lounge room, pause and resume at any time without having to worry about returning the DVDs and pay late fees. By the way Netflix has over 57 million members, in nearly 50 countries.
Uber is another example of a success story built on Big Data technologies. It uses predictive modelling to recommend that its drivers concentrate their efforts towards high congestion areas. It’s a great experience for customers who get the rides faster, and it’s simple as push of a button. I came across a blog posted by Justin Kintz (Driving Solutions To Build Smarter Cities) on how the trip data can be used to plan better cities and reduce traffic congestion. The concept is fascinating, but has raised privacy concerns.
I like and use LinkedIn which has grown to be the biggest professional network with over 300 million members across the globe. Over time it has also evolved providing a better user experience. It gives me suggestions on what I should read, and people I may like to connect with. LinkedIn also matches my profile with similar interest groups and opportunities out there. LinkedIn is dealing with a huge volume of data and there is an interesting article on how their Data Management team has extended its data platform to Big Data. (Gobblin is the framework at LinkedIn for “big data ingestion for Hadoop-based warehouses”).
The common differentiator for the success of these businesses has been that they were able to gain better understanding of their customers’ needs, adapt to those needs, and scale rapidly. These businesses have been able to leverage on Big Data Analytics and Big data Storage to get ahead of the competition. The have planned and invested on expansion into Big Data and analysis and are reaping the benefits.
If you are a service provider – You will have to think about Big Data
“Big data is not for our business, we just have a Terabyte of data and it’s all in our RDBMS data base” – I have heard this many times before in our IM meetings. It’s an attitude which I do not agree with.
I believe if you are an organisation with an established Information Management stream and the business is dependent on insights from you EDW/BI, then you should plan to Extending your Information/Data strategy for Big Data. My reasons include:
• Data volume doubles every 2 years (Data to grow more quickly says IDC’s Digital Universe study). Be prepared for the data explosion when it happens.
• We are increasingly seeing data coming in as unstructured formats – Documents, memos, media files , graphs – for which our RDBMS databases are not always the best suited.
• All major Software vendors have something to offer in terms of Big Data and the great thing is that almost all use and support Hadoop as the De facto platform for Big Data. Hadoop provides the technology to parallelize and scale horizontally rather than vertically. Google used the technology to scale there Petabytes of data and now the same technology is available for us to use.
• The messaging and offerings from the vendors is becoming clearer.
• Businesses are better understanding the value in Big Data Analytics to explore previously untapped areas.
In summary – your business may not use Big Data and Big Data Analytics today, but you should have a plan for it. Businesses will seek answers to their Big Data Questions in 2015.
Where to start with a Big Data plan?
Having a Big Data plan means you need to extend your Information Strategy and Operational plans to incorporate the challenges and complexities presented by the vast amount and variety of Big Data. With Big Data businesses are presented with a whole new amount of insights about their services, products and customers. The Information Strategy should align with company’s strategic objectives. Some typical examples of focus could be – Big Data Analytics can drive process efficiencies (both internal and external processes), lead to discovery of new market segment and drive product/service innovation
There is a change involved in Business thinking and it is also a complex task for the EDW/BI team who need to upskill and be the enablers of the Big Data technology. It would be the important to be realistic in setting up the expectations and start small.
What does it mean for the EDW/ BI teams?
Big Data and Analytics does not replace traditional EDW/BI. It rather supplements it. I see it as a new set of technology that has become available to us handling data characterised by the 3 Vs (Gartner’s definition of big data). Previously we were limited by Technology (Cost of Vertical scaling and storage limitation) to handle the large volumes of data. With Hadoop horizontal scaling and NoSQL databases, handling of unstructured data has become efficient and possible in a cost effective way.
Whether you start with Big Data in a silo on a public cloud or integrate it to your EDW the core stages/processes followed in EDW will extend to Big Data as well. The Data journey would still follow the stages of Data Acquisition -> Data Cleansing ->Data Modelling -> Derive insights. You would though add Refine and iterate to the process as Big Data uses inductive statistics to infer relationships and provide insights.
So, in summary the mission for EDW/BI team still remains to provide insights and the fundamental EDW processes still remain the same, they need to be adapted for Big Data.
In words of Dr Ralph Kimball: “Many of these practices are recognizable extensions from the EDW/BI world, and admittedly quite a few are new and novel ways of thinking about data and the mission of IT. But the recognition that the mission has expanded is welcome and is in some ways overdue. The current explosion of data-collecting channels, new data types, and new analytic opportunities mean that the list of best practices will continue to grow in interesting ways.” ( White Paper: Evolving Role of the Enterprise Data Warehouse in the Era of Big Data Analytics. )
Where do you start?
Big data creates a big disruption for People, Process and Technology. The challeng is to manage and be prepared in each one of the areas. Start small and scale. Recognise your Big Data Assets, and get Use Case scenarios to start building your first Big Data Analytics. Your Marketing department is likely to your most critical Partner for Big Data.
Do you have the right people to deliver: There is a shortage in Big Data technology skills and the breadth of skill sets required for building your Big Data capability is unlikely to be found in one person. Hiring a lone Data Scientist may not be the solution. Rather invest in building a Big Data team that has a breadth of skills to tackle the technical, data and analytical challenges of Big Data.
As per the HBR article the Five Roles You Need on Your Big Data Team are outlined:
• Data Hygienists–these people ensure data is clean and accurate always.
• Data Explorers–these people find the data you actually need for your big data project.
• Business Solution Architects–these people “put the discovered data together and organize it so that it’s ready to analyse. They structure the data to ensure it can be usefully queried in appropriate timeframes by all users. Some data needs to be accessed by the minute or hour, for example, so that data needs to be updated every minute or hour.
• Data Scientists–these people organize the data and build the analytics models for the big data projects. They also revise, update and replace models as necessary.
• Campaign Experts–these people interpret the results and put them into action.
I would suggest upskill your EDW/BI teams in Big Data technology. Your ETL developers/ Data Analyst/BI specialist are the ones who understand the context have the analytical and interpretive skills, business acumen and creativity to fill these Big Data roles. Invest in upskilling them to Big Data technologies. There are many courses and certification being offered by several online Universities and software vendors. I found the resources on Cloudera website helpful. There are free resource and also paid certified paths available to be a Big Data Developer/ Scientist.
Technology/infrastructure: If you have a tight budget, Cloud is the way to go and build your sandbox environment. There are several cloud providers to get you started. AWS provides with Big Data virtual machines and storage. You could start with a 4 node public cluster and then move to a Private cloud ones you can showcase the benefits and your IM team matures in the technology and toolsets.
BI and Analytics tools: Most of the BI tools support Big Data. So if you have an existing investment in Tableau, SAP BI, IBM Cognos etc. then your BI team should be able to provide Big Data Insights from within the familiar tools they are already using.
Big Data languages: There are a heap of languages which you could learn for developing MLM (Machine Learning Models). Some of them are Python, R, Hive, Perl etc. My personal pick is Python, which is acknowledged by many practitioners as the simplest to use.
Some other considerations:
• Most of the time and effort is still spent on Data Acquisition and cleansing
• You would be improving your models iteratively
• Ensure that privacy issues are addressed upfront
Summary and conclusion
Data has and will continue to play a critical role in providing the insights to executives to steer the company in the right direction. Traditional EDW has allowed for reporting, historical data analysis and to some extent predictive modelling as well. EDW/BI will continue to play an important role in the operations of a company. Big Data should be seen as an extension of EDW and will play a more strategic role. The 3 Vs of Big data (Volume, Velocity, Veracity) provide the perfect ingredients for gathering new insights and drive innovation. Machine Learning models for online ads, new target customers, product pricing and product recommendations are some working examples.
Google, Netflix and Amazon.com are great examples of driving innovation through Big Data. Netflix changed its model to delivering media through online streaming. Amazon.com evolved itself from a selling books to an online retailer. These companies invested in Big Data technology to innovate overcome challenges and gained from it. Smaller companies can take a cue from these examples and start building their own capabilities.
Big Data technology has matured and now is the time to take action and harness the power of Big Data insights. It has immense potential for leading innovation across industries and increase your profitability. Imagine the possibilities and do not get daunted by the technology itself.