This article is fully available at the following address
Recently, I developed a handful of demos using open source technologies for detecting and alerting fraudulent events, incidence of poor customer experience and arrival of target subjects in geo-fenced locations for marketing purposes. The use cases required detection of individual events from streaming data sources and processing complex set of rules for identifying events of interest to create alerts for enabling data-driven insights and actions.
I selected the Apache Hadoop technologies, namely, Kafka, Storm, HDFS and HBase as they were found to be the best fit for these use cases and the tools had been deployed in large scale operation by reputed multinational organisations. In addition, I found a vast array of pre-integrated libraries, examples of source code and “lessons learned” that were freely available on the Internet and therefore instrumental in improving my productivity.
When the rubber hit the road…
As I showed the demos to my colleagues and customers my decision to use these Big Data technology tools was put to test. Common questions raised during the demo were: “Why not use Apache Spark Streaming instead of Apache Storm?”; “What would you recommend as the open source technologies to bet on in our 3 year Big Data roadmap?”; “Why did you not use Apache Flink that integrates complex event processing, stream processing and machine learning”?
When all else fails – ask the experts
As Open Source technologies dominate the C-level agenda, questions such as these are not just limited to streaming and CEP technologies but expected to be widely common pre-requisite for developing Big Data architecture in the enterprise. Unfortunately, for every such question there are myriads of opinions with no new answer. With new Open Source projects unfolding every week it makes the task harder! So, I took advice from the experts on Big Data and Open Source, Think Big Analytics.
Find below my key takeaways from their advice.
There is no free lunch – the ‘free puppy’ still needs to be fed though!
Organisations fall into the trap of thinking that selecting and implementing open source is trivial or ‘free’. But open source is as free as a ‘free puppy’ – it comes with all kinds of hidden costs that keep popping up. Every technology comes at a cost that includes acquiring the skillsets to use it, develop for it, and maintain and operate it. Open source is not any different.
Tea leaves and the taste bud
Experimentation to compare open source technologies can open up opportunities, but there is not nearly enough time or resource in an organisation to try everything. While you may be able to install a large number of tools with the ease of few mouse clicks, determining their strengths and limitations can take weeks, even months. Have you acquired a taste for that tea yet?
After all you only need a wrench or two to fix the sink
There are more than a dozen query engines for SQL on Hadoop. You do not need that many and selecting the right one is crucial. The ideal approach is to adopt a relatively small number of technologies, be it for SQL engine or other big data technologies, and optimise how the organisation uses them to gain a return on investment. The shiny new wrench may look appealing but the guy who knew how to use it just left!
Stay the night or build your own blue print for long term living
Renting a room may be a good idea for a few nights of temporary stay here and there. But the idea of living in your dream home is quite different. It generally starts with a blue print and well defined plan and path to get there. The same is true for building a big data architecture for your enterprise that differentiates your organisation from competitors. After all you want your dream home to be different to Mr & Mrs Jones next door.
The problem with using the open source tool in response to an immediate need is that it is easy to end up with a multitude of tools that do not work well together. Instead of a “tool mentality,” organisations should take the approach of building a blueprint for big data. Your business objectives and requirements are critical to selecting technologies that meet your needs.
Anyway, these are just a few I picked up as I was venturing into the world of Open Source. If you are interested in gaining detailed understanding of the approach to selecting the right tool and technologies please download this paper.
Sundara Raman is a Senior Communications Industry Consultant at Teradata. He has 30 years of experience in the telecommunications industry that spans fixed line, mobile, broadband and Pay TV sectors. At Teradata, Sundara specialises in Business Value Consulting, business intelligence, Big Data and Customer Experience Management solutions for communication service providers. He frequently writes on a wide range of topics as whitepapers and in the Teradata blog. Prior to joining Teradata, he has worked for Etisalat, Telecom New Zealand, Optus Vision, Cable & Wireless Optus, Logica Consulting, 724 Solutions, Motorola Networks and Motorola Connected Home. Sundara has been responsible for product management, solution marketing, pre-sales for new generation networks and services as well as IT data management strategy, business intelligence, analytics and architecture development. Sundara has lived and worked in Asia, Australia, New Zealand and Middle East. Sundara has MBA degree from Massey University, New Zealand