This article is fully available at the following address
Hands up who has NOT heard of R? If you are in the data analytics space and have an internet connection then you would have heard the open source programming language for predictive analytics and statistical computing that has taken the analytics world by storm.
Like most things, it takes time to reach critical mass and I would say that R has very much reached that point. It was first released back in 1995, with a stable beta version released in 2000. We had heard about R in various contexts before, but there was no specific requirement to start using the tool in anger – or so we thought.
Enter Machine Learning
All of our predictive modelling was done using other proprietary tools, which were giving us good results. Unfortunately, the predictive models that we were building were offline and static in nature and took some time to develop. Enter Machine Learning.
The very nature of a ‘proper’ Machine Learning system is dynamic in nature and requires the models to track recent trends in the data. I say “proper” as static modelling can be considered to be one form of Machine Learning (like predicting the survivors on the Titanic on the data science website Kaggle). By nature therefore, one cannot hand craft models in which regular retraining is a requirement, it is just too onerous. So, one needs to work with a tool that can build predictive models quicker. Sure, you might lose some predictive power by not binning characteristics in the optimal way or taking extra care with missing values, but what you lose by cutting back on the TLC, you gain by retraining on more recent data.
This is particularly relevant in dynamic environments like in the call centre space where call centre agents can come and go at an alarming rate, diallers change, and the underlying data changes at a fundamental level. By the way, we have some great tricks now that dramatically narrow the gap between “quick-and-dirty” and “hand-crafted” using R, but more on that in another Blog.
There are many Machine Learning tools out there that can do a good job. But they all cost ‘quite a bit’ and in this fast-changing space, one is just not sure whether your carefully selected (and expensive) tool will be top of the pile in a year’s time. Plus, there is a requirement to up-skill with that tool, and that takes additional time. One thing is sure though, Microsoft will be around for some time.
What does Microsoft have to do with R?
But what does Microsoft have to do with R? Quite a bit actually. In April 2015, Microsoft took the most amazing leap forward and purchased Revolution Analytics. Revolution Analytics were the ones you contacted if you wanted to integrate R into your business, and they were doing a pretty good job. Let’s just say they knew R pretty well.
In purchasing RA, Microsoft bought the IP that would allow them to incorporate R into all their mainstream products – which they are wasting no time in doing and we are loving them for it. Let us take Power BI as an example, Microsoft’s BI solution. It’s dirt cheap (for now) and they are taking the BI world by storm by investing millions into its development and upgrading aggressively in line with all the user feedback comments. It is currently in the most favourable position in Gartner’s Magic Quadrant for BI tools. An R console is available on the back end (data load) and front end (User Interface).
On the back-end side, this means that you can manipulate data using the SQLDF package which is based on SQL LITE. If you know SQL, you will LOVE this. You can join tables, create new fields, and manipulate tables to your heart’s content. Very few BI tools have this capability (Qlikview being the exception, and this is one feature that I love about Qlikview). Basically, whatever works in native R works in Power BI. Brilliant!
On the front-end side, things get interesting. Again, anything you can do in R, you can do in Power BI. This throws the door wide open in ways you may not have realised. Here is a link showing just some of the visuals you can achieve using R (note: using R and not Power BI’s built- in functions)
What about SQL2016?
And then there is SQL2016. Dear, dear SQL2016, so happy you arrived. Traditionally, R has been more suitable for the research and small-scale cases due to its inability to efficiently process and model on big data.
Some pretty cutting-edge R libraries have been developed by some clever people who compete with the big hitters like SAS, but the limitation has always been on the data size. By bringing R into SQL2016, this solves this issue. Retraining using any of the powerful R libraries just got a whole lot quicker. Here is a case study from the Microsoft blog that illustrates this nicely and contains a pretty convincing quote: “PROS Holdings uses SQL Server 2016’s superior performance and built-in R Service to deliver advanced analytics more than 100x faster than before, resulting in higher profits for their customers”.
Here is a great link showing why R and SQL are a match made in heaven (in particular around the 2m30s mark).
R not only covers descriptive, predictive and prescriptive data analytics. There are over 7,000 packages available that make this tool extremely versatile – from image manipulation to heat maps, to linking to any type of DB like SalesForce.
We started off by asking: “I wonder if there is an R package for that?” but this has become a running rhetorical question. RStudio have even created a web service offering that allows you to create very attractive UI around your R code and showcase the resulting product to the outside world. Check out their gallery.
So we are pretty excited about all the things that R can bring to our table and we’d love to put these skills and passion for R and what it can do towards benefiting your business.
If you’d like us to use our R skills to develop some models that can predict outcomes for your business and answer business critical questions, just drop us a line!
Image credit: Designed by Freepik