As old as I am, about to turn 50, I realise that I have been working in the industry of advanced analytics for over 20 years, or since May 1998. Of course, back then we called it data mining, which is still a name that I think is genius in itself. Since then what I have been doing has had quite a few different names, buzzwords if you will. Predictive analytics, Big data analytics, Machine learning, Artificial intelligence and also my preferred Advanced analytics to name a few.
Between you and me, nothing much have changed since 1998. The algorithms are mainly the same and the use cases I worked with back then are still the same. What has changed might be the amount of data and the accessibility of it that has improved quite a bit though, and the difference between Data mining and Artificial intelligence is mainly the way we deploy where we have a wider range of options today than back then – options that allows us to benefit from the cleverness of the algorithms in an intuitive manner mimicking intelligence. But apart from that we are still working out of the same portfolio of mathematics with largely the same business issues to be resolved.
I am indeed happy to having been along on this journey and looking back I now have gathered a small portfolio of anecdotes that I want to share with you here.
My first exposure to SPSS Clementine, which later became IBM SPSS Modeler, was a use case in retail for the classical example of basket analysis, i.e. for a customer that buys a ping pong racket it is not unlikely that we will have a great opportunity to increase the sale with a whopping offering for 3-pack of ping pong balls. In geek we call this Association analysis. In the retail setting it is of course a given modus operandi and I reckon that today this is considered standard procedure for all retailers, online or store.
But if we just expand the horizon slightly, we see that the same reasoning applies in other areas. Fraud! An online communication company applied the exact same algorithm to identify patterns of the geographical location for the IP-addresses that their customers were using for their services and that also were fraudulent in a way or another.
SO what was an algorithm mainly helping us to cross sell certain products in a store could also help us understand that when a customer were using IP-addresses from two or more nations, specifically country x and country y the risk of it being a fraudulent customer increased with 400% - might deserve a double check before we let that one through.
Another anecdote is for this Finnish grocery retailer that wanted to increase the sales by adding a certain dynamic to the assortment. Depending on the time of the year, the socio-demographics of the customers in the catchment area and how the store was located with regards to competition and stores in the same chain an expected sales value was calculated effectively creating a recommended assortment to every store manager offering the possibility to optimise sales.
For us geeks the algorithm was linear regression. Naturally this is a given story but I am currently engaged with a company within the mining industry, and what they want to estimate is Remaining Useful Lifetime (RUL) for their fleet of trucks – yes you saw it coming – it is addressed in a very similar way. By putting telemacy data, geographical conditions and information on how the truck has been maintained a fairly accurate estimate will be produced through a linear regression model. No, it is not optimised sales number in that case, but it is instead minimised down time – in all essence very similar.
My well of anecdotes is far from empty, it is actually filling up even as I type, but I will stop there for a moment with a reflection. Data mining is a versatile tool. Honestly, from an algorithimical stand point I do not really care if it is a customer that will buy more, a patient that is about to die or a truck that will get maintained – it all boils down to applying the correct math and deploy it accordingly to make sure it is actionable and lucrative.
If in doubt, please do not hesitate to talk to us on Houston Analytics to learn how applied math can help you in the best way. And no, it doesn’t really matter if analytics for you is a green field yet to be explored or if you have a well functional data science practise running – fresh inspiration will always be a good thing.