To do any sort of business planning, you need forecasting. Our reason for existing is to help our customers drive success from their data, so, around two years ago we began to research adding automated forecasting to our cloud analytics platform.

Forecasting LifecycleThis did however present a huge challenge. Take the typical lifecycle of a forecasting project. It’s an iterative process of refining a model, generating a forecast and evaluating the results to understand how to further refine the model. This cycle is repeated until you are happy with the results.

Evaluating the results and refining the model are usually manual tasks. They need an analyst with a good understanding of the model and the business or process that generates the data. That’s a pretty specific skillset.

That’s fine if you are creating one forecast for one organisation. We, however, want to scale our business rapidly, taking on lots of different customers. It’s not practical for us to recruit large numbers of highly skilled analysts in a short space of time. So, we need to automate as much of the process as we can.

We looked at a number of solutions. An obvious first choice, as we already use AWS, was the Amazon AI platform SageMaker, which includes DeepAR. DeepAR is a ML forecasting model based on a recurrent neural network with long-short term memory cells allowing it to model trends and seasonality. Importantly for us, the platform offers automatic model tuning. So, you run your data through the model numerous times, and it automatically selects hyperparameters to improve the forecast.

We also looked at Facebook Prophet. Unlike DeepAR, it’s a statistical model, similar to the generalised additive model (GAM). It copes particularly well with trends, seasonality and holidays, each being the additive components for the model. It’s also open source, and available in Python or R, so was easy for us to integrate into our platform.

Finally, we also considered Amazon Forecast. As a fully managed service on AWS, you upload your data and it uses automated machine learning (AutoML) to choose and refine an appropriate model. It includes a growing number of algorithms, including ARIMA, DeepAR+ (an enhancement of DeepAR), Prophet, ETS and NPTS.

We began the project using real world data from one of our existing customers, a jewellery retailer. We chose them primarily because they were enthusiastic to take part, but also their data was particularly challenging to forecast. The nature of their sales means the data is noisy, outliers are often high value sales that occur randomly, yet account for a large proportion of the value so cannot be discounted. The sales are also highly seasonal, with the 5 weeks leading up to Christmas accounting for as much as 50% of the sales.

1 Yr Sales

We trained the models on three years data, from 2015-2017, then tested the resulting forecasts against the real-world results for 2018. For comparison we had the manual forecasts the company had used at the time, but unfortunately this was only detailed enough for the fairly rudimentary measure of the total difference over 12 months.

On total error over 12 months all the methods we tested significantly outperformed the manual forecast. The best results were generated by Prophet, with a total error of under 300k (out of annual sales over 100 million). This compared to over 15 million error with the manual forecast. However, using more meaningful measures (RMSE and MAE) DeepAR gave us the most accurate results.

Results

In the 9 months since we conducted this research, we’ve integrated automated forecasting into our platform. We initially chose to use Prophet, even though DeepAR was arguably more accurate. Our reasoning was that Prophet was significantly easier to configure, and less resource hungry to train. Given the relatively small increase in accuracy DeepAR gave us, the ease at which we could scale the solution with Prophet won the day. In more recent months, with the added complication of forecasting during the Covid-19 crisis, we've begun using a combination of Prophet and LightGMB, with some success.

It has been fascinating to see how quickly this field has developed in the last year. The proliferation of available models, if anything, has gathered pace. We’ve continued to test different algorithms and techniques, and the level of automation and accuracy of our forecasting increases almost daily. We are already much closer to the goal of completely automating forecasting for industry than we thought possible 12 months ago. Who knows where we’ll be in another 12 months!

Published in Blog Posts
AIThe market chatter about Big Data and AI is relentless. For Big Data, the statistics that many of us in the tech industry see bandied about are certainly eye catching; 2.7 Zetabytes of data exist in the digital universe today, 571 new websites are created every minute of the day, by 2020 business transactions on the internet will reach 450 billion per day etc. For AI, they are no less impressive; there was more than $300 million in venture capital invested in AI startups in 2014, a 300% increase over the year before; by 2018, 75% of developer teams will include AI functionality in one or more applications or services; by 2020, 30% of all companies will employ AI to augment at least one of their primary sales processes etc.
 
However, for many people not directly involved in the tech industry or the IT department of a huge multinational it’s difficult to see how these grandiose claims have any relevance to their day to day tasks. The real issue is, until recently, to do anything innovative with big data or AI you needed highly skilled data scientists versed in seemingly impenetrable technologies like NoSQL, R, MapReduce or Scala. And these guys are hard to come by and expensive, and not getting cheaper. IBM predicts that demand for data professionals in the US alone will reach 2.7 million by 2020.
 
However, that’s not the complete picture. Much in the same way computers began entering the business world as the preserve of large corporations like J Lyons & Company and the U. S. Census Bureau, were later more widely used as companies that could afford the huge cost of buying them provided services to others, and finally the productization of computers by the likes of IBM allowed almost every organisation to buy their own, Big Data and AI are going through the same process of democratization. 
 
The major three Cloud data providers Microsoft, Google and Amazon are amongst a host of providers that now offer scalable and affordable Big Data platforms that can be spun up in seconds. In the last few years all three have also started offering API driven AI services bound into their cloud platforms. More importantly, those Big Data platforms and AI API’s are now becoming easily accessible to more traditional development environments like .NET. This means that millions of traditional developers can now leverage Big Data and AI without leaving the comfort of their familiar development environment. 
 
The natural consequence of this will be an explosion of products that leverage Big Data and AI technologies available to even the smallest organisations, allowing the huge opportunities to filter down to all. In fact, here at DataPA we have spent the last twelve months working hard on a new automated analytics product leveraging Big Data and AI techniques, which we are hugely excited about launching in the coming months. The world is on the cusp of huge change that historically will rival the industrial revolution, and we are excited about sharing that journey with all our customers and partners in the coming months and years. 
 
Published in Blog Posts