Before building a forecasting model the team had to tackle data discrepancies to ensure accurate predictions. The data scientists at i2e first took on the task of geographic definition, for this the team cross referenced the historical sales data with the annual calendars of various regions to understand the reasons behind the variations in sales globally. The team used AWS SageMaker to conduct geo location trans reference to understand the variations in sales.
Once the data discrepancies were cleaned, the team further used the Augmented Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin test (KPSS) tests to determine the stationarity of the data which allowed them to double check the accuracy of the data post data cleaning. The team built several models starting from simple Moving Average to ARIMA, SARIMA, SARIMAX. We even used PydArima(Pyramid Arima) and FbProphet along with LSTM to train and predict the future sales trends for various APIs compounds accurately. The team also used deep learning models to determine the upper and lower limits to attain a prediction accuracy of 82%.