Methodology and Workflow for Models in this Project

Methodology

Artificial neural networks, typically recurrent neural networks, are inherently suitable for capturing the inter-temporal dependency of sequential data.

Also, particular types of recurrent units like Long Short-Term Memory Unit are designed to grasp relevant information across various lengths of lagged periods.

To predict the value of one period, we use historical observations as the feature and train model to fit the value in the target period.

We use various algorithms, typically Adam optimizer to train the model to minimize the mean squared error between predicted values and actual values.

After each training session, an expanded set of metrics, including mean-squared-error (MSE), mean-absolute-percentage-error(MAPE) and root-mean-square-error(RMSE) are calculated based on predictions from the neural network and the actual series of data. To measure the performance, we also implemented various time series models as benchmarks and comparisons of accuracies are made.

We have implemented several baseline neural networks, including multi-layer LSTM, and detailed demonstrations are available on the demo page.

Basic Workflow

I. Data Preprocessing

i. Differencing the Raw Dataset

In our baseline neural network, we use a univariate time series as our main dataset.

A typical dataset would look like

We firstly take differencing on dataset to ensure the stationarity of our time series, so that the value at each period in the transformed series is defined as

Where is the degree of differencing.

The above process can be applied recursively if the first order differenced time series is still suspicious for non-stationarity.

The total number of iteration of differencing is governed by the order parameter in data pre-processing.

ii. Generating a Supervised Learning Problem

To train our model, we first convert it into a typical supervised learning problem so we can train neural networks with error minimization oriented algorithms.

With user specified lag variable, , the supervised learning problem (SLP) generator loops over the entire dataset and for each period, , it marks the series range from to as training feature and value at period as the label.

For each training sample, using sequence notation, the feature would be

And the label is

By dropping the first few observations in the time series (since we don't have sufficient historical data to make predictions on them), we can generate roughly as many feature-label pairs, say, sample, as the length of time series. eq another sample

iii. Splitting SLP

After generating the collection of samples, we split them into three subsets for training, testing and validation purposes. Typically, ratios of 0.6:0.2:0.2 and 0.7:0.15:0.15 are chosen, depends on the total number of observations we have in raw dataset.

II. Training the Model

In each training session, specific optimizer with given parameters trys to minimize the mean-squared-error between predictions and actual values on the training set only.

Moreover, loss metrics are evaluated and recorded periodically to avoid over-fitting.

Model structure (graph) and weights are stored after training session finishes.

III. Evaluating the Model and Visualize

After the training session is finished, we evaluate the performance of the model with various performance and compare them with benchmark models from time series analysis.

Also, a copy of TensorBoard source code is stored together with the model structure and weights after training, one can use the tensorboard to navigate the structure of model and detailed metrics on performance.