This article will detail the study of time series. The objective of this study is to analyze the behavior of these series in order to understand its components and to make predictions.
Definition
A time series is a set of data which represents the evolution of a phenomenon over time. It is characterized by:
Component 1, the trend: general evolution of the series
Component 2, seasonality: variation of values over a defined period of time (week / month / year)
Component 3, noise (or residue): events that cannot be predicted
From the 3 components above and the right choice of a statistical model, it is possible to summarize the data and predict the future.
How to choose your model?
There are two main types of model which are:
The additive model where we sum the three components
The multiplicative model where we multiply the three components
To choose which model to use, you have to observe whether the season increases or decreases with the trend.
The method to make this observation is as follows:
Connect the maxima between them
Connect the minima between them
Study the parallelism between the two lines
If the lines are parallel, the additive model is the most appropriate, if the lines diverge, the multiplicative model should be chosen
Example applications of these models:
In the example above, we see for the example on the left, the difference between the two lines remains approximately the same. So the additive model is the most suitable.
Statistical Methodology of the Decomposition
Thus the time series can be broken down into 3 components.
In the additive model, we start by calculating the trend. It can be estimated in several ways via a parametric method (type least squares calculation ). The trendline may, depending on the model, be:
linear: y = a t + b
quadratic / order 2: y = a t² + bt + c
exponential: y = a exp (wt)
ARIMA : for non-stationary series
For seasonality, the objective is to find a pattern that is repeated over a temporal frequency. We must remove the trend component and distinguish the period of the season and its reason.
The noise where the residue is what is left after removing the trend and seasonal components. It is generally estimated to be Gaussian white noise .
Note: For a multiplicative model, we can reduce to an additive model by taking the natural logarithm of the time series and thus to its previous decomposition
We can evaluate the share of each of these components by calculating the variance of the latter and that of the time series. Mathematically, variance explains the deviation of a curve from the mean. From the variance of the time series and that of its components, we can calculate the proportion of the variance of each of these components. The greater the share of the variance of a component, the more it will explain the phenomenon. Thus, a market with a strong seasonality will have its seasonal component with a high variance.
Note: The sum of the parts of the variance of the three components is not 100% (the sum of squares is not necessarily equal to the square of the sum). However, it can be rebased to 100%.
Predictive Model Application
When the three components of a time series are identified, it is now possible to build a predictive model.
The three parts of the time series are determined, it is possible to calculate it by scrolling through the days (we calculate the model for one day after the end date).
It is very important to carry out the decomposition of a time series in order to succeed, subsequently, with the most accurate prediction.