Data plays a very important role in every area of a company. When it comes to data, a distinction is made primarily between operational data and dispositive data. Operational data play an important role, especially in day-to-day business. However, they are not nearly as relevant as dispositive data. This is because these data are collected over a longer period of time and provide an initial insight into the history or the past.
The advantage of dispositive data is not only the view into the past but also the opportunity to get an insight into the future one gains with the help of this data, dependent upon which statistical model is applied. One statistical model is discussed in more detail in this Newsroom Article: the simple regression analysis.
Identifying trends at any price – does that make sense?
Having a lot of data is all well and good, but every company must first ask itself what it would like to do with this data. It is therefore advisable to formulate hypotheses before evaluating or analysing the data. Depending on how the hypothesis is formulated, one can already see which statistical model has to be applied. An example hypothesis would be “There is a correlation between advertising expenditure and company turnover”.
Well-formulated hypotheses can be recognised by the fact that they are not formulated as a question and can still be responded to with a simple “true” or “false”.
It therefore makes no sense to want to recognise trends at any price without first creating hypotheses for oneself. Of course, the data basis also plays a major role. Many companies are not aware that the current database in their company is not suitable for calculating trends or a simple regression. In order to get a first impression of whether one’s own company data is actually suitable for carrying out a simple regression analysis, the calculation of the correlation coefficient can be an appropriate means.
Correlations and regression
When conducting a simple regression analysis, two factors play a major role. One is the independent variable (x), and the other is the dependent variable (y). If we took the example from above, we would declare advertising expenditure as the independent variable and company turnover as the dependent variable.
At this point, many make the mistake of jumping straight into the calculation of the regression. It is important to know that it is disadvantageous to perform a regression analysis if there is no correlation between the independent variable (x) and the dependent variable (y). In other words, if there is no correlation or a negative correlation between two variables, a regression analysis is not appropriate.
The causal relationship + Conclusion
In the advantageous case that there is a positive correlation between two variables, a regression analysis can be carried out. For the calculation of the regression, various aspects have to be calculated. Among other things, the slope and the intercept are central. Once these points have been calculated, any value can be entered for x in order to then display the predicted value (i.e. the trend). Returning to the example from above, a good question would be: “How much advertising expenditure (x) does the company have to use to generate a certain turnover (y)?”
Another seal of approval of the simple regression analysis which should always be calculated is the coefficient of determination (R2 ). For the calculation of the coefficient of determination, the dispersion of the actual and the predicted values play a role. In addition, the coefficient of determination is expressed as a percentage. A coefficient of determination close to 100 per cent indicates that the underlying regression model is a good model that delivers acceptable forecast values.
Finally, it should be pointed out that a positive correlation between two variables does not mean that there is a causal relationship between these two variables.