Causality Vs Correlation

Correlation and causation are terms which are mostly misunderstood and often used interchangeably. Understanding both the statistical terms. In statistics, the phrase 'correlation does not imply causation' refers to the inability to legitimately deduce a cause-and-effect relationship between two variables solely on the basis of an observed association or correlation between them. The idea that 'correlation implies causation' is an example of a questionable-cause logical fallacy, in which two events occurring together are taken to.

Of all of the misunderstood statistical issues, the one that’s perhaps the most problematic is the misuse of the concepts of correlation and causation. Correlation, as a statistical term, is the extent to which two numerical variables have a linear relationship (that is, a relationship that increases or decreases at a constant rate). Following are three examples of correlated variables:

The number of times a cricket chirps per second is strongly related to temperature; when it’s cold outside, they chirp less frequently, and as the temperature warms up, they chirp at a steadily increasing rate. In statistical terms, you say the number of cricket chirps and temperature have a strong positive correlation.
The number of crimes (per capita) has often been found to be related to the number of police officers in a given area. When more police officers patrol the area, crime tends to be lower, and when fewer police officers are present in the same area, crime tends to be higher. In statistical terms we say the number of police officers and the number of crimes have a strong negative correlation.
The consumption of ice cream (pints per person) and the number of murders in New York are positively correlated. That is, as the amount of ice cream sold per person increases, the number of murders increases. Strange but true!

But correlation as a statistic isn’t able to explain why or how the relationship between two variables, x and y, exists; only that it does exist.

Causation goes a step further than correlation, stating that a change in the value of the x variable will cause a change in the value of the y variable. Too many times in research, in the media, or in the public consumption of statistical results, that leap is made when it shouldn’t be. For instance, you can’t claim that consumption of ice cream causes an increase in murder rates just because they are correlated. In fact, the study showed that temperature was positively correlated with both ice cream sales and murders. When can you make the causation leap? The most compelling case is when a well-designed experiment is conducted that rules out other factors that could be related to the outcomes.

You may find yourself wanting to jump to a cause-and-effect relationship when a correlation is found; researchers, the media, and the general public do it all the time. However, before making any conclusions, look at how the data were collected and/or wait to see if other researchers are able to replicate the results (the first thing they try to do after someone else’s “groundbreaking result” hits the airwaves).

A correlation is a measure or degree of relationship between two variables. A set of data can be positively correlated, negatively correlated or not correlated at all. As one set of values increases the other set tends to increase then it is called a positive correlation.

As one set of values increases the other set tends to decrease then it is called a negative correlation.

If the change in values of one set doesn't affect the values of the other, then the variables are said to have 'no correlation' or 'zero correlation.'

A causal relation between two events exists if the occurrence of the first causes the other. The first event is called the cause and the second event is called the effect. A correlation between two variables does not imply causation. On the other hand, if there is a causal relationship between two variables, they must be correlated.

Example:

A study shows that there is a negative correlation between a student's anxiety before a test and the student's score on the test. But we cannot say that the anxiety causes a lower score on the test; there could be other reasons—the student may not have studied well, for example. So the correlation here does not imply causation.

However, consider the positive correlation between the number of hours you spend studying for a test and the grade you get on the test. Here, there is causation as well; if you spend more time studying, it results in a higher grade.

One of the most commonly used measures of correlation is Pearson Product Moment Correlation or Pearson's correlation coefficient. It is measured using the formula,

r_{x y} = \frac{n \sum x y - \sum x \sum y}{\sqrt{(n \sum x^{2} - {(\sum x)}^{2}) (n \sum y^{2} - {(\sum y)}^{2})}}

The value of Pearson's correlation coefficient vary from

- 1

+ 1

where –1 indicates a strong negative correlation and

+ 1

indicates a strong positive correlation.