Of some statistical words and their misuse in modern society

We live in a world of post-truth, meaning that it becomes more and more clear that there is less and less truth, and much of the previously available truth is increasingly questioned. Personally, I think that from a long term perspective this is a good thing. I am a radical optimist, and I think if such a thing as truth and facts is individually questioned, it may lead to a society where more people know more. In the short term, however, we recognize many examples where the increasing questioning of such things as truth and facts creates problems. Take people that do not believe that human-made climate change is a fact, or people who (still or again) believe that the earth is flat. Interestingly, I often recognize that the people that question the traditional knowledge producing institutions are the ones who still use and misuse the tools of these institutions. Here, I present three examples of words from science-or more specifically statistics- that I hear being used very often, and mostly wrong.

The first example is the word „significant“. This word indicates an agreed upon measure that indicates a probability of whatever it is you test or model in statistics. In other words, „significant“ means that there is a 95 % chance that something is not happening by chance if it is significant. Scientists use this approach all the time. For instance, if Ibuprofen has worked in hundreds of patients against headaches, and this result was obtained in a clinical trial, there is a significant chance that it will also help against your headache. See what I did here! I used the word „significant“ correctly, at least under the assumption that the clinical trial followed the standards of medical studies, and had a large enough sample number and this data was analysed using statistics. Today, many people use the word „significant“ wrong. „The US economy grew significantly since I took office“ the Donald would say. Did he make a statistical test? And did he account for other effects, such as previous efforts to trigger growth, global dynamics and the weather, which explains an increase in construction jobs in an early spring? I assume the Donald made no such rigorous analysis. Still, many people seem to like the use of the word „significant“. It gives a certain air of confidence, which strangely enough is exactly what statistics would yield.

The second word is „correlated“. A correlation is the statistical test on whether two continuous variables are related. In my perception, about 99 % of all non-scientists who use the word do not talk about the relation of two continuous variables. Instead they may say that groups are correlated. Or that just two things are correlated. These people may want to look for an Anova or a chi-square test, but I assume these people are not really interested in statistics. They are interested in making their speech sound confident. Again. To me they discredit themselves. But when it comes to statistics, I am probably a minority.

Word number three: “Clustered”. This word indicates a statistical analysis of multivariate data into groups, or in other words, clusters. The way many people use it is not as if the „clustering“ is made by an algorithm, but as if it is some sort of a Bazinga. Clustering is a quantitative methods. Dividing somethings into groups by qualitative criteria is not clustering in a statistical sense. But it sure makes people sound cool to most people. But not to me. I am an outlier. By the way, this is also a word from statistics, meaning a data point that deviated from the representative group. Oh, deviate and representative are also from statistics. But you probably know that, if you are reading this. Probably is another one. The list goes on. You decide what to do with it. Make it count.