

STATISTICS CORNER 

Year : 2022  Volume
: 34
 Issue : 2  Page : 178181 

Statistical analysis of quantitative and qualitative variables—A quick glimpse
Sandhya Somasundaran
Department of Ophthalmology, Govt Medical College Kozhikode, Kerala, India
Date of Submission  23Feb2022 
Date of Decision  10May2022 
Date of Acceptance  15May2022 
Date of Web Publication  30Aug2022 
Correspondence Address: Dr. Sandhya Somasundaran Department of Ophthalmology, Govt Medical College Kozhikode, Kerala India
Source of Support: None, Conflict of Interest: None  Check 
DOI: 10.4103/kjo.kjo_37_22
How to cite this article: Somasundaran S. Statistical analysis of quantitative and qualitative variables—A quick glimpse. Kerala J Ophthalmol 2022;34:17881 
Karl Pearson, one of the founding fathers of statistics, once said—”Statistics is the grammar of science.” Just like we need good grammar for effective communication, we also need proper statistics for advancing science.^{[1],[2]}
In previous issues, we dealt with types of variables and risk. In this issue, we will be dealing with the analysis of both quantitative and qualitative variables [Figure 1] and [Figure 2].  Figure 2: Analysis of correlation between quantitative variables and analysis of qualitataive variables
Click here to view 
Analysis of Quantitative Variables   
Normality tests
The distribution of continuous quantitative variables should be checked. This can be performed with charts (histogram, QQ plots, or box plot) or with statistical tests (Kolmogrov–Smirnov or Shapiro–Wilk test). In normal distributions, we compare the mean using parametric tests. In nonnormal distributions, the median is compared using parametric tests.^{[3]}
Descriptive statistics
In descriptive statistics, we describe the baseline parameters of the study groups. We use mean and standard deviation to describe the data in normal distributions. For nonnormal distributions, median and interquartile range are a better measure.
Analytic statistics^{[1],[2],[3],[4]}
Analytic tests can be carried out for finding:
 Difference between two groups [Figure 1]
 Correlation between variables.
Parametric tests
Difference between the means
 Comparing two groups—Ttests
 Comparing more than 2 groups—analysis of variance (ANOVA)
Both Ttests and ANOVA can be performed only if the data fulfills the following assumptions:
 Data is quantitative
 Variables are normally distributed.
 Samples are random
 Variance is equal in all samples^{[1],[2],[5]}
Ttest
There are 2 types of Ttests:
 Unpaired Ttest—two unrelated samples
 Paired Ttest—two related samples (before and after)
Unpaired Ttest
This is subdivided into onesample Ttest and twosample Ttest.
 The onesample Ttest compares the mean of a sample population to its standard value, for example, comparing for mean hemoglobin values of a group of diabetic patients to a standard value of 13.
 The twosample Ttest compares the means of two unrelated samples, for example, comparing central foveal thickness in two groups (bevacizumab group and ranibizumab group).
Paired Ttest
This test is used when there are two related samples, for example, comparing the intraocular pressure before and after exercise.
2. ANOVA
ANOVA is used to detect the difference in means between three or more independent groups. There are three types of ANOVA:
 Oneway ANOVA: It can detect the difference in means between three or more groups, for example, comparing central foveal thickness measurement in three intravitreal injection groups—bevacizumab, ranibizumab, and aflibercept.
However, the oneway ANOVA only shows if the means are different from each other. To know which pair is different, posthoc tests such as Tukey or Scheffe tests are used.^{[1],[4],[6]}  Twoway ANOVA: It compares two or more independent categorical groups that can be divided into subgroups, for example, comparing central foveal thickness in three injection groups (bevacizumab, ranibizumab, and aflibercept) and also between males and females in these 3 groups. Here, we have two independent groups: type of injection and gender.
 Repeatedmeasure ANOVA.
It is similar to the paired Ttest. It is used to find the difference in means when more than two observations are made in the same subject, for example, comparing intraocular pressure before exercise, 1 hour after exercise, and 3 hours after exercise.
Correlation between variables
 Correlation tests
Correlation tests are used to find relationships between variables and to quantify this relation. The correlation coefficient ranges from −1 to +1. If the correlation coefficient is zero, there is no correlation. If it is −1, there is perfect negative correlation, and if it is +1, there is perfect positive correlation.
Cohen suggested the following interpretation of the absolute value of correlation:
−0.3 to +0.3: Weak
−0.5 to −0.3 or +0.3 to +0.5: Moderate
−0.9 to −0.5 or +0.5 to +0.9: Strong
−1.0 to −0.9 or +0.9 to +1.0: Very strong
The most common test used for finding correlation is the Pearson correlation coefficient. In this test, both dependent and independent variables should be continuous and normally distributed.  Regression
Regression is the mathematical prediction of a dependent variable) using the value of an independent variable. There are different types of regression analytical models.  Simple linear regression is used for continuous variables with one independent and one dependent variable.
 Multiple linear regression is used for continuous variables with multiple independent variables and a single dependent variable.
Nonparametric tests
If the continuous variable is not normally distributed, we should use nonparametric tests. These include:
 Mann–Whitney test (Mann–Whitney–Wilcoxon or Wilcoxon ranksum test or Wilcoxon–Mann–Whitney test): This test is the nonparametric equivalent test of unpaired Ttest.
 Wilcoxon signed rank test: This test is the nonparametric test equivalent of paired Ttest.
 Kruskal–Wallis test: This is the nonparametric equivalent of oneway ANOVA.
 Friedman test: This test is the nonparametric equivalent of repeatedmeasure ANOVA.
 Spearman correlation.
 Logistic regression.
Analysis of qualitative variables
The tests used for categorical data are given in [Table 1].
Pearson's chi square test
It is the most commonly used test for categorical variables. It determines the difference in the proportion between two or more independent categorical groups.
Fisher's exact test is an alternative to the chi square test. It is used for small sample sizes.
Chi square test for trend is used when one variable is binary and the other is ordinal. It is used to assess whether the association between variables follows a trend, for example, to know the trend of uveitis among the different age groups.
Chi square goodnessoffit test
This test is applied to a single categorical variable drawn from a population through random sampling. It determines whether the sample data represents the actual population data.
Spearman correlation
Spearman correlation is the nonparametric equivalent of the Pearson correlation test. It determines whether two variables can predict each other. The test statistics varies from −1 (perfect negative correlation) to +1 (perfect positive correlation).
Logistic regression
This test is similar to linear regression. It is used to predict outcomes from one or more response variables. The variables may be categorical or continuous variables with skewed distribution.
Researchers must plan the statistical analysis at the beginning of the study itself. Choosing the statistical test is of paramount importance in a scientific paper. We hope that this article will give an overview of the various analytical tests in statistics.
Figures below summarize the statistical tests.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References   
1.  Gaddis ML, Gaddis GM. Introduction to biostatistics: Part 6, Correlation and regression. Ann Emerg Med 1990;19:14628. 
2.  Hazra A, Gogtay N. Biostatistics series module 3: comparing groups: numerical variables. Indian J Dermatol 2016;61:25160. [ PUBMED] [Full text] 
3.  Neideen T, Brasel K. Understanding statistical tests. J Surg Educ 2007;64:936. 
4.  Hazra A, Gogtay N. Biostatistics series module 9: survival analysis. Indian J Dermatol 2017;62:2517. [Full text] 
5.  Hoffman JI. Biostatistics for Medical and Biomedical Practitioners. 2 ^{nd} ed. Elsevier Science; 2019. 
6.  Vittinghoff E, Glidden DV, Shiboski SC, McCulloch CE. Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models. Springer, 2011. 
[Figure 1], [Figure 2]
[Table 1]
