|
|
STATISTICS CORNER |
|
Year : 2022 | Volume
: 34
| Issue : 2 | Page : 178-181 |
|
Statistical analysis of quantitative and qualitative variables—A quick glimpse
Sandhya Somasundaran
Department of Ophthalmology, Govt Medical College Kozhikode, Kerala, India
Date of Submission | 23-Feb-2022 |
Date of Decision | 10-May-2022 |
Date of Acceptance | 15-May-2022 |
Date of Web Publication | 30-Aug-2022 |
Correspondence Address: Dr. Sandhya Somasundaran Department of Ophthalmology, Govt Medical College Kozhikode, Kerala India
 Source of Support: None, Conflict of Interest: None  | Check |
DOI: 10.4103/kjo.kjo_37_22
How to cite this article: Somasundaran S. Statistical analysis of quantitative and qualitative variables—A quick glimpse. Kerala J Ophthalmol 2022;34:178-81 |
Karl Pearson, one of the founding fathers of statistics, once said—”Statistics is the grammar of science.” Just like we need good grammar for effective communication, we also need proper statistics for advancing science.[1],[2]
In previous issues, we dealt with types of variables and risk. In this issue, we will be dealing with the analysis of both quantitative and qualitative variables [Figure 1] and [Figure 2]. | Figure 2: Analysis of correlation between quantitative variables and analysis of qualitataive variables
Click here to view |
Analysis of Quantitative Variables | |  |
Normality tests
The distribution of continuous quantitative variables should be checked. This can be performed with charts (histogram, Q-Q plots, or box plot) or with statistical tests (Kolmogrov–Smirnov or Shapiro–Wilk test). In normal distributions, we compare the mean using parametric tests. In non-normal distributions, the median is compared using parametric tests.[3]
Descriptive statistics
In descriptive statistics, we describe the baseline parameters of the study groups. We use mean and standard deviation to describe the data in normal distributions. For non-normal distributions, median and interquartile range are a better measure.
Analytic statistics[1],[2],[3],[4]
Analytic tests can be carried out for finding:
- Difference between two groups [Figure 1]
- Correlation between variables.
Parametric tests
Difference between the means
- Comparing two groups—T-tests
- Comparing more than 2 groups—analysis of variance (ANOVA)
Both T-tests and ANOVA can be performed only if the data fulfills the following assumptions:
- Data is quantitative
- Variables are normally distributed.
- Samples are random
- Variance is equal in all samples[1],[2],[5]
T-test
There are 2 types of T-tests:
- Unpaired T-test—two unrelated samples
- Paired T-test—two related samples (before and after)
Unpaired T-test
This is sub-divided into one-sample T-test and two-sample T-test.
- The one-sample T-test compares the mean of a sample population to its standard value, for example, comparing for mean hemoglobin values of a group of diabetic patients to a standard value of 13.
- The two-sample T-test compares the means of two unrelated samples, for example, comparing central foveal thickness in two groups (bevacizumab group and ranibizumab group).
Paired T-test
This test is used when there are two related samples, for example, comparing the intraocular pressure before and after exercise.
2. ANOVA
ANOVA is used to detect the difference in means between three or more independent groups. There are three types of ANOVA:
- One-way ANOVA: It can detect the difference in means between three or more groups, for example, comparing central foveal thickness measurement in three intravitreal injection groups—bevacizumab, ranibizumab, and aflibercept.
However, the one-way ANOVA only shows if the means are different from each other. To know which pair is different, post-hoc tests such as Tukey or Scheffe tests are used.[1],[4],[6] - Two-way ANOVA: It compares two or more independent categorical groups that can be divided into sub-groups, for example, comparing central foveal thickness in three injection groups (bevacizumab, ranibizumab, and aflibercept) and also between males and females in these 3 groups. Here, we have two independent groups: type of injection and gender.
- Repeated-measure ANOVA.
It is similar to the paired T-test. It is used to find the difference in means when more than two observations are made in the same subject, for example, comparing intraocular pressure before exercise, 1 hour after exercise, and 3 hours after exercise.
Correlation between variables
- Correlation tests
Correlation tests are used to find relationships between variables and to quantify this relation. The correlation coefficient ranges from −1 to +1. If the correlation coefficient is zero, there is no correlation. If it is −1, there is perfect negative correlation, and if it is +1, there is perfect positive correlation.
Cohen suggested the following interpretation of the absolute value of correlation:
−0.3 to +0.3: Weak
−0.5 to −0.3 or +0.3 to +0.5: Moderate
−0.9 to −0.5 or +0.5 to +0.9: Strong
−1.0 to −0.9 or +0.9 to +1.0: Very strong
The most common test used for finding correlation is the Pearson correlation coefficient. In this test, both dependent and independent variables should be continuous and normally distributed. - Regression
Regression is the mathematical prediction of a dependent variable) using the value of an independent variable. There are different types of regression analytical models. - Simple linear regression is used for continuous variables with one independent and one dependent variable.
- Multiple linear regression is used for continuous variables with multiple independent variables and a single dependent variable.
Non-parametric tests
If the continuous variable is not normally distributed, we should use non-parametric tests. These include:
- Mann–Whitney test (Mann–Whitney–Wilcoxon or Wilcoxon rank-sum test or Wilcoxon–Mann–Whitney test): This test is the non-parametric equivalent test of unpaired T-test.
- Wilcoxon signed rank test: This test is the non-parametric test equivalent of paired T-test.
- Kruskal–Wallis test: This is the non-parametric equivalent of one-way ANOVA.
- Friedman test: This test is the non-parametric equivalent of repeated-measure ANOVA.
- Spearman correlation.
- Logistic regression.
Analysis of qualitative variables
The tests used for categorical data are given in [Table 1].
Pearson's chi square test
It is the most commonly used test for categorical variables. It determines the difference in the proportion between two or more independent categorical groups.
Fisher's exact test is an alternative to the chi square test. It is used for small sample sizes.
Chi square test for trend is used when one variable is binary and the other is ordinal. It is used to assess whether the association between variables follows a trend, for example, to know the trend of uveitis among the different age groups.
Chi square goodness-of-fit test
This test is applied to a single categorical variable drawn from a population through random sampling. It determines whether the sample data represents the actual population data.
Spearman correlation
Spearman correlation is the non-parametric equivalent of the Pearson correlation test. It determines whether two variables can predict each other. The test statistics varies from −1 (perfect negative correlation) to +1 (perfect positive correlation).
Logistic regression
This test is similar to linear regression. It is used to predict outcomes from one or more response variables. The variables may be categorical or continuous variables with skewed distribution.
Researchers must plan the statistical analysis at the beginning of the study itself. Choosing the statistical test is of paramount importance in a scientific paper. We hope that this article will give an overview of the various analytical tests in statistics.
Figures below summarize the statistical tests.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References | |  |
1. | Gaddis ML, Gaddis GM. Introduction to biostatistics: Part 6, Correlation and regression. Ann Emerg Med 1990;19:1462-8. |
2. | Hazra A, Gogtay N. Biostatistics series module 3: comparing groups: numerical variables. Indian J Dermatol 2016;61:251-60.  [ PUBMED] [Full text] |
3. | Neideen T, Brasel K. Understanding statistical tests. J Surg Educ 2007;64:93-6. |
4. | Hazra A, Gogtay N. Biostatistics series module 9: survival analysis. Indian J Dermatol 2017;62:251-7. [Full text] |
5. | Hoffman JI. Biostatistics for Medical and Biomedical Practitioners. 2 nd ed. Elsevier Science; 2019. |
6. | Vittinghoff E, Glidden DV, Shiboski SC, McCulloch CE. Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models. Springer, 2011. |
[Figure 1], [Figure 2]
[Table 1]
|