Logistic regression and linear discriminant analysis in the evaluation of factors associated with stunting in children: Divergence and similarity of the statistical methods

Type Thesis or Dissertation - Master of Science in Biostatistics
Title Logistic regression and linear discriminant analysis in the evaluation of factors associated with stunting in children: Divergence and similarity of the statistical methods
Author(s)
Publication (Day/Month/Year) 2016
URL http://ir.uz.ac.zw/bitstream/handle/10646/2658/RUTUNGA_Logistic_regression_and_linear_discriminant_a​nalysis_in_the_evaluation-of_factors_associated_with_stunting_in_children_.pdf?sequence=1&isAllowed=​y
Abstract
Background: Stunting is a well-established child health indicator of chronic malnutrition
which is associated with biological, environmental and socioeconomic factors. Logistic
regression and linear discriminant analysis are two statistical methods that can be used to
predict or classify subjects as either stunted or not stunted based on all or a subset of
measured predictor variables. The predictive accuracy of the two methods were compared
with respect to several attributes of each of the methods.
Methods: Data used for the study was extracted from the Zvitambo trial data set. The
multivariable logistic regression and linear discriminant models were fitted using 20
bootstrap samples for cross validation of the coefficients. The two models were compared
with respect to the variables selected, the sign and magnitude of the coefficients, sensitivity,
specificity, overall classification rate and areas under ROC curves. The two methods were
applied in combination to check if predictive accuracy would improve.
Results: Logistic regression and linear discriminant analysis had the same predictive
accuracy with classification rates of 78.76% and 78.86% respectively. Both methods
identified two common factors, sex and birth weight, and the coefficients of the two factors
had the same negative sign but the magnitude differed significantly, both had low sensitivity
(13.19% and 8.68%) and high specificity (97.44% and 98.24%). Combining the two methods
did not improve predictive accuracy (71.5% before and 70.24% after).
Conclusion: The two multivariable techniques tend to converge in classification accuracy
mainly when the sample size is large (>50) but when faced with making a choice between the
two, it is recommended to use the method whose assumptions for application are fulfilled.

Related studies

»