Logistic regression and linear discriminant analysis in the evaluation of factors associated with stunting in children: Divergence and similarity of the statistical methods

Type	Thesis or Dissertation - Master of Science in Biostatistics
Title	Logistic regression and linear discriminant analysis in the evaluation of factors associated with stunting in children: Divergence and similarity of the statistical methods
Author(s)	Loyce Rutunga
Publication (Day/Month/Year)	2016
URL	http://ir.uz.ac.zw/bitstream/handle/10646/2658/RUTUNGA_Logistic_regression_and_linear_discriminant_analysis_in_the_evaluation-of_factors_associated_with_stunting_in_children_.pdf?sequence=1&isAllowed=y
Abstract	Background: Stunting is a well-established child health indicator of chronic malnutrition which is associated with biological, environmental and socioeconomic factors. Logistic regression and linear discriminant analysis are two statistical methods that can be used to predict or classify subjects as either stunted or not stunted based on all or a subset of measured predictor variables. The predictive accuracy of the two methods were compared with respect to several attributes of each of the methods. Methods: Data used for the study was extracted from the Zvitambo trial data set. The multivariable logistic regression and linear discriminant models were fitted using 20 bootstrap samples for cross validation of the coefficients. The two models were compared with respect to the variables selected, the sign and magnitude of the coefficients, sensitivity, specificity, overall classification rate and areas under ROC curves. The two methods were applied in combination to check if predictive accuracy would improve. Results: Logistic regression and linear discriminant analysis had the same predictive accuracy with classification rates of 78.76% and 78.86% respectively. Both methods identified two common factors, sex and birth weight, and the coefficients of the two factors had the same negative sign but the magnitude differed significantly, both had low sensitivity (13.19% and 8.68%) and high specificity (97.44% and 98.24%). Combining the two methods did not improve predictive accuracy (71.5% before and 70.24% after). Conclusion: The two multivariable techniques tend to converge in classification accuracy mainly when the sample size is large (>50) but when faced with making a choice between the two, it is recommended to use the method whose assumptions for application are fulfilled.

Related studies

»	Zimbabwe - Demographic and Health Survey 2010-2011