## Describe each variable using appropriate descriptive statistics; no need to recode anything but make sure dummy coding is correct

University of Maryland
School of Nursing
NRSG 795

Analysis #4: Multiple linear and logistic regression and sensitivity and specificity

Clients seeking care at a weight loss clinic answered a short survey about their experience. The survey collected information on their age, gender, diet adherence and also contained a few short screening tools that assessed the minutes they exercised per day, the minutes sedentary per day, and their confidence in losing weight. These variables were used to predict weight loss and if they would recommend the clinic to a friend.

Part A. Multiple Linear Regression
Task: Evaluate if there is a relationship (predict) between the personal characteristics and their weight loss. Prepare a short description of what was done and what you found.

Conduct a multiple linear regression to predict weight loss using all of the personal characteristic variables (if appropriate). Use the steps provided in module 9 as a guide of how to conduct this analysis and include in your description what you did, such as the following:
a. Define the hypothesis
H0: 1 = 2 = … = k = 0
There are no significant associations at all. That is, all of the coefficients are zero and none of the variables belong in the model.
None of the personal characteristics will be associated with weight loss.
H1: At least one is not zero
The alternative hypothesis is not that every variable belongs in the model but that at least one of the variables belongs in the model.
ex) At least, one of the independent variables significantly contributes to weight loss.

b. Describe each variable using appropriate descriptive statistics; no need to recode anything but make sure dummy coding is correct
Table 1. Characteristics of selected clients of the weight loss clinic (N=51)
n %
%
Range Mean (SD)
Age (years) 51 19- 46 26.9 (8.1)
Exercise minutes per day 51 30-50 39.6 (5.5)
Confidence in success 51 11-26 17.8 (3.4)
Sedentary minutes per day 51 106-124 114.4 (4.4)
No 19 37.3
Yes 32 62.7
Gender
Male
22 43.1
Female 29 56.9

c. Run the bivariate associations need to avoid multicollinearity

Table 2. Correlation table (Spearman values shown when nominal variable included)

exercise confid sedentary Sex* diet* age
exercise 1
confid -.456 1
sedentary .837 -.352 1
sex 0.005 -0.058 -.039 1
diet -.242 .447 -.297 -.098 1
age .322 .163 .281 -.028 .033 1
lbslost -.542 .515 -.454 .139 .438 -.06

Intellectus statistics
Table 1
Pearson Correlation Results Among age, confid, sedentary, exercise, and lbslost
Combination rp Lower Upper p
age-confid 0.16 -0.12 0.42 .254
age-sedentary 0.28 0.01 0.52 .046
age-exercise 0.32 0.05 0.55 .021
age-lbslost -0.06 -0.33 0.22 .674
confid-sedentary -0.35 -0.57 -0.08 .011

Part B Multiple logistic regression
Task: Now we would like to see if we can find a relationship (predict) between these baseline measures and if they will recommend the clinic to others. No model needs to be run output is provided so we focus on interpreting it.
1. Is running a multiple logistic regression appropriate for this task? Explain why it is or is not appropriate.
Yes appropriate as Logistic regression should have a categorical variable as the dependent variable. The recommend yes/no is a binary dependent variable.
2. Define the hypotheses
Null hypothesis is that none of the independent variables affects the probability of the dependent variable (yes or no). This implies that all of the coefficients are zero.

H0: 1 = 2 = … = k = 0 None of the client variables will be associated with recommending the clinic.
H1: At least, one is not zero At least, one of the independent variables significantly contributes to the recommendation result.
3. How many and what percent of clients indicated they would recommend the clinic?
25 clients (49%)
4. Using the output as shown below write a summary of what we found.
DV: recommend clinic to others (1=yes vs 0=no)
B S.E. Sig. OR 95% C.I.for OR
Lower Upper
Step 1a lbslost .248 .095 .009 1.282 1.064 1.544
Sex (female vs male) 1.393 .694 .045 4.028 1.034 15.683
age .011 .044 .801 1.011 .928 1.102
diet .113 .757 .882 1.119 .254 4.935
Constant -7.228 2.780 .009 .001

Two variables (lbslost and sex) were significant predictors in this logistic regression analysis. The odds ratio for sex is 4.03 (95% CI: 1.03 to 15.68). It indicates that female clients are four times more likely to indicate they will recommend the clinic to their friends than male clients. The odds ratio for weight loss is 1.28 (95% CI: 1.06 to 1.54). It indicates that for every additional one pound lost the odds of clients recommending the clinic goes up by 28%. No association was found between age and recommending (p>.05) and having adhered to the diet and recommending the clinic (p>.05)

Part C. Sensitivity & Specificity
Our survey used a self report measure of diet adherence. We want to assess if the results are valid and accurate by comparing the self report with a gold standard (stool sample detecting microbiome and should see only small amounts of fats and sugars, etc). We identify 15 true positives out of the 32 clients who self identified as being adherent and 18 true negatives.

1. Fill in the following table
Gold standard positive Gold standard negative Total
Self report + adherence 15 17 32
Self report nonadherence 1 18 19
Total 16 35 51

Gold standard positive Gold standard negative Total
Self report + adherence a (True positive) b (False positive) 32
Self report nonadherence c (False negative) d (True negative) 19
Total 16 35 51

2. Calculate the sensitivity of the self report. a/a+c 15/16=94%
3. Calculate the specificity of the self report. d/b+d 18/35=51%
4. What does this meanwas our self report response OK ?
Low specificity implies many false positives. d/b+d – low d means larger b
A limitation of the self report is people want us perhaps to think they are doing the diet when they arent (gold standard contradicts their self report). These people would be misclassified as dieting (and if assume the diet is real and people following it should lose weight) thus it may influence our findings such that we may not see the expected association. Harder to find differences if negative people (non dieters) are mixed in with the true positives (where diet is working) this tends to bias any relationship established towards null (making it harder to reject the null hypothesis).
If we find an association we may be incorrectly concluding the diet works because too many of the clients were not actually dieting. Not valid.

Part D. Run Chart

In addition to creating a figure that illustrates the run chart, provide a summary of the context of the analysis and your run chart findings that include answers to a few questions.

Clinic X set out to improve the health of their diabetic patients. While developing an evidence based educational program for their clinic they monitored the proportion of A1C levels that were less than 7% for the first nine months of 2017 (mean=0.30, standard deviation=0.07). In October, November and December of 2017 the clinicians made improvements in their diabetes education program. The run chart displayed above illustrates that after the educational program the mean proportion of patients with a A1C level less than 7% increased to 0.64 (SD= 0.08). We cant be sure the changes lead to the improvements as more refined measurement is not available to determine what specifically lead to the improvement. Additional data useful for next steps in this quality improvement initiative could include identifying the range of hemoglobin A1C levels of all patients and what are the natural cut off points for different groups of patients, e.g., those less than 7.0%, those 7.0% to less than 8.0%, those 8.0% to less than 9.0%, and those more than 9.0%. So all groups of patients show improvements or are improvements made only among just a few of the groups of patients.