For this analysis I wanted to explore the connection between bone health and alcohol consumption. From my non-medical professional prospective, I assumed that there would be decreased bone health measures as alcohol consumption increased, but some papers actually show the opposite effect. I decided to see if the NHANES 2011 to 2016 data reflects this patter as well, and also to practice analyzing complex surveys and explore more that the survey package has to offer.
Load Packages
1
2
3
4
5
6
7
|
suppressPackageStartupMessages({
library(tidyverse)
library(nhanesA)
library(survey)
library(janitor)
library(tableone)
})
|
Load Data
I combined three cycles of NHANES data from the years 2011 to 2016, in doing this I saw in the documentation that new sample weights would have to be created to accommodate for this. Also, in reading the documentation I saw that the questionnaire that included the questions about alcohol consumption were administered in the mobile examination center, rather than in the in home interview questionnaire and therefore the exam weights would have to be used.
I performed this calculation when I was manipulating the other columns using dplyr’s mutate function, and for now I’m starting off with importing the relevant data using the nhanesA package.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
# Demographic Data
demo11 <- nhanes("DEMO_G") %>%
dplyr::select(SEQN, RIAGENDR, RIDAGEYR, RIDEXPRG, RIDRETH3, SDMVPSU, SDMVSTRA, WTMEC2YR)
demo13 <- nhanes("DEMO_H") %>%
dplyr::select(SEQN, RIAGENDR, RIDAGEYR, RIDEXPRG, RIDRETH3, SDMVPSU, SDMVSTRA, WTMEC2YR)
demo15 <- nhanes("DEMO_I") %>%
dplyr::select(SEQN, RIAGENDR, RIDAGEYR, RIDEXPRG, RIDRETH3, SDMVPSU, SDMVSTRA, WTMEC2YR)
# Alcohol Consumption Data
alc11 <- nhanes("ALQ_G") %>%
dplyr::select(SEQN, ALQ101, ALQ110, ALQ120Q, ALQ120U, ALQ130)
alc13 <- nhanes("ALQ_H") %>%
dplyr::select(SEQN, ALQ101, ALQ110, ALQ120Q, ALQ120U, ALQ130)
alc15 <- nhanes("ALQ_I") %>%
dplyr::select(SEQN, ALQ101, ALQ110, ALQ120Q, ALQ120U, ALQ130)
# Bone Density Data
bone11 <- nhanes("DXX_G") %>%
dplyr::select(SEQN, DXDTOBMC, DXDTOBMD)
bone13 <- nhanes("DXX_H") %>%
dplyr::select(SEQN, DXDTOBMC, DXDTOBMD)
bone15 <- nhanes("DXX_I") %>%
dplyr::select(SEQN, DXDTOBMC, DXDTOBMD)
|
Merge Data
1
2
3
4
5
6
7
|
df11 <- merge(demo11, alc11, by = c("SEQN"), all = TRUE) %>% merge(bone11, by = c("SEQN"), all = TRUE)
df13 <- merge(demo13, alc13, by = c("SEQN"), all = TRUE) %>% merge(bone13, by = c("SEQN"), all = TRUE)
df15 <- merge(demo15, alc15, by = c("SEQN"), all = TRUE) %>% merge(bone15, by = c("SEQN"), all = TRUE)
df <- rbind(df11, df13, df15)
df$SEQN <- NULL
|
Data Cleaning
Assess Variables
First I wanted to know the values of my selected variables so that I could make more concise categories and also assess for missing values and how they are coded.
1
2
3
4
5
6
7
|
## RIDRETH3 n percent
## Mexican American 5006 0.16741355
## Other Hispanic 3344 0.11183198
## Non-Hispanic White 9713 0.32482777
## Non-Hispanic Black 7079 0.23674002
## Non-Hispanic Asian 3398 0.11363788
## Other Race - Including Multi-Racial 1362 0.04554879
|
1
2
3
4
5
6
7
8
9
10
|
## RIDEXPRG n
## Yes, positive lab pregnancy test or self-reported pregnant at exam 192
## The participant was not pregnant at exam 3341
## Cannot ascertain if the participant is pregnant at exam 272
## <NA> 26097
## percent valid_percent
## 0.006420975 0.05045992
## 0.111731657 0.87805519
## 0.009096382 0.07148489
## 0.872750987 NA
|
1
2
3
4
5
|
## ALQ101 n percent valid_percent
## Yes 10844 0.3626513277 0.6948609509
## No 4748 0.1587853655 0.3042419582
## Don't know 14 0.0004681961 0.0008970909
## <NA> 14296 0.4780951107 NA
|
1
2
3
4
5
6
|
## ALQ110 n percent valid_percent
## Yes 2047 6.845696e-02 0.4298614028
## No 2700 9.029496e-02 0.5669886602
## Don't know 14 4.681961e-04 0.0029399412
## Refused 1 3.344258e-05 0.0002099958
## <NA> 25140 8.407464e-01 NA
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
|
## ALQ120Q n percent valid_percent
## 0 2581 8.631530e-02 0.2002172058
## 1 3001 1.003612e-01 0.2327980762
## 2 2527 8.450940e-02 0.1960282368
## 3 1508 5.043141e-02 0.1169808393
## 4 855 2.859341e-02 0.0663253433
## 5 753 2.518226e-02 0.0584128462
## 6 427 1.427998e-02 0.0331238849
## 7 606 2.026620e-02 0.0470095415
## 8 77 2.575079e-03 0.0059731596
## 9 9 3.009832e-04 0.0006981615
## 10 192 6.420975e-03 0.0148941122
## 11 3 1.003277e-04 0.0002327205
## 12 69 2.307538e-03 0.0053525716
## 13 2 6.688516e-05 0.0001551470
## 14 13 4.347535e-04 0.0010084555
## 15 63 2.106882e-03 0.0048871306
## 16 4 1.337703e-04 0.0003102940
## 17 1 3.344258e-05 0.0000775735
## 18 3 1.003277e-04 0.0002327205
## 19 1 3.344258e-05 0.0000775735
## 20 53 1.772457e-03 0.0041113955
## 21 1 3.344258e-05 0.0000775735
## 22 1 3.344258e-05 0.0000775735
## 24 8 2.675406e-04 0.0006205880
## 25 8 2.675406e-04 0.0006205880
## 28 3 1.003277e-04 0.0002327205
## 29 1 3.344258e-05 0.0000775735
## 30 35 1.170490e-03 0.0027150725
## 35 2 6.688516e-05 0.0001551470
## 36 1 3.344258e-05 0.0000775735
## 40 3 1.003277e-04 0.0002327205
## 42 1 3.344258e-05 0.0000775735
## 45 2 6.688516e-05 0.0001551470
## 48 4 1.337703e-04 0.0003102940
## 50 2 6.688516e-05 0.0001551470
## 52 1 3.344258e-05 0.0000775735
## 60 6 2.006555e-04 0.0004654410
## 61 1 3.344258e-05 0.0000775735
## 64 1 3.344258e-05 0.0000775735
## 72 1 3.344258e-05 0.0000775735
## 80 2 6.688516e-05 0.0001551470
## 90 4 1.337703e-04 0.0003102940
## 96 1 3.344258e-05 0.0000775735
## 97 1 3.344258e-05 0.0000775735
## 100 5 1.672129e-04 0.0003878675
## 108 1 3.344258e-05 0.0000775735
## 112 1 3.344258e-05 0.0000775735
## 120 1 3.344258e-05 0.0000775735
## 138 1 3.344258e-05 0.0000775735
## 140 1 3.344258e-05 0.0000775735
## 150 2 6.688516e-05 0.0001551470
## 160 1 3.344258e-05 0.0000775735
## 168 1 3.344258e-05 0.0000775735
## 178 1 3.344258e-05 0.0000775735
## 180 2 6.688516e-05 0.0001551470
## 182 1 3.344258e-05 0.0000775735
## 200 4 1.337703e-04 0.0003102940
## 220 1 3.344258e-05 0.0000775735
## 245 1 3.344258e-05 0.0000775735
## 250 1 3.344258e-05 0.0000775735
## 300 1 3.344258e-05 0.0000775735
## 305 1 3.344258e-05 0.0000775735
## 340 1 3.344258e-05 0.0000775735
## 350 3 1.003277e-04 0.0002327205
## 365 8 2.675406e-04 0.0006205880
## 777 2 6.688516e-05 0.0001551470
## 999 12 4.013109e-04 0.0009308820
## NA 17011 5.688917e-01 NA
|
1
2
3
4
5
|
## ALQ120U n percent valid_percent
## Week 4140 0.13845228 0.4020979
## Month 2880 0.09631463 0.2797203
## Year 3276 0.10955789 0.3181818
## <NA> 19606 0.65567521 NA
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
## ALQ130 n percent valid_percent
## 1 3601 1.204267e-01 3.493064e-01
## 2 2858 9.557889e-02 2.772335e-01
## 3 1498 5.009698e-02 1.453099e-01
## 4 777 2.598488e-02 7.537104e-02
## 5 467 1.561768e-02 4.530022e-02
## 6 525 1.755735e-02 5.092638e-02
## 7 89 2.976390e-03 8.633233e-03
## 8 139 4.648518e-03 1.348336e-02
## 9 20 6.688516e-04 1.940052e-03
## 10 103 3.444586e-03 9.991270e-03
## 11 10 3.344258e-04 9.700262e-04
## 12 128 4.280650e-03 1.241634e-02
## 13 7 2.340981e-04 6.790183e-04
## 14 9 3.009832e-04 8.730236e-04
## 15 33 1.103605e-03 3.201086e-03
## 16 7 2.340981e-04 6.790183e-04
## 17 1 3.344258e-05 9.700262e-05
## 18 5 1.672129e-04 4.850131e-04
## 20 8 2.675406e-04 7.760210e-04
## 21 2 6.688516e-05 1.940052e-04
## 23 1 3.344258e-05 9.700262e-05
## 24 4 1.337703e-04 3.880105e-04
## 25 1 3.344258e-05 9.700262e-05
## 30 2 6.688516e-05 1.940052e-04
## 64 1 3.344258e-05 9.700262e-05
## 82 1 3.344258e-05 9.700262e-05
## 777 2 6.688516e-05 1.940052e-04
## 999 10 3.344258e-04 9.700262e-04
## NA 19593 6.552405e-01 NA
|
This was a useful part of data cleaning, as I was able to see that certain missing values were coded as 999 or 777 rather than NA, this would’ve seriously thrown off my calculations when I tried to create the variable to capture average drinks over last 12 months time period.
Creating New Variables
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
dff <- df %>%
mutate(weight = 1/3*df$WTMEC2YR,
preg = as.factor(case_when(
RIDEXPRG == "Yes, positive lab pregnancy test or self-reported pregnant at exam" ~ "Yes",
RIDEXPRG == "The participant was not pregnant at exam" ~ "No",
TRUE ~ NA)),
race = as.factor(case_when(
RIDRETH3 == "Mexican American" | RIDRETH3 == "Other Hispanic" ~ "Hispanic",
RIDRETH3 == "Non-Hispanic White" ~ "White",
RIDRETH3 == "Non-Hispanic Black" ~ "Black",
RIDRETH3 == "Non-Hispanic Asian" ~ "Asian",
RIDRETH3 == "Other Race - Including Multi-Racial" ~ "Other")),
drinksyr = if_else(ALQ101 == "Don't know", NA, ALQ101),
drinkslife = if_else(ALQ110 == "Don't know" | ALQ110 == "Refused", NA, ALQ110),
drinksfreq = if_else(ALQ120Q == 777 | ALQ120Q == 999, NA, ALQ120Q),
drinksamt = if_else(ALQ130 == 777 | ALQ130 == 999, NA, ALQ130),
drinksfreqcalc = case_when(
ALQ120U == "Week" ~ drinksfreq * 52,
ALQ120U == "Month" ~ drinksfreq * 12,
ALQ120U == "Year" ~ drinksfreq,
ALQ120Q == 0 & ALQ120U == NA ~ 0,
ALQ120Q == NA & ALQ120U == NA ~ NA),
drinkstotalyr = drinksfreqcalc * drinksamt,
drinktype = as_factor(case_when(
RIAGENDR == "Female" & ALQ130 >= 4 ~ "Binge Drinker",
RIAGENDR == "Female" & ALQ130 < 4 ~ "Drinker",
RIAGENDR == "Male" & ALQ130 >= 5 ~ "Binge Drinker",
RIAGENDR == "Male" & ALQ130 < 5 ~ "Drinker",
ALQ110 == "No" ~ "Non Drinker"))
) %>%
rename(
gender = RIAGENDR,
age = RIDAGEYR,
psu = SDMVPSU,
strata = SDMVSTRA,
bmc = DXDTOBMC,
bmd = DXDTOBMD
) %>%
dplyr::select(-RIDRETH3, -RIDEXPRG, -ALQ101, -ALQ110, - ALQ120Q, - ALQ120U, -ALQ130, -WTMEC2YR)
|
Now I’m going to check out how my new columns look and make sure that there’s nothing unexpected that I didn’t cover through all of my case_when()s and if_else()s.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
## 'data.frame': 29902 obs. of 16 variables:
## $ gender : Factor w/ 2 levels "Male","Female": 1 2 1 2 2 1 1 1 1 1 ...
## $ age : num 22 3 14 44 14 9 0 6 21 15 ...
## $ psu : num 1 3 3 1 2 1 2 2 1 3 ...
## $ strata : num 91 92 90 94 90 91 92 103 92 91 ...
## $ bmc : num 2589 NA 1931 2039 1944 ...
## $ bmd : num 1.21 NA 1.02 1.07 1.06 ...
## $ weight : num 34746 5372 2623 42655 4461 ...
## $ preg : Factor w/ 2 levels "No","Yes": NA NA NA 1 NA NA NA NA NA NA ...
## $ race : Factor w/ 5 levels "Asian","Black",..: 5 3 1 5 2 5 1 4 1 4 ...
## $ drinksyr : Factor w/ 3 levels "Yes","No","Don't know": 2 NA NA NA NA NA NA NA 1 NA ...
## $ drinkslife : Factor w/ 4 levels "Yes","No","Don't know",..: 2 NA NA NA NA NA NA NA NA NA ...
## $ drinksfreq : num NA NA NA NA NA NA NA NA 1 NA ...
## $ drinksamt : num NA NA NA NA NA NA NA NA 2 NA ...
## $ drinksfreqcalc: num NA NA NA NA NA NA NA NA 12 NA ...
## $ drinkstotalyr : num NA NA NA NA NA NA NA NA 24 NA ...
## $ drinktype : Factor w/ 3 levels "Non Drinker",..: 1 NA NA NA NA NA NA NA 2 NA ...
|
It looks like factor levels that have no observations are still appearing as a valid level, I’m going to use the droplevels() function to clean this up.
1
2
|
dff$drinksyr <- droplevels(dff$drinksyr)
dff$drinkslife <- droplevels(dff$drinkslife)
|
And now re-checking.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
## 'data.frame': 29902 obs. of 16 variables:
## $ gender : Factor w/ 2 levels "Male","Female": 1 2 1 2 2 1 1 1 1 1 ...
## $ age : num 22 3 14 44 14 9 0 6 21 15 ...
## $ psu : num 1 3 3 1 2 1 2 2 1 3 ...
## $ strata : num 91 92 90 94 90 91 92 103 92 91 ...
## $ bmc : num 2589 NA 1931 2039 1944 ...
## $ bmd : num 1.21 NA 1.02 1.07 1.06 ...
## $ weight : num 34746 5372 2623 42655 4461 ...
## $ preg : Factor w/ 2 levels "No","Yes": NA NA NA 1 NA NA NA NA NA NA ...
## $ race : Factor w/ 5 levels "Asian","Black",..: 5 3 1 5 2 5 1 4 1 4 ...
## $ drinksyr : Factor w/ 2 levels "Yes","No": 2 NA NA NA NA NA NA NA 1 NA ...
## $ drinkslife : Factor w/ 2 levels "Yes","No": 2 NA NA NA NA NA NA NA NA NA ...
## $ drinksfreq : num NA NA NA NA NA NA NA NA 1 NA ...
## $ drinksamt : num NA NA NA NA NA NA NA NA 2 NA ...
## $ drinksfreqcalc: num NA NA NA NA NA NA NA NA 12 NA ...
## $ drinkstotalyr : num NA NA NA NA NA NA NA NA 24 NA ...
## $ drinktype : Factor w/ 3 levels "Non Drinker",..: 1 NA NA NA NA NA NA NA 2 NA ...
|
Variable Summary
My final dataset includes the following variables that I’m going to use to assess alcohol consumption
- drinksyr: have you had at least 12 drinks in a year?
- drinkslife: have you had at least 12 drinks in your life?
- drinksamt: ALQ130 with missing values recoded as NA
- drinksfreq: ALQ120Q with missing values recoded as NA
- drinksfreqcalc: drinking days during a year period
- drinkstotalyr: average total drinks a year created by multiplying average drinks/drinking day with average amount of drinking days in a year
- drinktype: if an average drinking day for an individual is considered binge drinking (note that these amounts are different for men and women)
Table
1
2
3
4
|
init.table <- CreateTableOne(data = dff,
includeNA = TRUE)
print(init.table,
showAllLevels = TRUE)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
##
## level Overall
## n 29902
## gender (%) Male 14751 (49.3)
## Female 15151 (50.7)
## age (mean (SD)) 31.60 (24.59)
## psu (mean (SD)) 1.54 (0.56)
## strata (mean (SD)) 111.13 (13.03)
## bmc (mean (SD)) 2080.01 (604.26)
## bmd (mean (SD)) 1.05 (0.15)
## weight (mean (SD)) 10414.87 (10842.12)
## preg (%) No 3341 (11.2)
## Yes 192 ( 0.6)
## <NA> 26369 (88.2)
## race (%) Asian 3398 (11.4)
## Black 7079 (23.7)
## Hispanic 8350 (27.9)
## Other 1362 ( 4.6)
## White 9713 (32.5)
## drinksyr (%) Yes 10844 (36.3)
## No 4748 (15.9)
## <NA> 14310 (47.9)
## drinkslife (%) Yes 2047 ( 6.8)
## No 2700 ( 9.0)
## <NA> 25155 (84.1)
## drinksfreq (mean (SD)) 3.49 (14.26)
## drinksamt (mean (SD)) 2.77 (2.64)
## drinksfreqcalc (mean (SD)) 72.23 (96.89)
## drinkstotalyr (mean (SD)) 227.84 (465.05)
## drinktype (%) Non Drinker 2700 ( 9.0)
## Drinker 8482 (28.4)
## Binge Drinker 1827 ( 6.1)
## <NA> 16893 (56.5)
|
Survey Object
Here I used the survey package to create a survey design object with my clean dataset.
1
2
3
4
5
6
|
alc.svy <- svydesign(id = ~psu,
strata = ~strata,
weights = ~weight,
nest = TRUE,
survey.lonely.psu = "adjust",
data = dff)
|
Exclusion Criteria
As survey data is only supposed to be filtered through the subset() method of an already existing survey object, I used this method to apply my exclusion criteria of age being over 21 and the participant not being pregnant. I also filtered out observations where the first question about alcohol consumption behaviors was missing, and missing information about bone mineral density and bone mineral content.
1
2
3
4
5
6
|
alc.subset <- subset(alc.svy,
age >= 21
& preg == "No" | is.na(preg)
& !is.na(drinksyr)
& !is.na(bmc)
& !is.na(bmd))
|
Graphics
Box Plot of Bone Health Metrics by Type of Drinker
Here I used the built in graphics in the survey package to create a box plot.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
par(mfrow = c(1, 2),
xpd = NA)
svyboxplot(bmd ~ drinktype,
alc.subset,
ylab = "Bone Mineral Density",
col = c("#98bad5", "#305674", "#c6d3e3"),
show.names = FALSE)
svyboxplot(bmc ~ drinktype,
alc.subset,
ylab = "Bone Mineral Content",
col = c("#98bad5", "#305674", "#c6d3e3"),
show.names = FALSE,
bty = "L")
legend("bottomleft",
legend = c("Non Drinker", "Normal Drinker", "Binge Drinker"),
fill = c("#98bad5", "#305674", "#c6d3e3"),
inset = c(-1.25, -0.3),
cex = 0.9,
horiz = TRUE)
mtext("Bone Mineral Content and Density by Type of Drinker",
side = 3,
line = -2.5,
cex = 1.2,
outer = TRUE)
|
Bone Health Metrics by Average Amount of Alcoholic Drinks Consumed in One Year
Here I use the built in graphics to take a look at the relationship between the two different bone health measurements and total average alcohol consumption over the course of a year.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
par(mfrow = c(1, 2))
svyplot(bmd ~ drinkstotalyr,
alc.subset,
xlim = c(0, 5000),
xlab = "",
ylab = "Bone Mineral Density (g/cm^2)",
style = c("transparent"))
lines(svysmooth(bmd ~ drinkstotalyr,
alc.subset,
bandwidth = 200),
col = "#2C54A7",
lwd = 2)
svyplot(bmc ~ drinkstotalyr,
alc.subset,
xlim = c(0, 5000),
ylab = "Bone Mineral Content (g)",
xlab = "",
style = c("transparent"))
lines(svysmooth(bmc ~ drinkstotalyr,
alc.subset,
bandwidth = 200),
col = "#2C54A7",
lwd = 2)
mtext("Bone Mineral Density(g/cm^2) and Bone Mineral Content by
Average Alcoholic Drinks in One Year",
side = 3,
line = -2.5,
cex = 1.2,
outer = TRUE)
mtext("Average Alcoholic Drinks in One Year",
side = 1,
line = -2.5,
outer = TRUE)
|
Average Total Drinks in One Year Regression
Now I’m going to fit some models to assess the relationship between my selected responses and covariates to see if there are any relationships between alcohol consumption and bone health measures when adjusting for other variables.
I also plot the diagnostic plots to assess these models.
Bone Density
1
2
3
4
|
bmd.glm <- svyglm(bmd ~ drinkstotalyr + gender + age + race,
alc.subset)
par(mfrow = c(2, 2))
plot(bmd.glm)
|
Diagnostics look good, the data is relatively normal and there are no obvious patterns in the other values that would throw off any assumptions. Now I'm going to take a look at a summary of the model to get the coefficients.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
##
## Call:
## svyglm(formula = bmd ~ drinkstotalyr + gender + age + race, design = alc.subset)
##
## Survey design:
## subset(alc.svy, age >= 21 & preg == "No" | is.na(preg) & !is.na(drinksyr) &
## !is.na(bmc) & !is.na(bmd))
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.124e+00 7.061e-03 159.139 < 2e-16 ***
## drinkstotalyr -3.275e-06 3.813e-06 -0.859 0.39562
## genderFemale -6.502e-02 3.100e-03 -20.974 < 2e-16 ***
## age -4.602e-04 1.337e-04 -3.441 0.00137 **
## raceBlack 1.052e-01 5.846e-03 17.991 < 2e-16 ***
## raceHispanic 2.298e-02 4.843e-03 4.744 2.66e-05 ***
## raceOther 4.947e-02 9.054e-03 5.463 2.68e-06 ***
## raceWhite 3.928e-02 5.236e-03 7.502 3.79e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.01040399)
##
## Number of Fisher Scoring iterations: 2
|
So it looks like drinkstotalyr is not statistically significant.
Bone Mineral Content
Now to check out the same things for the bone mineral content variable instead of the bone mineral density.
1
2
3
4
|
bmc.glm <- svyglm(bmc ~ drinkstotalyr + gender + age + race,
alc.subset)
par(mfrow = c(2, 2))
plot(bmc.glm)
|
This looks okay, I'm going to look at the summary.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
##
## Call:
## svyglm(formula = bmc ~ drinkstotalyr + gender + age + race, design = alc.subset)
##
## Survey design:
## subset(alc.svy, age >= 21 & preg == "No" | is.na(preg) & !is.na(drinksyr) &
## !is.na(bmc) & !is.na(bmd))
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2397.58331 19.63872 122.085 < 2e-16 ***
## drinkstotalyr -0.01464 0.01079 -1.357 0.1824
## genderFemale -501.15816 9.84552 -50.902 < 2e-16 ***
## age -0.76125 0.37994 -2.004 0.0519 .
## raceBlack 467.51804 19.12565 24.445 < 2e-16 ***
## raceHispanic 139.32230 17.15159 8.123 5.39e-10 ***
## raceOther 278.44977 38.19013 7.291 7.40e-09 ***
## raceWhite 264.39021 17.33046 15.256 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 128137.6)
##
## Number of Fisher Scoring iterations: 2
|
This is not statistically significant either.
Type of Drinker Regression
I’m now going to look at my created drinktype variable to see if there is any signifigance within a model that has that variable instead of total drinks consumed in a year on average.
Bone Mineral Density
1
2
3
|
bmd.glm2 <- svyglm(bmd ~ drinktype + gender + age + race,
alc.subset)
summary(bmd.glm2)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
##
## Call:
## svyglm(formula = bmd ~ drinktype + gender + age + race, design = alc.subset)
##
## Survey design:
## subset(alc.svy, age >= 21 & preg == "No" | is.na(preg) & !is.na(drinksyr) &
## !is.na(bmc) & !is.na(bmd))
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.1014758 0.0066615 165.350 < 2e-16 ***
## drinktypeDrinker 0.0188781 0.0041735 4.523 5.56e-05 ***
## drinktypeBinge Drinker 0.0144244 0.0046685 3.090 0.003686 **
## genderFemale -0.0655285 0.0030256 -21.658 < 2e-16 ***
## age -0.0004467 0.0001254 -3.561 0.000992 ***
## raceBlack 0.1079738 0.0054775 19.712 < 2e-16 ***
## raceHispanic 0.0251814 0.0036067 6.982 2.27e-08 ***
## raceOther 0.0545921 0.0081870 6.668 6.14e-08 ***
## raceWhite 0.0428369 0.0047636 8.992 4.74e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.00985392)
##
## Number of Fisher Scoring iterations: 2
|
Bone Mineral Content
1
2
3
|
bmc.glm2 <- svyglm(bmc ~ drinktype + gender + age + race,
alc.subset)
summary(bmc.glm2)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
##
## Call:
## svyglm(formula = bmc ~ drinktype + gender + age + race, design = alc.subset)
##
## Survey design:
## subset(alc.svy, age >= 21 & preg == "No" | is.na(preg) & !is.na(drinksyr) &
## !is.na(bmc) & !is.na(bmd))
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2321.5997 17.1247 135.571 < 2e-16 ***
## drinktypeDrinker 65.2218 17.4097 3.746 0.00058 ***
## drinktypeBinge Drinker 61.9370 17.8532 3.469 0.00129 **
## genderFemale -500.5703 9.3263 -53.673 < 2e-16 ***
## age -0.6723 0.3902 -1.723 0.09281 .
## raceBlack 464.5450 18.3295 25.344 < 2e-16 ***
## raceHispanic 138.4712 14.0689 9.842 4.01e-12 ***
## raceOther 291.6212 37.1853 7.842 1.54e-09 ***
## raceWhite 270.4601 15.9841 16.921 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 121702.2)
##
## Number of Fisher Scoring iterations: 2
|
So these models do show a positive increase of bone mineral content and density when the individual is classified as a drinker and also, to a lesser extent, as a binge drinker, which reinforces the findings from previous papers about this topic.
Conclusion
Things that are generally considered bad for you aren’t always bad for everything, sometimes they can also be neutral when you’re looking at very specific measures of someone’s health rather than the big picture.
I’m definitely used to using ggplot2 for graphics, but using R’s default graphics syntax was a bit challenging and definitely took me a bit to find out how to do certain things, like putting the legend on the box plot for example. I was originally hesitant to use these built in graphics due to that, but I decided to go ahead and use them as it is meant to be used with survey data and the content was correct, it just took me longer to customize the plots.