STAT 218 - Week 6, Lecture 4 Lab 4
We learned that we can estimate the unknown parameters in two ways:
Point estimation: A single value calculated from the sample (e.g., \(\bar{y}\))
Confidence Intervals: A range of values within which the parameter is expected to fall, with a certain degree of confidence.(e.g., 95% CI, 90% CI)
We also learned that we can use hypothesis testing to test for a specific value(s) of the parameter.
Important
We can summarize hypothesis testing in five steps:
1) Random Sampling: the data can be regarded as coming from independently chosen random sample(s),
2) Independence of Observations: the observations should be independent, and
3) Normal Distribution: Many of the methods depend on the data being from a population that has a normal distribution.
If the only source of information is the data at hand, then normality can be roughly checked.
In any case, a rudimentary check is better than none, and every data analysis should begin with inspection of a graph of the data, with special attention to any observations that lie very far from the center of the distribution.
We check assumptions/validity conditions before conducting any statistical analysis. To check normality assumption, we need to first check sample size.
\(1^{st}\) option - small samples: Check the \(p\)-value of Shapiro Wilk test. It is best used with a sample size less than 50 (Shapiro & Wilk 1965; Uttley,2019).
\(2^{nd}\) option - large samples: Check the visual plots (e.g., histogram, normal quantile plot) if your sample size is more than 50.
Today we will have large samples so we will learn Shapiro-Wilk next week.
Project Manager:
Note Taker:
Coder:
infer package to conduct t tests.infer package by using install.packages() function.After loading that package, let’s run library functions and load the data sets that we will use today.
Example of a Case:
Imagine that you are a biologist studying penguins, particularly their bill lengths. You hypothesize that the average bill length of penguins is 40 mm and you collect a random sample of 344 penguins, measure and record their bill length in mm.
Perform a one sample \(t\)-test to investigate whether the bill length of the penguins differs from the test value of 40 mm.
Open the assignment file and answer the following questions.
Question 1. What is the research question of this study?
Type your answer in the assignment file.
Question 2. What type of variable is used in this study?
Type your answer in the assignment file.
# A tibble: 1 × 7
statistic t_df p_value alternative estimate lower_ci upper_ci
<dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 13.3 341 9.58e-33 two.sided 43.9 43.3 44.5
Conclusion: Type your hypothesis testing conclusion to your assignment file!
Confidence Interval: Type your confidence interval statement to your assignment file!
Example of a Case:
Now, you’re curious about the difference in the body mass of penguins based on their sex. You hypothesize that body mass varies between different sexes. To test your hypothesis, you collect a random sample of 344 penguins, measure their body mass, and record their sex.
Perform an independent samples \(t\)-test to investigate whether the body mass of penguins differs between different sexes.
Continue with the assignment file and answer the following questions.
Question 1. What is the research question of this study?
Type your answer in the assignment file.
Question 2. What type of variable is used in this study?
Type your answer in the assignment file.
t_test(x = penguins,
formula = body_mass_g ~ sex,
order = c("male", "female"),
alternative = "two-sided",
conf_level = 0.90)# A tibble: 1 × 7
statistic t_df p_value alternative estimate lower_ci upper_ci
<dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 8.55 324. 4.79e-16 two.sided 683. 552. 815.
Conclusion: Type your hypothesis testing conclusion to your assignment file!
Confidence Interval: Type your confidence interval statement to your assignment file!