STAT 218 - Week 7, Lecture 4 Lab 4
Important
Download the assignment and dataset from Canvas.
.csv file from Canvas can be tricky. Try opening it in a new tab to download..csv file should download without a fight (hopefully).Save both files to your STAT 218 folder (VERY IMPORTANT). Otherwise, you won’t be able to import the dataset.
Ensure your assignment and dataset are in the same folder, with file extensions .qmd and .csv.
Follow the instructions on this slideshow to complete the assignment.
Let’s use library functions and load the data sets that we will use today.
IMPORTANT!: If you don’t see stream_data in your Environment Pane, this means that your dataset and quarto file are not in the same folder!
You may have downloaded stream.csv multiple times, so its name could look like stream (2)(1).csv.
Check the file name, correct it if necessary, and try again. Remember, R is very stubborn—it won’t read the data if the name doesn’t match exactly what you typed in your code.
Example of a Case: Pollutants in a stream may accumulate or attenuate as water flows down the stream. In a study to monitor the accumulation and attenuation of fecal contamination in a stream running through cattle rangeland, monthly water specimens were collected at two locations along the stream over a period of 21 months.
The data set stream the total coliform count (MPN/100ml) for a water specimen.
Perform a paired sample \(t\)-test to assess whether the mean total coliform count is consistent across the two locations.
1) Random Sampling: the data can be regarded as coming from independently chosen random sample(s),
2) Independence of Observations: the observations should be independent within each sample, and
3) Normal Distribution: Many of the methods depend on the data being from a population that has a normal distribution.
Check your sample size first!
Shapiro–Wilk Test is a statistical method that provides a numerical assessment of evidence for certain types of nonnormality in data.
The output of the Shapiro–Wilk test includes a P-value. Interpretation:
Shapiro-Wilk normality test
data: stream$Difference W = 0.9641, p-value = 0.6022
If you notice, we used t_test() function last week while conducting independent samples t-test by using infer package.
Today, we will use t.test() which is available default in R (And some BIO courses use this too).
Paired t-test
data: stream$upstream and stream$downstream
t = 4.6092, df = 20, p-value = 0.0001697
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
603.7724 1602.0562
sample estimates:
mean difference
1102.914
Compare and contrast how the codes in Question 4.2 and Question 4.3 work. Comment on the
If they produce the same results, explain what the two codes did differently.
Conclusion: Type your conclusion statement to your assignment!
Confidence Interval: Type your confidence interval statement to your assignment!