QTM project using R

---
title: "QTM Project"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

For this QTM project, **I am interested in whether the amount of fries in a large order versus a small order justifies the higher cost**, and I will conduct a **two sample t test** with the data I have gathered. By doing this experiment, I want to figure out which size fries is a better deal so that the next time I know which size I should order with my 20 piece chicken nuggets.

Assumptions for the two sample t test:

* **Randomization**:
For this experiment, I used three different McDonald’s locations (Duluth, Suwanee, Decatur) on three different days to randomize my sample. At each location, my friends/family and I ordered 5 small and 5 large Fries to get a total sample of 15 small fries and 15 large fries.

* **Indepedence**:
The sample values are independent of each other. The price of small fries did not affect the price of large fries and vice versa. The prices at one location did not affect the prices at a different location.

* **Normality**:
I forcibly...I mean kindly asked my friends to help me with this project by buying and eating fries with me.For a valid sample mean, the sample population size should be greater than or equal to 30, and my experiment fails to meet these conditions. However, 15 small and 15 large fries was the extent of the budget, and I did not want monetary restrictions deter me from exploring my interest with Mcfries, so I proceeded with the experimental project.


##Data Table of the Raw Data

--------------------------------------------------------------
   Location         size            Price          Count       
-------------- --------------- --------------- ---------------
     Duluth         small          $1.59             45         

     Duluth         small          $1.59             41   

     Duluth        small           $1.59             40   

     Duluth         small          $1.59             48   

     Duluth        small            $1.39            37   

     Suwanee       small            $1.39            50   
   
     Suwanee        small           $1.39            39
   
     Suwanee       small            $1.39            42
   
     Suwanee       small            $1.39            40
   
     Suwanee       small            $1.39            45
   
     Decatur      small             $1.59            41
   
     Decatur      small             $1.59            44
   
     Decatur      small             $1.59            39
 
     Decatur      small             $1.59            49
   
     Decatur      small             $1.59            42
   
     Duluth       large             $2.39            98         

     Duluth       large             $2.39            85   

     Duluth       large             $2.39            77   

     Duluth        large            $2.39            96   

     Duluth        large            $2.39            86   

     Suwanee       large            $2.39           75 
   
     Suwanee       large            $2.39            83
   
     Suwanee       large            $2.39           91
   
     Suwanee       large            $2.39            87
   
     Suwanee       large            $2.39           96
   
     Decatur       large            $2.89           80
   
     Decatur        large           $2.89           97
   
     Decatur        large           $2.89           88
         
     Decatur        large           $2.89           83
   
     Decatur         large          $2.89           93
------------------------------------------------------------


## Creating a dataframe

```{r}
# Create a vector for the count for all small fries
small_count <- 1.39="" 1.59="" 2.39="" 2.89="" 37="" 39="" 40="" 41="" 42="" 44="" 45="" 48="" 49="" 50="" 75="" 77="" 80="" 83="" 85="" 86="" 87="" 88="" 91="" 93="" 96="" 97="" 98="" a="" all="" and="" by="" c="" completed="" corresponding="" count.="" count="" create="" created="" data.frame="" dataframe="" div="" dividing="" for="" fries.="" fries="" fry="" in="" large="" large_count="" large_price="" large_unit_price="" new="" of="" order="" per="" price="" prices="" reate="" see="" small="" small_count="" small_price="" small_unit_price="" the="" this="" to="" unit="" variable="" variables="" vector="" was="" were="">

## Calculated Data
```{r}
# Print price per large fry
print(large_unit_price)

# Print price per large fry
print(large_unit_price)

# Summary of the price per small fry
summary(small_unit_price)

# Summary of the price per large fry
summary(large_unit_price)
```


+ The summary tells us that the sample mean for the price per small fry is $0.03588 and the sample mean for the price per large fry is $0.02935.


## Data Visualization of Price per Fries
```{r}
# Histogram of the price per small fry
hist(fries$small_unit_price, col = "light blue", xlab = "Price per Small Fry", main = "Distribution of Price per Fry of a Small Order")

# Histogram of the price per large fry
hist(fries$large_unit_price, col = "light yellow", xlab = "Price per Large Fry", main = "Distribution of Price per Fry of a Large Order")
```

The histograms show the frequency of the unit prices for small and large fries. Because my sample populations were not 30 or more, I cannot conclude that the distribution of the data for small fries or large fries is normal. However, I can instead report the shape of the distribution of the data from the histograms.

* For Price per Fry of a Small Order, the distribution of the data is unimodal and very close to being symmetric.

* For Price per Fry of a Large Order, the distribution of the data is right skewed.

```{r}
# Create a boxplot
boxplot(small_unit_price, large_unit_price, col = c("light blue", "light yellow"), names=c("small", "large"), xlab="Size of Fries", ylab="Price Per Fry", main= "Price per Fry by Size")
```

Next, I created a box plot in order to see if there is an association between the size and the unit price of fries. A box plot is used because you are comparing categorical (size) and quantitative (unit price per fry) data. The side by side boxplot shows that there is indeed an association between the size of the fries and the unit price. This visualization indicates that the fries in the larger size have a smaller price per fry.


## Hypothesis Testing

**Are large fries are more/less valuable compared to the small fries? Does the amount of fries in a large order versus a small order justify the higher cost?**

* **Parameters of Interest**

    + μ1 = true population mean of the price per fry for a small order of fries

    + μ2 = true population mean of the price per fry for a large order of fries

* **Hypotheses**:

    + H0: μ1 − μ2 = 0

          * The null hypothesis is that there is NO difference between the true mean price per small fry and the true mean price per large fry.

    + Ha: μ1 − μ2 does not equal 0

          * The alternative hypothesis is that there IS a difference between the true mean price per small fry and the true mean price per large fry.

* **Appropriate test**

    + Hypothesis test comparing two population means

    + Two-sample t-test
 

```{r}
# Standard deviation for the small fries sample
sd(small_unit_price)

# Standard deviation for the large fries sample
sd(large_unit_price)
```


+ In assessing the equality of variances for the two sample t test, we need to look at the standard deviations of the two sample groups: price per small fry and price per large fry. Because the standard deviation values for the two independent groups (0.004043859 and 0.003605305) are very similar, equal variance can be assumed for the two sample T test.


## Two Sample T Test and Conclusion
```{r}
# Run a two sample T test
t.test(small_unit_price, large_unit_price, var.equal=TRUE)
```

The two sample t test tells us that t = 4.6682 and p value = 6.868e-05. The P value tells us the probability that the null is true and the difference is 0 is 6.868e-05 which is very small. The smaller the p value, the stronger the evidence is against the null hypothesis. Since the p value is less than .05 (6.868e-05 < .05), and a 95% confidence interval is being used, the null hypothesis can be rejected, and it can be concluded that the experiment yielded significant results.

The 95% confidence interval ranges from 0.003664707 to 0.009395467, and this means that we are 95% confident that the difference between the true means will be a value from 0.003664707 to 0.009395467. Since the null value of 0 is not included in the interval, we can reject the null hypothesis in favor of the alternative.

In rejecting the null hypothesis, we can conclude that there **IS** a difference between the true mean price per small fry and the true mean price per large fry. We are 95% confident that the mean price per fry in a small order will be between 0.003664707 and 0.009395467 more expensive than the mean price per fry in a large order. Keep in mind, this can produce a type 1 error in which the null hypothesis is rejected when it is actually true. Assuming that there are no errors, this experiment shows convincing evidence that large fries are indeed more valuable because the price per fry for a large order is cheaper than the price per fry for a small order. In other words, I can get more bang for my buck if I order large fries instead of small fries, and the higher cost for a large order is justified since the price per individual fry is cheaper in a large order than in a small order. Knowing this information will be crucial for the next time I am in dire need of greasy potato goodness from McDonald's.





Comments

Popular posts from this blog

Chapter Summaries of the 7 Habits of Highly Effective Teens

Chapter Summaries of The Power of Place

Closet Confidential Tag Questions