Evaluate the data for any errors, specifically the number of missing values, outliers, and inaccurate entries.

Evaluate the data for any errors, specifically the number of missing values, outliers, and inaccurate entries.

and select a dataset that contains the following
a) at least 10,000 rows
b) at least 12 columns
c) at least 6 quantitative variables and 2 qualitative (binary) variables

additionally: the data must be from 2020 or later.
Notes: qualitative binary variables only have 2 possible values, usually 0 and 1. To check whether a variable is binary, you can look at the values in a specific column or run the unique(

$) in R studioProvide a link to where the file was downloaded.Write a short description (3-5 sentences) about the data.Import your data into R Studio. Run str() to check that your quantitative variables are imported as a numeric variable (NOT char).Start by running the summary function to find the descriptive statistics for the variables.

Evaluate the data for any errors, specifically the number of missing values, outliers, and inaccurate entries.

Find the correlation between each of the independent and dependent variablesCreate a scatterplot between each of the independent and dependent variablesCreate a correlation matrix with all 4 quantitative variablesSelect the dependent variable.Select 3 quantitative variables and 1 binary/qualitative variable you believe will can help predict the dependentCreate a scatterplot matrix with all 4 dependent variablesModel

1:Select 2 variables and create a new variable, model1, using the lm() functionNow, use the summary() and anova() function for three regression models. Submit the output of the lm() and anova() function for the first model.Then answer the following questions.What is the regression equation?Which variables are statistically significant?What are the R squared and Adjusted R Squared values?What is the F Statistic and the RSE?Based on the information in the output, is this a good model? Why or why not?Model 2:Create a new variable, model2, using the lm() function with all 3 variables.Now, use the summary() and anova() function for three regression models. Submit the output of the lm() and anova() function for the first model.Then answer the following questions.

What is the regression equation?What is the regression equation?

Which variables are statistically significant?What are the R squared and Adjusted R Squared values?What is the F Statistic and the RSE?Based on the information in the output, is this a good model? Why or why not?Model 3:create a new model, model3, with the 3 quantitative variables and the qualitative variable.

Now, use the summary() and anova() function for three regression models. Submit the output of the lm() and anova() function for the first model.Then answer the following questions.What is the regression equation?

Which variables are statistically significant?What are the R squared and Adjusted R Squared values?What is the F Statistic and the RSE?Based on the information in the output, is this a good model? Why or why not?Recall that the lm() function in R is the main function we will use to estimate a Linear Model (hence the function name lm).

The function takes the format of:lm(dv ~ iv, data = my_data) for simple linear regressionlm(dv ~ iv1 + iv2, data = my_data) for multiple linear regressionAssign a model, then run the summary() and anova() function to get additional information about the model
For instance:
model<- (dv ~ iv, data = my_data)
summary(model)
anova(model)

For model 1:

Check the model assumptions using residual analysis. Create the residual plots and explain whether there is a violation by analyzing each plot. For instance, Check for heteroskedasticity by looking at the Residuals vs Fitted Values Plot.Checking for multicollinearity is a fairly straightforward process. What we do is check the VIF of our model after we run a regression. When a VIF is greater than 10, that is an indication that there is multicollinearity. Find the Vif for the modelvif(model1)
for model 2:

Check the model assumptions using residual analysis. Create the residual plots and explain whether there is a violation by analyzing each plot. For instance, Check for heteroskedasticity by looking at the Residuals vs Fitted Values Plot. Checking for multicollinearity is a fairly straightforward process. What we do is check the VIF of our model after we run a regression. When a VIF is greater than 10, that is an indication that there is multicollinearity. (model2)
for

model 3:

Check the model assumptions using residual analysis. Create the residual plots and explain whether there is a violation by analyzing each plot. For instance, Check for heteroskedasticity by looking at the Residuals vs Fitted Values Plot.

Checking for multicollinearity is a fairly straightforward process. What we do is check the VIF of our model after we run a regression. When a VIF is greater than 10, that is an indication that there is multicollinearity.vif(model3)

Last Completed Projects

topic title academic level Writer delivered