contestada

In this problem we will once again examine a sample of 5,000 home sales and attempt to develop a model to explain the drivers for home prices. Use the Excel file Home Sales Sample.
• Eliminate the variables Record, Town, and University.
• Eliminate all rows for Multi-Family and Multiple Occupancy homes.
• Add the variable Sale_Month by using the function month() on the Sale Date.
• Bin the variable Build_Year into deciles (10 bins of equal count).
• Convert the categorical variables Binned_Build_Year and Sale_Month to dummy variables.
• Now partition the data into Training and Validation sets using the percentage and seed defaults. Remove Sales_month_1 and Binned Build_year_1 as the base case variables. Also remove Sale_Date and Type.
• Run a regression model to predict Sale_amount that includes all the potential predictors and selects the Forward Selection option. Paste the Best Subsets Detail results into the template.
• Repeat the Process Using Backward Elimination. Paste the Best Subsets Detail results into the template.
• Run the best model selected by the Backward Elimination algorithm making sure to deselect Feature Selection. Paste the Coefficient Summary and Validation Score into the template. Note: please adjust column widths to make all variable names legible.
• Now go to the Forward Selection variable selection page. Pick the model with 6 coefficients (5 variables) and run it. Paste the Coefficient Summary and Validation Score into the template.
• Comment on which is the preferred model and why.