My Account

HomeRecent QuestionsQuestion #77328Oth240324605

Recent Question/Assignment

assignment4-1
March 28, 2024
0.1 Assignment 4 - Spark ML 0.2 Learning Outcomes
In this assignment you will:
• Use ML piplenes
• Improve a Random Forest model
• Perform Hyperparameter tuning
Question 1: (5 marks) In our learning from this module, we have identified a fairly significant link by leveraging the ML pipeline, a more sophisticated model, and better hyperparameter tuning. However these results are still a bit disappointing. With that being said, we’re working with very few features and we’ve likely made some assumptions that just aren’t quite valid (like zip code shortening). Also, just because a rich zip code exists doesn’t mean that the farmer’s market would be held in that zip code too. In fact we might want to start looking at neighboring zip codes or doing some sort of distance measure to predict whether or not there exists a farmer’s market in a certain mile radius from a wealthy zip code.
With that being said, we’ve got a lot of other potential features and plenty of other parameters to tune on our random forest so play around with the above pipeline and see if you can improve it further! Note: adding a feaure for the distance measure is just an example and not a mandatory change to improve the model’s performance. We also aren’t concerned about if the model’s perforamnce is actually improved! We simply want to see if changes have been made to the code for possible improvements.
Learn more about the Farmers Markets dataset, here: https://catalog.data.gov/dataset/farmersmarkets-directory-and-geographic-data
You may use the same classifier we built in the notebook in this module.
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/59159900904
Question 2 ( 7 marks) Using the Apache Spark ML pipeline, build a model to predict the price of a diamond based on the available features.
Read from the following notebook for details about dataset. https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/59159900904
Note:
- Please submit the published notebook link in a word/pdf document. Do not submit HTML,
1
IPython notebook, or archive (DBC) formats. - If you receive an R_Squared value that is negative, that is okay. This may occur due to the low sample size of the data.
[ ]:
2

Looking for answers ?

Recent Questions

Word limit: 1500 - 2000Objectives:The purpose of this assignment is to:1. Familiarise yourself with best practice in engineering sustainability in a particular field by conducting a thorough literature...Module code: BMA4005-20Module title: Professional PracticeAssignment: A1. Digital PosterWord count: 500Contribution to module mark: 40% of overall gradeAssessment type: IndividualWeight 40%Submission deadline:...Assessment Criteria for Written Report (20%)• Submission of a minimum 1500 to maximum 1800-words written report due from week 9. You will submit your written report one week after you have undertaken the...Assessment 2: (20%) Case Study Presentation Due Date weeks 10This assessment builds on your theoretical knowledge and tests your ability to apply this knowledge to practice.you are required to formulate...Assessment 2Assessment 2Report: 30%Word count: 1400 – 1600 wordsDue date: Sunday of Week 8DescriptionYou are playing the role of a community service worker supporting a young adult son who is 18 years...1500-1800 words Instructions From TutorPlease download the eTask3 spreadsheet from Canvas and save the eTask spread as an Excel file. The main content of the force method is taught in Lectures 6,7&8.eTask3.1 is a...Show All Questions

Recent Question/Assignment

Looking for answers ?

Recent Questions

Nursing Assignment Help Services| Australia Best Tutors

What Makes You Happy In The Workplace?

Refund and Cancellation Policies - Australianbesttutors.com