Recent Question/Assignment

IE6600: Computation and Data Visualisation
Homework 2
Prof. Mohammad Dehghani

Assignment Guidelines
1. Students need to complete the assignment individually.
2. All the assignments are required to be done in RStudio.
3. Provide necessary comments using ‘#’ for better understanding of your script.
4. The code should follow tidyverse style guide ( The tidyverse

style guide has style standards for naming objects, indentation and how to write long lines of codes to name a few
5. If you take help from any external sources, please mention that in the reference. Violating academic integrity policies may include zero credit on the work.
6. The assignment report needs to include the following sections:
• Problem statement: A brief about your understanding on the assignment questions (maximum 3 lines)
• Result: What were your finding after creating the code and running it in R. This section may include:
– Graphs / charts / plots
– Final data frame for your result
– Results obtained
• Conclusion: What were the statistical inferences and observations from the results obtained.
– Students are not required to include codes in reports.
1. Please submit a*.rmd file which includes your code and can be knit into a PDF(recommended) or submit a *.zip file including the following items
i. R script (just 1 file including all your codes)
ii. HW Report: Report with a maximum length of 10 pages including all appendices, tables, and graphs if any.
2. All of the above mentioned files have to be labeled as: ‘HW # - IE 6600 – Sec # - Student Name ’ 3. Submit your HW deliverables via CANVAS
of 2
Direct to consumer marketing is an effective strategy to distribute agricultural and farm products to consumers. Farmers market forms an important link between farmers and consumers that helps foster farmer consumer relationships. The United States Department of Agriculture (USDA) has recognized the importance of farmers markets. Through its many programs, USDA has helped the growth of farmers markets across the country. As on date 8,791 farmers markets are listed in USDA’s National Farmers Market Directory. The data is stored in fm.csv.
The data file contains the following details
1. Variables indicating the geographical location of the farmers market (lat, long, street, county, state etc.)
2. Variables indicating types of products (herbs, vegetables, seafood etc.)
3. Variables indicating type of payment accepted (cash, WIC, SNAP, SFMNP etc.)
4. Variables indicating online social media presence
5. Variables indicating date and time
The directory of farmers market across the US is given in the file. Answer the following questions from the dataset fm.csv
Task 1
Write a code to compute the number of farmers markets by cities in the state of Florida and arrange them in ascending order of number of farmers market. Omit NA values.
Task 2
Write a code to compute the number of farmers market by state and display the top ten states.
Task 3
Filter by state of New York and generate the following table using pivot function. First column should contain the Payment system and second column should list the type of products. For Payment System consider the columns, “Credit”, “WIC”, “WICcash”, and “SNAP” from the original farmers market data.
Third column should have the number of farmers market offering the payment services.
Sample output:
The below table should only be considered as a reference as how the output should look like. Students need to generate the entire long form table.
States Payment System #Farmers Market
Credit Organic 2162
Credit Bakedgoods 4366
Credit Cheese 2605
Credit Crafts 3086
Credit Flowers 3397
1 of 2
Task 4
Create two new columns and add the columns to the farmers market dataframe. The first column should be named “Startdate” and the second column must be name “Enddate”. The Season1Date column has most entries of the form “05/05/2015 to 10/27/2015”. Split the date entries of Season1Date and allocate the first value to Startdate and the second value to Enddate.
Sample output:
The below table should only be considered as a reference as how the output should look like. Students need to generate the entire long form table.
Season1Date Startdate Enddate
05/05/2015 to 10/27/2015 05/05/2015 10/27/2015
Task 5
From the NY Collision data nycollision.csv compute for each borough and tabulate the following variables
• Number of pedestrians injured in each Borough will all stats (total, min, max, mean, median, mode, quartiles). All the stats have to be calculated in a single line of code. (10 Points)
• List the number of accidents by the type of vehicles involved in each borough (5 points)
• List the factors responsible for the accidents in each borough in descending order (5 points)
• List the number of accidents by each hour of the day (5 points)
• Give the monthly number of accidents by month and year (5 points)
• For Queens, List the number of persons injured, killed, pedestrians injured, killed, cyclist injured, killed, motorist injured, killed in the long form with two columns (Borough, type of outcome ie., injured/killed, number) Do not include rows with empty values.
2 of 2