Recent Question/Assignment

Please I really need the solution urgent sir, I will be grateful if I can get the solution
1. Hand et al’s A Handbook of Small Data Sets provides a data set of summary information on working hours, price and salary for 48 major world cities. The data was gathered by the Economic Research Department of the Union Bank of Switzerland, Zurich, in 1991. The variables are described by Hand et al. as follows: Working hours = weighted average of 12 different occupations. Price level = cost of a basket of 112 goods and services, weighted by consumer habits, excluding rent (Zurich = 100). Salary level = levels of hourly earnings in 12 different occupations, weighted according to occupational distributions, net after deducting tax and social insurance contributions (Zurich = 100). For brevity the variables “Salary level” and “Price level” have been named “Salary” and “Price”, respectively, in the csv file provided (“Hand_415 Comparisons between 48 major cities in 1991”).
(a) Obtain the correlation matrix for the three quantitative variables. If you try to obtain this on the original data frame you import, you will encounter two problems. Two cities have missing data, and the first column (the City names) is not a quantitative variable. You can clean up your data frame either within R or outside of R, but you should be able to do this within R. (Hints on how to do this will be given in class.)
(b) Based on the sample correlation coefficients obtained in (a), which pair of variables are most strongly correlated? Which pair of variables is most weakly correlated (be careful here, two are almost tied).
(c) Obtain a plot of Salary level (on the Y-axis) versus Working Hours, with the least squares regression line superimposed.
(d) Test for a linear association between Salary level and Working Hours at the 5% level of significance, assuming the usual SLR (simple linear regression) model. Be sure to state your null and alternative hypotheses, the evidence you use to make a decision, and your conclusion.
2. In this exercise you will analyze a subset of a “green brain” data set I compiled in Fall 2019. The data set will be explained further in class, but for now it is sufficient to state that the data set you will work with here consists of wet weight and dry weight measurements (in grams) for 60 “green brains” (fruits of the osage orange tree). Measuring a wet weight is easy whereas measuring a dry weight requires a drying oven considerable time, so initially we will treat wet weight as the predictor variable and dry weight as the response variable.
(a) Plot the scattergram of dry weight on wet weight, and superimpose the estimated simple linear regression equation. (This will get you started but you won’t hand this in.)
(b) Specify the simple linear regression model (model equation plus model assumptions) that you will base subsequent statistical inference on.
(c) Obtain 95% (individual) confidence intervals for the intercept and slope parameters.
(d) Test the null hypotheses that each of the intercept and slope parameters is equal to 0; use a two-sided alternative hypothesis for each test. These tests are conducted on your R output; you simply need to identify the appropriate test statistic values and interpret the associated p-values. Use a 5% level of significance.
(e) In many situations where a simple linear regression analysis is conducted, the test that the slope parameter is equal to 0 is of much greater interest than the test that the intercept is equal to 0. For our data set, it can be argued that the reverse is true; that is, testing if the intercept parameter is equal to 0 is of much greater interest than testing if the slope parameter is equal to 0. Briefly summarize the key points to this argument.
(f) Briefly explain (i) why a one-sided test for the slope parameter would have been more sensible than the two-sided test conducted by R (more sensible does not mean it was really that sensible), and (ii) why the 95% confidence interval for the slope parameter provides more useful information than the corresponding hypothesis test.
(g) In certain cases an argument can be made for forcing the fitted regression equation to pass through the origin. A good discussion on this topic, and how to do so in R, is available at: https://rpubs.com/aaronsc32/regression-through-the-origin
What evidence do you have that fitting a regression equation to pass through the origin might be a good thing to do with the “green brain” data? Obtain this regression equation for our data. (i) Superimpose the regression equation of (g) to your graph in (a). Label your graph properly and include a legend for the two equations. Hand this graph in. [This is most easily submitted as a separate pdf file.]
(j) In the future we will look at a formal test for comparing the two models fit. At this point, though, we will ask, what evidence do you have that the SLR equation that included an estimated Y-intercept is not much better than the SLR equation that is forced through the origin? [Note: the multiple R2 is not a useful measure here!]
3. For this question, you are to find two examples of regression graphs in journals of your choice. One graph will fit a simple linear regression equation, the other graph will fit some form of regression equation that is not a straight line. Save the graphs in some way, along with the associated figure captions; for each, also record the full reference information including the article title. How do the authors use the graphs to support their conclusions? For the graph of the regression equation that is not a straight line, was the model fit using some form of pseudolinear model or was the model fit a true nonlinear model?