Q1 Kernel Function

15 Points

(i) Explain what you understand as the intuitive concept of a kernel function in the context of SVM learning?

(ii) Given two data points xi and xj, and kernel function K, what is K(xi,xj) intended to represent?

(iii) For RBF kernel, by increasing the hyperparameter gamma, the class boundaries become tighter.

By looking at the algebraic expression of the RBF kernel, could you explain why?

Q2 Information Gain

15 Points

The above is a data given to you where the first four columns represent features, and the last column (F) represents the class the data belongs to. For example, the first row represents a feature vector 0,0,0,0 and the class this data belongs to is also class 0.

Now, suppose that you have to fit a decision tree model for this training data. So you have to decide which of the features (A, B, C or D) will be the best to split the data on. From the concept of information gain based on entropy, determine which of the features among A, B, C, D will be the best to split on.

[Hint: If you split on A, then you have left subtree with 6 data points of class 1, and right subtree with 4 data points in class 1). If you split on B, then you will have 5 data points of class 1 in the left subtree and 5 in the right subtree etc. Use the concept of entropy to determine which feature will give you better split to have maximum information gain]

Q3 Random Forest

20 Points

(i) Describe in your own words, how a repeated construction of decision trees leads to the random forest model? Outline the steps of training a random forest model from training data.

(ii) If you decrease the size of the boot strap samples -- what do you expect in terms of the model's performance?

15 Points

(i) Explain what you understand as the intuitive concept of a kernel function in the context of SVM learning?

(ii) Given two data points xi and xj, and kernel function K, what is K(xi,xj) intended to represent?

(iii) For RBF kernel, by increasing the hyperparameter gamma, the class boundaries become tighter.

By looking at the algebraic expression of the RBF kernel, could you explain why?

Q2 Information Gain

15 Points

The above is a data given to you where the first four columns represent features, and the last column (F) represents the class the data belongs to. For example, the first row represents a feature vector 0,0,0,0 and the class this data belongs to is also class 0.

Now, suppose that you have to fit a decision tree model for this training data. So you have to decide which of the features (A, B, C or D) will be the best to split the data on. From the concept of information gain based on entropy, determine which of the features among A, B, C, D will be the best to split on.

[Hint: If you split on A, then you have left subtree with 6 data points of class 1, and right subtree with 4 data points in class 1). If you split on B, then you will have 5 data points of class 1 in the left subtree and 5 in the right subtree etc. Use the concept of entropy to determine which feature will give you better split to have maximum information gain]

Q3 Random Forest

20 Points

(i) Describe in your own words, how a repeated construction of decision trees leads to the random forest model? Outline the steps of training a random forest model from training data.

(ii) If you decrease the size of the boot strap samples -- what do you expect in terms of the model's performance?

Assessment task 3: Take Home ExaminationAssessment SummaryTask type: Take home exam/ written assignmentTask length: 2000 words (+/-10%)Weighting: 40%Due Date/Time: Multiple due datesA penalty of 10% per...Assignment Details :N8002-Professional Reflection in Nursing-G...PurposeSynthesize knowledge from the literature to develop strategies that improve a particular issue in your specialty practice.Task detailsStudents...How much for paraphrasing of the attached and additional conclusion?Page 1 Kaplan Business School Assessment Outline 2024 T1Assessment 3 InformationSubject Code: ACCM4400Subject Name: Auditing and AssuranceAssessment Title: Case StudyAssessment Type: Individual Written...Undergraduate Modular Scheme Open Book ExaminationModule Name Managing DataModule Code BMA4003-20 Level 4Submission Deadline Exam paper release date and time: Sunday 9th June 2024 at 23:59Submission cut-off...New assignments(Write a paper based on critical reflection on specific issues?Please use Gibbs reflection cycle as a framework for there reflection. Examples of clinical situations that students can reflect...Health EconomicsAssessment 3 ___________________________________________________________________________The purpose of this final assessment is to apply learnings from the unit to conduct a critical analysis...**Show All Questions**