Q1 Kernel Function
(i) Explain what you understand as the intuitive concept of a kernel function in the context of SVM learning?
(ii) Given two data points xi and xj, and kernel function K, what is K(xi,xj) intended to represent?
(iii) For RBF kernel, by increasing the hyperparameter gamma, the class boundaries become tighter.
By looking at the algebraic expression of the RBF kernel, could you explain why?
Q2 Information Gain
The above is a data given to you where the first four columns represent features, and the last column (F) represents the class the data belongs to. For example, the first row represents a feature vector 0,0,0,0 and the class this data belongs to is also class 0.
Now, suppose that you have to fit a decision tree model for this training data. So you have to decide which of the features (A, B, C or D) will be the best to split the data on. From the concept of information gain based on entropy, determine which of the features among A, B, C, D will be the best to split on.
[Hint: If you split on A, then you have left subtree with 6 data points of class 1, and right subtree with 4 data points in class 1). If you split on B, then you will have 5 data points of class 1 in the left subtree and 5 in the right subtree etc. Use the concept of entropy to determine which feature will give you better split to have maximum information gain]
Q3 Random Forest
(i) Describe in your own words, how a repeated construction of decision trees leads to the random forest model? Outline the steps of training a random forest model from training data.
(ii) If you decrease the size of the boot strap samples -- what do you expect in terms of the model's performance?