Name: Homework 7 Solution
SKU: 48563
Price: 29.99 USD
Availability: InStock

Description

Rate this product

All questions have multiple-choice answers ([a], [b], [c], …). You can collaborate with others, but do not discuss the selected or excluded choices in the answers. You can consult books and notes, but not other people’s solutions. Your solutions should be based on your own work. De nitions and notation follow the lectures.

Note about the homework

The goal of the homework is to facilitate a deeper understanding of the course material. The questions are not designed to be puzzles with catchy answers. They are meant to make you roll up your sleeves, face uncertainties, and ap-proach the problem from di erent angles.

The problems range from easy to di cult, and from practical to theoretical. Some problems require running a full experiment to arrive at the answer.

The answer may not be obvious or numerically close to one of the choices, but one (and only one) choice will be correct if you follow the instructions precisely in each problem. You are encouraged to explore the problem further by experimenting with variations on these instructions, for the learning bene t.

You are also encouraged to take part in the forum http://book.caltech.edu/bookforum

where there are many threads about each homework set. We hope that you will contribute to the discussion as well. Please follow the forum guidelines for posting answers (see the \BEFORE posting answers” announcement at the top there).

Validation

In the following problems, use the data provided in the les in.dta and out.dta for Homework # 6. We are going to apply linear regression with a nonlinear transforma-tion for classi cation (without regularization). The nonlinear transformation is given by ₀ through ₇ which transform (x₁; x₂) into

1 x₁ x₂ x²₁ x²₂ x₁x₂ jx₁ x₂j jx₁ + x₂j

To illustrate how taking out points for validation a ects the performance, we will consider the hypotheses trained on D_train (without restoring the full D for training after validation is done).

Split in.dta into training ( rst 25 examples) and validation (last 10 examples). Train on the 25 examples only, using the validation set of 10 examples to select between ve models that apply linear regression to ₀ through _k, with k = 3; 4; 5; 6; 7. For which model is the classi cation error on the validation set smallest?

1. k = 3

1. k = 4

1. k = 5

1. k = 6

1. k = 7

Evaluate the out-of-sample classi cation error using out.dta on the 5 models to see how well the validation set predicted the best of the 5 models. For which model is the out-of-sample classi cation error smallest?

1. k = 3

1. k = 4

1. k = 5

1. k = 6

1. k = 7

Reverse the role of training and validation sets; now training with the last 10 examples and validating with the rst 25 examples. For which model is the classi cation error on the validation set smallest?

1. k = 3

1. k = 4

1. 1. k = 5

1. 1. k = 6

1. 1. k = 7

1. Once again, evaluate the out-of-sample classi cation error using out.dta on the 5 models to see how well the validation set predicted the best of the 5 models. For which model is the out-of-sample classi cation error smallest?

1. 1. k = 3

1. 1. k = 4

1. 1. k = 5

1. 1. k = 6

1. 1. k = 7

1. What values are closest in Euclidean distance to the out-of-sample classi cation error obtained for the model chosen in Problems 1 and 3, respectively?

1. 1. 0.0, 0.1

1. 1. 0.1, 0.2

1. 1. 0.1, 0.3

1. 1. 0.2, 0.2

1. 1. 0.2, 0.3

Validation Bias

1. Let e₁ and e₂ be independent random variables, distributed uniformly over the interval [0; 1]. Let e = min(e₁; e₂). The expected values of e₁; e₂; e are closest to

1. 1. 0.5, 0.5, 0

1. 1. 0.5, 0.5, 0.1

1. 1. 0.5, 0.5, 0.25

1. 1. 0.5, 0.5, 0.4

1. 1. 0.5, 0.5, 0.5

Cross Validation

7. You are given the data points (x; y): ( 1; 0); ( ; 1); (1; 0); 0; and a choice between two models: constant f h₀(x) = b g and linear f h₁(x) = ax + b g. For which value of would the two models be tied using leave-one-out cross-validation with the squared error measure?

1. 3 + 4

1. 3 1 p _p
2. 9+4 6 p _p

1. 96

1. None of the above

PLA vs. SVM

Notice: Quadratic Programming packages sometimes need tweaking and have numeri-cal issues, and this is characteristic of packages you will use in practical ML situations. Your understanding of support vectors will help you get to the correct answers.

In the following problems, we compare PLA to SVM with hard margin¹ on linearly separable data sets. For each run, you will create your own target function f and data set D. Take d = 2 and choose a random line in the plane as your target function

(do this by taking two random, uniformly distributed points on [ 1; 1] [ 1; 1] and taking the line passing through them), where one side of the line maps to +1

and the other maps to 1. Choose the inputs x_n of the data set as random points in X = [ 1; 1] [ 1; 1], and evaluate the target function on each x_n to get the corresponding output y_n. If all data points are on one side of the line, discard the run and start a new run.

Start PLA with the all-zero vector and pick the misclassi ed point for each PLA iteration at random. Run PLA to nd the nal hypothesis g_PLA and measure the disagreement between f and g_PLA as P[f(x) 6= g_PLA(x)] (you can either calculate this exactly, or approximate it by generating a su ciently large, separate set of points to evaluate it). Now, run SVM on the same data to nd the nal hypothesis g_SVM by solving

min		1	w^Tw
	2
w;b
s:t:	y_n w^Tx_n + b 1

using quadratic programming on the primal or the dual problem. Measure the dis-agreement between f and g_SVM as P[f(x) 6= g_SVM(x)], and count the number of support vectors you get in each run.

For N = 10, repeat the above experiment for 1000 runs. How often is g_SVM better than g_PLA in approximating f? The percentage of time is closest to:

For hard margin in SVM packages, set C ! 1.

1. 40%

1. 60%

1. 80%

1. 100%

For N = 100, repeat the above experiment for 1000 runs. How often is g_SVM better than g_PLA in approximating f? The percentage of time is closest to:

1. 10%

1. 30%

1. 50%

1. 70%

1. 90%

For the case N = 100, which of the following is the closest to the average number of support vectors of g_SVM (averaged over the 1000 runs)?

1. 2

1. 3

1. 5

1. 10

1. 20

Homework 7 Solution

Description

Related products

Lab 3: “Thanks for All the Fish!!!” SOlution

Lab 7: Multithreaded Numerical Integration Solution

[solved]Homework 6: Hero Agents-Solution

Homework 4: A* Pathfinding Solution

Project 3A: Ray Tracing Spheres Solution