Name: Homework 5 Solution
SKU: 48562
Price: 29.99 USD
Availability: InStock

Description

Rate this product

All questions have multiple-choice answers ([a], [b], [c], …). You can collaborate with others, but do not discuss the selected or excluded choices in the answers. You can consult books and notes, but not other people’s solutions. Your solutions should be based on your own work. De nitions and notation follow the lectures.

Note about the homework

The goal of the homework is to facilitate a deeper understanding of the course material. The questions are not designed to be puzzles with catchy answers. They are meant to make you roll up your sleeves, face uncertainties, and ap-proach the problem from di erent angles.

The problems range from easy to di cult, and from practical to theoretical. Some problems require running a full experiment to arrive at the answer.

The answer may not be obvious or numerically close to one of the choices, but one (and only one) choice will be correct if you follow the instructions precisely in each problem. You are encouraged to explore the problem further by experimenting with variations on these instructions, for the learning bene t.

You are also encouraged to take part in the forum http://book.caltech.edu/bookforum

where there are many threads about each homework set. We hope that you will contribute to the discussion as well. Please follow the forum guidelines for posting answers (see the \BEFORE posting answers” announcement at the top there).

Linear Regression Error

Consider a noisy target y = w ^T x + , where x 2 R^d (with the added coordinate x₀ = 1), y 2 R, w is an unknown vector, and is a noise term with zero mean and

variance. Assume is independent of x and of all other ’s. If linear regression is carried out using a training data set D = f(x₁; y₁); : : : ; (x_N ; y_N )g, and outputs the parameter vector w_lin, it can be shown that the expected in-sample error E_in with respect to D is given by:

E_D[E_in(w_lin)] = ²	1	d + 1

			N

1. For = 0:1 and d = 8, which among the following choices is the smallest number of examples N that will result in an expected E_in greater than 0.008?

1. 1. 10

1. 1. 25

1. 1. 100

1. 1. 500

1. 1. 1000

Nonlinear Transforms

In linear classi cation, consider the feature transform : R² ! R² (plus the added zeroth coordinate) given by:

(1; x₁; x₂) = (1; x²₁; x²₂)

Which of the following sets of constraints on the weights in the Z space could correspond to the hyperbolic decision boundary in X depicted in the gure?

x ₂

−1 +1 −1

^x 1

You may assume that w~₀ can be selected to achieve the desired boundary.

w~₁ = 0; w~₂ > 0

w~₁ > 0; w~₂ = 0

w~₁ > 0; w~₂ > 0

w~₁ < 0; w~₂ > 0

w~₁ > 0; w~₂ < 0

Now, consider the 4th order polynomial transform from the input space R²:

₄ : x ! (1; x₁; x₂; x²₁; x₁x₂; x²₂; x³₁; x²₁x₂; x₁x²₂; x³₂; x⁴₁; x³₁x₂; x²₁x²₂; x₁x³₂; x⁴₂)

1. What is the smallest value among the following choices that is not smaller than the VC dimension of a linear model in this transformed space?

1. 1. 3

1. 1. 5

1. 1. 15

1. 1. 20

1. 1. 21

Gradient Descent

Consider the nonlinear error surface E(u; v) = (ue^v 2ve ^u)². We start at the point (u; v) = (1; 1) and minimize this error using gradient descent in the uv space. Use

= 0:1 (learning rate, not step size).

1. What is the partial derivative of E(u; v) with respect to u, i.e., ^@E_@u ?

1. 1. (ue^v 2ve ^u)²

1. 1. 2(ue^v 2ve ^u)

1. 1. 2(e^v + 2ve ^u)

1. 1. 2(e^v 2ve ^u)(ue^v 2ve ^u)

[e] 2(e^v + 2ve ^u)(ue^v 2ve ^u)

How many iterations (among the given choices) does it take for the error E(u; v) to fall below 10 ¹⁴ for the rst time? In your programs, make sure to use double precision to get the needed accuracy.

1. 1

1. 3

1. 5

1. 10

1. 17

After running enough iterations such that the error has just dropped below 10 ¹⁴, what are the closest values (in Euclidean distance) among the following choices to the nal (u; v) you got in Problem 5?

1. (1:000; 1:000)

1. (0:713; 0:045)

1. (0:016; 0:112)

1. ( 0:083; 0:029)

1. (0:045; 0:024)

Now, we will compare the performance of \coordinate descent.” In each itera-tion, we have two steps along the 2 coordinates. Step 1 is to move only along the u coordinate to reduce the error (assume rst-order approximation holds like in gradient descent), and step 2 is to reevaluate and move only along the v coordinate to reduce the error (again, assume rst-order approximation holds). Use the same learning rate of = 0:1 as we did in gradient descent. What will the error E(u; v) be closest to after 15 full iterations (30 steps)?

them as the boundary between y = 1. Pick N = 100 training points at random from X , and evaluate the outputs y_n for each of these points x_n.

Run Logistic Regression with Stochastic Gradient Descent to nd g, and estimate E_out (the cross entropy error) by generating a su ciently large, separate set of points to evaluate the error. Repeat the experiment for 100 runs with di erent targets and take the average. Initialize the weight vector of Logistic Regression to all zeros in each run. Stop the algorithm when kw^(t ¹⁾ w^(t)k < 0:01, where w^(t) denotes the weight vector at the end of epoch t. An epoch is a full pass through the N data points (use a random permutation of 1; 2; ; N to present the data points to the algorithm within each epoch, and use di erent permutations for di erent epochs). Use a learning rate of 0.01.

1. Which of the following is closest to E_out for N = 100?

1. 1. 0.025

1. 1. 0.050

1. 1. 0.075

1. 1. 0.100

1. 1. 0.125

1. How many epochs does it take on average for Logistic Regression to converge for N = 100 using the above initialization and termination rules and the speci ed learning rate? Pick the value that is closest to your results.

1. 1. 350

1. 1. 550

1. 1. 750

1. 1. 950

1. 1. 1750

PLA as SGD

1. The Perceptron Learning Algorithm can be implemented as SGD using which of the following error functions e_n(w)? Ignore the points w at which e_n(w) is not twice di erentiable.

1. 1. e_n(w) = e ^yn^w^|^xn

1. 1. e_n(w) = y_nw^|x_n

1. 1. e_n(w) = (y_n w^|x_n)²
  2. e_n(w) = ln(1 + e ^yn^w^|^xn )
  3. e_n(w) = min(0; y_nw^|x_n)

Homework 5 Solution

Description

Related products

Homework 3: Mbed Setup Solution

Lab 7: Multithreaded Numerical Integration Solution

Project 5: Subdivision Surfaces

Project 4: GPU Programming Solution

Project 1A: Transformation Matrices Solution