Solved–Assignment #2 –SOlution

$30.00 $19.00

Problem 1 (20 Points) This problem investigates how changing the error measure can change the result of the learning process. You have N data points y1 yN and wish to estimate a ‘representative’ value. [10 pts] If your algorithm is to find the hypothesis h that minimizes the in-sample sum of squared deviations, N X…

You’ll get a: . zip file solution

 

 
Categorys:
Tags:

Description

5/5 – (2 votes)

Problem 1 (20 Points)

This problem investigates how changing the error measure can change the result of the learning process.

You have N data points y1 yN and wish to estimate a ‘representative’ value.

  1. [10 pts] If your algorithm is to find the hypothesis h that minimizes the in-sample sum of squared deviations,

N

X

Ein(h) = (h yn)2;

n=1

then show that your estimate will be the in sample mean,

N

1 X

hmean = N n=1 yn:

  1. [10 pts] If your algorithm is to find the hypothesis h that minimizes the in-sample sum of absolute deviations,

N

X

Ein(h) = jh ynj;

n=1

then show that your estimate will be the in-sample median hmed, which is any value for which half the data points are at most hmed and half the data points are at least hmed.

Problem 2 (20 Points)

Consider

en(w) = max(0; 1 ynw>xn):

  1. [5 pts] Show that en(w) is continuous and differentiable except when yn = w>xn.

  1. [5 pts] Show that en(w) is an upper bound for the “0-1 loss” or [[sign(w>xn) 6= yn]]. Thus, N n=1 en(w) is an upper bound for the in-sample classification error Ein(w).1PN

(c) [10

pts] Apply stochastic gradient descent on N

n=1 en(w) (ignoring the singular case of

1

N

w

x

= y

algorithm.

>

n

n) and derive a new perceptron learning

P

Note: en (w) corresponds to the “hinge loss” used for maximum-margin classification, most notably for support vector machines (SVMs).

Electrical and Computer Engineering, SNU

3

Problem 3 (20 Points)

There are a number of bounds on the generalization error , all holding with probability at least 1 .

(a) Original VC-bound:

r

N ln

H

8 4m

(2N)

(b) Rademacher penalty bound:

r

+ r

2 ln(2NN

H

(N))

+ N

N ln

m

2

1

1

(c) Parrondo and van den Broek:

s

N

2 + ln

6m

H

1

(2N)

(d) Devroye:

s

2N 4 (1 + ) + ln 4mH(N

2

)

1

Note that (c) and (d) are implicit bounds in . Fix dVC = 50 and = 0:05. Plot these bounds as a function of N. Which is the best?

Problem 4 (20 Points)

The bias-variance decomposition of out-of-sample error is based on squared error measures. Recall that the out-of-sample error is given by

h i

Eout g(D) = Ex (g(D)(x) f(x))2 (1)

where Ex denotes the expected value with respect to input x, and using g(D) is to make explicit the dependence of g on data D. From (1) we can remove the dependence on D by taking average:

ED hEout g(D) i = ED

hEx

h

(g(D)(x) f(x))2

ii

= Ex hED h(g(D)(x) f(x))2ii :

(a)

[5 pts] To evaluate ED (g(D)(x) f(x))2 , we define the ‘average’ hypothesis

(x) as

g

(x) , ED hg(D)(x)i :

(2)

g

Now imagine that we have K datasets D1; D2; : : : ; DK , then what will be the average hypothesis

(x) estimated using these datasets?

g

(b)

[10 pts] Let us define

var(x) , ED g(D)(x)

(x) 2

, and

g

bias(x) , (

(x) f(x))2 :

g

Show that

i = var(x) + bias(x)

ED h(g(D)(x) f(x))2

(c)

[5 pts] Show that

ED hEout(g(D))i = bias + var

where bias , Ex [bias(x)] and var , Ex [var(x)].

Problem 5 (20 Points)

In support vector machines, the hyperplane h separates the data if and only if it can be represented by weights (b; w) that satisfy

min

yn(w>xn + b) = 1:

(3)

n=1;:::;N

Consider the data below and a ‘hyperplane’ (b; w) that separates the data.

X =

22

23

y =

2 13

w =

3:2

b = 0:5

0

0

1

1:2

2

0

+1

4

5

4 5

(a) [10 pts] Compute

= min yn(w>xn + b):

1;:::;N

  1. [5 pts] Compute the weights 1 (b; w) and show that they satisfy Eq (3).

  1. [5 pts] Plot both hyperplanes to show that they are the same separator.

Note: This problem is to give a concrete example of re-normalizing the weights to show that the condition (3) and the following condition are equivalent, as covered at class:

yn(w>xn + b) > 0:

(4)