Name: Solved--Homework I-- Solution
SKU: 20155
Price: 30.00 USD
Availability: InStock

Description

5/5 – (2 votes)

Theory

I: A \warm up” problem 5 points

Consider instances of X drawn from the uniform distribution D on [ 1; 1]. Let f denote the actual labeling function mapping each instance to its label y 2 f 1; 1g with probabilities

P r(y = 1jx > 0) = 0:9

P r(y = 1jx > 0) = 0:1

P r(y = 1jx 0) = 0:1

P r(y = 1jx 0) = 0:9

The hypothesis h predicts the label for each instance as de ned below:

(

1 if x > 0

h(x) =

otherwise

Measure the success of this predictor by calculating the training error for h.

II: Bayes Optimal Predictor 5 points

Show that for every probability distribution D, the Bayes optimal predictor f_D is, in fact, optimal. That is, show that for every classi er g : X ! f0; 1g,

L_D(f_D) L_D(g):

III: Perceptron with a Learning Rate 5 points

Let us modify the perceptron algorithm as follows:

In the update step, instead of setting w⁽ ^t+1) = w^(t) + y_ix_i every time there is a misclassi cation, we set w^(t+1) = w^(t) + y_ix_i instead, for some 0 < < 1. Show that this modi ed perceptron will

perform the same number of iterations as the original, and

converge to a vector that points to the same direction as the output of the original perceptron.

IV: Unidentical Distributions 5 points

Let X be a domain and let D₁; D₂; :::; D_m be a sequence of distributions over X . Let H be a nite class of binary classi ers over X and let f 2 H. Suppose we are getting a sample S of m examples such that the instances are independent but not identically distributed, the i^th instance is sampled from D_i and then y_i is set to be f(x_i). Let D_m denote the average, i.e., D_m = (D₁+ . . . +D_m)=m.

CSE 353			Homework I

Fix an accuracy parameter 2 (0; 1). Show that
P r ^h	9 h 2 H s.t. L₍		_m_;f)^(h) ^> ^and ^L(S;f)^{(h) = 0}	ⁱ _jHje m
		D

Programming

In this section, you will be working with a Breast Cancer Prediction dataset. Diagnosis of breast cancer is performed when an abnormal lump is found (from self-examination or x-ray) or a tiny speck of calcium is seen (on an x-ray). In this dataset, there are 569 observations, with each observation consisting of 5 features (mean radius, mean texture, mean perimeter, mean area, mean smoothness). The last column is the diagnosis { 0 indicates that the nding was benign, and 1 indicates that it was malignant.

I: Perceptron 25 points

The rst task is to write your own code in Java or Python to implement the perceptron learning algorithm. Your goal is minimize the empirical risk (i.e., no need to do validation or structural risk minimization) for the perceptron to learn the diagnosis.

As you can tell, this is a binary classi cation task. For this task, submit your code for the perceptron learning algorithm. This code should be able to read the dataset, and eventually provide the learned weight vector as its nal output.

II: A modi ed Perceptron algorithm 15 points

The second task is to modify the perceptron algorithm to exploit the observations that do not lead to updated weights. This modi ed perceptron algorithm does the following: for each weight vector w, (i) it keeps count of the number of consecutive correctly classi ed observations until the next update, and (ii) it always keeps the weight vector w that had the longest such streak. This variation is called the pocket algorithm because it keeps the best solution seen so far \in the pocket”, and returns this rather than the last solution (see Alg. 1).

Your submission for this task should be your code for the pocket algorithm. Just like the original perceptron code, this code too should be able to read the dataset, and eventually provide the learned weight vector as its nal output. You may write a totally separate code for the pocket algorithm, or you may choose to incoroporate it as a ‘version’ of the perceptron algorithm. An example of the latter approach could be something that looks like

python perceptron.py –version naive –dataset /path/to/data/filename.csv

python perceptron.py –version pocket –dataset /path/to/data/filename.csv

III: Linear Regression for Classi cation 25 points

The third and nal programming task is to use linear regression for classi cation. In class, we saw how logistic regression has a nice probabilistic interpretation that can be used for binary classi cation. Linear regression, however, does not.

What you are required to do, instead, is divide the dataset into two sub-datasets according to whether the diagnosis is 0 or 1. Next, run linear regression on each of these two sub-datasets to obtain two linear subspaces that t the benign ndings and the malignant ndings, respectively. Once you have obtained these two subspaces (both through ERM), the actual binary class cation will be done by computing the Euclidean distance of each observation from the two subspaces, and assigning the label of the closer t. That is, once you obtain the two vectors w₀ (for benign) and

CSE 353 Homework I Due by Mar 29,

w₁ (for malignant) as output of the two linear regressions, an observation will be classi ed as 0 or 1 depending on which of these two weight vectors it is closer to.

Your submission for this task should be your code for linear regression. This code should be able to perform the whole task de ned above, and not require the user to manually divide the dataset and call the ‘usual’ linear regression code multiple times. For example, the user should be able to do something like

$ python linearregressionclassifier.py –dataset /path/to/data/filename.csv

and obtain the classi er’s performance.

IV: Final Report 15 points

Along with your code, you are also required to submit a short report (no more than 2 pages)¹. The report must contain the following:

The performance of each model (i.e., the empirical risk).

A brief discussion (at most a half-page) about what you observed regarding the runtime and termination of the original perceptron algorithm, and what you did if/when the algorithm did not stop after some reasonable amount of time. In this discussion, also investigate why the algorithm did or did not stop and justify your reasoning.

A brief discussion (at most a half-page) about what you observed regarding the runtime and termination of the pocket algorithm, especially when compared to the original perceptron algorithm.

A brief discussion (at most a half-page) about using linear regression in this way and what might be the potential bene ts and/or drawbacks of the approach.

A README section explaining how to run your code. Keep in mind that the grader will only run the code, and not do any manual work (e.g., placing the dataset in the same folder as your code, or divide the dataset into two, etc. etc.) to make sure that your code runs \as is”!

The report will be graded based on (a) performance details, (b) replicability of experiments, (c) explanation of how to run your code, and (d) the quality of the discussions about each algorithm.

Notes

What programming languages are allowed?

Java (JDK 1.8 or above), Python (3.x).

You are NOT allowed to use any data science and/or machine learning library for this assign-ment. If you have doubts about whether the use of a library is acceptable, please ask before assuming that it can be used!

You can use libraries for the matrix multiplication and eigenvalue decomposition steps re-quired in linear regression. Beyond that, though, the ERM code for linear regression must be your own original code.

What should we submit, and how?

Submit a single .zip archive containing (i) one folder simply called \code”, containing all your code, (ii) a PDF document for your report, and (iii) a PDF document for the theory part of your submission. Please do NOT handwrite the solutions and scan or take pictures. For the PDF documents, use either L^AT_EXor MS Word to write the solutions, and export to PDF. Anything else will not be graded.

^{For the report, please stick to single line spacing.}

CSE 353	Homework I	Due by Mar 29,

Input:
Training data: S = f(x_i; y_i)g_iⁿ₌₁, X = 1 R^d, Y = f		1; 1g.
Learning rate parameter: 0 < < 1
Maximum number of steps: max_iter
Initialization:
w	0

^wpocket ⁰

run 0

^runpocket ⁰

while w has updated ^ run < max_iter do

draw the next labeled example (x; y) from S;

if (hw; xi 0 ^ y = 1) _ (hw; xi < 0 ^ y = 1) then if run > run_pocket then

^wpocket ^w^;

run_pocket run;

end

w + x;

run 0;

else

run run +1;

end

Output: w_pocket 2 R^d+1.

Algorithm 1: Pocket Algorithm

Solved–Homework I– Solution

Description

Related products

Lab 7: Multithreaded Numerical Integration Solution

Mario Level Generation Solution

Homework 1: Grid Navigation Solution

Animation Project 2B: Solution

Project 1B: Transformation Matrices Solution