Homework Assignment 4 Solution

$29.99 $18.99

Hand-in Instructions This homework assignment includes two written problems and a programming problem in Java. Hand in all parts electronically to your Canvas assignments page. For each written question, submit a single pdf file containing your solution. Handwritten submissions must be scanned. No photos or other file types allowed. For the programming question, submit a…

You’ll get a: . zip file solution

 

 

Description

Rate this product

Hand-in Instructions

This homework assignment includes two written problems and a programming problem in Java. Hand in all parts electronically to your Canvas assignments page. For each written question, submit a single pdf file containing your solution. Handwritten submissions must be scanned. No photos or other file types allowed. For the programming question, submit a zip file containing all the Java code necessary to run your program, whether you modified provided code or not.

Submit the following three files (with exactly these names):

<wiscNetID>-HW4-P1.pdf

<wiscNetID>-HW4-P2.pdf

<wiscNetID>-HW4-P3.zip

For example, for someone with UW NetID crdyer@wisc.edu the first file name must be:

crdyer-HW4-P1.pdf

Late Policy

All assignments are due at 11:59 p.m. on the due date. One (1) day late, defined as a 24-hour period from the deadline (weekday or weekend), will result in 10% of the total points for the assignment deducted. So, for example, if a 100-point assignment is due on a Wednesday and it is handed in between any time on Thursday, 10 points will be deducted. Two (2) days late, 25% off; three (3) days late, 50% off. No homework can be turned in more than three (3) days late. Written questions and program submission have the same deadline. A total of three (3) free late days may be used throughout the semester without penalty. Assignment grading questions must be discussed with a TA or grader within one week after the assignment is returned.

Collaboration Policy

You are to complete this assignment individually. However, you may discuss the general algorithms and ideas with classmates, TAs, peer mentors and instructor in order to help you answer the questions. But we require you to:

  • not explicitly tell each other the answers

  • not to copy answers or code fragments from anyone or anywhere

  • not to allow your answers to be copied

  • not to get any code from the Web

1

CS 540 Fall 2019

Problem 1. Neural Networks [15 points]

The figure below shows a 2-layer, feed-forward neural network with two hidden-layer nodes and one output node. x1 and x2 are the two inputs. For the following questions, assume the learning rate is α = 0.1. Each node also has a bias input value of +1. Assume there is a sigmoid activation function at the hidden layer nodes and at the output layer node. A sigmoid activation

function takes the form:

( ) =

1

where = =1and wi is the ith incoming weight to a

1+

node, xi is the ith incoming input value, and n is the number of incoming edges to the node.

  1. [5] Calculate the output values at nodes h1, h2 and of this network for input {x1 = 0, x2 = 1}. Each unit produces as its output the real value computed by the unit’s associated sigmoid function. Show all steps in your calculation.

  1. [10] Compute one (1) step of the backpropagation algorithm for a given example with input {x1 = 0, x2 = 1} and target output y = 1. The network output is the real-valued output of the sigmoid function, so the error on the given example is defined as = 12 ( )2 where O is the real-valued network output of that example at the output node, and y is the integer-valued target output for that example. Compute the updated weights for both the hidden layer and the output layer (there are nine updated weights in total (i.e., the three incoming weights to node h1, the three incoming weights to node h2 and the three incoming weights to node ) by performing ONE step of gradient descent. Show all steps in your calculation.

2

CS 540 Fall 2019

Problem 2. Constraint Satisfaction Problems [20 points]

Consider the following graph representing 7 countries on a map that needs to be colored using three different colors, 1, 2 and 3, so that no adjacent countries have the same color. Adjacencies are represented by edges in the graph. We can represent this problem as a CSP where the variables are the countries and the values are the colors.

  1. [5] What are the domains of all the variables after applying Forward Checking inference with variables ordered alphabetically (from A to G) and values ordered increasingly (from 1 to 3), assuming you start with each variable having all possible values except it is known that A has value 1 and E has value 2?

  1. [10] Apply the Backtracking Search algorithm (Figure 6.5 in the textbook) with Forward Checking inference (Section 6.3.2), assuming you start with each variable having all possible values. Variables and values are chosen following alphabetical ordering of the variables (A to G) and increasing order of the values (1 to 3), respectively. Show your result as a search tree where each node in the tree shows each variable with its set of possible values. Arcs in the search tree should be labeled with an assignment of a selected value to a selected variable. If a solution is found, show the final coloring of the map. The search tree only needs to show nodes and arcs until a single solution is found.

  1. [5] What are the domains of all the variables after applying Arc-Consistency (AC-3) inference (Figure 6.3) with variables ordered alphabetically and values ordered increasingly, assuming you start with each variable having all possible values except it is known that A has value 1, B has value 1, and C has value 2? List all the possible outcomes.

3

CS 540 Fall 2019

Problem 3. Back-Propagation for Handwritten Digit Recognition [65 points]

In this problem you are to write a program that builds a 2- layer, feed-forward neural network and trains it using the back-propagation algorithm. The problem that the neural network will handle is a multi-class classification problem for recognizing images of handwritten digits. All inputs to the neural network will be numeric. The neural network has one hidden layer. The network is fully connected between consecutive layers, meaning each unit, which we’ll call a node, in the input layer is connected to all nodes in the hidden layer, and each node in the hidden layer is connected to all nodes in the output layer. Each node in the hidden layer and the output layer will also have an extra input from a “bias node” that has constant value +1. So, we can consider both the input layer and the hidden layer as containing one additional node called a bias node. All nodes in the hidden layer (except for the bias node) should use the ReLU activation function, while all the nodes in the output layer should use the Softmax activation function. The initial weights of the network will be set randomly (already implemented in the skeleton code). Assuming that input examples (called instances in the code) have m attributes (hence there are m input nodes, not counting the bias node) and we want h nodes (not counting the bias node) in the hidden layer, and o nodes in the output layer, then the total number of weights in the network is (m +1) h between the input and hidden layers, and (h+1) o connecting the hidden and output layers. The number of nodes to be used in the hidden layer will be given as input.

You are only required to implement the following methods in the classes NNImpl and Node:

public class Node{

public void calculateOutput()

public void calculateDelta()

public void updateWeight(double learningRate)

}

public class NNImpl{

public int predict(Instance inst);

public void train();

private double loss(Instance inst);

}

void calculateOutput():

calculates the output at the current node and stores that value in a member variable called outputValue

void calculateDelta():

calculates the delta value, , at the current node and stores that value in a member variable called delta

void updateWeight(double learningRate):

updates the weights between parent nodes and the current node using the provided learning rate

int predict (Instance inst):

calculates the output (i.e., the index of the class) from the neural network for a given example

4

CS 540 Fall 2019

void train():

trains the neural network using a training set, fixed learning rate, and number of epochs (provided as input to the program). This function also prints the total Cross-Entropy loss on all the training examples after each epoch.

double loss(Instance inst):

calculates the Cross-Entropy loss from the neural network for a single instance. This function will be used by train()

Dataset

The dataset we will use is called Semeion (https://archive.ics.uci.edu/ml/datasets/Semeion+Handwritten+Digit). It contains 1,593 binary images of size 16 x 16 that each contain one handwritten digit. Your task is to classify each example image as one of the three possible digits: 6, 8 or 9. If desired, you can view an image using the supplied python code called view.py Usage is described at the beginning of this file (this is entirely optional if you want to view what an image in the dataset looks like. In other words, it has nothing to do with your implementation in Java).

Each dataset will begin with a header that describes the dataset: First, there may be several lines starting with “ //” that provide a description and comments about the dataset. The line starting with “**” lists the digits. The line starting with ” ##” lists the number of attributes, i.e., the number of input values in each instance (in our case, the number of pixels). You can assume that the number of classes will always be 3 for this homework because we are only considering 3-class classification problems. The first output node should output a large value when the instance is determined to be in class 1 (here meaning it is digit 6). The second output node should output a large value when the instance is in class 2 (i.e., digit 8) and, similarly, the third output node corresponds to class 3 (i.e., digit 9). Following these header lines, there will be one line for each instance, containing the values of each attribute followed by the target/teacher values for each output node. For example, if the last 3 values for an instance are: 0 0 1 then this means the instance is the digit 9. We have written the dataset loading part for you according to this format, so do not change it.

Implementation Details

We have created four classes to assist your coding, called Instance, Node, NNImpl and NodeWeightPair. Their data members and methods are commented in the skeleton code. An overview of these classes is given next.

The Instance class has two data members: ArrayList<Double> attributes and

ArrayList<Integer> classValues. It is used to represent one instance (aka example) as the name suggests. attributes is a list of all the attributes (in our case binary pixel values) of that instance (all of them are double) and classValues is the class (e.g., 1 0 0 for digit 6) for that instance.

The most important data member of the Node class is int type. It can take the values 0, 1, 2, 3 or 4. Each value represents a particular type of node. The meanings of these values are:

  1. an input node

  2. a bias node that is connected to all hidden layer nodes

  1. a hidden layer node

  2. a bias node that is connected to all output layer nodes

  3. an output layer node

5

CS 540 Fall 2019

exactly the same as in this homework. https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

  • The ReLU function is defined as

( ) = max (0, )

where = ∑ =1 and are the inputs to the given node, are the corresponding weights, and is the number of inputs including the bias. When updating weights, you’ll need to use the derivative of the ReLU, defined as

( ) =

0,

0

1,

  • The Softmax function is defined as

� � = =1

where is the weighted sum of its inputs at the jth output node, and is the number of output nodes.

  • The Cross-Entropy loss function for a single example ( ) is defined as

( ) = − =1 ( ) ln ( )

where ( ) is the target class value of the ith example ( ). For example, if the target digit is 6, then the target class value = [1, 0, 0]. The total loss is defined as the average loss across the entire training set:

= 1|| ()

| | =1

where | | is the number of examples in the training set. When updating weights, it’s easier to compute the gradients of the Softmax activation function and Cross-Entropy loss together, which is defined as

( )

=

( )

More details can be found at https://deepnotes.io/softmax-crossentropy

  • Based on the above information, if we set the learning rate to α, the weight update rule for this neural network is

w = α j

where is the output of node , and

(),

=

,

( )

8