Homework 1 Solution

$29.99 $18.99

Instructions You can use either C/C++, Java or Python to implement your algo-rithms. Your implementations should compile on remote.cs.binghamton.edu. Make sure remote.cs.binghamton.edu has the packages that you require before starting to implement. This homework requires you to implement decision tree. Using existing decision tree package is not allowed. Your homework should contain the following components:…

You’ll get a: . zip file solution

 

 
Categorys:
Tags:

Description

Rate this product
  • Instructions

    1. You can use either C/C++, Java or Python to implement your algo-rithms.

    1. Your implementations should compile on remote.cs.binghamton.edu.

    1. Make sure remote.cs.binghamton.edu has the packages that you require before starting to implement.

    1. This homework requires you to implement decision tree. Using existing decision tree package is not allowed.

    1. Your homework should contain the following components:

      1. README.txt le with detailed instructions on how to compile and run the code.

      1. Code source les

      1. Type-written document containing the results on the datasets.

      1. For complete credit, you need to report correct results on both the datasets, for both the heuristics. If you fail to report results for all the above-mentioned combinations, points will be deducted.

    1. Submit the homework as a single zip le: f irstname lastname hw1:zip.

Inducing Decision Trees

In this homework you will implement and test the decision tree learning algorithm (See Mitchell, Chapter 3).

Download the two datasets available on myCourses. Each data set is divided into three sets: the training set, the validation set and the test set. Data sets are in CSV format. The rst line in the le gives the attribute names. Each line after that is a training (or test) example that contains a list of attribute values separated by a comma. The last attribute is the class-variable. Assume that all attributes take values from the domain f0,1g.

(100 points) Implement the decision tree learning algorithm. As dis-cussed in class, the main step in decision tree learning is choosing the next attribute to split on. Implement the following two heuristics for selecting the next attribute.

    1. Information gain heuristic (See Class slides, Mitchell Chapter 3).

    1. Variance impurity heuristic described below.

Let K denote the number of examples in the training set. Let K0 denote the number of training examples that have class = 0 and K1 denote the number of training examples that have class = 1. The variance impurity of the training set S is de ned as:

V I(S) = K0 K1

K K

Notice that the impurity is 0 when the data is pure. The gain for this impurity is de ned as usual.

X

Gain(S; X) = V I(S) Pr(x)V I(Sx)

x2V alues(X)

where X is an attribute, Sx denotes the set of training examples that have X = x and Pr(x) is the fraction of the training examples that have X = x (i.e., the number of training examples that have X = x divided by the number of training examples in S).

Your implementations should be called from the command line to get complete credit. Points will be deducted if it is not possible to run all the di erent versions of your implementa-tion from the command line.

Implement a function to print the decision tree to standard output. We will use the following format.

wesley = 0 : | honor = 0 :

| | barclay = 0 : 1

| | barclay = 1 : 0 | honor = 1 :

| | tea = 0 : 0

| | tea = 1 : 1

wesley = 1 : 0

According to this tree, if wesley = 0 and honor = 0 and barclay = 0, then the class value of the corresponding instance should be 1. In other words, the value appearing before a colon is an attribute value, and the value appearing after a colon is a class value.

Once we compile your code, we should be able to run it from the command line. Your program should take as input the following four arguments:

.\program <training-set> <validation-set> <test-set> <to-print> to-print:{yes,no} <heuristic>

It should output the accuracies on the test set for decision trees con-structed using the two heuristics. If to-print equals yes, it should print the decision tree in the format described above to the standard output.

A README.txt le with detailed instructions on compiling the code. A document containing the accuracy on the training, validation, and test set in both datasets for decision trees constructed using the two

heuristics mentioned above.

3