Solved–Project 01– Solution

$35.00 $24.00

Task The task of this project is to perform classi cation using machine learning. It is for a two class problem. The features used for classi cation are pre-computed from images of a ne needle aspirate (FNA) of a breast mass. Your task is to classify suspected FNA cells to Benign (class 0) or Malignant…

You’ll get a: . zip file solution

 

 
Categorys:
Tags:

Description

5/5 – (2 votes)
  • Task

The task of this project is to perform classi cation using machine learning. It is for a two class problem. The features used for classi cation are pre-computed from images of a ne needle aspirate (FNA) of a breast mass. Your task is to classify suspected FNA cells to Benign (class 0) or Malignant (class

  1. using logistic regression as the classi er. The dataset in use is the Wisconsin Diagnostic Breast Cancer (wdbc.dataset). The code should be written in Python from scratch. Deadline to submit the code and the report on timberlake server

  • Dataset

Wisconsin Diagnostic Breast Cancer (WDBC) dataset will be used for training, validation and testing. The dataset contains 569 instances with 32 attributes (ID, diagnosis (B/M), 30 real-valued input features). Features are computed from a digitized image of a ne needle aspirate (FNA) of a breast mass. Computed features describes the following characteristics of the cell nuclei present in the image:

1

radius (mean of distances from center to points on the perimeter)

2

texture (standard deviation of gray-scale values)

3

perimeter

4

area

5

smoothness (local variation in radius lengths)

6

compactness (perimeter2=area 1:0)

7

concavity (severity of concave portions of the contour)

8

concave points (number of concave portions of the contour)

9

symmetry

10

fractal dimension (\coastline approximation” – 1)

The mean, standard error, and \worst” or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features.

1

  • Plan of Work

    1. Extract features values and Image Ids from the data: Process the original CSV data les into a Numpy matrix or Pandas Dataframe.

    1. Data Partitioning: Partition your data into training, validation and testing data. Randomly choose 80% of the data for training and the rest for validation and testing.

    1. Train using Logistic Regression: Use Gradient Descent for logistic regression to train the model using a group of hyperparameters.

    1. Tune hyper-parameters: Validate the regression performance of your model on the validation set. Change your hyper-parameters. Try to nd what values those hyper-parameters should take so as to give better performance on the validation set.

    1. Test your machine learning scheme on the testing set: After nishing all the above steps, x your hyper-parameters and model parameter and test your models performance on the testing set. This shows the ultimate e ectiveness of your models generalization power gained by learning.

  • Evaluation

    1. Print out a graph showing training accuracy versus number of epochs.

    1. Evaluate your solution on the test set using Accuracy, Precision and Recall.

Accuracy =

TP +TN

(1)

TP +TN +FP +FN

P recision =

T P

(2)

TP +FP

Recall =

T P

(3)

TP +FN

Where TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False Negatives.

  • Deliverables

There are two deliverables: report and code. After nishing the project, you may be asked to demon-strate it to the TAs, particularly if your results and reasoning in your report are not clear enough.

1. Report (30 points)

2

The report should describe your results, experimental setup and comparison between the results obtained from di erent setting of the algorithm and dataset. Submit the PDF on a CSE student server with the following script:

submit cse474 proj1.pdf for undergraduates

submit cse574 proj1.pdf for graduates

  1. Code (70 points)

The code for your implementation should be in Python only. You can submit multiple les, but the name of the entrance le should be main.ipynb. Please provide necessary comments in the code. Python code and data les should be packed in a ZIP le named proj1code.zip. Submit the Python code on a CSE student server with the following script:

submit cse474 proj1code.zip for undergraduates submit cse574 proj1code.zip for graduates

3