Week8

Types of Data for classificatin

Qualitative Data

  • Categorical or nominal data

  • Ordinal or Ranked Data

    Quantitative Data

  • Discrete Data

  • Continuous measurements

Handling Categorical/Nominal data----dummy encoding

from statsmodels.tools import categorical
cat_encod=categorical(data, dictnames=False, drop=True)
# dictnames: create a dict
# drop: create new /drop data

Support Vector Machine (SVM)

https://oceandatamining.sciencesconf.org/data/program/OBIDAM14_Canu.pdf https://web.stanford.edu/~hastie/Papers/ESLII.pdf

Naive Bayesian

A collection of classification algorithms.

Random Forest

Classification Metrics: confusion matrix!

Diagonal elements of the matrix, it contains the number of correctly identified samples for each class

important note: use heatmap to plot the CM

Exercise

RF

Import the dataset from the following ‘url’ and do a classification with decision tree and Random Forest (RF) with number of trees equal to 5, and compare the result of testing data with confusion matrix.

Last updated

Was this helpful?