Lecture Slides

This section contains lecture slides and background material for learning data science with WEKA. All slides are in powerpoint (pptx) format and portable document format (pdf). Most required readings refer to the WEKA book (third edition).

Lecture slides 1: Introduction (pptx, pdf)

Classification, regression, basic concepts, correlations, spurious correlations, decision tree, description vs prediction, WEKA file format and essentials.

Lecture slides 2: Classification (pptx, pdf)

k-Nearest Neighbour classifier, decision boundaries, decision trees, model complexity, Donoho’s paper on 50 years of data science.

Lecture slides 3: Overfitting and Underfitting (pptx, pdf)

Cross-validation evaluation procedure, classification versus regression, (univariate) regression, multivariate regression, model complexity, underfitting, overfitting, determining model complexity.

Lecture slides 4: Curse of Dimensionality (pptx, pdf)

Train, test, and validation sets, feature dimensionality, curse of dimensionality, precision, recall, F1 score, principal component analysis (PCA).

Lecture slides 5: Evaluation (pptx, pdf)

Parameter optimisation in decision trees (J48), comparing classifiers, model selection, evaluation with t-test, WEKA’s Experimenter, paper on significance tests in data science.

Lecture slides 6: Classifiers (pptx, pdf)

Precision and recall (reprise), PCA (reprise), limitations of PCA, random decision forests (RDFs), naive Bayes, support vector machines (SVMs), kernels, RDFs naive Bayes and SVMs in WEKA.