This section contains lecture slides and background material for learning data science with WEKA. All slides are in powerpoint (pptx) format and portable document format (pdf). Most required readings refer to the WEKA book (third edition).
Classification, regression, basic concepts, correlations, spurious correlations, decision tree, description vs prediction, WEKA file format and essentials.
k-Nearest Neighbour classifier, decision boundaries, decision trees, model complexity, Donoho’s paper on 50 years of data science.
Cross-validation evaluation procedure, classification versus regression, (univariate) regression, multivariate regression, model complexity, underfitting, overfitting, determining model complexity.
Train, test, and validation sets, feature dimensionality, curse of dimensionality, precision, recall, F1 score, principal component analysis (PCA).
Parameter optimisation in decision trees (J48), comparing classifiers, model selection, evaluation with t-test, WEKA’s Experimenter, paper on significance tests in data science.
Precision and recall (reprise), PCA (reprise), limitations of PCA, random decision forests (RDFs), naive Bayes, support vector machines (SVMs), kernels, RDFs naive Bayes and SVMs in WEKA.