Data Mining and Visualization STAN 45 (7.5p)

Course description:
This course on data mining and visualization cover methodology and applications in this
field.
By introducing principal ideas in statistical learning, the course will help students to understand methods in data
mining and computational aspects of algorithm implementation.
The course also explores the question of what visualization is, and why one should use visualizations for quantitative data.
Students are required to work on projects to practice applying existing software and to a certain extent, developing
their own algorithms.
Classes will be provided in three forms: lecture, project discussion, and special topic survey.
Project discussion will enable students to share and compare ideas with each other and to receive specific guidance
from the instructors.
By surveying special topics, students will be exposed to literature and become more aware of recent research.
In particular, basics for classification and clustering, e.g., linear classification methods, prototype methods, decision
trees, and hidden Markov models, will be introduced.
Five course lab sessions will be included with emphasis on understanding and using existing learning algorithms.
Lab sessions will focus on providing practice using realworld data.
A general introduction and overview of the subject is given in the following series of slides, while the official course description is given in here .
Learning outcomes:
Textbooks:
Supporting computer package:
All projects and exercises can be completed using
Lecturer:
Week #  Day  Time and Location  Lecture/Lab/Discussion #  Material  Handouts/Reading material 

36  Monday, 02/09/2019  08:1510:00,Alfa1:1010  Lecture 1: Overview, Introduction  Overview , Introduction  Data sets to download, Illustrative session in R  Lecture 1, Relevant sections in Textbook 1: 2.13, 5.23 and Textbook 2: 1, 3.2, 7.11, 8.2.1, Cumulative Rcode for Week One  Lectures 
10:1512:00,Alfa1:1010  Lecture 2: Bootstrap  Bootstrap  
37  Monday, 09/09/2019  10:1512:00, Alfa1:1010  Discussion 1: Overview of computational tools  Monte Carlo method and bootstrap  Assignment 1  Relevant sections in Textbook 1: Sections 7.15, Section 7.7 in Textbook 2: Sections 5.15, Section 9.1 Assignment 1 , Assignment 1  Solutions , 
Tuesday, 10/09/2019  10:1512:00, Alfa1:1010  Lecture 3: Splines  Splines,  
14:1516:00, Alfa1:1010  Lecture 4: Generalized Additive Models  Generalized Additive Models  
38  Monday, 16/09/2019  10:1512:00, Alfa1:1010  Discussion 2: Splines  Assignment 2  Relevant sections: in Textbook 1, Sections 4.15, 7.7; in Textbook 2, Section 4.13, 9.1 Computations in R for Assignment 2 , Assignment 2  Solutions , Computer Lab 1  Rcode Help file , 
14:1516:00, Alfa1:0043  Lab 1: Work on Computer Project 1  Computer Lab 1  
Tuesday, 17/09/2019  10:1512:00, Alfa1:1010  Lecture 5: Classification  Classification  
39  Monday, 23/09/2019  14:1516:00, Alfa1:1010  Lecture 6: Regression Trees  Regression Trees  Computer Project 2  Help file , Relevant sections: in Textbook 1, Section 7.7; in Textbook 2, Section 9.2, Assignment 3  Solutions , 
Tuesday, 24/09/2019  10:1512:00, Alfa1:1010  Discussion 3: Generalized Additive Models and Logistic Regression  Assignment 3  
14:1516:00, Alfa 0043  Lab 2: Work on Computer Project 2  Computer Lab 2  
40  Monday, 30/09/2019  10:1512:00, Alfa1:1010  Lecture 7: Boosting  Boosting  Textbook 1: Chapter 4, Sections 15; Chapter 8, Sections 12; Textbook 2: Chapter 4, Sections 14; Chapter 9, Section 2; RCode for Project 3, Relevant sections in the textbook: 8.7, 9.2, 10.1, 10.2, 10.79, 10.13, 14.1, 14.3, 14.5 
14:1516:00, Alfa1:1010  Lecture 8: Random Forests  Random Forests  
Tuesday, 01/10/2010  10:1512:00, Alfa1:1010  Discussion 4: Classification  Assignment 4  
14:1516:00, Alfa 0043  Lab 3: Work on Computer Project 3  Computer Lab 3  
41  Monday, 07/10/2019  10:1512:00, Alfa1:1010  Lecture 9: Unsupervised learning  Unsupervised Learning  Relevant sections in the textbook: 11.16, 12.13, Rcode for Project 4 Assignment 4  Solutions , 
14:1516:00, Alfa1:1010  Lecture 10: Neural Networks  Neural Networks  
Tuesday, 08/10/2019  10:1512:00, Alfa1:1010  Discussion 5: Regression Trees, Boosting and Random Forests  Assignment 5  
14:1516:00, Alfa 0043  Lab 4: Work on Computer Project 4  Computer Project 4,  
42  Monday, 14/10/2019  10:1512:00, Alfa1:1010  Lecture 11: Support Vector Machines  Support Vector Machines  Assignment 5  Solutions Rcode for Computer Project 5, Instruction for oral presentations, Data source for final presentations Source #1 of Data for the Exam Project  original data, Source #2 of Data for the Exam Project  textbook data, Source #1 of Data for the Exam Project  textbook data, 
14:1516:00, Alfa1:1010  Lecture 12: Random Graph Methods  Random Graph Methods  
Tuesday, 15/10/2018  10:1512:00, Alfa 0043  Lab 5: Work on Computer Project 5  Computer Project 5  
14:1516:00, Alfa1:1010  Lecture 13: Review of the material, Evaluation  
43  Monday, 21/10/2011  8:0010:00, Alfa1:1048  Exam Part I  
14:0016:00, Alfa1:1010  Exam Part II  
Tuesday, 22/10/2019  8:0010:00, Alfa1:1010  Exam Part III  Exam Instructions  
Comprehensive Exam: The work throughout the course will be compounded into one comprehensive examination paper. It will comprise three parts:
The final grades will be assigned according to the following table:
Percentage  Grade 

49  0  F 
54  50  E 
64  55  D 
74  65  C 
84  75  B 
100  85  A 
Usefull links: