Data Mining and Visualization STAN 45 (7.5p)
|
Course description:
This course on data mining and visualization cover methodology and applications in this
field.
By introducing principal ideas in statistical learning, the course will help students to understand methods in data
mining and computational aspects of algorithm implementation.
The course also explores the question of what visualization is, and why one should use visualizations for quantitative data.
Students are required to work on projects to practice applying existing software and to a certain extent, developing
their own algorithms.
Classes will be provided in three forms: lecture, project discussion, and special topic survey.
Project discussion will enable students to share and compare ideas with each other and to receive specific guidance
from the instructors.
By surveying special topics, students will be exposed to literature and become more aware of recent research.
In particular, basics for classification and clustering, e.g., linear classification methods, prototype methods, decision
trees, and hidden Markov models, will be introduced.
Five course lab sessions will be included with emphasis on understanding and using existing learning algorithms.
Lab sessions will focus on providing practice using real-world data.
A general introduction and overview of the subject is given in the following series of slides, while the official course description is given in here .
Learning outcomes:
Textbooks:
Supporting computer package:
All projects and exercises can be completed using
Lecturer:
Week # | Day | Time and Location | Lecture/Lab/Discussion # | Material | Handouts/Reading material |
---|---|---|---|---|---|
36 | Monday, 02/09/2019 | 08:15-10:00,Alfa1:1010 | Lecture 1: Overview, Introduction | Overview , Introduction | Data sets to download, Illustrative session in R - Lecture 1, Relevant sections in Textbook 1: 2.1-3, 5.2-3 and Textbook 2: 1, 3.2, 7.11, 8.2.1, Cumulative R-code for Week One -- Lectures |
10:15-12:00,Alfa1:1010 | Lecture 2: Bootstrap | Bootstrap | |||
37 | Monday, 09/09/2019 | 10:15-12:00, Alfa1:1010 | Discussion 1: Overview of computational tools -- Monte Carlo method and bootstrap | Assignment 1 | Relevant sections in Textbook 1: Sections 7.1-5, Section 7.7 in Textbook 2: Sections 5.1-5, Section 9.1 Assignment 1 , Assignment 1 - Solutions , |
Tuesday, 10/09/2019 | 10:15-12:00, Alfa1:1010 | Lecture 3: Splines | Splines, | ||
14:15-16:00, Alfa1:1010 | Lecture 4: Generalized Additive Models | Generalized Additive Models | |||
38 | Monday, 16/09/2019 | 10:15-12:00, Alfa1:1010 | Discussion 2: Splines | Assignment 2 | Relevant sections: in Textbook 1, Sections 4.1-5, 7.7; in Textbook 2, Section 4.1-3, 9.1 Computations in R for Assignment 2 , Assignment 2 - Solutions , Computer Lab 1 - R-code Help file , |
14:15-16:00, Alfa1:0043 | Lab 1: Work on Computer Project 1 | Computer Lab 1 | |||
Tuesday, 17/09/2019 | 10:15-12:00, Alfa1:1010 | Lecture 5: Classification | Classification | ||
39 | Monday, 23/09/2019 | 14:15-16:00, Alfa1:1010 | Lecture 6: Regression Trees | Regression Trees | Computer Project 2 - Help file , Relevant sections: in Textbook 1, Section 7.7; in Textbook 2, Section 9.2, Assignment 3 - Solutions , |
Tuesday, 24/09/2019 | 10:15-12:00, Alfa1:1010 | Discussion 3: Generalized Additive Models and Logistic Regression | Assignment 3 | ||
14:15-16:00, Alfa 0043 | Lab 2: Work on Computer Project 2 | Computer Lab 2 | |||
40 | Monday, 30/09/2019 | 10:15-12:00, Alfa1:1010 | Lecture 7: Boosting | Boosting | Textbook 1: Chapter 4, Sections 1-5; Chapter 8, Sections 1-2; Textbook 2: Chapter 4, Sections 1-4; Chapter 9, Section 2; R-Code for Project 3, Relevant sections in the textbook: 8.7, 9.2, 10.1, 10.2, 10.7-9, 10.13, 14.1, 14.3, 14.5 |
14:15-16:00, Alfa1:1010 | Lecture 8: Random Forests | Random Forests | |||
Tuesday, 01/10/2010 | 10:15-12:00, Alfa1:1010 | Discussion 4: Classification | Assignment 4 | ||
14:15-16:00, Alfa 0043 | Lab 3: Work on Computer Project 3 | Computer Lab 3 | |||
41 | Monday, 07/10/2019 | 10:15-12:00, Alfa1:1010 | Lecture 9: Unsupervised learning | Unsupervised Learning | Relevant sections in the textbook: 11.1-6, 12.1-3, R-code for Project 4 Assignment 4 - Solutions , |
14:15-16:00, Alfa1:1010 | Lecture 10: Neural Networks | Neural Networks | |||
Tuesday, 08/10/2019 | 10:15-12:00, Alfa1:1010 | Discussion 5: Regression Trees, Boosting and Random Forests | Assignment 5 | 14:15-16:00, Alfa 0043 | Lab 4: Work on Computer Project 4 | Computer Project 4, |
42 | Monday, 14/10/2019 | 10:15-12:00, Alfa1:1010 | Lecture 11: Support Vector Machines | Support Vector Machines | Assignment 5 - Solutions R-code for Computer Project 5, Instruction for oral presentations, Data source for final presentations Source #1 of Data for the Exam Project -- original data, Source #2 of Data for the Exam Project -- textbook data, Source #1 of Data for the Exam Project -- textbook data, |
14:15-16:00, Alfa1:1010 | Lecture 12: Random Graph Methods | Random Graph Methods | |||
Tuesday, 15/10/2018 | 10:15-12:00, Alfa 0043 | Lab 5: Work on Computer Project 5 | Computer Project 5 | ||
14:15-16:00, Alfa1:1010 | Lecture 13: Review of the material, Evaluation | ||||
43 | Monday, 21/10/2011 | 8:00-10:00, Alfa1:1048 | Exam Part I | ||
14:00-16:00, Alfa1:1010 | Exam Part II | ||||
Tuesday, 22/10/2019 | 8:00-10:00, Alfa1:1010 | Exam Part III | Exam Instructions | ||
Comprehensive Exam: The work throughout the course will be compounded into one comprehensive examination paper. It will comprise three parts:
The final grades will be assigned according to the following table:
Percentage | Grade |
---|---|
49 - 0 | F |
54 - 50 | E |
64 - 55 | D |
74 - 65 | C |
84 - 75 | B |
100 - 85 | A |
Usefull links: