Hong Kong Baptist University Faculty of Science Department of Mathematics Title (Units): MATH 3836 Data Mining (3,3,0) Course Aims: This course introduces the concept of data mining and data mining techniques (including advanced statistical and machine learning techniques) for solving problems such as data cleaning, clustering, classification, relation detection, forecasting. It also introduces students to modern data mining applications such as recommendation systems and mining natural languages. Anti-requisite: COMP4027 Data Mining and Knowledge Discovery Prepared by: Y.D. XU

Course Intended Learning Outcomes (CILOs):

Upon successful completion of this course, students should be able to:

No.Course Intended Learning Outcomes (CILOs)
1Explain the fundamental principles of data mining
2Identify a working knowledge of data mining
3Interpret information from data mining
4Apply data mining skills and techniques
5Report the interpretation of findings in a scientific and concise manner
6Solve problems logically, analytically, critically and creatively

Teaching & Learning Activities (TLAs)

CILOTLAs will include the following:
1,2,3,4,5,6Lecture
Lectures with rigorous mathematical discussions and concrete examples. The lecturer will constantly ask questions in class to make sure that the majority of students are following the teaching materials. The lecture will also include Python programming examples to illustrate some of the concepts.
1,2,3,4,5,6In-class activity
A problem-based approach will be used, using examples from real-life data mining problems in lectures to stimulate the learning of concepts, followed by software demos to consolidate the knowledge gained.
1,2,3,4,5,6Student Orientated Case Study
A real-life case study of data mining application will be conducted using knowledge gained both during class, as well as from other findings of student(s)’s own research.

Assessment:

1Tests40%1,2,3,4,5,6There will be 2 tests. Each of them is designed to assess how well students have learned the concepts and knowledge of the completed part of the course. Students will be required to solve problems by explaining concepts/theories relating to data mining. A large part of the tests will be based primarily on what is in the course material to check whether students can apply what they have learned in class. The rest will be used to assess student’s ability to adapt what they have learned to new scenarios.
2Project35%1,2,3,4,5,6Students are to work individually (or in small groups) to conduct real-life case studies to apply data mining techniques.
3Homework25%1,2,3,4,5,6The students will work individually on short questions to showcase their understanding of the theory and practice component of the subject. The homework will be given online and there will be 5 online homework. They allow the instructor to keep track of how well the students master the knowledge covered during different stages of the course.

Course Intended Learning Outcomes and Weighting:

ContentCILO No.
I. Introduction1,2,3,4
II. Mining Association Rules In Large Databases1,2,3,4,5,6
III. Dimension Reduction techniques1,2,3,4,5,6
IV. Supervised Learning1,2,3,4,5,6
V. Unsupervised Learning1,2,3,4,5,6
VI. Recommendation Systems and mining natural language 2,3,4,5,6

Textbook

1. Lecture notes prepared by the instructor
References
1. Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, and Vipin Kumar, Introduction to Data Mining (2nd Edition), Pearson, 2019
2. Galit Shmueli, Peter C. Bruce, Inbal Yahav, Nitin R. Patel and Kenneth C. Lichtendahl Jr., Data Mining for Business Analytics, Concepts, Techniques, and Applications, Wiley, 2017
3. Goodfellow, I.; Bengio, Y. & Courville, A., Deep Learning, MIT Press, 2016
4. Jiawei Han, Micheline. Kamber and Jian Pei, Data Mining: Concepts and Techniques, Third Edition, The Morgan Kaufmann Publishers, 2011.
5. Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

Course Contents in Outline:

Topics

IIntroduction
AThe Knowledge Discovery Based in Databases (KDD)
BData and Data Visualization
CData Warehouse and Cloud storage
DData Cleaning and Preprocessing
EData Mining Principles

IIMining Association Rules In Large Databases
AAssociation Rule Mining
BMining Multidimensional Association Rules From Relational Databases

IIIDimension Reduction techniques
APrincipal Components Analysis
BGaussian Process Latent Variable Model
Ct-distributed Stochastic Neighbour Embedding

IVSupervised Learning
ANeural Networks
BLinear and Partial Regression
CMetric Learning

VUnsupervised Learning
AK – Means Clustering
BGaussian Mixture Model
CLatent Dirichlet Allocation

VIRecommendation Systems and mining natural language
ACollaborative Filtering
BNon-negative Matrix factorization