Hong Kong Baptist University | |

Faculty of Science | |

Department of Mathematics | |

Title (Units): | MATH 3836 Data Mining (3,3,0) |

Course Aims: | This course introduces the concept of data mining and data mining techniques (including advanced statistical and machine learning techniques) for solving problems such as data cleaning, clustering, classification, relation detection, forecasting. It also introduces students to modern data mining applications such as recommendation systems and mining natural languages. |

Anti-requisite: | COMP4027 Data Mining and Knowledge Discovery |

Prepared by: | Y.D. XU |

**Course Intended Learning Outcomes (CILOs): **

Upon successful completion of this course, students should be able to:

No. | Course Intended Learning Outcomes (CILOs) |
---|---|

1 | Explain the fundamental principles of data mining |

2 | Identify a working knowledge of data mining |

3 | Interpret information from data mining |

4 | Apply data mining skills and techniques |

5 | Report the interpretation of findings in a scientific and concise manner |

6 | Solve problems logically, analytically, critically and creatively |

**Teaching & Learning Activities (TLAs) **

CILO | TLAs will include the following: |
---|---|

1,2,3,4,5,6 | LectureLectures with rigorous mathematical discussions and concrete examples. The lecturer will constantly ask questions in class to make sure that the majority of students are following the teaching materials. The lecture will also include Python programming examples to illustrate some of the concepts. |

1,2,3,4,5,6 | In-class activityA problem-based approach will be used, using examples from real-life data mining problems in lectures to stimulate the learning of concepts, followed by software demos to consolidate the knowledge gained. |

1,2,3,4,5,6 | Student Orientated Case StudyA real-life case study of data mining application will be conducted using knowledge gained both during class, as well as from other findings of student(s)’s own research. |

**Assessment: **

No. | Assessment Methods | Weighting | CILO Address | Remarks |
---|---|---|---|---|

1 | Tests | 40% | 1,2,3,4,5,6 | There will be 2 tests. Each of them is designed to assess how well students have learned the concepts and knowledge of the completed part of the course. Students will be required to solve problems by explaining concepts/theories relating to data mining. A large part of the tests will be based primarily on what is in the course material to check whether students can apply what they have learned in class. The rest will be used to assess student’s ability to adapt what they have learned to new scenarios. |

2 | Project | 35% | 1,2,3,4,5,6 | Students are to work individually (or in small groups) to conduct real-life case studies to apply data mining techniques. |

3 | Homework | 25% | 1,2,3,4,5,6 | The students will work individually on short questions to showcase their understanding of the theory and practice component of the subject. The homework will be given online and there will be 5 online homework. They allow the instructor to keep track of how well the students master the knowledge covered during different stages of the course. |

**Course Intended Learning Outcomes and Weighting:**

Content | CILO No. |
---|---|

I. Introduction | 1,2,3,4 |

II. Mining Association Rules In Large Databases | 1,2,3,4,5,6 |

III. Dimension Reduction techniques | 1,2,3,4,5,6 |

IV. Supervised Learning | 1,2,3,4,5,6 |

V. Unsupervised Learning | 1,2,3,4,5,6 |

VI. Recommendation Systems and mining natural language | 2,3,4,5,6 |

** Textbook**

- Lecture notes prepared by the instructor

- Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, and Vipin Kumar, Introduction to Data Mining (2nd Edition), Pearson, 2019
- Galit Shmueli, Peter C. Bruce, Inbal Yahav, Nitin R. Patel and Kenneth C. Lichtendahl Jr., Data Mining for Business Analytics, Concepts, Techniques, and Applications, Wiley, 2017
- Goodfellow, I.; Bengio, Y. & Courville, A., Deep Learning, MIT Press, 2016
- Jiawei Han, Micheline. Kamber and Jian Pei, Data Mining: Concepts and Techniques, Third Edition, The Morgan Kaufmann Publishers, 2011.
- Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

**Course Contents in Outline:**

Topics | |||
---|---|---|---|

I | Introduction | ||

A | The Knowledge Discovery Based in Databases (KDD) | ||

B | Data and Data Visualization | ||

C | Data Warehouse and Cloud storage | ||

D | Data Cleaning and Preprocessing | ||

E | Data Mining Principles | ||

II | Mining Association Rules In Large Databases | ||

A | Association Rule Mining | ||

B | Mining Multidimensional Association Rules From Relational Databases | ||

III | Dimension Reduction techniques | ||

A | Principal Components Analysis | ||

B | Gaussian Process Latent Variable Model | ||

C | t-distributed Stochastic Neighbour Embedding | ||

IV | Supervised Learning | ||

A | Neural Networks | ||

B | Linear and Partial Regression | ||

C | Metric Learning | ||

V | Unsupervised Learning | ||

A | K – Means Clustering | ||

B | Gaussian Mixture Model | ||

C | Latent Dirichlet Allocation | ||

VI | Recommendation Systems and mining natural language | ||

A | Collaborative Filtering | ||

B | Non-negative Matrix factorization | ||

C | Other advanced recommender algorithms | ||

D | Word and Sentence embedding |

Updated on: 2024-05-31 02:02:31

Approved by Faculty Board meeting on 18 May 2022.

Approved by Faculty Board meeting on 31 October 2023.