Hong Kong Baptist University | |

Faculty of Science | |

Department of Mathematics | |

Title (Units): | MATH 3626 Computational Statistics for Data Science (3,3,0) |

Course Aims: | The course introduces data science from a practice-oriented viewpoint. Students will learn statistical concepts, data analytical methods, and their implementation through R programming language to deal with various facets of data science practice, including data visualization, exploratory data analysis, descriptive modeling and predictive modeling. To make the learning contextual, real datasets from a variety of disciplines will be used. |

Prerequisite: | MATH2005 Calculus, Probability, and Statistics for Computer Science or MATH2006 Calculus, Probability, and Statistics for Science or MATH2206 Probability and Statistics or COMP2865 Fundamental of Data Analysis and Management |

Prepared by: | S. N. Chiu, J. Fan, H. Peng |

**Course Intended Learning Outcomes (CILOs): **

Upon successful completion of this course, students should be able to:

No. | Course Intended Learning Outcomes (CILOs) |
---|---|

1 | Identify the applications and limitations of various data analytical methods. |

2 | Evaluate practical situations in different aspects and select appropriate data analytical methods. |

3 | Manipulate R programming language to analyze data. |

4 | Interpret the results from R programming language. |

5 | Formulate solution for real-life problems of interest to them. |

**Teaching & Learning Activities (TLAs) **

CILO | TLAs will include the following: |
---|---|

1,2,3,4,5 | Lecture Instructor will show simple real-life problems to motivate the statistical concepts and data analytical methods, followed by discussions of their implementation through R programming language. Students will then be required to consolidate the knowledge by further reading and through discussion within lectures. |

1,2,3,4,5 | In-class activities and assignmentsInstructor will give problems related to data science in simple real-life situations in lectures and assignments. In lectures the instructor will demonstrate how to formulate and solve the problems and discuss why a particular data analytical method is used. Students are required to contribute in the discussion by expressing their own opinions about what kind of methods can be applied and figure out how R programming language is useful in solving the problems. |

**Assessment: **

No. | Assessment Methods | Weighting | CILO Address | Remarks |
---|---|---|---|---|

1 | Project | 35% | 1,2,3,4,5 | Students are to work individually (or in small groups) to conduct real-life case studies to apply computational statistics methods to solve data-related problems. |

2 | Homework | 25% | 1,2,3,4,5 | The students will work individually on short questions to showcase their understanding of the theory and practice component of the subject. The homework will be given online and there will be 5 online homework.They allow the instructor to keep track of how well the students master the knowledge covered during different stages of the course. |

3 | Final Examination | 40% | 1,2,3,4,5 | The final exam is designed to assess how well students have learned the concepts and knowledge of the entire course. Students will be required to solve problems by explaining concepts/theories relating to computational statistics and data science. A large part of the exam will be based primarily on what is in the course material to check whether students can apply what they have learned in class. The rest will be used to assess students' ability to adapt what they have learned to new scenarios. |

**Course Intended Learning Outcomes and Weighting:**

Content | CILO No. |
---|---|

I. Introduction to data science | 1 |

II. Introduction to the programming language R | 3,4 |

III. Data visualization using R | 1,2,3,4,5 |

IV. Computational statistics using R | 1,2,3,4,5 |

V. Data analytical methods using R | 1,2,3,4,5 |

- Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R. Springer.
- Trevor Hastie, Robert Tibshirani and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
- EMC Education Services (2015). Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. John Wiley & Sons, Inc.
- Robert Kabacoff (2015). R in Action: Data Analysis and Graphics with R. Second Edition. Manning Publications.

**Course Contents in Outline:**

Topics | |||
---|---|---|---|

I | Introduction to data science | ||

A | Basic concepts in data science | ||

B | Big data in real life | ||

C | Examples of big data analytics | ||

II | Introduction to the programming language R | ||

A | R as a language and an environment for statistical computing and graphics | ||

B | Data handling and storage | ||

C | Graphics using R packages | ||

III | Data visualization using R | ||

A | Data collection and data manipulation | ||

B | Exploratory data analysis | ||

C | Big data visualization | ||

D | Infographics | ||

IV | Computational statistics using R | ||

A | Simulation | ||

B | Resampling methods | ||

C | Nonparametric methods | ||

D | Bayesian inference | ||

E | The EM algorithms | ||

F | Large-scale inference | ||

V | Data analytical methods using R | ||

A | High-dimensional clustering and heatmaps | ||

B | Data complexity reduction | ||

C | Linear model selection and regularization | ||

D | Moving beyond linearity | ||

E | Tree-based methods | ||

F | Predictive analytics in big data |

Updated on: 2024-05-31 01:56:49

Approved by Faculty Board meeting on 18 May 2022.

Approved by Faculty Board meeting on 31 October 2023.