Hong Kong Baptist University Faculty of Science Department of Mathematics Title (Units): MATH 3626 Computational Statistics for Data Science (3,3,0) Course Aims: The course introduces data science from a practice-oriented viewpoint. Students will learn statistical concepts, data analytical methods, and their implementation through R programming language to deal with various facets of data science practice, including data visualization, exploratory data analysis, descriptive modeling and predictive modeling. To make the learning contextual, real datasets from a variety of disciplines will be used. Prerequisite: MATH2005 Calculus, Probability, and Statistics for Computer Science or MATH2006 Calculus, Probability, and Statistics for Science or MATH2206 Probability and Statistics or COMP2865 Fundamental of Data Analysis and Management Prepared by: S. N. Chiu, J. Fan, H. Peng

Course Intended Learning Outcomes (CILOs):

Upon successful completion of this course, students should be able to:

No.Course Intended Learning Outcomes (CILOs)
1Identify the applications and limitations of various data analytical methods.
2Evaluate practical situations in different aspects and select appropriate data analytical methods.
3Manipulate R programming language to analyze data.
4Interpret the results from R programming language.
5Formulate solution for real-life problems of interest to them.

Teaching & Learning Activities (TLAs)

CILOTLAs will include the following:
1,2,3,4,5Lecture
Instructor will show simple real-life problems to motivate the statistical concepts and data analytical methods, followed by discussions of their implementation through R programming language. Students will then be required to consolidate the knowledge by further reading and through discussion within lectures.
1,2,3,4,5In-class activities and assignments
Instructor will give problems related to data science in simple real-life situations in lectures and assignments. In lectures the instructor will demonstrate how to formulate and solve the problems and discuss why a particular data analytical method is used. Students are required to contribute in the discussion by expressing their own opinions about what kind of methods can be applied and figure out how R programming language is useful in solving the problems.

Assessment:

1Project35%1,2,3,4,5Students are to work individually (or in small groups) to conduct real-life case studies to apply computational statistics methods to solve data-related problems.
2Homework25%1,2,3,4,5The students will work individually on short questions to showcase their understanding of the theory and practice component of the subject. The homework will be given online and there will be 5 online homework.They allow the instructor to keep track of how well the students master the knowledge covered during different stages of the course.
3Final Examination40%1,2,3,4,5The final exam is designed to assess how well students have learned the concepts and knowledge of the entire course. Students will be required to solve problems by explaining concepts/theories relating to computational statistics and data science. A large part of the exam will be based primarily on what is in the course material to check whether students can apply what they have learned in class. The rest will be used to assess students' ability to adapt what they have learned to new scenarios.

Course Intended Learning Outcomes and Weighting:

ContentCILO No.
I. Introduction to data science1
II. Introduction to the programming language R3,4
III. Data visualization using R1,2,3,4,5
IV. Computational statistics using R1,2,3,4,5
V. Data analytical methods using R1,2,3,4,5

References
1. Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R. Springer.
2. Trevor Hastie, Robert Tibshirani and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
3. EMC Education Services (2015). Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. John Wiley & Sons, Inc.
4. Robert Kabacoff (2015). R in Action: Data Analysis and Graphics with R. Second Edition. Manning Publications.

Course Contents in Outline:

Topics

IIntroduction to data science
ABasic concepts in data science
BBig data in real life
CExamples of big data analytics

IIIntroduction to the programming language R
AR as a language and an environment for statistical computing and graphics
BData handling and storage
CGraphics using R packages

IIIData visualization using R
BExploratory data analysis
CBig data visualization
DInfographics

IVComputational statistics using R
ASimulation
BResampling methods
CNonparametric methods
DBayesian inference
EThe EM algorithms
FLarge-scale inference

VData analytical methods using R
AHigh-dimensional clustering and heatmaps
BData complexity reduction
CLinear model selection and regularization
DMoving beyond linearity
ETree-based methods
FPredictive analytics in big data

Updated on: 2024-05-31 01:56:49

Approved by Faculty Board meeting on 18 May 2022.
Approved by Faculty Board meeting on 31 October 2023.