Hong Kong Baptist University
Faculty of Science
Department of Mathematics
  
Title (Units):MATH 3626 Computational Statistics for Data Science (3,3,0)
  
Course Aims:The course introduces data science from a practice-oriented viewpoint. Students will learn statistical concepts, data analytical methods, and their implementation through R programming language to deal with various facets of data science practice, including data visualization, exploratory data analysis, descriptive modeling and predictive modeling. To make the learning contextual, real datasets from a variety of disciplines will be used.
  
Prerequisite:MATH2005 Calculus, Probability, and Statistics for Computer Science or MATH2006 Calculus, Probability, and Statistics for Science or MATH2206 Probability and Statistics or COMP2865 Fundamental of Data Analysis and Management
  
Prepared by: S. N. Chiu, J. Fan, H. Peng

Course Intended Learning Outcomes (CILOs):

Upon successful completion of this course, students should be able to:

No.Course Intended Learning Outcomes (CILOs)
1Identify the applications and limitations of various data analytical methods.
2Evaluate practical situations in different aspects and select appropriate data analytical methods.
3Manipulate R programming language to analyze data.
4Interpret the results from R programming language.
5Formulate solution for real-life problems of interest to them.

Teaching & Learning Activities (TLAs)

CILOTLAs will include the following:
1,2,3,4,5Lecture
Instructor will show simple real-life problems to motivate the statistical concepts and data analytical methods, followed by discussions of their implementation through R programming language. Students will then be required to consolidate the knowledge by further reading and through discussion within lectures.
1,2,3,4,5In-class activities and assignments
Instructor will give problems related to data science in simple real-life situations in lectures and assignments. In lectures the instructor will demonstrate how to formulate and solve the problems and discuss why a particular data analytical method is used. Students are required to contribute in the discussion by expressing their own opinions about what kind of methods can be applied and figure out how R programming language is useful in solving the problems.

Assessment:

No.Assessment MethodsWeightingCILO AddressRemarks
1Project35%1,2,3,4,5Students are to work individually (or in small groups) to conduct real-life case studies to apply computational statistics methods to solve data-related problems.
2Homework25%1,2,3,4,5The students will work individually on short questions to showcase their understanding of the theory and practice component of the subject. The homework will be given online and there will be 5 online homework.They allow the instructor to keep track of how well the students master the knowledge covered during different stages of the course.
3Final Examination40%1,2,3,4,5The final exam is designed to assess how well students have learned the concepts and knowledge of the entire course. Students will be required to solve problems by explaining concepts/theories relating to computational statistics and data science. A large part of the exam will be based primarily on what is in the course material to check whether students can apply what they have learned in class. The rest will be used to assess students' ability to adapt what they have learned to new scenarios.

Course Intended Learning Outcomes and Weighting:

ContentCILO No.
I. Introduction to data science1
II. Introduction to the programming language R3,4
III. Data visualization using R1,2,3,4,5
IV. Computational statistics using R1,2,3,4,5
V. Data analytical methods using R1,2,3,4,5

 

References
  1. Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R. Springer.
  2. Trevor Hastie, Robert Tibshirani and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
  3. EMC Education Services (2015). Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. John Wiley & Sons, Inc.
  4. Robert Kabacoff (2015). R in Action: Data Analysis and Graphics with R. Second Edition. Manning Publications.

 

Course Contents in Outline:

Topics 
    
IIntroduction to data science 
 ABasic concepts in data science  
 BBig data in real life  
 CExamples of big data analytics 
    
IIIntroduction to the programming language R 
 AR as a language and an environment for statistical computing and graphics  
 BData handling and storage 
 CGraphics using R packages 
    
IIIData visualization using R 
 AData collection and data manipulation 
 BExploratory data analysis 
 CBig data visualization 
 DInfographics 
    
IVComputational statistics using R 
 ASimulation 
 BResampling methods 
 CNonparametric methods  
 DBayesian inference 
 EThe EM algorithms 
 FLarge-scale inference 
    
VData analytical methods using R 
 AHigh-dimensional clustering and heatmaps 
 BData complexity reduction 
 CLinear model selection and regularization 
 DMoving beyond linearity 
 ETree-based methods 
 FPredictive analytics in big data  

Updated on: 2024-05-31 01:56:49

Approved by Faculty Board meeting on 18 May 2022.
Approved by Faculty Board meeting on 31 October 2023.