Statistics Workshop on
Data Dynamic Modeling, Computing and Inference

14 June 2017
Hong Kong Baptist University

Time: 9:30am -17:00pm
Venue: FSC1217, Fong Shu Chuen Building, Ho Sin Hang Campus

Feng, Yang, Columbia University, USA
Feng, Zhenghui, Xiamen University, China
Tong, Xin, University of Southern California, USA
Wang, Cheng, Shanghai Jiaotong University, China
Wang, Weining, City University of London, British
Zhang, Kun, Carnegie Mellon Univesity, USA
Zhao, Jingxin, Hong Kong Baptist University, Hong Kong
Zhou, Min, Hong Kong Baptist University, Hong Kong


Feng, Yang

Abstract: One of the most fundamental problems in network study is community detection. The stochastic block model (SBM) is one widely used model for network data with different estimation methods developed with their corresponding community detection consistency results unveiled. However, the SBM is restricted by the strong assumption that all nodes in the same community are stochastically equivalent, which may not be suitable for practical applications. In this work, we extend SBM to incorporate covariate information and provide theoretical support on the estimators.

Wang, Weining

Abstract: The complex tail dependency structure in a dynamic network with a large number of nodes is an important object to study. Here we propose a network quantile autoregression model (NQAR), which characterizes the dynamic quantile behavior. Our NQAR model consists a system of equations, of which we relate a response to its connected nodes and node specific characteristics in a quantile autoregression process. We introduce the estimation of NQAR and the asymptotic properties of the estimators with assumptions on the adjacency matrix. Moreover, innovative tail-event driven impulse functions are defined. Finally, we demonstrate the usage of our model by investigating the financial contagions in the Chinese stock market accounting for shared ownership of companies.


Wang, Cheng

Abstract: In the context of sufficient dimension reduction (SDR), sliced inverse regression (SIR) is the first and perhaps one of the most popular tools to reduce the covariate dimension for high dimensional nonlinear regressions. Despite the fact that the performance of SIR is very insensitive to the number of slices when the covariate is low or moderate dimensional, our empirical studies indicate that, the performance of SIR relies heavily upon the number of slices when the covariate is high or ultrahigh dimensional. How to select the optimal number of slices for SIR is still a longstanding problem in the SDR literature, which is a crucial issue for SIR to be effective in high and ultrahigh dimensional regressions. In this paper, we work with an improved version of SIR, the cumulative slicing estimation (CUME) method, which does not require selecting the optimal number of slices. We provide a general framework to analyze the phase transition phenomenon for the CUME method. We show that, without sparsity assumption, CUME is consistent if and only if $p/n\to 0$, where p stands for the covariate dimension and n stands for the sample size. If we make certain sparsity assumptions, then the thresholding estimate for the CUME method is consistent as long as $\log(p)/n\to0$. We demonstrate the superior performance of our proposals through extensive numerical experiments.

Zhou, Min

Abstract: A simple sure screening procedure is proposed to detect significant interaction between predict variables and the response variable in the high or ultra-high dimensional generalized linear regression models. The sure screening method is a simple, but powerful tool to reduce the ultra high dimensional models to relative large models. We investigate the sure screening properties of our proposal method from theoretical insight and numerical studies. Furthermore, we suggest an efficient boosting algorithm so that we can fully screen all of interactions when the dimension of the data is relative large. The simulation results and real data analysis demonstrate that the proposed procedure performs not bad than other competing screening procedures.


Tong, Xin

Abstract: In genomic research, statistical measures of associations serve as important tools for screening pairwise variables (e.g. genes) that exhibit specific relationships among thousands of variable pairs. Examples of classic association measure include Pearson correlation, Spearman correlation, and maximal correlation, each of which can idenfity linear relationships, monotone relationships, and functional relationships respectively, in increasing order of generalizaiton. While these measures have demonstrated great power in screening pairwise variables in many research settings, there remain some sparse non-functional relationships (i.e., a mixture of a small number of functional relationships) that may also be of interest in some settings. In this talk, I will present an ongoing work on the development of a new statistcal measure for indentifying certain types of sparse non-functional relationships between pairwise variables. The new measure is based on a generalized definition of conditional expection and can be regarded as an extension of the classic coefficient of determination. We propose an estimator of this new measure under a combination of local regression and clustering frameworks. Consistency of this estimator is established. Simulation and real data studies demonstrate the effectiveness of this new measure in identifying different types of sparse non-functional relationships.

Zhang, Kun

Abstract: Can we find the causal direction between two variables? How can we make optimal predictions in the presence of distribution shift? We are often faced with such causal modeling or prediction problems. Recently, with the rapid accumulation of huge volumes of data, both causal discovery, i.e., learning causal information from purely observational data, and machine learning are seeing exciting opportunities as well as great challenges. This talk will be focused on recent advances in causal discovery and how causal information facilitates understanding and solving certain problems of learning from heterogeneous data. In particular, I will talk about conditional independence-based and functional causal model-based approaches to causal discovery, focusing on their underlying assumptions, algorithms, and applications. Practical issues in causal discovery, including selection bias and nonstationarity or heterogeneity of the data, will also be addressed. Finally, I will discuss why and how underlying causal knowledge helps in learning from heterogeneous data when the i.i.d. assumption is dropped, with transfer learning? as a particular example.


Feng, Zhenghui

Abstract: For multivariate nonparametric regression models, existing variable selection methods with penalization require high-dimensional nonparametric approximations in objective functions. When the dimension is high, none of methods with penalization in the literature are readily available. Also, ranking and screening approaches cannot have selection consistency when iterative algorithms cannot be used due to inefficient nonparametric approximation. In this paper, a novel and easily implemented approach is proposed to make existing methods feasible for selection with no need of nonparametric approximation. Selection consistency can be achieved. As an application to additive regression models, we then suggest a two-stage procedure that separates selection and estimation steps. An adaptive estimation to the smoothness of underlying components can be constructed such that the consistency can be even at parametric rate if the underlying model is really parametric. Simulations are carried out to examine the performance of our method, and a real data example is analyzed for illustration.

Zhao, Jingxin

Abstract: As a more exible version of classical linear model, the varying coecient model is widely applied in many areas. It explains well when the regression coefficients do not stay constant. In this paper, we introduce a local average method to estimate the functional coefficients in the varying coefficient model. What's more, we extend this local average method to the semi-varying coecient model, which is consists of a linear part and a varying coefficient part. The procedures of the estimations are developed, and their statistical properties are investigated. Plenty of simulations are conducted to study the performance of our method.