The 2018 HKBU-ISM
Joint Workshop for Mathematical Data Science

9 March 2018
Hong Kong Baptist University

FSC1217, Fong Shu Chuen Building, Ho Sin Hang Campus,
Hong Kong Baptist University
Institute of Statistical MathematicsHong Kong Baptist University
Daisuke MurakamiMing-Yen Cheng
Mirai Tanaka Charles K. Chui
Stephen Wu Weiyang Ding
Jun Fan
Lizhi Liao
09:30-09:40 Opening Talk by Professor Michael Ng
09:40-09:50 Opening Talk by Professor Tomoyuki Higuchi
09:50-10:30 Charles K. Chui
Super-resolution approach to mathematics of big data

Abstracts: Big Data have been around since Big Bang and the beginning of life, but not yet explored till recently. In addition, data are continually being generated in rapidly increasing volumes and complexity, everywhere, and by just about everything around us. The understanding of Big Data is indeed a most challenging endeavor to the communities of mathematicians and other scientists. The recent exciting advancement of green fluorescence protein in light microscopy, with the capability of viewing well below the hundredth nanometer scale, allowing us to study the molecular activities in human cells, is truly a profound breakthrough in super-resolution imaging. In this talk, we will describe the background and development of this fascinating subject and present two current mathematical approaches to super-resolution and beyond.

10:30-11:10 Mirai Tanaka
DC algorithm for convex constrained nonconvex regularized sparse optimization problem

Abstracts: We propose a DC (difference of convex functions) algorithm for solving a convex constrained sparse optimization problem. Our problem contains a nonconvex regularizer in the objective function expressed as the difference of the l1-norm and a convex function to sparsify solutions. Our proposed algorithm efficiently works when the resulting convex subproblem, constrained proximal point with respect to the l1 norm, is efficiently solved at each iteration exploiting the structure of the constraint. We show such examples arising from machine learning and operations research applications. Numerical results demonstrate the efficiency of our proposed algorithm in comparison with existing methods.

11:10-11:20 Break
11:20-12:00 Ming-Yen Cheng
A simple and adaptive two-sample test in high dimensions

Abstracts: High-dimensional data are commonly encountered nowadays. Testing the equality of two means is a fundamental problem. The conventional Hotelling's test performs poorly or becomes inapplicable in high dimensions. Several modifications have been proposed to address this challenging issue and shown to perform well. However, many of them use normal approximation to the null distributions, thus they require strong regularity assumptions on the underlying covariance structure. We study this issue and propose an L2-norm based test that works under much milder conditions and even when there are fewer observations than the dimension. In particular, we employ the Welch- Satterthwaite approximation and ratio-consistently estimators for the parameters in the approximation distribution. While many existing tests are not, our test is adaptive to singularity or near singularity of the unknown covariance structure, which is commonly seen in high dimensions and has great impact on the shape of the null distribution. Simulation studies and real data applications show that our test has a much better size controlling than some existing tests, while the powers are comparable their sizes are comparable.

12:00-12:40 Stephen Wu
C. elegans neural network analysis using multi-domain clustering

Abstracts: Whole-brain imaging of C. elegans allows neuroscientist to access the full neural network activity of a single worm under different stimulations. However, the noisy nature of the images makes it difficult to extract meaningful activity patterns. A typical statistical solution is to increase the number of worm samples in order to suppress the influence from the noise. In this presentation, we formulate the neural network analysis of multiple worms as a multi-domain clustering problem, where we construct an undirected graph for each worm to represent the correlation of the neural activities between neurons. The robustness of our multi-domain clustering method leads to interesting biological discoveries that may guide the future experiment of the C. elegans research.

12:40-14:30 Lunch (and Tour)
14:30-15:10 Lizhi Liao
Interior point continuous trajectories for optimization: motivations, convergence, and computation

Abstracts: In this talk, we will overview the interior point continuous trajectory approach for convex optimization. Our discussion will start with the motivations and fundamental ideas behind the interior point continuous trajectory approach. With many existing results on interior point methods for convex optimization, some convergence results associated with the interior point continuous trajectory approach will be presented. Finally, some solutions schemes and computational issues will be addressed.

15:10-15:50 Daisuke Murakami
Spatially varying modeling for large datasets: a mixed effects approach

Abstracts: This study develops a spatially varying coefficient (SVC) model by extending a Moran’s eigenvector-based mixed effects model. The developed model has the following properties: its SVCs are interpretable in terms of the Moran coefficient, which is a spatial dependence diagnostic statistics; each of the SVCs can have a different degree of spatial smoothness; and it yields an approximation of a Bayesian SVC model. Moreover, while computational burden is a major difficulty in SVC modeling, our approach estimates its coefficients within a reasonable time scale even for large samples; the computational cost or our approach is independent of the sample size after a small preprocessing. Results of a Monte Carlo simulation reveal that our approach outperforms a conventional SVC models, including geographically weighted regression (GWR) models and their variants, in terms of the accuracy of the SVC estimates and computational time. We empirically apply our model to a land price analysis of flood hazards in Japan.

15:50-16:00 Break
16:00-16:40 Jun Fan
Robustness and kernel based modal regression

Abstracts: Modal regression estimates the condition mode of a response variable given a set of covariates, serving as an alternative to mean regression and quantile regression under heavy-tailed noise. In this talk, we will study kernel based modal regression within the framework of statistical learning, and discuss its relation to kernel density estimation and the role of the involved scaling parameter in the algorithms.

16:40-17:20 Weiyang Ding
Fast computation of stationary joint probability distribution of sparse Markov chains

Abstracts: In this talk, we study a fast algorithm for finding stationary joint probability distributions of sparse Markov chains or multilinear PageRank vectors which arise from data mining applications. In these applications, the main computational problem is to calculate and store solutions of many unknowns in joint probability distributions of sparse Markov chains. Our idea is to approximate large-scale solutions of such sparse Markov chains by two components: the sparsity component and the rank-one component. Here the non-zero locations in the sparsity component refer to important associations in the joint probability distribution and the rank-one component refers to a background value of the solution. We propose to determine solutions by formulating and solving sparse and rank-one optimization problems via closed form solutions. The convergence of the truncated power method is established. Numerical examples of multilinear PageRank vector calculation and second-order web-linkage analysis are presented to show the efficiency of the proposed method. It is shown that both computation and storage are significantly reduced by comparing with the traditional power method.

17:20-17:30 Closing (and Discussion)
18:30-20:30 Dinner