Workshop in Semi/Nonparametrically Statistical Learning

文:西南财经大学统计学院 发布时间:2018-07-02 浏览次数:70

 Workshop in semi/nonparametrically statistical learning

Conference Schedule







Yuhong Yang
University of Minnesota

Treatment Allocations Based on Multi-Armed Bandit Strategies


Runze Li
The Pennsylvania State University

Testing of Significance for High-Dimensional Longitudinal Data


Have a BreakTaking a photo


Jianguo (Tony) Sun
University of Missouri

Joint Analysis of Interval-censored failure Time Data and Panel Count Data


Lunch( 学校柳园食府二楼大厅)


Heping Zhang
Yale University

Modeling Hybrid Dependent Responses


Wei Pan
University of Minnesota

An empirical comparison of deep learning and other methods for prediction of protein subcellular localization with microscopy images


              Have a Break


Fang Liu
The University of Notre Dame

Noise Injection Regularization in Large Models with Applications to Neural Networks and Graphical Models 


Gang Li
University of California at Los Angeles

Prediction Accuracy Measures for a Nonlinear Model and for Right-Censored Time-to-Event Data



Location: Conference Room 408, Hongyuan Building of SWUFE(Southwestern University of Finance and Economics)






时  间:2018年7月6日(星期五)9:00-15:20

地  点:弘远楼408会议室


主题一: Treatment Allocations Based on Multi-Armed Bandit Strategies




Yuhong Yang received his Ph.D from Yale in statistics in 1996. He then joined the Department of Statistics at Iowa State University and moved to the University of Minnesota in 2004. He has been a full professor there since 2007. His research interests include model selection, multi-armed bandit problems, forecasting, high-dimensional data analysis, and machine learning. He has published in journals in several fields, including Annals of Statistics, IEEE Transaction on Information Theory, Journal of Econometrics, Journal of Approximation Theory, Journal of Machine Learning Research, and International Journal of Forecasting. He is a fellow of Institute of Mathematical Statistics.



In practice of medicine, multiple treatments are often available to treat individual patients. The task of identifying the best treatment for a specific patient is very challenging due to patient inhomogeneity. Multi-armed bandit with covariates provides a framework for designing effective treatment allocation rules in a way that integrates the learning from experimentation with maximizing the benefits to the patients along the process.

In this talk, we present new strategies to achieve asymptotically efficient or minimax optimal treatment allocations. Since many nonparametric and parametric methods in supervised learning may be applied to estimating the mean treatment outcome functions (in terms of the covariates) but guidance on how to choose among them is generally unavailable, we propose a model combining allocation strategy for adaptive performance and show its strong consistency. When the mean treatment outcome functions are smooth, rates of convergence can be studied to quantify the effectiveness of a treatment allocation rule in terms of the overall benefits the patients have received.  A multi-stage randomized allocation with arm elimination algorithm is proposed to combine the flexibility in treatment outcome function modeling and a theoretical guarantee of the overall treatment benefits. Numerical results are given to demonstrate the performance of the new strategies.

The talk is based on joint work with Wei Qian.


主题二: Testing of Significance for High-Dimensional Longitudinal Data




李润泽是宾州州立大学统计系冠名讲座教授。他的研究领域包括高维数据的variable selection and feature screening以及非参数模型和半参数模型的建模和统计推断。他在统计学应用方面也做了一系列的研究工作。他曾担任Annals of Statistics的副主编和主编。目前他担任JASA的副主编。他是IMS , ASA and AAAS 的fellows.



This paper concerns statistical inference for longitudinal data with ultrahigh dimensional covariates. We first study the problem of constructing confidence intervals and hypothesis tests for a low dimensional parameter of interest. The major challenge is how to construct an optimal test statistic in the presence of high dimensional nuisance parameters and the sophisticated dependence among measurements. To deal with the challenge, we propose a novel quadratic decorrelated inference function approach, which simultaneously removes the impact of nuisance parameters and incorporates the correlation to enhance the efficiency of the estimation procedure. We prove that the proposed estimator is asymptotically normal and attains the semiparametric information bound, based on which we can construct an optimal test statistic for the parameter of interest. We then study how to control the false discovery rate (FDR) when a vector of high-dimensional regression parameters is of interest. We prove that applying the Storey (2002)'s procedure to the proposed test statistics for each regression parameter controls FDR asymptotically in longitudinal data. We conduct simulation studies to assess the finite sample performance of the proposed procedures. Our simulation results imply that the newly proposed procedure can control both Type I error for testing a low dimensional parameter of interest and the FDR in the multiple testing problem. Finally, we apply the proposed procedure to a real data example.


主题三:Joint Analysis of Interval-censored failure Time Data and Panel Count Data




孙建国,现为密苏里大学统计系教授,1992年毕业于滑铁卢大学,并取得博士学位。其研究兴趣包括:生物统计学, 生存分析, 纵向数据分析, 化学计量学。他是数理统计研究所的fellow, 国际统计研究所成员,ASA的fellow,2018年获得MU研究生导师奖。



Interval-censored failure time data and panel count data are two types of incomplete data that commonly occur in event history studies and many methods have been developed for their analysis separately (Sun, 2006; Sun and Zhao, 2013). Sometimes one may be interested in or need to conduct their joint analysis such as in the clinical trials with composite endpoints, for which it does not seem to exist an established approach in the literature.  This talk will discuss this problem and present a sieve  maximum likelihood approach.  Some simulation results and an application will also be provided.


主题四:Modeling Hybrid Dependent Responses




张和平博士,耶鲁大学Susan Dwight Bliss生物统计学教授,统计与数据科学教授,儿童研究中心教授。他创建并主持耶鲁大学科学与统计协作中心。同时他也是香港大学荣誉教授,国家千人计划学者和长江讲座教授,泛华统计协会候任主席。他于1982年获得江西师范大学数学学士学位,1991年获得斯坦福大学统计学博士学位并兼修计算机科学。

他是期刊Statistics and Its Interface的创始主编。他目前担任美国统计协会杂志(JASA), 遗传流行病学和生殖与不育专题研究的编委。2019担任JASA (ACS)主编。

张教授曾入选哈佛大学公共卫生学院2008年度Myrto Lefkopoulou杰出学者并作2011年IMS Medallion报告,2011年Royan国际生殖健康研究奖的获得者,2013年获得美国生殖医学学会颁发的科学论文奖,2014年March of Dimes 早产最佳研究奖,2017年美国妇产科杂志优秀论文奖。

研究兴趣包括非参数方法,纵向数据,统计遗传学和生物信息学,临床试验,流行病学数据统计建模,脑成像分析,统计计算和行为科学的统计方法。他是Springer出版的“递归分区及其应用(Recursive Partitioning and Its Applications)”一书的作者,并在高影响力的统计、遗传、流行病学和精神病学期刊上发表了280多篇学术论文,其中包括Annals of Statistics, Annals of Applied Statistics, Biometrika, JASA, JRSSB, American Journal of Human Genetics, American Journal of Psychiatry, PNAS, Science, JAMA, 以及 the New England Journal of Medicine.



I will present a novel multivariate model for analyzing hybrid traits and identifying genetic factors for comorbid conditions. Comorbidity is a common phenomenon in mental health in which an individual suffers from multiple disorders simultaneously. For example, in the Study of Addiction: Genetics and Environment (SAGE), alcohol and nicotine addiction were recorded through multiple assessments that we refer to as hybrid traits. Statistical inference for studying the genetic basis of hybrid traits has not been well-developed. Recent rank-based methods have been utilized for conducting association analyses of hybrid traits but do not inform the strength or direction of effects. To overcome this limitation, a parametric modeling framework is imperative. Although such parametric frameworks have been proposed in theory, they are neither well-developed nor extensively used in practice due to their reliance on complicated likelihood functions that have high computational complexity. Many existing parametric frameworks tend to instead use pseudo-likelihoods to reduce computational burdens. Here, we develop a model fitting algorithm for the full likelihood. Our extensive simulation studies demonstrate that inference based on the full likelihood can control the type-I error rate, and gains power and improves the effect size estimation when compared with several existing methods for hybrid models. These advantages remain even if the distribution of the latent variables is misspecified. After analyzing the SAGE data, we identify three genetic variants (rs7672861, rs958331, rs879330) that are significantly associated with the comorbidity of alcohol and nicotine addiction at the chromosome-wide level. Moreover, our approach has greater power in this analysis than several existing methods for hybrid traits. Although the analysis of the SAGE data motivated us to develop the model, it can be broadly applied to analyze any hybrid responses.


主题五:An empirical comparison of deep learning and other methods for prediction of protein subcellular localization with microscopy images







We compare the performance of deep-learning method and more traditional machine learning methods to predict protein subcellular localization based on a large dataset of single cell microscopy images. Specifically, we show better performance of various VGG-type Convolutional Neural Networks (CNNs) and residual CNNs (ResNets) over random forests and gradient boosting. We also demonstrate the use of CNNs for transfer learning and feature extraction.


主题六:Noise Injection Regularization in Large Models with Applications to Neural Networks and Graphical Models

主讲人六:University of Notre Dame刘芳副教授



Prof Fang Liu is currently an Associate Professor and the Director of Graduate Studies in the Department of Applied and Computational Mathematics and Statistics at the University of Notre Dame. She obtained her Ph.D. degree in Biostatistics from University of Michigan, Ann Arbor in 2003 and worked as a Biostatistician at Merck Research Labs from 2003 to 2011. Prof Liu’s research interests include development of statistical methods for protecting data privacy, missing data analysis, Bayesian methods and modelling, statistical learning and regularization of complex models, and application of statistics to biological and social science data. Prof Liu has published 40+ peer-reviewed journal articles, and is the sole PI on two NSF grants on data privacy. She is also the lead biostatistician on several large multinational studies on malaria prevention.



The noise injection regularization technique (NIRT) is an approach to mitigate over-fitting in large models. In this talk, I will demonstrate the applications of the NIRT in two scenarios of learning large models: Neural Networks (NN) and Graphical Models (GM). For NNs, we develop a NIRT called whiteout that injects adaptive Gaussian noises during the training of NNs. We show that the optimization objective function associated with whiteout in generalized linear models has a closed-form penalty term that has connections with a wide range of regularizations and includes the bridge, lasso, ridge, and elastic net penalization as special cases; it can also be extended to offer regularizations similar to the adaptive lasso and group lasso.  For GMs, we develop an AdaPtive Noisy Data Augmentation regularization (PANDA) approach to promote sparsity in estimating individual graphical models and similarity among multiple graphs through training of generalized linear models. On the algorithmic level, PANDA can be implemented in a straightforward manner by iteratively solving for MLEs without constrained optimizations. For both the NN and PANDA approaches, we use simulated and real-life data to demonstrate their applications and show their superiority or comparability with existing methods.


主题七:A New Joint Screening Method for Right-Censored Time-to-Event Data with Ultrahigh Dimensional Covariates




Dr. Gang Li obtained his Ph.D degree in Statistics from Florida State University in 1992. He is Professor of Biostatistics and Biomathematics at University of California at Los Angeles (UCLA) and Director of UCLA’s Jonsson Comprehensive Cancer Center Biostatistics Shared Resource. He has published extensively with over 110 peer-reviewed articles in statistical research and applied work in the areas of survival analysis, longitudinal data analysis, high dimensional data analysis, clinical trials, and evaluation of biomarkers. He has co-authored/co-edited three statistical research monographs including a recent CRC Chapman & Hall book entitled "Joint Modeling of Longitudinal and Time-to-Event Data".  Dr. Li is Elected Fellow of the Institute of Mathematics, Elected Fellow of the American Statistical Association, Elected Member of the International Statistics Institute, and Elected Fellow of the Royal Statistical Society. He has served on the editorial board for multiple statistics journals. Dr. Li has been active in collaborating with researchers in basic science, translational research, and clinical trials, and has been a statistics principal investigator for multiple large cancer studies.



In an ultrahigh dimensional setting with a huge number of covariates, variable screening is useful for dimension reduction before a more refined variable selection and parameter estimation method is applied. This paper proposes a new sure joint screening procedure for right-censored time-to-event data based on a sparsity-restricted semiparametric accelerated failure time model.  Our method, referred to as Buckley-James assisted sure screening (BJASS),   consists of an initial screening step using a sparsity-restricted least-squares estimate based on a synthetic time variable and a refinement screening step using a sparsity-restricted least-squares estimate with the Buckley-James imputed event times. The refinement step may be repeated several times to obtain more stable results. We show that with any fixed number of refinement steps, the BJASS procedure retains all important variables with probability tending to 1. Simulation results are presented to illustrate its performance in comparison with some marginal screening methods.  A real data example is provided using a diffuse large-B-cell lymphoma (DLBCL) data. We have implemented the BJASS method using Matlab and R, which are available to readers upon request.

(This talk is based on joint work with Yi Liu and Xiaolin Chen)