文字
缩放
增大字体
减小字体
【统计研究中心系列讲座第3期】:大数据分析方法新进展

([西财新闻] 发布于 :2017-04-14 )

光华讲坛——社会名流与企业家论坛第4444期


主题:大数据分析方法新进展

主持人:林华珍教授

时  间2017年04月18日(星期二)下午1:00-5:00

  点:通博楼B212学术会议室

主办单位:统计研究中心  统计学院   科研处


主讲人一:英国约克大学 张文扬教授(下午1:00-2:00)

主讲人简介

张文扬教授主要从事大数据分析,金融数据分析,高维数据分析,非参数建模、时间序列分析、空间数据分析,多层次建模,生存分析,结构方程模型等方向的研究。曾先后在英国伦敦政治经济学院、英国Kent大学、英国Bath大学、英国York大学任教,现为英国York大学统计学首席教授。他曾是英国皇家统计学会科研委员会委员(历史上仅有三位华人担任该委员会委员),目前是统计学四大国际顶尖期刊之一Journal of the American Statistical Association的副主编。在《Journal of the American Statistical Association》、《The Annals of Statistics》、《 Journal of the Royal Statistical Society, Series B 》等国际一流期刊发表论文数十篇。

主题:Factor Models for Asset Returns Based on Transformed Factors

摘要:

The Fama-French three factor models are commonly used in the description of asset returns in finance.  Statistically speaking, the Fama-French three factor models imply that the return of an asset can be accounted for directly by the Fama-French three factors, i.e. market, size and value factor, through a linear function.  A natural question is:  would some kind of transformed Fama-French three factors work better than the three factors?  If so, what kind of transformation should be imposed on each factor in order to make the

transformed three factors better account for asset returns?  In this talk, I am going to address these questions through nonparametric modelling.  I will show a data driven approach to construct the transformation for each factor concerned and a generalised maximum likelihood ratio based hypothesis test to test whether transformations on the Fama-French three factors are needed for a given data set.  I will also show some asymptotic properties to justify the proposed methods.  Intensive simulation study results will be presented to show how the proposed methods work when sample size is finite.  Finally, I will apply the proposed methods to a real data set and show some interesting findings.

 

主讲人二:

百度研究院大数据实验室资深数据科学家  吴海山(下午2:00-3:00)

主讲人简介:

吴海山,百度研究院大数据实验室资深数据科学家,时空大数据研究负责人。2011年从复旦大学博士学位,毕业后加入IBM中国研究院。2012年底加入美国普林斯顿大学进行博士后研究。2014年9月加入百度研究院大数据实验室,担任百度时空大数据研究负责人。先后负责了百度经济测量、百度人群预警系统、百度商业地产选址系统等多个课题。研究成果获得了国内外知名媒体的广泛报道(如the Wall Street Journal,Bloomberg, the economist,  Forbes, CNBC, CNN Money, MIT Technology Review, New Scientist, NPR, Washington Post, China Daily等),研发的百度经济指数每月5号会在彭博终端上更新。其中基于时空数据挖掘的中国鬼城研究被MIT Technology Review评为2015年度最佳之一(Best of 2015)。

时间:4月18日

题目:Mobimetrics:基于百度时空大数据的经济测量和投资决策应用

摘要:

我们现在几乎无时无刻不在感受着基于位置的服务带来的便利:通过手机上的各种APP,我们可以进行位置导航、确定周边的额路况,查找周边的酒店、订购附近的外卖等等。目前百度每天响应的定位请求已高达数百亿次,为数亿用户提供移动端的位置服务,也产生了海量的时空数据 。相对比其他数据而言,时空数据更为直接的反应了用户的经济活动,从而测量复杂的经济提供了全新的视角。这次报告将会介绍百度研究院大数据实验室进行的Mobimetrics研究,即如何使用用户移动数据(mobility data)来测量(measure)中国的宏观和微观的经济活动,从而为经济学家和金融投资者提供决策依据。我们将会回答以下的问题,如:我们是如何挖掘这些海量时空数据的?百度经济指数是如何构建的?不同行业的就业和线下消费趋势如何?如何根据时空数据来预测和分析公司的运营?我们如何通过时空数据预测模型来检测电影票房作弊的?城市的劳动力的流动趋势能否预测中国鬼城的将来?

 

 

 

主讲人三:乔治华盛顿大学  赖颖蕾教授(下午3:00--4:00)

主讲人简介:

Yinglei Lai, Ph.D. Dr. Lai is Professor of Statistics and the Deputy Chair of the Department of Statistics at The George Washington University.  His research interest is to develop statistical and computational methods in bioinformatics, computational biology and biostatistics.  He received his B.S. in Information & Computation Sciences and Business Administration from the University of Science and Technology of China in 1999.  Dr. Lai received his Ph.D. in Applied Mathematics (computational Biology) from the University of Southern California in 2003.  After his postdoctoral training at Yale University School of Medicine, he joined as a faculty member in the Department of Statistics at the George Washington University in 2004.

 

主题:Exploration of Concordant Changes among Multiple Data Sets
摘要:

In practice, multiple data sets with a large number of variables/features can be collected for the same or similar study purposes.  In some situations, each data set may be collected under certain conditions.  Then, it can be difficult to combine multiple data sets and different data sets need to be analyzed separately.  One interesting data exploration for multiple sets is to identify variables/features showing statistically significant changes that are concordant among multiple data sets.  For example, we observe a variable/feature showing a clearly positive change from one group to the other group, and we observe this for the same variable (and the same groups) among multiple data sets.  Furthermore, variable/feature sets can be defined and data exploration can be performed at the variable/feature set level.  With a given collection (a large number) of variable/feature sets, it is also interesting to identify variable/feature sets showing statistically significant coordinate changes that are concordant among multiple data sets.  For example, in a given variable/feature set, we observe many variables/features showing clearly positive changes from one group to the other group, and we observe this for almost the same variables/features (and the same groups) among multiple data sets.  We have developed a mixture model based framework for exploring these concordant changes.  The statistical significance can be evaluated based on the mixture model based false discovery rate (FDR).  Furthermore, as the number of data sets increases, it is necessary to reduce the number of parameters in the model.  Motivated by the well-known generalized estimating equations (GEEs) for longitudinal data analysis, we have also developed the related model reduction approaches so that efficient analysis results can be achieved.  The advantage and usefulness of our methods are illustrated on the large-scale experimental gene expression data sets for cancer studies.


 

主讲人四:东北师范大学数学与统计学院  郑术蓉教授(下午4:00-5:00)

主讲人简介:

郑术蓉,现为东北师范大学数学与统计学院 教授,主要从事高维数据分析、大维随机矩阵、不对称系数的研究。曾完成国家自然科学基金面上项目、青年项目各1项,教育部“新世纪优秀人才支持计划”1项,现正主持国家自然科学基金优秀青年基金1项。在Journal of American Statistical Association、Annals of Statistics、Biometrika、Bioinformatics等期刊上发表和被接受发表的学术论文28篇,其中SCI期刊检索文章26篇。出版图书章节两章。在高等教育出版社以及Cambridge University Press合作出版中、英文著作各一部。博士论文《线性不等式约束下的EM算法》曾分别获得“2006年吉林省优秀博士论文”和“2006年全国优秀博士论文提名”。

题目: Global Testing  for High-Dimensional Correlation Matrices

摘要: Testing the correlation matrices is important in multivariate statistical analysis. The aim of this paper is to develop a set of global test statistics to test correlation structures for the one-, two-, and multiple sample testing problems under the high-dimensional setting. Our global test statistics  are designed to deal with both the dense and sparse alternatives. Specifically, they are the sum of two terms, including a term for the dense alternative and the other for the sparse alternative. Simulations are done to evaluate the performance of the proposed global tests. As an illustration, the ROI volumes and demographic information baseline data from

NIH Alzheimer's Disease Neuroimaging Initiative (ADNI) study are analyzed by the proposed tests.

 

☆该新闻已被浏览: 次★

打印本文】 【关闭窗口