经济与管理学部

个人资料

部门：统计学院
性别：男
专业技术职务：教授
毕业院校：美国威斯康星大学
学位：博士
学历：

联系电话：
电子邮箱： ffang@sfs.ecnu.edu.cn
办公地址：
通讯地址：
邮编：
传真：

工作经历

2007年8月-2009年12月，通用电气金融集团，高级分析师

2010年1月-2013年7月，上海浦东发展银行，战略发展部，战略分析师

2013年8月-2019年12月，华东师范大学，统计学院，副教授

2020年1月至今，华东师范大学，统计学院，教授

教育经历

1998年9月-2002年7月，北京大学，数学系，本科

2002年8月-2007年7月，University of Wisconsin - Madison，统计系，博士

个人简介

方方，华东师范大学统计学院教授，博士生导师。入选上海市东方英才计划拔尖项目。曾任统计与数据科学前沿理论及应用教育部重点实验室副主任。本科和博士先后毕业于北京大学数学系和美国威斯康星大学统计系。在2013年加入华东师范大学之前，曾在通用电气金融集团和上海浦东发展银行任职多年。主要研究方向为缺失数据、模型平均、碎片化数据分析、KS学习。在包括 AOS/JOE/Biometrika/JBES 在内的国际一流统计学和计量经济学期刊上发表论文30余篇。先后主持和参与国家和省部级项目13项。目前主持国家自然科学基金重点项目“大数据背景下不完全数据的统计分析方法、理论和应用”。授权专利6项。曾获上海市自然科学二等奖。全国工业统计学教学研究会常务理事、数字经济与区块链技术分会副理事长，IMS China委员会委员，SCI期刊 Journal of Nonparametric Statistics 副主编。在应用领域长期关注信用评分和民航QAR大数据分析。出版统计科普小说《统计王国奇遇记》和专著《多源数据的统计分析与建模》。

社会兼职

IMS China委员会委员 2023.07-2025.06

全国工业统计学教学研究会常务理事 2022-2026

中国现场统计研究会机器学习分会常务理事 2021-2025

全国工业统计学教学研究会数字经济与区块链技术分会副理事长 2020-2024

中国优选法统筹法与经济数学研究会数据科学分会监事 2019-2023

全国工业统计学教学研究会中国青年统计学家协会理事 2019-2023

中国现场统计研究会经济与金融统计分会理事 2017-2021

中国现场统计研究会高维数据统计分会理事 2015-2019

Journal of Nonparametric Statistics Associate Editor 2016-2018,2019-2021,2022-2024

研究方向

missing data, model averaging, fragmentary data analysis, KS learning

招生与培养

开授课程

本科：深度学习入门，机器学习，概率论与数理统计，大学统计，数据时代的理性思维

专业硕士：信用评分模型（实用商业数据分析，实务案例分析），高等统计学

学术硕士/博士：线性模型理论，统计思想与统计思维

教学项目：

[1] 统计学习理论、方法及应用课程建设团队，立项单位：华东师范大学经济与管理学部，2018年10月-2021年10月。

[2] 2020 年通识教育经典阅读课程和核心课程建设项目——数据时代的理性思维，立项单位：华东师范大学，2020年5月-2021年5月。

[3] 华东师范大学学术学位研究生课程建设项目：人类思维与学科史论系列课程——统计思想与统计思维，立项单位：华东师范大学，2021年。

教学奖项：

[1] 以数据素养为核心的本科统计类课程体系构建与实践，上海市高等教育优秀教学成果二等奖，6/10。

[2] 以数据素养为核心的本科统计类课程体系构建与实践，华东师范大学教学成果一等奖, 6/10。

[3] 数智管理人才培养的探索与实践，华东师范大学教学成果二等奖,7/7。

[4] 基于生成对抗网络的碎片化数据填充与预测，华东师范大学2022年优秀硕士学位论文，指导教师。

[5] 基于代理损失函数的KS优化和变量选择，华东师范大学2023年优秀硕士学位论文，指导教师。

[6] 广告点击率多重自适应校准模型研究，华东师范大学2023年优秀硕士学位论文，指导教师。

科研项目

主持项目：

[6] 2023年上海市东方英才计划拔尖项目，2024年3月-2027年3月。

[5] 国家自然科学基金重点项目，大数据背景下不完全数据的统计分析方法、理论和应用，项目批准号：72331005。2024年1月-2028年12月。

[4] 国家自然科学基金面上项目，“碎片化数据”的模型平均方法和理论研究，项目批准号：12071143。2021年1月-2024年12月。

[3] 国家自然科学基金重点项目，多源异构数据的融合、特征提取与分析方法，项目批准号：11831008。2019年1月-2023年12月。子课题负责人。

[2] 国家自然科学基金青年基金，不可忽略缺失数据的工具变量方法研究，项目批准号：11601156。2017年1月-2019年12月。

[1] 上海市自然科学基金，广义线性模型中非忽略性缺失数据处理的研究，项目编号：15ZR1410300。2015年1月-2017年12月。

参与项目：

[7] 上海市科委科技项目，非欧结构大数据的统计学习方法理论体系构建，项目编号TQ20220105。2022年12月-2027年11月。

[6] 上海市“科技创新行动计划”基础研究领域应用数据重点项目，大数据背景下航空安全管理中的关键数理问题研究，项目编号：22JC1400800。高级研究人员。2022年7月-2025年6月。

[5] 国家重点研发计划重点专项“油气管网安全运维的大数据分析理论、算法及应用”（2021YFA1000100）课题“基于油气管网运行机理的数据挖掘与机器学习理论及方法”。子课题负责人。2021年12月-2026年11月。

[4] 国家重点研发计划重点专项“ 智慧城市交通系统若干关键技术的数学理论与算法”（2021YFA1000300）课题“复杂交通流态势演化机理与优化控制研究”。子课题负责人。2021年12月-2026年11月。

[3] 国家自然科学基金面上项目，函数型数据分析的若干问题研究，项目批准号：11771146。2018年1月-2021年12月。

[2] 上海市科委科技项目，高维大数据内在相关性的统计建模方法与理论研究，项目编号16QA1401700。2016年4月-2019年3月。

[1] 上海市科委科技项目，统计前沿理论方法及其应用，项目编号14XD1401600。2014年7月-2017年6月。

横向课题（主持）：

[2] 基于零售百货行业大数据的“品牌土壤”模型开发。上海弦石信息科技有限公司。2017年9月-2018年3月。

[1] 基于车联网大数据的UBI模型开发。杭州好好开车科技有限公司。2016年6月-2017年12月。

学术成果

在审和在改的

[1] Geometric model averaging.

[2] High-dimensional factor augmented quantile regression: estimation, inference and simultaneous testing.

[3] Optimal linear combination of biomarkers by weighted Youden index maximization.

[4] Weighted stochastic gradient descent for linear models with large-scale fragmentary data.

[5] Integrated generalizd moment method with adaptive moment selection from external heterogeneous populations.

[6] Kolmogorov-Smirnov learning by neuron networks with a nonconvex surrogate loss.

会议论文

[1] Fang, Fang，Zhang, Riquan, and Zhao, Xinbin*. An aggregated evaluation and multi-dimensional comparison method of flight safety based on QAR data. IEEE - ICCASIT 2020.

期刊论文

[35] Zhong, Yan*#, Liu, Tong#, Fang, Fang#, Ge, Jia, Xu, Bohao, Zhao, Xinbin. Hard landing pattern recognition and precaution with QAR data by functional data analysis. IEEE Transactions on Aerospace and Electronic Systems, 2024, 60(4), 5101-5113.

[34] Lin, Xiefang and Fang, Fang*. Variable selection of Kolmogorov-Smirnov maximization with a penalized surrogate loss. Computational Statistics & Data Analysis, 2024, 195, Article 107944.

[33] Fang, Fang* and Bao, Shenliao. FragmGAN: Generative adversarial nets for fragmentary data imputation and prediction. Statistical Theory and Related Fields, 2024, 8(1), 15-28. An invited paper for special issue of causal inference, missing data and data integration.

[32] Yuan, Chaoxia, Fang, Fang*, and Li, Jialiang. Model averaging for generalized linear models in diverging model spaces with effective model size. Econometric Reviews, 2024, 43(1), 71-96.

[31] Fang, Fang*, Yuan, Chaoxia, and Tian, Wenling. An asymptotic theory for least squares model averaging with nested models. Econometric Theory, 2023, 39(2), 412-441.

[30] Yuan, Chaoxia*, Wu, Yang, and Fang Fang. Model averaging for generalized linear models in fragmentary data prediction. Statistical Theory and Related Fields, 2022, 6(4), 344-352.

[29] Fang Fang* , Yang, Qiwei, and Tian, Wenling. Cross-validation for selecting the penalty factor in least squares model averaging. Economics Letters, 2022, 217, Article 110683.

[28] Fang, Fang*, Li, Jialiang, and Xia, Xiaochao. Semiparametric model averaging prediction for dichotomous response. Journal of Econometrics, 2022, 229, 219-245.

[27] Yuan, Chaoxia, Fang, Fang, and Lyu Ni*. Mallows model averaging with effective model size in fragmentary data prediction. Computational Statistics & Data Analysis, 2022, 173, Article 107497.

[26] Fang, Fang, Zhao, jiwei, Ahmed, Ejaz, and Qu Annie*. A weak-signal-assisted procedure for variable selection and statistical inference with an informative subsample. Biometrics, 2021, 77, 996-1010.

[25] Chen, Ji, Shao, Jun, and Fang, Fang*. Instrument search in pseudo likelihood approach for nonignorable nonresponse. Annals of the Insititute of Statistical Mathematics, 2021, 73, 519-533.

[24] Wang, Lei, Shao, Jun, and Fang, Fang*. Propensity model selection with nonignorable nonresponse and instrument variable. Statistica Sinica, 2021, 31, 647-672.

[23] Fang, Fang* and Liu, Minhan. Limit of the optimal weight in least squares model averaging with non-nested models. Economics Letters, 2020, 196, 109586.

[22] Fang, Fang and Yu, Zhou*. Model averaging assisted sufficient dimension reduction. Computational Statistics & Data Analysis, 2020, 152, Article 106993.

[21] Ni, Lyu, Fang, Fang*, and Shao, Jun. Feature screening for ultrahigh dimensional categorical data with covariates missing at random. Computational Statistics & Data Analysis, 2020, 142, Article 106824.

[20] Fang, Fang*, Lan, Wei, Tong, Jingjing, and Shao, Jun. Model averaging for prediction with fragmentary data. Journal of Business & Economic Statistics. 2019, 37, 517-527

[19] Fang, Fang, Li, Jialiang* and Wang, Jingli. Optimal model averaging estimation for correlation structure in generalized estimating equations. Communications in Statistics - Simulation and Computation. 2019, 48, 1574-1593.

[18] Chen, Ji and Fang, Fang*. Semiparametric likelihood for estimating equations with nonignorable nonresponse by nonresponse instrument. Journal of Nonparametric Statistics. 2019, 31, 420-434.

[17] Fang, Fang* and Chen, Yuanyuan. A new approach for credit scoring by directly maximizing the Kolmogorov-Smirnov statistic. Computational Statistics & Data Analysis. 2019, 133, 180-194.

[16] Fang, Fang*, Yin, Xiangju, and Zhang, Qiang. Divide and conquer algorithms for model averaging with massive data. Journal of System Science and Mathematical Sciences, Chinese Series. 2018, 38, 764-776. An invited paper for the special issue of model averaging.

[15] Fang, Fang*, and Ni, Lyu. Variable screening with missing covariates: A discussion of Statistical inference for nonignorable missing data problems: A selective review by Niansheng Tang and Yuanyuan Ju. Statistical Theory and Related Fields, 2018, 2, 134-136.

[14] Fang, Fang, Zhao, Jiwei, and Shao, Jun*. Imputation-based adjusted score equations in generalized linear models with nonignorable missing covariate values. Statistica Sinica, 2018, 28, 1677-1701.

[13] Chen, Ji, Fang, Fang and Xiao, Zhiguo*, Semiparametric inference for estimating equations with nonignorable missing covaraites. Journal of Nonparametric Statistics, 2018, 30, 796-812.

[12] Ni, Lyu, Fang, Fang*, and Wan, Fangjiao. Adjusted Pearson Chi-Square feature screening for multi-classification with ultrahigh dimensional data. Metrika, 2017, 80, 805-828.

[11] Fang, Fang, and Shao, Jun*. Model selection with nonignorable nonresponse. Biometrika, 2016, 103, 861-874.

[10] Fang, Fang*, Fan, Xiaoyin, and Zhang Ying. Estimation of response from longitudinal binary data with noignorable missing values in migraine trails. Contemporary Clinical Trials Communications, 2016, 4, 90-98.

[9] Fang, Fang, and Shao, Jun*. Iterated imputation estimation for generalized linear models with missing response and covariate values. Computational Statistics & Data Analysis, 2016, 103, 111-123.

[8] Ni, Lyu, and Fang, Fang*. Entropy based model free feature screening for ultrahigh dimenisonal multiclass classification. Journal of Nonparametric Statistics, 2016, 28, 515-530.

[7] Fang, Fang*. Regression analysis with nonignorably missing covariates using surrogate data. Statistics and Its Interface, 2016, 9, 123-130.

[6] Fang, Fang, Hong, Quan, and Shao, Jun*. Empirical likelihood estimation for samples with nonignorable nonresponse. Statistica Sinica, 2010, 20, 263-280.

[5] Fang, Fang, Hong, Quan, and Shao, Jun*. A pseudo empirical likelihood approach for stratified samples with nonresponse. The Annals of Statistics, 2009, 37, 371-393 .

[4] 方方，资本项目开放与银行业务发展机遇探析，上海金融, 2013, No. 1, 98-101.

[3] 李麟、蒋波、方方，商业银行综合经营的边界、收益和风险，金融理论与实践, 2012, 396(7), 13-18.

[2] 李麟、方方、李晓玮，利率市场化下的区域商业银行转型，中国金融, 2012, No. 19, 70-72.

[1] 方方，“大数据”趋势下商业银行应对策略研究，新金融，2012, 286(12), 25-28.

其它文章：

[1] Fang, Fang and Lou, Zhilan. A Conversation with Jun Shao. ICSA Bulletin, 2015, 27, 69-77.

著作：

[3] 《多源数据的统计分析与建模》，方方，倪葎，邵军著，上海交通大学出版社，2024。

[2] 《统计王国奇遇记》，方方著，华东师范大学出版社，2020。

[1] 《中国银行业海外发展战略研究》，中国银行业协会发展研究委员会编著。参与编写。

授权专利：

[6] 一种广布种分布范围的预测方法，专利号：ZL 2020 1 0688315.5，授权日期：2023年10月24日。5/12。

[5] 一种计及记录偏差的航空数据匹配方法，专利号：ZL 202211505325.6，授权日期：2023年7月21日。3/5。

[4] 一种两栖动物的探测率、占域率以及密度估算方法，专利号：ZL 2020 1 0747266.8，授权日期：2023年7月4日。9/9。

[3] 一种航空风险评价方法、装置及计算机设备，专利号：ZL 202010399853.2，授权日期：2023年6月13日。1/6。

[2] 一种运行风险量化方法、运行风险评价方法及装置，专利号：ZL 201811238049.5，授权日期：2023年5月23日。8/12。

[1] 一种鸟类密度估算方法，专利号：ZL202010671298.4，授权日期：2022年3月22日。8/9。

荣誉及奖励

1、上海市东方英才计划拔尖项目，2023年11月。

2、“几类复杂数据的统计分析方法研究”，上海市自然科学二等奖，2020年4月，第三完成人。

——————————--————————————————————————————————————

国家自然科学基金重点项目，多源异构数据的融合、特征提取与分析方法，项目批准号：11831008。项目负责人：邵军。

项目背景：

随着数据收集技术和计算机存储能力的不断发展，来自公共管理、电子商务、金融服务、医疗健康等应用领域的大数据不断涌现，人类社会已经步入了大数据驱动下的数字经济时代。在大数据的发展浪潮下，我们需要处理和分析的数据早已经从单一数据来源向多个来源转变。不断增长的数据来源为我们更好的研究和预测个体或群体的行为创造了前所未有的机会，进而带来巨大的社会和经济效益。但另一方面，这些增加的数据源也给分析建模带来了新的挑战。多源数据的多样性以及建模的复杂性使得传统的统计建模方法陷入困境，亟需发展新的理论和方法。

研究成果：

项目组主要讨论两大类多源数据的统计分析与建模方法。第一类是多源碎片化数据的建模和预测。它考虑数据的自变量来自于不同来源的情况。在这种情况下，每个数据样本都不太容易获得全部来源的数据，因此最终的建模样本呈现“碎片化”的特征。由于缺失比例高、缺失模式复杂，传统处理缺失数据的方法很难处理碎片化数据。第二类是有效利用多源外部数据的统计推断。它考虑的情况是：我们主要关心的“内部数据”的数据量比较小，直接进行统计推断的效率比较低。但同时我们还能获得很多其它来源的“外部数据”，可以利用它们来提升对内部数据参数的推断有效性。但由于外部数据的观测不完整、个体数据不一定可获得、数据异质性等问题，对于外部数据的运用存在很多挑战。针对这两大类问题，我们提出了一系列基于模型平均、生成对抗网络、广义估计方程、经验似然等工具的处理多源数据的工具和方法。

FragmGAN: Generative adversarial nets for fragmentary data imputation and prediction.

我们考虑如下类型的数据，这在多源大数据信用评分中非常常见。

主要的方法：基于生成对抗网络，对数据进行插补和预测

实现算法：

教师个人主页

导航

方方