基于移動通信大數(shù)據(jù)的人口流動性測度研究
本文選題:流動人口規(guī)模 + 移動通信用戶數(shù)據(jù); 參考:《山西財經(jīng)大學(xué)》2017年碩士論文
【摘要】:人口流動性是經(jīng)濟(jì)社會發(fā)展的一個重要指標(biāo),用來測度人口以流動方式追求經(jīng)濟(jì)社會目標(biāo)而形成的較長時間的自由遷徙和異地生活的狀況。依照政府統(tǒng)計口徑,流動人口是指在中國戶籍制度條件下,離開了戶籍所在地到其他地方居住的人口,但目前尚無明確、準(zhǔn)確和統(tǒng)一的定義。截止到2016年末,我國流動人口總數(shù)約為2.45億人。經(jīng)濟(jì)增長是人口流動的一個重要原因,鑒于流動人口的構(gòu)成成分復(fù)雜、流動周期不確定、流動軌跡多變等原因,我國現(xiàn)有的人口流動性統(tǒng)計存在諸多問題,口徑統(tǒng)計不一,數(shù)據(jù)質(zhì)量參差,不能滿足政府和社會的統(tǒng)計需求,流動人口相關(guān)的統(tǒng)計方法和相關(guān)制度都亟待改進(jìn)。本文基于移動通信運營商的即時通話記錄數(shù)據(jù),以人口行為學(xué)特征為基礎(chǔ),從移動通信大數(shù)據(jù)所表征的用戶行為對人口的流動性進(jìn)行判斷和測度,在對流動人口概念進(jìn)一步界定的基礎(chǔ)上,設(shè)計了一種將基于機(jī)器學(xué)習(xí)算法構(gòu)建的流動人口識別模型和基于捕獲再捕獲抽樣構(gòu)建的人口流動性測度模型相結(jié)合來對流動人口規(guī)模進(jìn)行估計的方法。在基于機(jī)器學(xué)習(xí)方法構(gòu)建流動人口識別模型時,通過對移動通信用戶流動人口和本地人口通信行為特征的分析構(gòu)造了流動人口識別特征變量,利用AUC-RF方法進(jìn)行了特征變量的選擇。在此基礎(chǔ)上,本文選擇了決策樹、Bagging、隨機(jī)森林、支持向量機(jī)以及人工神經(jīng)網(wǎng)絡(luò)五種算法進(jìn)行模型的構(gòu)建,并通過多種評價標(biāo)準(zhǔn)對模型進(jìn)行評估和選擇,最終選擇分類性能和泛化能力最優(yōu)的隨機(jī)森林模型作為最終的流動人口識別模型,對樣本集中的未分類樣本進(jìn)行了分類預(yù)測。在構(gòu)建基于捕獲再捕獲抽樣的流動人口規(guī)模測度模型時,實證表明該估計方法能比較準(zhǔn)確可靠的對地區(qū)流動人口規(guī)模進(jìn)行估計。因此本文得出結(jié)論認(rèn)為,本文設(shè)計的基于移動通信大數(shù)據(jù)的人口流動性測度方法與傳統(tǒng)的流動人口調(diào)查方法可并行使用,相互補(bǔ)充,相互印證。本文希望在移動通信大數(shù)據(jù)的基礎(chǔ)上,對改進(jìn)我國流動人口統(tǒng)計調(diào)查探索一個基于大數(shù)據(jù)思想的統(tǒng)計方法和制度,依托同時期的移動通信記錄數(shù)據(jù),利用科學(xué)的統(tǒng)計推斷方法,對流動人口的規(guī)模和特征進(jìn)行估計和外推,從而得到更加精確和完整的人口統(tǒng)計數(shù)據(jù)。實證檢驗表明,本文設(shè)計的方法成本低、速度快、精度較高,非常適合于對我國現(xiàn)行的統(tǒng)計制度進(jìn)行改進(jìn)和拓展。
[Abstract]:Population mobility is an important indicator of economic and social development, which is used to measure the situation of long time free migration and living in different places. According to the government statistics, the floating population refers to the population who left the domicile to live in other places under the condition of Chinese household registration system, but there is no clear, accurate and unified definition at present. By the end of 2016, the total floating population in China was about 245 million. Economic growth is an important reason for population mobility. In view of the complexity of the composition of floating population, the uncertainty of flow cycle and the changeable flow path, there are many problems in the current population mobility statistics in our country. The data quality is uneven and can not meet the statistical needs of the government and society. The relevant statistical methods and systems of floating population need to be improved urgently. Based on the instant call record data of mobile communication operators and based on the characteristics of population behavior, this paper judges and measures the mobility of population from the user behavior represented by mobile communication big data. On the basis of a further definition of the concept of floating population, A new method is designed to estimate the size of floating population by combining the mobile population identification model based on machine learning algorithm and the population mobility measurement model based on capture and recapture sampling. In the process of constructing a mobile population identification model based on machine learning, the mobile population identification feature variables are constructed by analyzing the characteristics of mobile population and local population. The AUC-RF method is used to select the characteristic variables. On this basis, this paper selects five algorithms of decision tree agginging, random forest, support vector machine and artificial neural network to construct the model, and evaluates and selects the model through various evaluation criteria. Finally, a stochastic forest model with optimal classification performance and generalization ability is chosen as the final identification model of floating population, and the unclassified samples in the sample set are classified and forecasted. When constructing the scale measurement model of floating population based on capture and recapture sampling, the empirical results show that this method can estimate the size of floating population accurately and reliably. Therefore, this paper concludes that the method of population mobility measurement based on mobile communication big data can be used in parallel with the traditional method of mobile population survey, which can complement each other and confirm each other. On the basis of mobile communication big data, this paper hopes to explore a statistical method and system based on the idea of big data to improve the survey of mobile population in our country, rely on the mobile communication record data of the same period, and use scientific statistical inference method. The size and characteristics of floating population are estimated and extrapolated to obtain more accurate and complete demographic data. The empirical test shows that the method designed in this paper has the advantages of low cost, fast speed and high precision, which is very suitable for improving and expanding the current statistical system of our country.
【學(xué)位授予單位】:山西財經(jīng)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:C924.2
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 張曉鳳;侯艷;李康;;基于AUC統(tǒng)計量的隨機(jī)森林變量重要性評分的研究[J];中國衛(wèi)生統(tǒng)計;2016年03期
2 米子川;李毅;;面向SNS大數(shù)據(jù)的捕獲移出模型抽樣估計[J];數(shù)理統(tǒng)計與管理;2016年03期
3 周曉津;姚陽;;基于大數(shù)據(jù)的京滬人口流動流量、流向新變化[J];大數(shù)據(jù);2016年03期
4 金勇進(jìn);劉展;;大數(shù)據(jù)背景下非概率抽樣的統(tǒng)計推斷問題[J];統(tǒng)計研究;2016年03期
5 熊志斌;;信用評估中的特征選擇方法研究[J];數(shù)量經(jīng)濟(jì)技術(shù)經(jīng)濟(jì)研究;2016年01期
6 宋竹;秦志光;羅嘉慶;張悅涵;;電信數(shù)據(jù)中用戶行為特征測量與分析[J];電子科技大學(xué)學(xué)報;2015年06期
7 徐藹婷;楊玉香;;基于行政記錄人口普查方法的國際比較[J];統(tǒng)計研究;2015年11期
8 包婷;章志剛;金澈清;;基于手機(jī)大數(shù)據(jù)的城市人口流動分析系統(tǒng)[J];華東師范大學(xué)學(xué)報(自然科學(xué)版);2015年05期
9 李拓;李斌;;中國跨地區(qū)人口流動的影響因素——基于286個城市面板數(shù)據(jù)的空間計量檢驗[J];中國人口科學(xué);2015年02期
10 周天綺;嚴(yán)奧霞;;基于移動通信大數(shù)據(jù)的流動人口統(tǒng)計中Hadoop的應(yīng)用研究[J];軟件導(dǎo)刊;2015年03期
相關(guān)博士學(xué)位論文 前1條
1 何鑫;人口流動視角下的房地產(chǎn)價格空間差異性研究[D];湘潭大學(xué);2016年
相關(guān)碩士學(xué)位論文 前6條
1 范川;借助于CGSS的流動人口抽樣設(shè)計[D];復(fù)旦大學(xué);2013年
2 李銳鑫;移動通信流動客戶識別理論研究與應(yīng)用[D];北京郵電大學(xué);2012年
3 馬曉峰;基于數(shù)據(jù)挖掘技術(shù)的個人客戶識別模型的研究及應(yīng)用[D];成都理工大學(xué);2011年
4 路洋;3G時代基于神經(jīng)網(wǎng)絡(luò)的移動通信業(yè)客戶細(xì)分研究[D];西南交通大學(xué);2010年
5 劉春波;基于人工神經(jīng)網(wǎng)絡(luò)的移動通信市場用戶規(guī)模預(yù)測研究[D];暨南大學(xué);2010年
6 李悅猛;挖掘電信客戶的交往圈[D];北京郵電大學(xué);2006年
,本文編號:2079347
本文鏈接:http://sikaile.net/jingjilunwen/jiliangjingjilunwen/2079347.html