網上專家經濟觀點自動挖掘研究

發(fā)布時間：2018-02-15 04:04

本文關鍵詞： 非結構化數據文本數據自然語言處理技術空間向量模型　出處：《首都經濟貿易大學》2017年碩士論文　論文類型：學位論文

【摘要】：前互聯網時代,數據的獲取受制于時間、空間等因素的影響,由于當時網絡不發(fā)達,數據獲取的難度大、速度慢、來源少,導致人們對于數據的獲取感到非常困難。并且在數據處理上,所能處理的數據類型大多是結構化數據,對于非結構化數據還未能從容應對�，F在,隨著互聯網的快速發(fā)展,信息獲取的廣度增加、速度加快,其也不再受制于時間和空間,再加上非結構化數據占比越來越多,傳統(tǒng)的結構化數據等早已不能滿足人們的分析需要。這時,數據的獲取開始受限于信息的整合。因為面對現如今海量的數據,尤其是諸如文本、聲音、圖像等非結構化數據,如何將其中有用的信息進行整合,已經成為了一個重要的問題。目前,信息整合大多依賴于人力,那么如何實現自動化信息整合就成為了一項挑戰(zhàn)。本文研究的目的是以經濟形勢數據為例,通過python urllib爬蟲技術實現數據的獲取;利用專家觀點篩選公式實現數據的篩選;使用描述統(tǒng)計相關方法實現數據的整體把握;運用自然語言處理技術實現文本數據的處理;結合空間向量模型實現文本的聚類;自建文章自動撰寫系統(tǒng)實現數據的整合。通過本研究,使得我們可以自動處理文本數據,從而實現一套自動化處理文本的流程。為人們的決策起到輔助作用,極大提高人們的效率,方便人們的生活。
[Abstract]:In the pre-Internet era, the acquisition of data was affected by time, space and other factors. Because of the underdevelopment of the network at that time, it was difficult to obtain data, slow in speed, and few in sources. It makes it very difficult for people to get data. And in data processing, most of the data types that can be processed are structured data, and they can't deal with unstructured data. Now, with the rapid development of the Internet, The breadth and speed of access to information is increasing, and it is no longer constrained by time and space. In addition, with the increasing proportion of unstructured data, the traditional structured data has long been unable to meet the analytical needs of people. Access to data began to be limited by the integration of information, because in the face of today's massive amounts of data, especially unstructured data such as text, sound, and images, how to integrate useful information into them, At present, information integration mostly depends on manpower, so how to realize automation information integration has become a challenge. The purpose of this paper is to take the economic situation data as an example. Through python urllib crawler technology to achieve data acquisition, using expert view screening formula to achieve data screening, using descriptive statistics related methods to achieve the overall grasp of data, using natural language processing technology to achieve text data processing; Combining spatial vector model to achieve text clustering, self-built automatic writing system to achieve data integration. Through this study, we can automatically process text data, Thus a set of automatic process of text processing is realized, which plays an auxiliary role in people's decision-making, greatly improves people's efficiency and facilitates people's life.
【學位授予單位】：首都經濟貿易大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP391.1

【參考文獻】

相關期刊論文前10條

1 陳開昌;;自然語言處理技術中的中文分詞研究[J];信息與電腦(理論版);2016年19期

2 李暉;高洪山;;“寫作機器人”將如何影響媒體[J];秘書工作;2015年08期

3 馬曉河;;新常態(tài)下的經濟形勢研判和宏觀政策建議[J];國家行政學院學報;2015年01期

4 夏斌;;當前經濟形勢判斷與調控技術[J];新金融;2015年01期

5 周俊;鄭中華;張煒;;基于改進最大匹配算法的中文分詞粗分方法[J];計算機工程與應用;2014年02期

6 李生;;自然語言處理的研究與發(fā)展[J];燕山大學學報;2013年05期

7 徐戈;王厚峰;;自然語言處理中主題模型的發(fā)展[J];計算機學報;2011年08期

8 孫立偉;何國輝;吳禮發(fā);;網絡爬蟲技術的研究[J];電腦知識與技術;2010年15期

9 熊泉浩;;中文分詞現狀及未來發(fā)展[J];科技廣場;2009年11期

10 龍樹全;趙正文;唐華;;中文分詞算法概述[J];電腦知識與技術;2009年10期

相關博士學位論文前1條

1 周雅倩;最大熵方法及其在自然語言處理中的應用[D];復旦大學;2005年

相關碩士學位論文前3條

1 曹衛(wèi)峰;中文分詞關鍵技術研究[D];南京理工大學;2009年

2 姚清耘;基于向量空間模型的中文文本聚類方法的研究[D];上海交通大學;2008年

3 蘇旋;分布式網絡爬蟲技術的研究與實現[D];哈爾濱工業(yè)大學;2006年

，

本文編號：1512381

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1512381.html

上一篇：小麥莖稈截面參數顯微圖像測量系統(tǒng)
下一篇：基于最大信息系數和近似馬爾科夫毯的特征選擇方法

論文發(fā)表

·知網|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

網上專家經濟觀點自動挖掘研究