上市公司負(fù)面信息監(jiān)測(cè)系統(tǒng)分析與設(shè)計(jì)

發(fā)布時(shí)間：2018-03-13 21:35

本文選題：搜索引擎　切入點(diǎn)：網(wǎng)絡(luò)爬蟲(chóng)　出處：《復(fù)旦大學(xué)》2013年碩士論文　論文類(lèi)型：學(xué)位論文

【摘要】：隨著網(wǎng)絡(luò)的發(fā)展,人們開(kāi)始認(rèn)識(shí)到在大量的數(shù)據(jù)中獲取有用的信息相當(dāng)困難。在此背景下,從上世紀(jì)90年代開(kāi)始,數(shù)據(jù)挖掘技術(shù)開(kāi)始迅速崛起。該研究領(lǐng)域綜合了機(jī)器學(xué)習(xí)和統(tǒng)計(jì)分析等多個(gè)學(xué)科的計(jì)算機(jī)技術(shù),它能夠有效的幫助人們從浩瀚的數(shù)據(jù)中提取出有用的信息資源并加以研究,從而幫助人們科學(xué)客觀(guān)地做出各種決策。系統(tǒng)采用了數(shù)據(jù)挖掘技術(shù),可以應(yīng)用于網(wǎng)絡(luò)中的各類(lèi)網(wǎng)站,采集相關(guān)的負(fù)面信息。本系統(tǒng)是專(zhuān)門(mén)針對(duì)東方財(cái)富股吧論壇而設(shè)計(jì),對(duì)該論壇中某一上市公司采集負(fù)面信息。系統(tǒng)實(shí)現(xiàn)了對(duì)網(wǎng)頁(yè)信息的采集、預(yù)處理、分詞、文本傾向性分析以及索引檢索的全過(guò)程,主要包括以下幾個(gè)功能：1.網(wǎng)頁(yè)采集：下載東方財(cái)富股吧論壇中的網(wǎng)頁(yè)并保存在本地文件夾中。2.網(wǎng)頁(yè)預(yù)處理：去除網(wǎng)頁(yè)中各類(lèi)無(wú)用的標(biāo)簽,提取正文部分。3.中文分詞：作為數(shù)據(jù)挖掘的前提,在負(fù)面信息判定前,將提取到的正文做分詞處理。4.負(fù)面信息判定：通過(guò)文本分類(lèi)技術(shù)判斷文本中的負(fù)面信息,保存含有負(fù)面信息的文本。5.用戶(hù)檢索：用戶(hù)通過(guò)輸入上市公司的股票代碼,獲取該公司在東方財(cái)富股吧論壇中的負(fù)面消息。在完成了系統(tǒng)設(shè)計(jì)和系統(tǒng)完整功能的基礎(chǔ)上,本文還針對(duì)文本分類(lèi)的多種算法進(jìn)行了分析和研究,采用精度較高的算法實(shí)現(xiàn)本系統(tǒng)的負(fù)面信息判定功能。論文最后總結(jié)了課題的研究成果,展望了本論文涉及的相關(guān)技術(shù)及進(jìn)一步的研究工作。
[Abstract]:With the development of the network, people begin to realize that it is very difficult to obtain useful information from a large amount of data. In this context, since -10s, Data mining technology has begun to rise rapidly. This research field combines computer technology of machine learning and statistical analysis. It can effectively help people extract useful information resources from the vast amount of data and study them. It helps people make scientific and objective decisions. The system adopts data mining technology, which can be applied to all kinds of Web sites and collect related negative information. This system is specially designed for the Oriental Wealth Unit Forum. The system realizes the whole process of collecting, preprocessing, participle, text orientation analysis and index retrieval for a listed company in this forum. The main functions include: 1. Web page collection: download the page from the Oriental Wealth Unit forum and save it in the local folder .2. Page preprocessing: remove all kinds of useless tags from the web page, Extract the text part .3.Chinese participle: as the premise of data mining, the extracted text will be used as the word segmentation processing before the negative information is judged. 4. Negative information judgment: judging the negative information in the text through text classification technology, User Retrieval: by entering the stock code of the listed company, the user acquires the negative news of the company in the Oriental Fortune Bath Forum. On the basis of completing the system design and the complete function of the system, the user retrieves the negative information of the company by entering the stock code of the listed company. This paper also analyzes and studies various algorithms for text classification, and uses the algorithm with high accuracy to realize the negative information judgment function of the system. Finally, the paper summarizes the research results of the subject. The related technology and further research work in this paper are prospected.
【學(xué)位授予單位】：復(fù)旦大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2013
【分類(lèi)號(hào)】：TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文前5條

1 徐鳳亞,羅振聲;文本自動(dòng)分類(lèi)中特征權(quán)重算法的改進(jìn)研究[J];計(jì)算機(jī)工程與應(yīng)用;2005年01期

2 黃昌寧;趙海;;中文分詞十年回顧[J];中文信息學(xué)報(bào);2007年03期

3 唐慧豐;譚松波;程學(xué)旗;;基于監(jiān)督學(xué)習(xí)的中文情感分類(lèi)技術(shù)比較研究[J];中文信息學(xué)報(bào);2007年06期

4 黃穎;黃治平;;HtmIParser提取網(wǎng)頁(yè)信息的設(shè)計(jì)與實(shí)現(xiàn)[J];江西理工大學(xué)學(xué)報(bào);2007年06期

5 朱敏;羅省賢;;基于Heritrix的面向特定主題的聚焦爬蟲(chóng)研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2012年02期

相關(guān)碩士學(xué)位論文前1條

1 李兆福;基于K最短路徑的中文分詞算法研究與實(shí)現(xiàn)[D];哈爾濱工程大學(xué);2009年

，

本文編號(hào)：1608225

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1608225.html

上一篇：鈉—葡萄糖協(xié)同轉(zhuǎn)運(yùn)蛋白2抑制劑治療2型糖尿病的系統(tǒng)評(píng)價(jià)
下一篇：基于圖論的個(gè)性化視頻推薦算法研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

上市公司負(fù)面信息監(jiān)測(cè)系統(tǒng)分析與設(shè)計(jì)