基于異構(gòu)信息的債券知識服務(wù)的研究與實現(xiàn)
發(fā)布時間:2018-03-01 00:13
本文關(guān)鍵詞: 異構(gòu)信息 檢索結(jié)果評估方法 本體規(guī)則自適應(yīng) 不平衡分類 出處:《哈爾濱工業(yè)大學》2013年碩士論文 論文類型:學位論文
【摘要】:隨著金融行業(yè)的迅猛發(fā)展,金融產(chǎn)品的網(wǎng)絡(luò)知識服務(wù)平臺越來越得到眾多投資者的認可。以債券為例,網(wǎng)絡(luò)中大量債券異構(gòu)信息的存在,為構(gòu)建自動化的債券知識服務(wù)平臺提供了一定的數(shù)據(jù)來源。因此,本課題將研究金融產(chǎn)品異構(gòu)信息的獲取方法,以及對這些異構(gòu)信息進行加工、處理,,進一步完成信息的分類融合,并將最終整合的信息應(yīng)用于債券知識服務(wù)平臺當中。 本課題研究的主要內(nèi)容有以下幾個方面: 債券產(chǎn)品異構(gòu)信息的獲取方法:包括債券結(jié)構(gòu)化數(shù)據(jù)和非結(jié)構(gòu)化網(wǎng)頁數(shù)據(jù)的獲取、預(yù)處理;債券數(shù)據(jù)的來源包括固定金融網(wǎng)站和搜索引擎兩部分,在搜索引擎部分本文提出了基于搜索引擎的特定領(lǐng)域檢索結(jié)果評估模型RDMDRR,進一步提高了債券公告信息獲取的準確性和全面性。 債券產(chǎn)品異構(gòu)信息的抽取:首先使用WHISK算法構(gòu)建債券特征的本體規(guī)則庫,然后利用本體規(guī)則自適應(yīng)的方法對構(gòu)建的規(guī)則進行剪枝操作,得到完善的本體規(guī)則庫,并將其運用到債券實體信息的抽取中,為構(gòu)建債券的知識服務(wù)提供數(shù)據(jù)來源。 債券信息的分類及融合:針對債券的不同類別,分別采用了規(guī)則和機器學習的方法對債券進行分類;陬悇e不均衡分布的特點,本文提出了一種新的特征權(quán)重方法,對原來的TFIDF進行了改進,并將其運用到不均衡分類當中,提高了少數(shù)類的識別率,準確的對債券信息進行歸類整理,然后將其與其它債券信息進行融合,形成較完整的債券知識庫。 異構(gòu)信息經(jīng)過上述三個環(huán)節(jié)的處理、加工與融合,得到完整的債券知識,并將其整合到債券知識服務(wù)平臺中。實驗表明,構(gòu)建的知識服務(wù)平臺改變了傳統(tǒng)的知識服務(wù)平臺的知識擴充模式,知識獲取的準確度和召回率在不同處理環(huán)節(jié)均得到了相應(yīng)的提高,知識服務(wù)平臺也得到債券投資用戶的認可。
[Abstract]:With the rapid development of financial industry, the network knowledge service platform of financial products is more and more recognized by many investors. It provides a certain data source for the construction of automated bond knowledge service platform. Therefore, this paper will study the methods of obtaining heterogeneous information of financial products, as well as the processing and processing of these heterogeneous information. Further complete the classification and fusion of information, and apply the final integrated information to the bond knowledge service platform. The main contents of this research are as follows:. The methods of obtaining isomerous information of bond products include the acquisition and preprocessing of structured and unstructured data of bonds, and the sources of bond data include fixed financial websites and search engines. In the part of search engine, this paper puts forward the evaluation model of search results based on search engine in specific domain, which further improves the accuracy and comprehensiveness of obtaining bond announcement information. The extraction of heterogeneous information of bond products: firstly, the ontology rule base of bond features is constructed by using WHISK algorithm, and then the rules are pruned by the adaptive method of ontology rules, and a perfect ontology rule base is obtained. It is applied to the extraction of bond entity information to provide data source for constructing bond knowledge service. The classification and fusion of bond information: according to the different categories of bonds, the methods of rule and machine learning are used to classify bonds. Based on the characteristics of class disequilibrium distribution, a new method of feature weight is proposed in this paper. This paper improves the original TFIDF and applies it to the unbalanced classification, improves the recognition rate of a few classes, classifies the bond information accurately, and then merges it with other bond information. Form a complete bond knowledge base. The heterogeneous information is processed, processed and integrated into the bond knowledge service platform through the processing and fusion of the above three links. The experimental results show that, The knowledge service platform has changed the knowledge expansion mode of the traditional knowledge service platform. The accuracy and recall rate of knowledge acquisition have been improved in different processing links. The knowledge service platform has also been recognized by bond investment users.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP391.3
【參考文獻】
相關(guān)期刊論文 前7條
1 廖一星;潘雪增;;面向不平衡文本的特征選擇方法[J];電子科技大學學報;2012年04期
2 陳蘭,左志宏,熊毅,孟令謙;一種新的基于Ontology的信息抽取方法[J];計算機應(yīng)用研究;2004年08期
3 劉遷;焦慧;賈惠波;;信息抽取技術(shù)的發(fā)展現(xiàn)狀及構(gòu)建方法的研究[J];計算機應(yīng)用研究;2007年07期
4 劉鵬博;車海燕;陳偉;;知識抽取技術(shù)綜述[J];計算機應(yīng)用研究;2010年09期
5 車萬翔,劉挺,李生;實體關(guān)系自動抽取[J];中文信息學報;2005年02期
6 張愛華;靖紅芳;王斌;徐燕;;文本分類中特征權(quán)重因子的作用研究[J];中文信息學報;2010年03期
7 郭紅鈺;;基于信息熵理論的特征權(quán)重算法研究[J];計算機工程與應(yīng)用;2013年10期
本文編號:1549567
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1549567.html
最近更新
教材專著