基于邏輯回歸的金融數(shù)據(jù)分類系統(tǒng)的設計與實現(xiàn)
發(fā)布時間:2018-04-05 01:23
本文選題:文本分類 切入點:邏輯回歸 出處:《山東大學》2017年碩士論文
【摘要】:近半個世紀,伴隨著人類進入大數(shù)據(jù)時代,計算機多媒體技術得到快速而驚人的飛速發(fā)展。與之相伴的是,各種圖像和文本數(shù)據(jù)庫中存儲的有價值信息日新月異,金融類的新聞、公告和資訊等數(shù)據(jù)對于關心市場變化和熱衷于投資的人們來說有著極其重要的參考價值。然而金融數(shù)據(jù)的來源廣泛、種類繁雜,對于人們想直接找到最關注的信息帶來了很大的困擾。因此如何對海量的金融數(shù)據(jù)進行有效而精確快速的管理與檢索成為新時期的重大挑戰(zhàn),金融數(shù)據(jù)的分類問題成為迎接這個挑戰(zhàn)的所需要處理的核心問題。金融數(shù)據(jù)分類平臺可以實現(xiàn)對互聯(lián)網(wǎng)中存在的金融數(shù)據(jù)做到實時收集并快速的分到準確的類別,可以高效的處理海量的文本信息,有極高的實用價值。分類系統(tǒng)是金融數(shù)據(jù)分類平臺中的一個核心部件,它能夠通過對樣本數(shù)據(jù)處理和訓練得到性能優(yōu)越的分類器,并將分類器作用于實時數(shù)據(jù)的處理,能夠很好的實現(xiàn)金融數(shù)據(jù)分類平臺的分類功能,有很好的應用價值。根據(jù)以上介紹的背景,本文所要解決的問題是對海量的金融類文本數(shù)據(jù)實現(xiàn)快速高效的自動分類。通過對金融數(shù)據(jù)類別的市場需求調(diào)研,確定了常用的18個類別為最終的分類結果。本文的主要工作是完成對金融數(shù)據(jù)分類系統(tǒng)的設計與實現(xiàn),結合金融數(shù)據(jù)中專有名詞較多、特征較為明顯的特點,本文采用邏輯回歸算法實現(xiàn)對分類器的構造,將樣本數(shù)據(jù)經(jīng)過預處理后得到訓練樣本和測試樣本,對訓練樣本進行特征提取、特征加權、特征向量化,傳入邏輯回歸模型訓練得到分類器,用測試樣本測試分類器的分類效果并根據(jù)評價結果進一步優(yōu)化分類器,最終把分類器應用到對實際的金融數(shù)據(jù)分類。金融數(shù)據(jù)分類系統(tǒng)可應用在相關的門戶網(wǎng)站和數(shù)據(jù)分類平臺,實現(xiàn)對金融類的新聞、公告、資訊等文本數(shù)據(jù)的自動分類,也可以應用于公司或個人對大量金融類文本的信息管理,幫助用戶快速定位到最想要獲取的相關內(nèi)容,有非常好的應用價值。金融數(shù)據(jù)分類系統(tǒng)實現(xiàn)對數(shù)據(jù)的分類可以作為金融類信息檢索和數(shù)據(jù)挖掘的基礎,進一步提高信息的利用率。本文提出的金融數(shù)據(jù)分類系統(tǒng)經(jīng)過測試,有很好的分類效果,有非常好的應用前景。
[Abstract]:In the past half century, with the entering of big data, computer multimedia technology has developed rapidly and amazingly.At the same time, the valuable information stored in various image and text databases is changing with each passing day. Financial news, announcements and information have extremely important reference value for people who care about market changes and are keen to invest.Financial data, however, come from a wide variety of sources, causing a great deal of trouble for people to find the most concerned information directly.Therefore, how to manage and retrieve large amounts of financial data efficiently and accurately becomes a major challenge in the new era, and the classification of financial data becomes the core problem that needs to be dealt with in order to meet this challenge.The financial data classification platform can collect the financial data in the Internet in real time and divide it into accurate categories quickly. It can deal with massive text information efficiently and has high practical value.The classification system is a core component of the financial data classification platform. It can process and train the sample data to obtain the superior classifier, and the classifier can be used in real-time data processing.It can achieve the classification function of financial data classification platform, and has good application value.According to the background above, the problem to be solved in this paper is to realize fast and efficient automatic classification of large amounts of financial text data.By investigating the market demand of financial data categories, 18 categories are determined as the final classification results.The main work of this paper is to complete the design and implementation of the financial data classification system. Combined with the characteristics of more proper nouns and more obvious characteristics in the financial data, this paper uses the logical regression algorithm to construct the classifier.After preprocessing the sample data, the training sample and the test sample are obtained, and the training samples are extracted, weighted, vectorized, and trained by the incoming logical regression model to obtain the classifier.The classification effect of the classifier is tested with test samples, and the classifier is further optimized according to the evaluation results. Finally, the classifier is applied to the classification of the actual financial data.The financial data classification system can be used in relevant web portals and data classification platforms to realize the automatic classification of financial news, announcements, information and other text data.It can also be applied to the information management of a large number of financial texts by companies or individuals, which can help users quickly locate the relevant content that they want to obtain most quickly, and have very good application value.The financial data classification system can be used as the basis of financial information retrieval and data mining to further improve the utilization rate of information.The financial data classification system proposed in this paper has been tested and has a good classification effect and a very good application prospect.
【學位授予單位】:山東大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.52
【相似文獻】
相關期刊論文 前1條
1 郭通;黃焱;張白愚;;基于ARM和FPGA的DVB-S2數(shù)據(jù)分類系統(tǒng)研究[J];電子技術應用;2009年08期
相關碩士學位論文 前2條
1 王蕾;數(shù)據(jù)分類系統(tǒng)的設計與實現(xiàn)[D];華中科技大學;2014年
2 劉展;基于邏輯回歸的金融數(shù)據(jù)分類系統(tǒng)的設計與實現(xiàn)[D];山東大學;2017年
,本文編號:1712596
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1712596.html
最近更新
教材專著