天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 搜索引擎論文 >

農(nóng)業(yè)信息搜索引擎分類器的研究

發(fā)布時間:2018-03-23 12:02

  本文選題:樸素貝葉斯 切入點:文本信息分類 出處:《東北農(nóng)業(yè)大學》2015年碩士論文


【摘要】:當今互聯(lián)網(wǎng)高速發(fā)展,人類社會邁入網(wǎng)絡信息爆炸時代,這帶來了網(wǎng)絡上農(nóng)業(yè)知識信息的激增,給農(nóng)業(yè)從業(yè)者帶來了農(nóng)業(yè)信息查找的便利。知識意味著財富,農(nóng)業(yè)從業(yè)者從這些農(nóng)業(yè)信息中擷取財富信息,然而,海量的農(nóng)業(yè)知識信息不意味著可以快速有效的查詢出所需信息,農(nóng)業(yè)領域細化信息的快速定位與分類查找是必要與必須的。本文以農(nóng)業(yè)信息搜索引擎分類器為研究對象,全面的介紹了當前信息文本分類器現(xiàn)狀、國內(nèi)外分類器發(fā)展歷程,在分類特征提取、訓練樣本和眾多分類算法基礎上,從農(nóng)業(yè)信息文本分類特征項提取方式上從手,提出了具有農(nóng)業(yè)信息文本特色的特征提取方式,在此特征項訓練基礎上,建立農(nóng)業(yè)信息文本訓練庫,針對分類算法分類效果各有差異,使用改進優(yōu)化后的樸素貝葉斯分類器對農(nóng)業(yè)信息進行分類,設計實現(xiàn)了農(nóng)業(yè)信息搜索引擎分類器系統(tǒng)。世界上不會存在一模一樣的兩片葉子,每個對象都具有其獨特性,文本信息對象也都具有各自獨有識別特征以供識別分類。本文對文本特征提取四種方式信息增益、互信息、卡方統(tǒng)計和文檔頻率進行算法論述與實現(xiàn)實驗比較,提出農(nóng)業(yè)信息文本特征提取方式:基于文檔頻率的文本特征提取,將TF-IDF、空間向量模型與余弦相關度的計算運用其中,在此基礎上,依據(jù)農(nóng)業(yè)信息分類原則,根據(jù)識別度,選取各農(nóng)業(yè)類別的文本信息,最終建立了農(nóng)業(yè)信息文本訓練庫。任何一種分類算法都不具有絕對優(yōu)越性,都存在不同分類偏差,不同文本信息,分類器分類效果不一樣。本文實驗比較了決策樹算法、K-近鄰算法、支持向量機和樸素貝葉斯四種分類算法對農(nóng)業(yè)信息文本分類情況,運用并改進優(yōu)化樸素貝葉斯分類器,主要改進點兩個方面:樸素貝葉斯算法計算公式變化,將二值模型變換成多項式模型,建立多項式模型公式,進行實驗結果數(shù)據(jù)比較;在分類器部署方式上,將分類器分布式部署到多臺計算機,采用Top-N算法排序結果,進行實驗結果數(shù)據(jù)比較。本文根據(jù)多組分類實驗比較結果,在軟件設計理論上,結合上述改進優(yōu)化后樸素貝葉斯算法,使用農(nóng)業(yè)信息文本訓練庫,設計并實現(xiàn)了農(nóng)業(yè)信息搜索引擎分類器系統(tǒng),對農(nóng)業(yè)信息文本分類實驗測試得出結果數(shù)據(jù)。實驗結果表明,經(jīng)改進優(yōu)化后樸素貝葉斯分類器分類精度更高,分類速度更快,是實用可靠的農(nóng)業(yè)信息搜索引擎分類器系統(tǒng)。綜上,本文在農(nóng)業(yè)信息搜索引擎抓取農(nóng)業(yè)信息文本基礎上,從分類信息文本特征提取、農(nóng)業(yè)信息文本訓練、分類算法上對農(nóng)業(yè)信息文本分類器研究,通過實驗對比,提出農(nóng)業(yè)信息分類特征提取方式,建立農(nóng)業(yè)信息文本訓練庫,從算法上對樸素貝葉斯分類器改進,從部署上,將分類器系統(tǒng)分布式部署分類,最終達到改進優(yōu)化農(nóng)業(yè)信息文本分類器。本文為農(nóng)業(yè)信息文本分類提供了理論和基礎實驗平臺,同時,本文研究也可作為實際應用推廣應用。
[Abstract]:With the rapid development of the Internet, the human society has entered the era of information explosion, which brought a surge in agricultural knowledge and information network, brings convenience to the agricultural information search agricultural practitioners. Knowledge means wealth, the wealth of agricultural practitioners capture information from these agricultural information in agricultural knowledge however, massive information does not mean to the required information quickly and effectively, rapid positioning of agricultural information classification and search field refinement is necessary and necessary. Based on the agricultural information search engine classifier as the research object, comprehensively introduces the current information text classifier the status quo, development at home and abroad in the extraction of feature classification, classifier training samples and many classification algorithm based from the agricultural information, text classification feature extraction method from the hand of agricultural information with text feature extraction method in The characteristics of training on the basis of the establishment of agricultural information text training base, according to the classification results of different classification algorithms, for agricultural information classification using Naive Bayesian classifier improved after optimization, the design and implementation of agricultural information search engine system. The world does not exist classifier two leaves each object as like as two peas, has its unique characteristics also, the text information objects have their own unique feature for recognition and classification. In this paper, four kinds of text feature extraction, information gain, mutual information, chi square statistics and document frequency method is discussed and experimental comparison, put forward the feature extraction of text information extraction, text: agricultural characteristics based on document frequency TF-IDF. Vector space model and cosine calculation of correlation to use them on the basis of this, according to the principle of agricultural information classification, according to the degree of recognition, the selection of agricultural Industry categories of text information, finally established the agricultural information database. Any kind of training text classification algorithm has no absolute superiority, there are different classification bias, different text classification, the effects are not the same. This paper compared the decision tree algorithm, K- nearest neighbor algorithm, support vector machine and Naive Bayesian four classification the algorithm of text classification and use of agricultural information, optimization of Naive Bayesian classifier, the main improvement in two aspects: Naive Bayesian algorithm formula changes the value of the two models are transformed into polynomial model, a polynomial model formula, experimental results for data comparison; in the classifier deployment, the classifier distributed deployment to multiple computers the sequencing results, Top-N algorithm, the experimental results were compared. Based on the data sets classification experimental results, in the software design theory According to the above, the improved and optimized Naive Bayesian algorithm, the use of agricultural information text training base, the design and implementation of agricultural information search engine results of classifier system, data classification experiment of agricultural information text test. The experimental results show that the improved Subayers Park classifier has higher classification accuracy and faster classification speed is practical and reliable agricultural information search engine classification system. To sum up the search engine grab text based agricultural information in agricultural information, classified information extracted from text feature, text classification algorithm of agricultural information training, research on agricultural information text classifier, through the experimental comparison, put forward the agricultural information classification feature extraction method, the establishment of agricultural information text training base, improvement the Naive Bayesian classifier from the algorithm, from the deployment, the classifier system distributed classification, finally To improve the optimization of the agricultural information text classifier. This paper provides theoretical basis and experimental platform for agricultural information classification at the same time, this study can also be used as a practical application.

【學位授予單位】:東北農(nóng)業(yè)大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TP391.1

【參考文獻】

相關期刊論文 前4條

1 李曉黎,劉繼敏,史忠植;概念推理網(wǎng)及其在文本分類中的應用[J];計算機研究與發(fā)展;2000年09期

2 郭昭輝;劉紹翰;武港山;;基于神經(jīng)網(wǎng)絡的中文文本分類中的特征選擇技術[J];計算機應用研究;2006年07期

3 范敏;石為人;;層次樸素貝葉斯分類器構造算法及應用研究[J];儀器儀表學報;2010年04期

4 林美娜;蘇玉;張紅艷;;基于VSM的個性化信息過濾算法的研究[J];微型機與應用;2012年21期

相關碩士學位論文 前2條

1 胡改蝶;中文文本分類中特征選擇方法的應用與研究[D];太原理工大學;2011年

2 張巖;基于SVM算法的文本分類器的實現(xiàn)[D];電子科技大學;2011年

,

本文編號:1653376

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1653376.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶1ce8c***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
欧美做爰猛烈叫床大尺度| 日韩专区欧美中文字幕| 国产精品99一区二区三区| 日韩视频在线观看成人| 麻豆欧美精品国产综合久久| 日韩人妻中文字幕精品| 国产三级欧美三级日韩三级| 欧美性猛交内射老熟妇| 国产在线一区中文字幕| 欧美国产日本高清在线| 国产一区在线免费国产一区| 欧美日韩综合在线第一页| 又大又长又粗又猛国产精品| 精品女同在线一区二区| 亚洲少妇一区二区三区懂色| 久久精品国产亚洲av麻豆| 日本不卡视频在线观看| 麻豆国产精品一区二区三区| 91欧美一区二区三区成人| 亚洲av在线视频一区| 亚洲一区二区三区在线中文字幕| 东京不热免费观看日本| 亚洲天堂精品在线视频| 久久精品视频就在久久| 高清国产日韩欧美熟女| 精品午夜福利无人区乱码| 99免费人成看国产片| 欧美三级精品在线观看| 久久婷婷综合色拍亚洲| 内射精品欧美一区二区三区久久久| 亚洲国产成人av毛片国产| 亚洲精品国产第一区二区多人| 欧美一二三区高清不卡| 欧美一级内射一色桃子| 亚洲最新的黄色录像在线| 亚洲精选91福利在线观看| 国产一区二区三区精品免费| 国内欲色一区二区三区| 免费特黄欧美亚洲黄片| 亚洲专区中文字幕视频| 日本黄色美女日本黄色|