基于條件隨機(jī)場的農(nóng)作物病蟲害及農(nóng)藥命名實(shí)體識別
發(fā)布時間:2018-05-12 13:01
本文選題:病蟲害 + 農(nóng)藥。 參考:《農(nóng)業(yè)機(jī)械學(xué)報》2017年S1期
【摘要】:互聯(lián)網(wǎng)農(nóng)技問答平臺現(xiàn)僅依靠人工提供答題服務(wù),響應(yīng)速度慢,回答質(zhì)量難以保證。實(shí)現(xiàn)智能農(nóng)技問題解答,構(gòu)建農(nóng)技知識庫,需要從現(xiàn)有問答數(shù)據(jù)提取"農(nóng)作物-病蟲害-農(nóng)藥"命名實(shí)體三元組,F(xiàn)有對農(nóng)業(yè)中文命名實(shí)體識別的研究較少,且準(zhǔn)確率較低。根據(jù)農(nóng)作物、病蟲害及農(nóng)藥命名實(shí)體的特點(diǎn),針對農(nóng)技問答數(shù)據(jù),提出基于條件隨機(jī)場的農(nóng)作物、病蟲害及農(nóng)藥命名實(shí)體的識別方法。對數(shù)據(jù)集進(jìn)行格式整理及自動分詞,并對分詞后的語料,針對是否包含特定界定詞、是否含特定偏旁部首、是否是數(shù)量詞、是否是特定左右指界詞及詞性等特征進(jìn)行自動標(biāo)注。利用標(biāo)注后的數(shù)據(jù)訓(xùn)練CRF模型,可以對語料進(jìn)行分類,包括判斷語料是否屬于農(nóng)作物、病蟲害、農(nóng)藥3類命名實(shí)體并識別該語料在復(fù)合命名實(shí)體中的位置,從而實(shí)現(xiàn)了對3類命名實(shí)體的識別,由此可自動構(gòu)建關(guān)聯(lián)三元組。通過試驗(yàn)選擇特征組合和調(diào)整上下文窗口大小,提高了本方法的識別準(zhǔn)確度,降低了模型訓(xùn)練時間,對農(nóng)作物、病蟲害、農(nóng)藥命名實(shí)體識別的準(zhǔn)確度分別達(dá)97.72%、87.63%、98.05%,比現(xiàn)有方法有顯著提高。
[Abstract]:Internet agricultural technology question and answer platform only depends on manual to provide answer service, the response speed is slow, the answer quality is difficult to guarantee. To realize the intelligent agricultural technology problem solving and to construct the agricultural technology knowledge base, it is necessary to extract the named entity triple of "crop, pest and pesticide" from the existing question and answer data. There are few researches on agricultural Chinese named entity recognition, and the accuracy is low. According to the characteristics of named entities of crops, pests and pesticides, a method of identifying named entities of crops, pests and pesticides based on conditional random field is proposed. The data set is organized by format and automatic participle, and the corpus after word segmentation is automatically tagged for whether it contains a specific defining word, whether it contains a specific partial radical, whether it is a quantitative word, whether it is a specific left and right finger boundary word and whether it is a part of speech and so on. Using the labeled data to train the CRF model, we can classify the corpus, including judging whether the corpus belongs to the named entities of crops, pests and diseases, pesticides and recognizing the position of the corpus in the compound named entity. The recognition of named entities of three classes is realized, and the associated triples can be constructed automatically. By selecting feature combination and adjusting the size of context window, the recognition accuracy of this method is improved, the training time of model is reduced, and the crops, pests and diseases are treated. The accuracy of identification of named entities of pesticides was 97.72 and 98.05 respectively, which was significantly higher than that of the existing methods.
【作者單位】: 中國農(nóng)業(yè)大學(xué)信息與電氣工程學(xué)院 山東老刀網(wǎng)絡(luò)科技有限公司
【基金】:國家自然科學(xué)基金項(xiàng)目(61502500) 北京市自然科學(xué)基金項(xiàng)目(4164090) 中央高;究蒲袠I(yè)務(wù)費(fèi)專項(xiàng)資金項(xiàng)目(2017QC077)
【分類號】:TP391.1
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 向曉雯,史曉東,曾華琳;一個統(tǒng)計(jì)與規(guī)則相結(jié)合的中文命名實(shí)體識別系統(tǒng)[J];計(jì)算機(jī)應(yīng)用;2005年10期
2 張曉艷;王挺;陳火旺;;命名實(shí)體識別研究[J];計(jì)算機(jī)科學(xué);2005年04期
3 邱莎;;幾種基于機(jī)器學(xué)習(xí)的生物命名實(shí)體識別模型比較[J];電腦知識與技術(shù)(學(xué)術(shù)交流);2007年05期
4 趙軍;;命名實(shí)體識別、排歧和跨語言關(guān)聯(lián)[J];中文信息學(xué)報;2009年02期
5 鄭強(qiáng);劉齊軍;王正華;朱云平;;生物醫(yī)學(xué)命名實(shí)體識別的研究與進(jìn)展[J];計(jì)算機(jī)應(yīng)用研究;2010年03期
6 張向U,
本文編號:1878716
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1878716.html
最近更新
教材專著