基于條件隨機場的農作物病蟲害及農藥命名實體識別
發(fā)布時間:2018-05-12 13:01
本文選題:病蟲害 + 農藥。 參考:《農業(yè)機械學報》2017年S1期
【摘要】:互聯(lián)網(wǎng)農技問答平臺現(xiàn)僅依靠人工提供答題服務,響應速度慢,回答質量難以保證。實現(xiàn)智能農技問題解答,構建農技知識庫,需要從現(xiàn)有問答數(shù)據(jù)提取"農作物-病蟲害-農藥"命名實體三元組。現(xiàn)有對農業(yè)中文命名實體識別的研究較少,且準確率較低。根據(jù)農作物、病蟲害及農藥命名實體的特點,針對農技問答數(shù)據(jù),提出基于條件隨機場的農作物、病蟲害及農藥命名實體的識別方法。對數(shù)據(jù)集進行格式整理及自動分詞,并對分詞后的語料,針對是否包含特定界定詞、是否含特定偏旁部首、是否是數(shù)量詞、是否是特定左右指界詞及詞性等特征進行自動標注。利用標注后的數(shù)據(jù)訓練CRF模型,可以對語料進行分類,包括判斷語料是否屬于農作物、病蟲害、農藥3類命名實體并識別該語料在復合命名實體中的位置,從而實現(xiàn)了對3類命名實體的識別,由此可自動構建關聯(lián)三元組。通過試驗選擇特征組合和調整上下文窗口大小,提高了本方法的識別準確度,降低了模型訓練時間,對農作物、病蟲害、農藥命名實體識別的準確度分別達97.72%、87.63%、98.05%,比現(xiàn)有方法有顯著提高。
[Abstract]:Internet agricultural technology question and answer platform only depends on manual to provide answer service, the response speed is slow, the answer quality is difficult to guarantee. To realize the intelligent agricultural technology problem solving and to construct the agricultural technology knowledge base, it is necessary to extract the named entity triple of "crop, pest and pesticide" from the existing question and answer data. There are few researches on agricultural Chinese named entity recognition, and the accuracy is low. According to the characteristics of named entities of crops, pests and pesticides, a method of identifying named entities of crops, pests and pesticides based on conditional random field is proposed. The data set is organized by format and automatic participle, and the corpus after word segmentation is automatically tagged for whether it contains a specific defining word, whether it contains a specific partial radical, whether it is a quantitative word, whether it is a specific left and right finger boundary word and whether it is a part of speech and so on. Using the labeled data to train the CRF model, we can classify the corpus, including judging whether the corpus belongs to the named entities of crops, pests and diseases, pesticides and recognizing the position of the corpus in the compound named entity. The recognition of named entities of three classes is realized, and the associated triples can be constructed automatically. By selecting feature combination and adjusting the size of context window, the recognition accuracy of this method is improved, the training time of model is reduced, and the crops, pests and diseases are treated. The accuracy of identification of named entities of pesticides was 97.72 and 98.05 respectively, which was significantly higher than that of the existing methods.
【作者單位】: 中國農業(yè)大學信息與電氣工程學院 山東老刀網(wǎng)絡科技有限公司
【基金】:國家自然科學基金項目(61502500) 北京市自然科學基金項目(4164090) 中央高;究蒲袠I(yè)務費專項資金項目(2017QC077)
【分類號】:TP391.1
【相似文獻】
相關期刊論文 前10條
1 向曉雯,史曉東,曾華琳;一個統(tǒng)計與規(guī)則相結合的中文命名實體識別系統(tǒng)[J];計算機應用;2005年10期
2 張曉艷;王挺;陳火旺;;命名實體識別研究[J];計算機科學;2005年04期
3 邱莎;;幾種基于機器學習的生物命名實體識別模型比較[J];電腦知識與技術(學術交流);2007年05期
4 趙軍;;命名實體識別、排歧和跨語言關聯(lián)[J];中文信息學報;2009年02期
5 鄭強;劉齊軍;王正華;朱云平;;生物醫(yī)學命名實體識別的研究與進展[J];計算機應用研究;2010年03期
6 張向U,
本文編號:1878716
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1878716.html
最近更新
教材專著