基于混合余弦相似度的中文文本層次關系挖掘
發(fā)布時間:2018-09-08 13:14
【摘要】:層次關系是中文文本概念間存在的最為重要的關系之一。對層次關系的正確判定是進行領域本體自動構建、文本數據挖掘等信息處理的基礎研究內容。先將概念間可能存在的候選層次關系羅列出來,構建詞性序列語義余弦相似度和關系詞語余弦相似度混合的核函數分類器,將概念間層次關系的挖掘問題轉換為分類問題;再通過對文本數據進行模板標注來訓練分類器;最后輸入預處理后的中文文本,使用核函數分類器對候選層次關系進行判定。以空軍武器裝備領域的中文文本為測試數據,通過實驗表明,該方法簡單可靠,具有較好的正確率和召回率。
[Abstract]:Hierarchical relationship is one of the most important relationships between concepts of Chinese text. The correct judgment of hierarchical relationship is the basic research content of domain ontology automatic construction, text data mining and other information processing. Firstly, the candidate hierarchical relationships among concepts are listed out, and a kernel function classifier is constructed, which combines the semantic cosine similarity of part of speech sequence and the cosine similarity of relational words, and the mining problem of hierarchical relations between concepts is transformed into a classification problem. Then the classifier is trained by template annotation of text data. Finally, the pre-processed Chinese text is input and the candidate hierarchical relationship is judged by kernel function classifier. Taking the Chinese text in the field of air force weapon equipment as the test data, the experimental results show that the method is simple and reliable, and has good accuracy and recall rate.
【作者單位】: 西北工業(yè)大學計算機學院;
【基金】:國家部委基金智能信息處理支撐技術項目(513150703) 陜西省自然科學基金資助項目(2015JM6290)
【分類號】:TP391.1
[Abstract]:Hierarchical relationship is one of the most important relationships between concepts of Chinese text. The correct judgment of hierarchical relationship is the basic research content of domain ontology automatic construction, text data mining and other information processing. Firstly, the candidate hierarchical relationships among concepts are listed out, and a kernel function classifier is constructed, which combines the semantic cosine similarity of part of speech sequence and the cosine similarity of relational words, and the mining problem of hierarchical relations between concepts is transformed into a classification problem. Then the classifier is trained by template annotation of text data. Finally, the pre-processed Chinese text is input and the candidate hierarchical relationship is judged by kernel function classifier. Taking the Chinese text in the field of air force weapon equipment as the test data, the experimental results show that the method is simple and reliable, and has good accuracy and recall rate.
【作者單位】: 西北工業(yè)大學計算機學院;
【基金】:國家部委基金智能信息處理支撐技術項目(513150703) 陜西省自然科學基金資助項目(2015JM6290)
【分類號】:TP391.1
【相似文獻】
相關期刊論文 前10條
1 蘭杰;在西文狀態(tài)下閱讀中文文本文件[J];電腦知識;1997年02期
2 駱衛(wèi)華,羅振聲,宮小瑾;中文文本自動校對技術的研究[J];計算機研究與發(fā)展;2004年01期
3 顧益軍,樊孝忠,于江德,李良富;受限領域中文文本主題標引系統(tǒng)研究[J];計算機應用;2004年01期
4 李長榮,闞戈;中文文本2-分類模型在上證指數趨勢分析中的應用研究[J];齊齊哈爾大學學報;2005年02期
5 許細清;林世平;;面向中文文本的觀點檢索技術研究[J];福州大學學報(自然科學版);2010年05期
6 薛麗敏;李殿偉;肖斌;;中文文本情感傾向性五元模型研究[J];通信技術;2011年07期
7 劉開瑛,薛翠芳,鄭家恒,周曉強;中文文本中抽取特征信息的區(qū)域與技術[J];中文信息學報;1998年02期
8 劉晶茹,王開鑄;中文文本自動校對技術研究及系統(tǒng)組成[J];電腦學習;1999年06期
9 劉來e,
本文編號:2230624
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2230624.html