面向語義搜索的漢語名名組合的自動釋義研究
發(fā)布時間:2018-05-15 19:36
本文選題:名名組合 + 自動釋義; 參考:《北京大學》2012年碩士論文
【摘要】:本文以現(xiàn)代漢語(特別是網絡搜索詞)中的名名組合的語義關系為主要研究對象。名名組合內部的語義關系復雜,常常隱含了謂詞。對名名組合進行釋義的主要目的是發(fā)現(xiàn)兩個名詞之間隱含的謂詞,進而揭示這兩個名詞之間的語義關系。本文在生成詞庫論等理論的指導下,提出了一種自上而下與自下而上相結合的方法,設計并實現(xiàn)了自動生成由兩個名詞構成的名名組合的釋義短語的程序。 本文首先搜集、分析了谷歌熱榜詞和百度新聞熱搜詞,發(fā)現(xiàn)名名組合在網絡搜索詞中占有重要地位,研究名名組合的自動釋義對信息檢索等自然語言處理應用有所幫助;然后本文借鑒生成詞庫論,結合《現(xiàn)代漢語語義詞典》,對來自百度新聞熱搜詞、前人文獻和各種小說、散文中的850個名名組合進行了歸納,總結得出了356個語義類組合模式及其相應的釋義模板,在此基礎上建立了名名搭配數(shù)據庫Noun_Noun;接著,本文利用《知網》資源,進一步建立了名詞知識庫Noun_Verb;最后,本文在名名搭配數(shù)據庫Noun_Noun和名詞知識庫Noun_Verb的基礎上,進一步開發(fā)了漢語名名組合的自動釋義程序。 我們設計的自動生成名名組合釋義短語的程序,主要有5個操作步驟:(1)對于輸入的名名組合首先進行切詞、標注詞性操作,得到詞串N1+N2,確定為名名組合;(2)分別查詢N1和N2在數(shù)據庫Noun_Verb中的語義類S1和S2;(3)在數(shù)據庫Noun_Noun中查找語義類組合模式為S1+S2的釋義模板;(4)根據釋義模板的要求在數(shù)據庫Noun_Verb中查找相關名詞的施成角色或者功能角色(動詞),作為表示N1和N2之間的語義關系的謂詞;(5)將動詞、N1、N2插入至釋義模板中,生成釋義短語。 在程序建立以后,我們以2011年5月至9月的百度新聞熱搜詞中的名名組合作為測試數(shù)據,檢驗了程序的有效性。通過研究和程序測試,本文還為《現(xiàn)代漢語語義詞典》和《知網》提出了一些改進意見和建議。本文希望能夠實現(xiàn)語言資源和應用系統(tǒng)的良性互動,同時,通過開發(fā)名名組合自動釋義程序,本文深感建設基礎語言資源的必要性和重要性。 在國內,關于漢語名名組合自動釋義的研究,比較具有代表性的是王萌、黃居仁、俞士汶、李斌(2010)。跟王萌等(2010)的研究相比,本文具有3個特點:(1)釋義模板更為豐富;(2)釋義短語更為自然;(3)多種方法有機結合。 跟王萌等(2010)的研究相比,本文的不足之處是:(1)我們的研究成果在很大程度上依賴于人工建構的釋義模板和相關的知識庫,操作的步驟比較多,沒有王萌等(2010)的系統(tǒng)智能;(2)我們歸納的釋義模板、名詞的施成角色以及功能角色還不夠完善,還需要在使用過程中不斷擴充和改進。 本文還提出了一些進一步改進名名組合自動釋義程序的設想。我們期望,在進一步完善名名組合自動釋義程序之后,它能夠更好地為搜索引擎、機器翻譯等自然語言處理任務服務。
[Abstract]:This paper focuses on the semantic relationship of the combination of names and names in modern Chinese, especially in Internet search terms. The semantic relationship within name-name combination is complex, and predicates are often implied. The main purpose of defining the combination of names is to discover the implicit predicates between the two nouns, and then to reveal the semantic relationship between the two nouns. Under the guidance of generative lexicon theory, this paper proposes a method of combining top-down and bottom-up, and designs and implements a program to automatically generate the interpretive phrases composed of two nouns. Firstly, this paper collects and analyzes Google's hot list words and Baidu News's hot search words, and finds that the combination of names plays an important role in the network search words. The research on the automatic interpretation of name combination is helpful to the application of natural language processing such as information retrieval. Then this paper draws lessons from generative lexicon theory and combines the Modern Chinese semantic Dictionary to sum up 850 famous names from Baidu News Hot search words, previous literature and various novels and prose. In this paper, 356 semantic class combination patterns and their corresponding interpretation templates are summarized. On this basis, a noun collocation database, NounNouns, is set up. Then, the noun knowledge base Nouns Verb is further established by using the knowledge net resources. Based on the name collocation database Noun_Noun and the noun knowledge base Noun_Verb, this paper further develops the automatic interpretation program of the Chinese name combination. The program that we designed to automatically generate name-name combination interpretive phrases has five main operation steps: (1) for the input name combination, we first cut words and annotate the part of speech operation. Get the string N1 N2, determine the combination of name and name.) query the semantic classes S1 and S2 in database Noun_Verb of N1 and N2 respectively) find semantic class combination schema of S1 S2 in database Noun_Noun and interpret template 4) according to the request of interpretation template. In the library Noun_Verb, the verb noun is inserted into the interpreted template (verb noun, as a predicate to indicate the semantic relationship between N1 and N2), or functional role (verb / functional role). Generate an interpreted phrase. After the program was set up, we tested the validity of the program by using the name combination of Baidu News hot search words from May to September 2011 as the test data. Through research and program testing, this paper also puts forward some suggestions and suggestions for the semantic Dictionary of Modern Chinese and the Web of knowledge. This paper hopes to realize the benign interaction between the language resources and the application system. At the same time, through the development of the name combination automatic interpretation program, this paper deeply feels the necessity and importance of the construction of the basic language resources. In China, Wang Meng, Huang Juren, Yu Shiwen and Li Bin 2010 are more representative of the automatic interpretation of Chinese name combination. Compared with the study of Wang Meng et al. (2010), this paper has three characteristics: 1) the template of interpretation is more abundant and 2) the phrase of interpretation is more natural than the other 3). Compared with the research by Wang Meng et al. (2010), the disadvantage of this paper is that: 1) our research results depend to a large extent on artificially constructed interpretation templates and related knowledge bases, and there are many steps to operate. Without Wang Meng et al. (2010) the definition template, the roles of nouns and their functions are not perfect enough, and need to be expanded and improved in the process of use. This paper also puts forward some ideas to improve the automatic interpretation program of name combination. We hope that with the further improvement of the name combination automatic interpretation program, it can better serve the natural language processing tasks such as search engine, machine translation and so on.
【學位授予單位】:北京大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:H146
【參考文獻】
相關期刊論文 前10條
1 孔令達;“名_1+的+名_2”結構中心名詞省略的語義規(guī)則[J];安徽師大學報(哲學社會科學版);1992年01期
2 宋作艷;;字族化與漢語未登錄詞的自動提取[J];北京大學學報(哲學社會科學版);2007年02期
3 沈陽;領屬范疇及領屬性名詞短語的句法作用[J];北京大學學報(哲學社會科學版);1995年05期
4 董振東,董強;知網和漢語研究[J];當代語言學;2001年01期
5 J.Pustejovsky;張秀松;張愛玲;;生成詞庫論簡介[J];當代語言學;2009年03期
6 袁毓林;陳振宇;張秀松;李湘;周強;高嵩;;從認知假設到計算分析和程序實現(xiàn)——一種認知語言學研究的計算范式與技術路線[J];當代語言學;2010年02期
7 周日安;;名名組合的語義折疊與受事域外化[J];佛山科學技術學院學報(社會科學版);2010年02期
8 宋春陽;;現(xiàn)代漢語名+名語義關系的識別及序位研究[J];華東師范大學學報(哲學社會科學版);2007年03期
9 程書秋;附加性聯(lián)合短語初探[J];哈爾濱學院學報;2005年06期
10 袁毓林;;對“詞類是表述功能類”的質疑[J];漢語學報;2006年03期
相關碩士學位論文 前1條
1 李光群;漢英“名+名結構”對比分析及互譯研究[D];華中師范大學;2007年
,本文編號:1893638
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1893638.html
教材專著