文檔融合關(guān)鍵技術(shù)研究
[Abstract]:Document fusion is the key technology of organizing text and integrating information, and it is also the important foundation of natural language generation. The purpose of this technique is to integrate important information across multiple documents to generate concise and smooth abstracts. Unlike the traditional task of generating abstracts, the abstract not only covers the common information in the collection of subject documents, but also embodies the important difference information. It is not only the extraction of key content, but also the integration of related content. Among them, how to get the topic concept in the document collection and the topic development that these topic concept extends, and arrange the key information in the whole set according to certain logical and organized order. It is an important research topic to cluster and organize text or sentence based on different topic content. This paper mainly explores the key technologies involved in the document fusion task from three aspects, the details of which are as follows: 1. The document fusion task integrates the relevant information of the same event or object. Taking the news event as an example, different news reports describe the same news event, and based on different perspectives, the information presented is different. Follow-up reports will also appear with the development of events with the emergence of new relevant information. In order to effectively remove redundant information and obtain topic and related information, this paper proposes an object merging framework based on fuzzy multi-set theory. Based on the merging function, the multiple sets corresponding to the document set and the fuzzy multiple sets corresponding to the concepts in a single document are combined, and then the merging function is evaluated and optimized by using the effectiveness evaluation function. In order to obtain the key concepts and related words. 2. Document fusion needs logical content arrangement, taking sentences as processing "granularity", extracting sentences containing key concepts and development clues from the document collection, sorting these sentences by sorting fusion technology to form logical smooth, and making use of sorting and fusion technology to sort these sentences to form logical fluency, which contains the key concepts and development clues in the document collection. A new text structure with strong readability. In this paper, the topic sentence clustering and graph model are used to combine and model the sorting sentences, and the problem of sentence sequencing is transformed into the path optimization problem of continuous Hopfield neural network. A shortest path is found among the nodes in the graph corresponding to the topic cluster. Finally, the output sequence of the path is used as the optimal sorting scheme. Document fusion needs to solve the basic problem of subject content partition. Due to the lack of domain background knowledge, there are still difficulties in topic clustering for specific events or specific domains, and it is difficult to extract relevant features effectively in this kind of clustering problem. In this paper, domain knowledge acquisition based on domain ontology is proposed to guide feature selection. These features are represented by vector space model, and fuzzy equivalence relation matrix is obtained by matrix transformation to realize clustering. This method is an unsupervised method, does not need to label data manually in advance, does not need training process, so it has high flexibility and automatic processing ability in organizing documents in special fields.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.1
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 何堯;張順淼;;利用未標(biāo)識(shí)文檔提高中心分類法性能的研究[J];電腦知識(shí)與技術(shù)(學(xué)術(shù)交流);2007年16期
2 付劍波;王明文;羅遠(yuǎn)勝;張華偉;;基于團(tuán)模型的文檔重排算法研究[J];中文信息學(xué)報(bào);2009年01期
3 陳釩;馮志勇;李曉紅;趙庚;;基于語(yǔ)言節(jié)奏的大規(guī)模文檔去重算法研究[J];計(jì)算機(jī)工程與應(yīng)用;2011年11期
4 顏學(xué)雄;王清賢;;基于屬性的內(nèi)部文檔訪問(wèn)控制[J];計(jì)算機(jī)工程與設(shè)計(jì);2009年05期
5 羅三定,馮元勇,沈德耀,賈維嘉;基于概念的文檔評(píng)價(jià)模型[J];計(jì)算機(jī)工程;2002年08期
6 毛存禮;余正濤;吳則建;郭劍毅;線巖團(tuán);;專家證據(jù)文檔識(shí)別無(wú)向圖模型[J];軟件學(xué)報(bào);2013年11期
7 馬維亞;基于PDF文檔的網(wǎng)絡(luò)學(xué)習(xí)資源建設(shè)[J];長(zhǎng)春理工大學(xué)學(xué)報(bào);2004年04期
8 許繼紅;;淺談基于PDF文檔的網(wǎng)絡(luò)學(xué)習(xí)資源建設(shè)[J];天津職業(yè)院校聯(lián)合學(xué)報(bào);2006年05期
9 安亮;;PDF文檔的網(wǎng)絡(luò)學(xué)習(xí)資源建設(shè)[J];科教文匯(下半月);2006年04期
10 鄭瑞銀;史曉紅;胡文偉;;談基于PDF文檔的網(wǎng)絡(luò)學(xué)習(xí)資源建設(shè)[J];科技廣場(chǎng);2007年09期
相關(guān)會(huì)議論文 前9條
1 李立;何婷婷;瞿國(guó)忠;張勇;;基于文檔擴(kuò)展的中文信息檢索系統(tǒng)[A];內(nèi)容計(jì)算的研究與應(yīng)用前沿——第九屆全國(guó)計(jì)算語(yǔ)言學(xué)學(xué)術(shù)會(huì)議論文集[C];2007年
2 曹慧;;一種xml文檔相似性距離的計(jì)算方法[A];山東省計(jì)算機(jī)學(xué)會(huì)2005年信息技術(shù)與信息化研討會(huì)論文集(一)[C];2005年
3 沙蕓;周俊武;張國(guó)英;;基于主題關(guān)鍵詞的新聞去重算法[A];第四屆全國(guó)信息檢索與內(nèi)容安全學(xué)術(shù)會(huì)議論文集(上)[C];2008年
4 溫俊;陽(yáng)國(guó)貴;;XML文檔集公共模式獲取技術(shù)研究[A];第二十屆全國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集(技術(shù)報(bào)告篇)[C];2003年
5 張剛;王斌;程學(xué)旗;;基于鏈接的分布式信息檢索文檔劃分研究[A];第二屆全國(guó)信息檢索與內(nèi)容安全學(xué)術(shù)會(huì)議(NCIRCS-2005)論文集[C];2005年
6 梁紅;李偉生;;XML文檔的并行聚類算法[A];第二十一屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集(技術(shù)報(bào)告篇)[C];2004年
7 楊潔;季鐸;蔡?hào)|風(fēng);白宇;;基于聯(lián)合權(quán)重的多文檔關(guān)鍵詞抽取技術(shù)[A];第四屆全國(guó)學(xué)生計(jì)算語(yǔ)言學(xué)研討會(huì)會(huì)議論文集[C];2008年
8 楊建武;陳曉鷗;;XML文檔集的聚類研究[A];第十八屆全國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集(研究報(bào)告篇)[C];2001年
9 賈候萍;萬(wàn)小軍;黃小江;楊建武;肖建國(guó);;多文檔摘要系統(tǒng)中句子排序研究[A];第四屆全國(guó)信息檢索與內(nèi)容安全學(xué)術(shù)會(huì)議論文集(上)[C];2008年
相關(guān)重要報(bào)紙文章 前2條
1 ;中國(guó)軟件首個(gè)國(guó)際聯(lián)盟標(biāo)準(zhǔn)UOML誕生[N];網(wǎng)絡(luò)世界;2008年
2 Linux逍遙客;用好OpenOffice的細(xì)小功能[N];電腦報(bào);2004年
相關(guān)博士學(xué)位論文 前4條
1 岳琳;文檔融合關(guān)鍵技術(shù)研究[D];吉林大學(xué);2016年
2 李旭;基于指紋和語(yǔ)義知識(shí)表示的中文文檔復(fù)制檢測(cè)方法[D];燕山大學(xué);2010年
3 劉喜平;XML文檔搜索中的查詢處理技術(shù)研究[D];江西財(cái)經(jīng)大學(xué);2010年
4 龔書;抽取式多文檔文摘的文本表示研究[D];北京交通大學(xué);2013年
相關(guān)碩士學(xué)位論文 前10條
1 岳大鵬;基于話題的多文檔文摘技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2011年
2 李延龍;基于查詢網(wǎng)絡(luò)的文檔推薦策略研究[D];東北大學(xué);2011年
3 李旭;基于串匹配方法的文檔復(fù)制檢測(cè)系統(tǒng)研究[D];燕山大學(xué);2006年
4 張志濤;基于參考文檔的信息檢索模型的研究[D];哈爾濱工業(yè)大學(xué);2010年
5 管冬根;Web文檔中信息的獲取與表示研究[D];重慶大學(xué);2003年
6 周丹;基于子主題的多文檔摘要關(guān)鍵技術(shù)研究[D];北京郵電大學(xué);2008年
7 衡偉;面向多文檔摘要的主題建模方法研究[D];北京郵電大學(xué);2014年
8 姚超;中文多文檔文摘關(guān)鍵技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2007年
9 李大任;基于參考文檔模型的個(gè)性化Web檢索研究[D];哈爾濱工業(yè)大學(xué);2011年
10 婁振霞;基于云模型理論的文檔重排方法研究[D];華中師范大學(xué);2012年
,本文編號(hào):2431505
本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/2431505.html