互聯(lián)網(wǎng)新聞的漢越雙語話題演化關(guān)鍵技術(shù)研究
[Abstract]:Vietnam has a close relationship with China. It is of great significance to analyze the evolution of topics over time from the mass collection of Chinese and Vietnamese news topics, which is of great significance to enhance the cultural exchanges between the two peoples. The technology of topic evolution analysis aims to express the topics concerned by users in a concise and orderly manner, which can help users to understand the whole context of the topic clearly. The Sino-Vietnamese topic text set is a text set in which the same content is described in two languages. No matter which language it is, it contains the same or similar event elements, such as object, time, place and event trigger word. By using this commonality in the text set of Chinese and Vietnamese topics, we can construct a pair of Chinese and Vietnamese topic elements to connect the two languages. In this paper, an evolutionary analysis method based on sub-topic association is used around the existing Sino-Vietnamese topic text set, and the following two special works are completed: 1. A method of extracting Chinese and Vietnamese bilingual news topic elements based on hypergraph is proposed. First of all, the event elements in news are extracted according to the method of trigger word motivation, then the topic hypergraph model is constructed on the basis of which, the Sino-Vietnamese event element is used as the node, and the sentence in the Sino-Vietnamese text set is taken as the super-edge. According to the probability evaluation function, the initial weights of nodes and overedges are calculated, and PageRank random walk method is used to score the Sino-Vietnamese event elements, and then to obtain the Sino-Vietnamese topic elements. The experimental results show that the effectiveness of this method is significantly higher than that of only considering single text event element extraction. A method of Chinese and Vietnamese bilingual topic evolution analysis based on subtopic correlation is proposed. First, the initial subtopic set is obtained by using the k-means algorithm, and the initial subtopic set is taken as a sample example, and the sub-topic set in each time slice is obtained by the single-pass clustering method based on the knn algorithm. Then the similarity values of subtopics in different time windows are calculated by using the mixed formula of cosine method and KL distance. Finally, the relationship of sub-topics between different time slices is obtained by the analytical steps of topic evolution proposed in this paper. Compared with the method using only KL distance or cosine formula, the proposed method is more effective.
【學位授予單位】:昆明理工大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.1
【參考文獻】
相關(guān)期刊論文 前10條
1 劉煒;劉菲京;王東;劉宗田;;一種基于事件本體的文本事件要素提取方法[J];中文信息學報;2016年04期
2 潘清清;周楓;余正濤;郭劍毅;線巖團;;基于條件隨機場的越南語命名實體識別方法[J];山東大學學報(理學版);2014年01期
3 張先飛;郭志剛;劉嵩;程磊;田雨暄;;基于觸發(fā)詞指導的自相似度聚類事件檢測[J];計算機科學;2010年03期
4 張闊;李涓子;吳剛;王克宏;;基于詞元再評估的新事件檢測模型[J];軟件學報;2008年04期
5 洪宇;張宇;范基禮;劉挺;李生;;基于子話題分治匹配的新事件檢測[J];計算機學報;2008年04期
6 孫吉貴;劉杰;趙連宇;;聚類算法研究[J];軟件學報;2008年01期
7 趙妍妍;秦兵;車萬翔;劉挺;;中文事件抽取技術(shù)研究[J];中文信息學報;2008年01期
8 邱立坤;龍志yN;鐘華;程葳;;層次化話題發(fā)現(xiàn)與跟蹤方法及系統(tǒng)實現(xiàn)[J];廣西師范大學學報(自然科學版);2007年02期
9 洪宇;張宇;劉挺;鄭偉;龔誠;李生;;基于層次聚類的自適應信息過濾學習算法[J];中文信息學報;2007年03期
10 宋丹;王衛(wèi)東;陳英;;基于改進向量空間模型的話題識別與跟蹤[J];計算機技術(shù)與發(fā)展;2006年09期
相關(guān)碩士學位論文 前1條
1 馮禮;基于事件框架的突發(fā)事件信息抽取[D];上海交通大學;2008年
,本文編號:2126415
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2126415.html