天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 文藝論文 > 語言學論文 >

英、漢跨語言話題檢測與跟蹤技術研究

發(fā)布時間:2018-04-22 02:20

  本文選題:跨語言話題檢測 + 跨語言話題跟蹤 ; 參考:《中央民族大學》2013年博士論文


【摘要】:當今世界已經(jīng)逐步邁入信息化和數(shù)字化時代。根據(jù)CNNIC第30次調查報告①顯示,截止2012年6月底我國網(wǎng)絡用戶數(shù)量已達到5.38億,網(wǎng)站數(shù)達到250萬,網(wǎng)絡新聞的用戶規(guī)模達到3.92億,網(wǎng)民對網(wǎng)絡新聞的使用率高達73.0%。由于網(wǎng)絡新聞發(fā)布簡便快捷等特點,互聯(lián)網(wǎng)已成為新聞傳播的“第四媒體”。普通民眾希望從海量網(wǎng)絡資源中獲取自己感興趣的新聞話題,同時也希望了解其他國家的新聞話題。因此,對網(wǎng)絡新聞話題進行跨語言的檢測與跟蹤,己經(jīng)逐漸成為當今國內外學者研究的興趣之所在。 目前的跨語言話題檢測與跟蹤研究中存在著多個具有挑戰(zhàn)性的難題。首先,網(wǎng)絡新聞報道文本描述手段匱乏,涉及多語言環(huán)境的新聞報道話題描述難度更大;其次,跨語言話題檢測與跟蹤需要實現(xiàn)多語言環(huán)境下的新聞報道處理,怎樣跨越語言鴻溝,是首先需要攻克的技術難題之一。再次,如何更好地發(fā)展現(xiàn)有技術,并將其應用到話題檢測與跟蹤研究中,這一問題值得進一步探討。針對上述問題,希望本文對英、漢跨語言話題檢測與跟蹤技術的研究能為語言處理相關技術的發(fā)展做出微薄貢獻,并能為我國多民族語言文本處理提供一定的借鑒。 本文的研究主要包括跨語言新聞報道文本分析、跨語言話題模型構建方法、語料庫構建方法、跨語言話題檢測和跨語言話題跟蹤等五個部分。 首先,筆者從新聞報道的本質因素研究入手,從新聞的認知理解和本身特性這兩個角度來分析新聞報道的核心要素。通過分析,筆者認為詞匯處理是對文本進行描述的有效途徑之一;新聞要素也可作為對報道文本加以區(qū)分的手段。 其次,本文從“報道-話題-事件”的相互關系出發(fā),闡述了CLTDT研究中新聞報道模型構建的基本思路;分析了當前常用文本表示模型的特點與不足;認為早期文本表示模型缺乏對“報道-話題-事件”之間關系的深入描寫和刻畫。為了揭示新聞文本中潛藏的話題,本文選取了LSI模型和LDA模型進行文本建模實驗,并通過實驗對比和分析了兩種模型對新聞報道文本的描述能力。 在以上理論分析和實驗驗證的基礎上,我們提出在英、漢可比語料庫的基礎上進行跨語言話題檢測與跟蹤研究的思路。通過語料采集、元數(shù)據(jù)處理、新聞事件分類、語料分詞處理和標注、命名實體標注等流程和步驟,本文嘗試建立“英、漢跨語言新聞報道可比語料庫”。我們將以語料庫中所包含的英、漢新聞報道文本語料為基礎,對跨語言環(huán)境中的新聞話題進行檢測與跟蹤研究。 在綜合當前跨語言處理技術和LDA模型研究的基礎上,結合本文研究目的,筆者提出跨語言聯(lián)合LDA (CLU-LDA)模型。這一模型既可以對英、漢新聞報道進行事件回顧檢測,又可以對新事件進行發(fā)現(xiàn)。在跨語言話題跟蹤中,通過使用先驗的話題模型對新聞報道樣本話題進行推斷,借助已有先驗知識和可比語料庫,我們不僅可以在時間序列上描繪出新聞事件的話題發(fā)展狀況,還可以對特定新聞報道進行有效跟蹤。
[Abstract]:Today, the world has gradually entered the era of information and digital. According to the thirtieth survey report of CNNIC, the number of Internet users in China has reached 538 million by the end of June 2012, the number of Web sites has reached 2 million 500 thousand, the user scale of network news has reached 392 million, and the use rate of Internet news is as high as 73.0%. because of the simple distribution of network news. Fast and so on, the Internet has become the "fourth media" of news communication. Ordinary people want to get news topics of interest from the mass network resources, and also want to know the news topics of other countries. Therefore, the cross language detection and tracking of the topic of network news has gradually become a domestic and foreign scholar. The interest of the study is.
There are many challenging problems in the current cross language topic detection and tracking research. First, the text description means of the network news report is scarce and the news report topic involving multi language environment is more difficult to describe. Secondly, the cross language topic detection and tracking needs to deal with the news reports under the multi language environment and how to cross the language. The more language gap is one of the technical problems that need to be tackled first. Again, how to develop the existing technology better and apply it to the research of topic detection and tracking is worth further discussion. In view of the above problems, this paper hopes that the research of the English, Chinese and cross language topic detection and tracking technology can be used for language processing related technologies. It will make a modest contribution to the development and provide some references for the processing of multilingual texts in China.
The research of this paper includes five parts: cross language news report text analysis, cross language topic model building method, corpus construction method, cross language topic detection and cross language topic tracking.
First of all, the author starts with the study of the essential factors of news reports and analyzes the core elements of news reports from the two perspectives of the cognitive understanding of the news and their own characteristics. Through the analysis, the author thinks that lexical processing is one of the effective ways to describe the text, and the news elements can also be used as a means to distinguish the text from the news.
Secondly, starting from the relationship of "report topic event", this paper expounds the basic idea of the construction of news report model in CLTDT research, analyzes the characteristics and shortcomings of the current common text representation model, and thinks that the early text representation model lacks the deep description and characterization of the relationship between "report topic event". To reveal the latent topic in the news text, this paper selects the LSI model and the LDA model to carry out the text modeling experiment, and compares and analyzes the ability of the two models to describe the news text.
On the basis of the above theoretical analysis and experimental verification, we put forward the ideas of cross language topic detection and tracking on the basis of the English and Chinese corpus, through the process and steps of language collection, metadata processing, news event classification, word segmentation processing and tagging, and the labeling of the name of the life body. This paper tries to establish "English and Chinese". We will examine and track news topics in a cross language environment, based on the corpus of English and Chinese news reports that are included in the corpus.
On the basis of the study of current cross language processing and LDA model and the purpose of this study, I propose a cross language joint LDA (CLU-LDA) model. This model can not only review the events of English and Chinese news reports, but also discover new events. In cross language topic tracking, we use a priori topic model. Based on the prior knowledge and comparable corpus, we can not only describe the development of news events on the time series, but also track the specific news reports effectively.

【學位授予單位】:中央民族大學
【學位級別】:博士
【學位授予年份】:2013
【分類號】:H15;H315;H087

【參考文獻】

相關期刊論文 前10條

1 房璐;葛運東;洪宇;姚建民;;可比較語料庫構建及在跨語言信息檢索中的應用[J];廣西師范大學學報(自然科學版);2010年03期

2 趙華;趙鐵軍;張姝;王浩暢;;基于內容分析的話題檢測研究[J];哈爾濱工業(yè)大學學報;2006年10期

3 劉遠超;宋明凱;劉銘;張想;;用于細顆粒度挖掘的產(chǎn)品評論語料庫構建技術[J];哈爾濱工業(yè)大學學報;2012年03期

4 賈自艷 ,何清 ,張海俊 ,李嘉佑 ,史忠植;一種基于動態(tài)進化模型的事件探測和追蹤算法[J];計算機研究與發(fā)展;2004年07期

5 于滿泉;駱衛(wèi)華;許洪波;白碩;;話題識別與跟蹤中的層次化話題識別技術研究[J];計算機研究與發(fā)展;2006年03期

6 張sソ,

本文編號:1785169


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/wenyilunwen/yuyanxuelw/1785169.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶62ebc***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
青青操视频在线播放免费| 精品视频一区二区三区不卡| 自拍偷拍一区二区三区| 国产高清三级视频在线观看| 亚洲国产精品国自产拍社区| 大香伊蕉欧美一区二区三区| 精品国自产拍天天青青草原| 亚洲欧美日韩中文字幕二欧美| 午夜午夜精品一区二区| 麻豆看片麻豆免费视频| 国产欧美亚洲精品自拍| 欧美av人人妻av人人爽蜜桃| 国产精品自拍杆香蕉视频| 日本深夜福利视频在线| 二区久久久国产av色| 高清亚洲精品中文字幕乱码| 国产免费一区二区不卡| 日本不卡视频在线观看| 高潮少妇高潮久久精品99| 国产精品二区三区免费播放心 | 亚洲高清欧美中文字幕| 色播五月激情五月婷婷| 亚洲一区二区精品福利| 国产又长又粗又爽免费视频| 中文字幕亚洲人妻在线视频| 国产不卡视频一区在线| 人人妻在人人看人人澡| 免费观看在线午夜视频| 国产成人午夜av一区二区| 国产超薄黑色肉色丝袜| 欧美黑人在线精品极品| 妻子的新妈妈中文字幕| 国产一区二区三区成人精品| 国产日本欧美特黄在线观看| 久久亚洲精品成人国产| 欧美野外在线刺激在线观看| 日韩欧美综合在线播放| 99少妇偷拍视频在线| 欧美一级特黄大片做受大屁股| 91精品国产综合久久福利| 福利视频一区二区三区|