基于事件或話題下文檔的實(shí)體重要性排序
發(fā)布時(shí)間:2018-05-17 23:42
本文選題:事件檢測 + 實(shí)體排序 ; 參考:《華東師范大學(xué)》2017年碩士論文
【摘要】:在互聯(lián)網(wǎng)時(shí)代背景下,新型網(wǎng)絡(luò)媒體的蓬勃發(fā)展使得人們可以方便有效的共享海量信息。目前,新型網(wǎng)絡(luò)媒體積累了大量文本數(shù)據(jù),這些數(shù)據(jù)中記錄著社會發(fā)展過程中重要的輿情事件和熱門討論話題。通過監(jiān)測網(wǎng)絡(luò)輿情,政府、群眾以及相關(guān)部門可以了解我國社會現(xiàn)狀并及時(shí)發(fā)現(xiàn)社會存在的問題。同時(shí),輿情監(jiān)測還可以幫助相關(guān)政府部門科學(xué)管理并做出科學(xué)決策。因此,如何從海量網(wǎng)絡(luò)文本數(shù)據(jù)中檢測出事件或話題成為一個(gè)重要并有現(xiàn)實(shí)意義的研究課題。而對于事件或話題下的文本,重要實(shí)體可以抽象概括文本中所描述的主體。本文基于海量網(wǎng)絡(luò)新聞數(shù)據(jù),檢測熱門事件和熱門話題并抽取文本關(guān)鍵實(shí)體概括事件主要元素。本文主要工作包括以下幾個(gè)方面:·本文通過度量學(xué)習(xí)方法重新定義新聞文本相似度計(jì)算方式;針對海量、無序、冗余的網(wǎng)絡(luò)新聞文本數(shù)據(jù),提出基于主題的事件檢測方法ToED。該方法應(yīng)用主題模型學(xué)習(xí)文檔主題分布,對于任意主題下的文檔集合,提出基于密度的事件聚類方法ESACN來檢測熱門事件!め槍ξ臋n重要實(shí)體選擇問題,本文提出了一種基于前向分步算法的重要實(shí)體排序模型LA-FSAM。該算法不僅考慮實(shí)體在文檔中的重要特征,還通過維基百科和谷歌Word2Vec引入實(shí)體外部特征對實(shí)體進(jìn)行排序。該模型運(yùn)用改進(jìn)的AUC準(zhǔn)則構(gòu)造損失函數(shù),通過標(biāo)注訓(xùn)練數(shù)據(jù)并利用隨機(jī)梯度下降法學(xué)習(xí)模型參數(shù)。通過LA-FSAM與基線方法的實(shí)驗(yàn)對比證明了我們所提方法的有效性!け疚脑O(shè)計(jì)并實(shí)現(xiàn)了社會熱點(diǎn)輿情分析展示系統(tǒng)(KSPOS),該系統(tǒng)提供了基于事件或話題的檢索功能。為了向用戶展示全面廣泛的搜索結(jié)果,系統(tǒng)挑選重要實(shí)體并挖掘?qū)嶓w語義關(guān)系,構(gòu)建輿情事件語義網(wǎng)絡(luò),同時(shí),系統(tǒng)抽取文檔集合關(guān)鍵詞抽象概括事件或話題描述內(nèi)容,生成事件時(shí)間線充分展示事件發(fā)展過程。
[Abstract]:Under the background of the Internet era, the flourishing development of new network media makes it convenient and effective to share massive information. At present, the new network media has accumulated a large amount of text data, which records the important public opinion events and hot discussion topics in the process of social development. Through monitoring the network public opinion, the government, the masses and the relevant departments can understand the social situation of our country and discover the social problems in time. At the same time, public opinion monitoring can also help relevant government departments to manage and make scientific decisions. Therefore, how to detect events or topics from massive network text data has become an important and meaningful research topic. For the text under the event or topic, the important entity can abstract the main body described in the text. Based on mass network news data, this paper detects hot events and hot topics and extracts the main elements of key entities of text to summarize events. The main work of this paper includes the following aspects: this paper redefines the similarity calculation method of news text by metric learning method, and proposes a topic-based event detection method ToED for the massive, unordered and redundant network news text data. In this method, topic model is used to study document topic distribution. For the document set under any topic, a density-based event clustering method (ESACN) is proposed to detect hot events. In this paper, an important entity sorting model, LA-FSAM, is proposed based on forward step algorithm. The algorithm not only considers the important features of entities in documents, but also introduces entity external features to sort entities through Wikipedia and Google Word2Vec. The model uses the improved AUC criterion to construct the loss function and uses the stochastic gradient descent method to learn the parameters of the model by annotating the training data. The effectiveness of the proposed method is proved by the comparison of LA-FSAM and baseline method. In this paper, we design and implement the analysis and display system of social hot public opinion, which provides the retrieval function based on event or topic. In order to show users comprehensive and extensive search results, the system selects important entities and excavates entity semantic relations, constructs semantic network of public opinion events, and extracts document sets of keywords to abstract the event or topic description content. Generate event timeline to fully show the event development process.
【學(xué)位授予單位】:華東師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 ;Online detection of bursty events and their evolution in news streams[J];Journal of Zhejiang University-Science C(Computer & Electronics);2010年05期
,本文編號:1903422
本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/1903422.html
最近更新
教材專著