天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 碩博論文 > 信息類博士論文 >

基于時間特性的微博檢索和微博過濾研究

發(fā)布時間:2019-01-10 19:16
【摘要】:隨著社交媒體和移動互聯(lián)網(wǎng)的迅速發(fā)展,以微博為代表的短文本信息流的處理技術(shù)變得越來越重要。面對海量微博和眾多用戶多樣性的信息需求,微博檢索和微博過濾已經(jīng)成為微博服務(wù)不可或缺的重要組成部分。近年來,微博的時間特性引起了研究者的注意。研究表明,微博的時間特性為微博檢索性能的提高提出了一個新的解決途徑,基于時間的檢索技術(shù)逐漸成為微博檢索的研究熱點。本文關(guān)注利用時間特性來提升微博檢索和微博過濾的性能,圍繞查詢建模、文檔建模、查詢與文檔相關(guān)度計算以及過濾模型展開研究,力圖利用微博的時間特性緩解短文本給基于內(nèi)容的微博檢索帶來的困境,并利用歷史微博的排序信息及時間特性,提高微博過濾的性能。本文研究的具體內(nèi)容如下。(1)針對微博查詢短的問題,提出了基于詞語時間分布的查詢模型。本文首先分析了擴展詞與查詢詞在時間分布上的特點,在提出詞語時間分布的定義和估計方法的基礎(chǔ)上,給出了查詢詞與擴展詞的時間分布相似性的度量,以此作為它們的相關(guān)度,完成擴展詞的選擇和查詢模型的重估。本文方法利用時間信息而不是內(nèi)容來擴展查詢,避免了基于內(nèi)容的查詢擴展方法因微博內(nèi)容短而無法準(zhǔn)確估計擴展詞的不足。(2)針對微博內(nèi)容短的問題,提出了基于時間的微博文檔模型。該模型嘗試?yán)迷~在爆發(fā)期內(nèi)微博上的分布以及詞在時間近鄰微博上的分布來估計擴展詞的權(quán)重,并提出了基于機器學(xué)習(xí)的擴展詞選擇方法,據(jù)此構(gòu)建文檔擴展模型,并利用該文檔擴展模型估計更準(zhǔn)確的文檔模型。為優(yōu)化基于時間的文檔模型的時間復(fù)雜度,本文提出了兩個優(yōu)化的時間文檔模型,減輕了文檔擴展帶來的系統(tǒng)開銷。(3)針對短文本給微博與查詢的相關(guān)度計算帶來的影響,將時間特性引入到微博檢索中。使得微博檢索在考慮內(nèi)容相關(guān)度之外,還考慮了微博與查詢在時間上存在的多種相關(guān)度,以使排序結(jié)果更符合相關(guān)微博的時間特性。具體而言,在經(jīng)典語言模型檢索框架下,給出了三種利用時間關(guān)系優(yōu)化檢索結(jié)果的方法;在排序?qū)W習(xí)框架下,提出了一種基于時間敏感的排序?qū)W習(xí)算法,設(shè)計了時間敏感損失函數(shù),提高了微博檢索的性能。(4)針對在微博實時過濾中傳統(tǒng)分類模型過濾效果不佳的問題,提出了基于歷史微博信息的微博實時過濾模型,有效地融合了檢索模型和分類模型。具體而言,本文提出了基于歷史微博的微博實時過濾模型的框架,將歷史微博的排序信息以及時間近鄰信息應(yīng)用在檢索模型中構(gòu)建先驗知識,并利用先驗知識動態(tài)調(diào)整分類模型的分類面。進(jìn)一步,以語言模型和邏輯回歸模型為例,實現(xiàn)了該框架的一個實例,并給出了具體參數(shù)的估計方法。
[Abstract]:With the rapid development of social media and mobile Internet, the processing technology of short text stream represented by Weibo has become more and more important. In the face of the huge amount of Weibo and the diverse information demand of many users, Weibo retrieval and Weibo filtering have become an indispensable and important part of Weibo service. In recent years, Weibo's time characteristics have attracted the attention of researchers. The research shows that Weibo's time characteristic provides a new way to improve the performance of Weibo's retrieval, and the time-based retrieval technology has gradually become a hot research topic of Weibo's retrieval. This paper focuses on the use of time characteristics to improve the performance of Weibo retrieval and Weibo filtering, focusing on query modeling, document modeling, query and document correlation calculation, and filtering model. This paper tries to make use of Weibo's time characteristics to alleviate the predicament brought by the short text to the content-based Weibo retrieval, and to improve the filtering performance by using the sort information and time characteristic of historical Weibo. The main contents of this paper are as follows: (1) aiming at the short query of Weibo, a query model based on word time distribution is proposed. In this paper, the characteristics of temporal distribution of extended words and query words are analyzed. On the basis of the definition and estimation method of temporal distribution of words, a measure of the similarity of temporal distribution between query words and extended words is given. As their correlation degree, the selection of extended words and the revaluation of query model are completed. In this paper, time information rather than content is used to expand the query, which avoids the shortage of Weibo's short content. (2) aiming at the problem of the short content of Weibo, the method can not estimate the shortage of extension words accurately because of the short content of Weibo. A time-based Weibo document model is proposed. The model attempts to estimate the weight of extended words by using the distribution of words on Weibo during the outbreak period and on the temporal neighbor Weibo, and puts forward an extended word selection method based on machine learning, based on which a document extension model is constructed. The extended document model is used to estimate the more accurate document model. In order to optimize the time complexity of the time-based document model, two optimized time-document models are proposed in this paper, which reduce the system overhead brought by the document expansion. (3) aiming at the impact of the short text book on the calculation of the correlation between Weibo and the query, This paper introduces time characteristic into Weibo search. In order to make Weibo search in consideration of the relevance of content, but also considering the time correlation between Weibo and query, in order to make the ranking results more in line with the time characteristics of the relevant Weibo. Specifically, under the framework of classical language model retrieval, three methods of optimizing retrieval results using time relation are presented. In the framework of ranking learning, a time-sensitive learning algorithm is proposed, and a time-sensitive loss function is designed. The performance of Weibo retrieval is improved. (4) aiming at the problem of poor filtering effect of traditional classification model in Weibo real-time filtering, a real-time filtering model based on historical Weibo information is proposed, which effectively integrates the retrieval model and classification model. Specifically, this paper puts forward a framework of historical Weibo's real-time filtering model, which applies the ranking information of historical Weibo and the time nearest neighbor information to the retrieval model to construct the prior knowledge. A priori knowledge is used to dynamically adjust the classification surface of the classification model. Furthermore, taking the language model and the logical regression model as examples, an example of the framework is implemented, and the estimation method of the specific parameters is given.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2016
【分類號】:TP391.3;TP393.092

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 李斌;商業(yè)用專家系統(tǒng)研制和評定中的時間特性[J];管理科學(xué)文摘;1999年08期

2 董光宇,卿斯?jié)h,劉克龍;帶時間特性的角色授權(quán)約束[J];軟件學(xué)報;2002年08期

3 陳海敏;黃云峰;黃振滔;;CPS環(huán)境下時間特性的研究[J];信息與電腦(理論版);2011年03期

4 趙建功,劉宏月,崔霞;權(quán)限委托的時間描述[J];河南科學(xué);2005年05期

5 胡健生,黃金志,廖,

本文編號:2406695


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/2406695.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶043ed***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com