用戶視頻檢索意圖強度識別算法研究
發(fā)布時間:2018-05-15 01:22
本文選題:短文本分類 + 信息檢索; 參考:《浙江大學(xué)》2015年碩士論文
【摘要】:隨著數(shù)據(jù)爆炸性增長,用戶在信息面前面臨越來越多的選擇性困難。搜索引擎是人們獲取信息的一個重要手段,并且隨著智能設(shè)備的普及,移動端的搜索占有越來越重要的地位。移動設(shè)備有限的展示空間決定了要為用戶提供盡可能精準、有效的信息,因此需要更加準確識別用戶的檢索意圖,從而為用戶提供更加精準的服務(wù),增強用戶體驗。然而在互聯(lián)網(wǎng)發(fā)達的時代,人們的信息需求通常以短串的形式表達,一般由3-4個詞組成,信息描述相對模糊、歧義性較強,造成了對用戶實際需求識別不夠準確。本文利用搜索引擎中豐富的數(shù)據(jù)資源以及用戶的交互結(jié)果,分析、解決用戶視頻檢索意圖強度識別的問題。該技術(shù)應(yīng)用于通用搜索和視頻檢索系統(tǒng)中,通過分析用戶的檢索串識別出視頻意圖強弱,從而將更加精準的結(jié)果以友好的方式展示給用戶。本文首先對用戶輸入的檢索串利用搜索引擎展示結(jié)果以及用戶點擊結(jié)果中的標題進行擴展,同時根據(jù)本課題類別間文本重合度較高的特點提出了一種新的基于熵和詞頻的文本特征選擇方法。其次,詳細設(shè)計并抽取了基于文本、視頻域名統(tǒng)計、搜索引擎返回結(jié)果類型、深度語言模型的語義信息以及session的統(tǒng)計等5組不同的特征及其組合方法進行實驗,驗證了本課題的有效性。受深度學(xué)習(xí)語言模型word2vec的啟發(fā),提出了站點域名的詞向量表示方法Host2vec,將深度語言模型引入檢索意圖強度識別的問題中來。最后,針對用戶檢索視頻檢索意圖強度隨時序變化的關(guān)系進行了分析、挖掘。
[Abstract]:With the explosive growth of data, users face more and more difficulties of selectivity in front of information. Search engine is an important means for people to obtain information, and with the popularity of intelligent devices, mobile search plays an increasingly important role. The limited display space of mobile devices determines the need to provide users with as accurate and effective information as possible, so it is necessary to identify users' retrieval intentions more accurately, so as to provide users with more accurate services and enhance user experience. However, in the era of Internet development, people's information needs are usually expressed in short strings, usually composed of 3-4 words. The information description is relatively vague and ambiguous, which results in inaccurate identification of users' actual needs. Based on the rich data resources in search engines and the interactive results of users, this paper analyzes and solves the problem of identifying the intension of users' video retrieval. This technique is applied to the general search and video retrieval system. By analyzing the user's retrieval string, the video intention is identified, and the more accurate results are displayed to the user in a friendly manner. This paper first extends the search string input by the user using search engines to display the results as well as the titles in the user click results. At the same time, a new text feature selection method based on entropy and word frequency is proposed. Secondly, we design and extract five groups of different features and their combination methods based on text, video domain name statistics, search engine return result type, semantic information of depth language model and session statistics. The validity of this subject is verified. Inspired by the deep learning language model (word2vec), this paper proposes a word vector representation method of site domain name, Host2vec. the depth language model is introduced into the problem of identifying the intension of retrieval intention. Finally, the relationship between the order change of the intention intensity of the user retrieval video retrieval is analyzed, and the mining is carried out.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:TP391.41
【參考文獻】
相關(guān)期刊論文 前1條
1 張磊;李亞楠;王斌;李鵬;蔣在帆;;網(wǎng)頁搜索引擎查詢?nèi)罩镜腟ession劃分研究[J];中文信息學(xué)報;2009年02期
,本文編號:1890350
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1890350.html
最近更新
教材專著