個性化信息推薦中若干關鍵問題與技術研究

發(fā)布時間：2018-06-19 19:18

本文選題：個性化信息推薦 + 評分預測　；參考：《國防科學技術大學》2014年博士論文

【摘要】：互聯網技術的飛速發(fā)展與信息網絡化趨勢的蔓延使得互聯網上信息的數量快速膨脹,人們面臨著信息過載帶來的信息獲取方面的困難。如何幫助互聯網用戶更加有效地獲取自己想要的信息,成為信息科學、計算機科學與網絡科學等交叉領域的研究熱點。得益于眾多研究人員的不懈努力,當前已經有了幾種可以比較高效地獲取感興趣的信息的方式,最主要的是信息檢索技術和信息過濾技術,前者以各種搜索引擎為典型代表,通過與用戶的交互獲取用戶對目標信息的描述,通過描述關鍵詞在網絡中進行查找;后者以信息推薦為主要方法,通過收集用戶的行為數據和其他屬性信息,分析用戶的潛在興趣,為用戶篩選可能感興趣的信息。搜索技術需要用戶提供盡可能明確的關鍵詞來描述自己的需求,并且有限的關鍵詞無法進一步區(qū)分具有不同習慣的用戶,得到的結果都是相同的;而推薦技術使用用戶的有關信息以及其過往行為所代表的興趣分析得到用戶的偏好與傾向,并不以用戶需求的自我描述為前提,所以用戶可以以較少的付出得到更精準的信息。因此,對于沒有明確需求的情況,推薦技術可以很好地滿足用戶的需求。推薦技術已經發(fā)展了近二十年,在很多領域已經取得了較為成功的應用,在理論研究方面,推薦技術得到了大量研究人員的關注,對經典推薦方法——比如協同過濾方法——的研究熱度不減,還有很多其他的新方法——比如基于二分網絡的方法——被不斷提出,進一步豐富了推薦技術的相關研究。隨著研究的不斷深入以及應用環(huán)境的持續(xù)變化,推薦技術面臨著不少問題與挑戰(zhàn),這其中最主要的就是數據稀疏性問題與大規(guī)模數據處理問題。數據稀疏性問題指的是基于協同過濾的推薦中用戶與項目數量規(guī)模較大,但是用戶對項目的評價數據相對較少,導致整個用戶-項目矩陣中的評分數據十分稀疏,給推薦方法的計算帶來準確性方面的影響。大規(guī)模數據處理問題是指隨著實際應用中推薦技術要處理的數據量的不斷增大,推薦算法的實時性壓力越來越大,這就要求設計更加高效的方法或者提出其他提高算法執(zhí)行效率的方法,提升推薦算法對數據的處理能力與處理速度。針對推薦技術面臨的以上主要挑戰(zhàn),本文將對下面幾個問題展開研究。第一,基于協同過濾方法的評分預測中數據稀疏性問題研究。評分預測是個性化信息推薦的一個主要研究內容,通過分析用戶以往評分來預測未評分的項目的評分值。數據稀疏性問題對協同過濾算法的影響主要體現在用戶相似度計算與評分預測生成兩個階段,數據稀疏導致用戶之間的公共數據變得更加有限,使得用戶之間相似結果的可信度下降;而受稀疏性的影響近鄰的評分完整性無法保證,在不完整參考評分集上得到的評分預測值也就不能保證較高的準確度。因此,提出了基于絕對相似度度量進行參考用戶(項目)選擇和利用跨維度填補方法提高參考評分集完整性的方法。實驗結果驗證了本文提出的算法在減少數據稀疏性影響并提高推薦準確性方面的作用。第二,基于二分網絡的top-n推薦中數據稀疏性問題研究。Top-n推薦是個性化信息推薦中的另一個基本問題,目的是向每個用戶提供一個包含N個項目的推薦列表。二分網絡的推薦方法是一種比較新穎的方法,這類方法能夠更好地適應比較稀疏的數據,并且可以獲得更高的推薦精度。以用戶評分為依據劃分用戶興趣時,只考慮用戶喜歡的項目部分使得數據利用率很低,而對用戶不喜歡的項目部分利用的不夠;用戶評分反映的興趣差別不僅應該體現在興趣的有無上,還應該進一步細化到興趣強度的差異上以及興趣資源轉移過程中。本文提出了一種新的二分網絡方法,通過分析用戶不喜歡的項目所透露出來的信息建立負興趣感知的用戶興趣模型,并且使用評分敏感的用戶興趣資源初始化方法與資源轉移方法來體現用戶興趣在程度上的不同。接下來的實驗表明,使用本文提出的新方法,推薦的效果取得了明顯的提高。第三,基于二分網絡的評分預測算法研究。針對節(jié)點度分布不均衡的數據,提出一種二分網絡上無偏溫差傳導和有偏溫度恒定的算法處理評分預測問題。由于不需要進行相似計算和選擇固定個數用戶(項目)作為近鄰,二分網絡的方法可以更好地緩解稀疏數據的影響。本文提出的算法基于熱傳導的過程,并采用用戶之間的溫差作為傳導與比較的內容,并設定節(jié)點獲得的溫差是從所有連接節(jié)點處傳導過來的溫差的均值,以此平衡所有節(jié)點的影響;此外,利用溫度恒定的過程計算項目節(jié)點的預測溫度,得到用戶對項目的評分預測值。由文中進行的實驗可知,在特定類型的數據集上,本文提出的算法可以取得比基于協同過濾的方法更好的效果,并且該算法比經典熱傳導方法具有更高的計算效率。第四,基于Mapreduce的評分預測與top-n推薦算法的大規(guī)模數據處理問題研究。個性化信息推薦在實際應用中要處理的數據量越來越大,因此對算法的執(zhí)行效率提出了更高的要求。有些研究針對算法計算過程進行精簡,比如矩陣降維等,但這類方法受限于算法本身,并不能保證精簡的效果一定能夠滿足要求,也不能無限地精簡來提升算法的擴展能力。本文研究了所提出的幾種推薦算法,對基于二分網絡的top-n推薦算法與評分預測算法進行并行化設計與實現,利用Mapreduce的并行計算功能將整個算法的計算量分配到多個計算節(jié)點上并發(fā)進行,以此提高算法的執(zhí)行效率,減少處理大規(guī)模數據時算法的時間消耗。這類方法的好處是,隨著數據量的不斷加大,在算法適用的前提下,只要提供足夠的計算節(jié)點分擔計算量,就可以不斷增加其擴展能力。
[Abstract]:With the rapid development of Internet technology and the spread of information network, the number of information on the Internet is expanding rapidly. People are faced with the difficulty of obtaining information from information overload. How to help Internet users get more information they want more effectively, become information science, computer science and network science and so on Thanks to the unremitting efforts of many researchers, there have been several ways to obtain information more efficiently, the most important is the information retrieval technology and information filtering technology. The former takes various search engines as the typical representative, and gets the user's information to the user through the interaction with the user. By describing the key words in the network, the latter uses the information recommendation as the main method to analyze the user's potential interest by collecting the user's behavior data and other attribute information, and screening the information that may be interested in the user. And the limited key words can not further distinguish the users with different habits, and the results are all the same; and the recommendation technology uses the information of the user and the interest analysis represented by the past behavior to get the user's preference and tendency, which is not based on the self description of the user's needs, so the user can pay less. The recommendation technology has been developed for nearly twenty years and has achieved more successful applications in many fields. In the field of theoretical research, the recommendation technology has been paid attention by a large number of researchers and the classic recommendation. There are many other new methods, such as collaborative filtering, and many other new methods - such as the two - Network - based approach - have been put forward to further enrich the related research of recommendation technology. With the deepening of the research and the continued changes in the application environment, the recommendation technology is facing many problems and challenges. The most important of these is data sparsity and large-scale data processing. Data sparsity refers to the large number of users and projects in the recommendation based on collaborative filtering, but the user's evaluation data on the project is relatively small, which leads to the sparse data in the entire user item matrix. The calculation of the method brings about the effect of accuracy. The problem of large-scale data processing is that the real-time pressure of the recommended algorithm is increasing with the increasing of the amount of data to be processed in the practical application. This requires the design of more efficient methods or other methods to improve the efficiency of the algorithm to improve the recommendation. The processing ability and speed of data processing. In view of the main challenges facing recommendation technology, this paper will study the following problems. First, research on data sparsity in scoring prediction based on collaborative filtering method. The impact of data sparsity on the collaborative filtering algorithm is mainly reflected in the two stages of user similarity calculation and grade prediction generation. Data sparsity leads to more limited public data between users, which reduces the credibility of similar results among users; and sparsity is sparse. The score integrity of the nearest neighbor cannot be guaranteed, and the prediction value obtained on the incomplete reference score set can not guarantee a higher accuracy. Therefore, a method based on the absolute similarity measure to select the reference user (project) and to use the cross dimension filling method to improve the integrity of the reference score set is proposed. The proposed algorithm plays a role in reducing the impact of data sparsity and improving the accuracy of recommendation. Second, data sparsity in the top-N recommendation based on two points network research,.Top-n recommendation is another basic problem in personalized information recommendation. The purpose is to provide each user with a recommendation list containing N items. Two The recommendation method of the sub network is a novel method, which can better adapt to the relatively sparse data and obtain higher recommendation accuracy. When user interest is divided on user score, only the item part of the user's favorite item makes the data use rate very low and the user dislikes the part of the project. It is not enough; the interest difference reflected by the user's score should not only be reflected in the interest, but also should be further refined to the difference of interest intensity and the transfer of interest resources. A new two point network method is proposed in this paper to establish an interest perception by analyzing the information revealed by the items that the user dislikes. The user interest model is used and the user interest resource initialization method and resource transfer method are used to reflect the different degree of user interest. The next experiment shows that the proposed method has been greatly improved by using the new method proposed in this paper. Third, the score prediction algorithm based on the two point network is studied. For unbalanced data of node degree distribution, an algorithm to deal with score prediction with unbiased temperature difference conduction and constant temperature constant on two division networks is proposed. Because no similar calculation and selection of fixed number users (projects) are not needed as close neighbors, the method of two sub network can better alleviate the influence of sparse data. This paper proposes a method proposed in this paper. The algorithm is based on the process of heat conduction, and uses the temperature difference between users as the content of the conduction and comparison, and the temperature difference obtained by the node is the mean of the temperature difference conducted from all connection nodes to balance the influence of all nodes. In addition, the temperature of the node is calculated by the temperature constant of the temperature, and the user is obtained. The experiment in this article shows that the algorithm proposed in this paper can achieve better results than the collaborative filtering method on a specific type of data set, and the algorithm has a higher computational efficiency than the classic heat conduction method. Fourth, the Mapreduce based score prediction and the top-N recommendation algorithm are large Research on model data processing. Personalized information recommendation in the actual application to deal with more and more data, so the efficiency of the algorithm put forward higher requirements. Some of the algorithms to simplify the algorithm calculation process, such as matrix reduction, but this kind of method is limited to the algorithm itself, and can not ensure that the simplified effect is one. In this paper, we studied several proposed algorithms, designed and implemented the top-N recommendation algorithm and the scoring prediction algorithm based on the two sub network, and used the parallel computing power of Mapreduce to allocate the calculation amount of the whole algorithm to a number of meters. In order to improve the efficiency of the algorithm and reduce the time consuming of processing large scale data, the advantage of this kind of method is that as the amount of data is increasing, the expansion ability of the algorithm can be increased by providing enough computing nodes to share the computation.
【學位授予單位】：國防科學技術大學
【學位級別】：博士
【學位授予年份】：2014
【分類號】：TP391.3

【參考文獻】

相關期刊論文前6條

1 郁雪;李敏強;;基于PCA-SOM的混合協同過濾模型[J];系統(tǒng)工程理論與實踐;2010年10期

2 張光衛(wèi);李德毅;李鵬;康建初;陳桂生;;基于云模型的協同過濾推薦算法[J];軟件學報;2007年10期

3 邢春曉;高鳳榮;戰(zhàn)思南;周立柱;;適應用戶興趣變化的協同過濾推薦算法[J];計算機研究與發(fā)展;2007年02期

4 周軍鋒,湯顯,郭景峰;一種優(yōu)化的協同過濾推薦算法[J];計算機研究與發(fā)展;2004年10期

5 鄧愛林,朱揚勇,施伯樂;基于項目評分預測的協同過濾推薦算法[J];軟件學報;2003年09期

6 潘紅艷;林鴻飛;趙晶;;基于矩陣劃分和興趣方差的協同過濾算法[J];情報學報;2006年01期

，

本文編號：2040987

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2040987.html

上一篇：社區(qū)型問答中問句檢索關鍵技術研究
下一篇：改進后綴樹的中文檢索結果聚類系統(tǒng)

論文發(fā)表

·知網|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

個性化信息推薦中若干關鍵問題與技術研究