個性化信息推薦中若干關(guān)鍵問題與技術(shù)研究
發(fā)布時(shí)間:2018-06-19 19:18
本文選題:個性化信息推薦 + 評分預(yù)測。 參考:《國防科學(xué)技術(shù)大學(xué)》2014年博士論文
【摘要】:互聯(lián)網(wǎng)技術(shù)的飛速發(fā)展與信息網(wǎng)絡(luò)化趨勢的蔓延使得互聯(lián)網(wǎng)上信息的數(shù)量快速膨脹,人們面臨著信息過載帶來的信息獲取方面的困難。如何幫助互聯(lián)網(wǎng)用戶更加有效地獲取自己想要的信息,成為信息科學(xué)、計(jì)算機(jī)科學(xué)與網(wǎng)絡(luò)科學(xué)等交叉領(lǐng)域的研究熱點(diǎn)。得益于眾多研究人員的不懈努力,當(dāng)前已經(jīng)有了幾種可以比較高效地獲取感興趣的信息的方式,最主要的是信息檢索技術(shù)和信息過濾技術(shù),前者以各種搜索引擎為典型代表,通過與用戶的交互獲取用戶對目標(biāo)信息的描述,通過描述關(guān)鍵詞在網(wǎng)絡(luò)中進(jìn)行查找;后者以信息推薦為主要方法,通過收集用戶的行為數(shù)據(jù)和其他屬性信息,分析用戶的潛在興趣,為用戶篩選可能感興趣的信息。搜索技術(shù)需要用戶提供盡可能明確的關(guān)鍵詞來描述自己的需求,并且有限的關(guān)鍵詞無法進(jìn)一步區(qū)分具有不同習(xí)慣的用戶,得到的結(jié)果都是相同的;而推薦技術(shù)使用用戶的有關(guān)信息以及其過往行為所代表的興趣分析得到用戶的偏好與傾向,并不以用戶需求的自我描述為前提,所以用戶可以以較少的付出得到更精準(zhǔn)的信息。因此,對于沒有明確需求的情況,推薦技術(shù)可以很好地滿足用戶的需求。推薦技術(shù)已經(jīng)發(fā)展了近二十年,在很多領(lǐng)域已經(jīng)取得了較為成功的應(yīng)用,在理論研究方面,推薦技術(shù)得到了大量研究人員的關(guān)注,對經(jīng)典推薦方法——比如協(xié)同過濾方法——的研究熱度不減,還有很多其他的新方法——比如基于二分網(wǎng)絡(luò)的方法——被不斷提出,進(jìn)一步豐富了推薦技術(shù)的相關(guān)研究。隨著研究的不斷深入以及應(yīng)用環(huán)境的持續(xù)變化,推薦技術(shù)面臨著不少問題與挑戰(zhàn),這其中最主要的就是數(shù)據(jù)稀疏性問題與大規(guī)模數(shù)據(jù)處理問題。數(shù)據(jù)稀疏性問題指的是基于協(xié)同過濾的推薦中用戶與項(xiàng)目數(shù)量規(guī)模較大,但是用戶對項(xiàng)目的評價(jià)數(shù)據(jù)相對較少,導(dǎo)致整個用戶-項(xiàng)目矩陣中的評分?jǐn)?shù)據(jù)十分稀疏,給推薦方法的計(jì)算帶來準(zhǔn)確性方面的影響。大規(guī)模數(shù)據(jù)處理問題是指隨著實(shí)際應(yīng)用中推薦技術(shù)要處理的數(shù)據(jù)量的不斷增大,推薦算法的實(shí)時(shí)性壓力越來越大,這就要求設(shè)計(jì)更加高效的方法或者提出其他提高算法執(zhí)行效率的方法,提升推薦算法對數(shù)據(jù)的處理能力與處理速度。針對推薦技術(shù)面臨的以上主要挑戰(zhàn),本文將對下面幾個問題展開研究。第一,基于協(xié)同過濾方法的評分預(yù)測中數(shù)據(jù)稀疏性問題研究。評分預(yù)測是個性化信息推薦的一個主要研究內(nèi)容,通過分析用戶以往評分來預(yù)測未評分的項(xiàng)目的評分值。數(shù)據(jù)稀疏性問題對協(xié)同過濾算法的影響主要體現(xiàn)在用戶相似度計(jì)算與評分預(yù)測生成兩個階段,數(shù)據(jù)稀疏導(dǎo)致用戶之間的公共數(shù)據(jù)變得更加有限,使得用戶之間相似結(jié)果的可信度下降;而受稀疏性的影響近鄰的評分完整性無法保證,在不完整參考評分集上得到的評分預(yù)測值也就不能保證較高的準(zhǔn)確度。因此,提出了基于絕對相似度度量進(jìn)行參考用戶(項(xiàng)目)選擇和利用跨維度填補(bǔ)方法提高參考評分集完整性的方法。實(shí)驗(yàn)結(jié)果驗(yàn)證了本文提出的算法在減少數(shù)據(jù)稀疏性影響并提高推薦準(zhǔn)確性方面的作用。第二,基于二分網(wǎng)絡(luò)的top-n推薦中數(shù)據(jù)稀疏性問題研究。Top-n推薦是個性化信息推薦中的另一個基本問題,目的是向每個用戶提供一個包含N個項(xiàng)目的推薦列表。二分網(wǎng)絡(luò)的推薦方法是一種比較新穎的方法,這類方法能夠更好地適應(yīng)比較稀疏的數(shù)據(jù),并且可以獲得更高的推薦精度。以用戶評分為依據(jù)劃分用戶興趣時(shí),只考慮用戶喜歡的項(xiàng)目部分使得數(shù)據(jù)利用率很低,而對用戶不喜歡的項(xiàng)目部分利用的不夠;用戶評分反映的興趣差別不僅應(yīng)該體現(xiàn)在興趣的有無上,還應(yīng)該進(jìn)一步細(xì)化到興趣強(qiáng)度的差異上以及興趣資源轉(zhuǎn)移過程中。本文提出了一種新的二分網(wǎng)絡(luò)方法,通過分析用戶不喜歡的項(xiàng)目所透露出來的信息建立負(fù)興趣感知的用戶興趣模型,并且使用評分敏感的用戶興趣資源初始化方法與資源轉(zhuǎn)移方法來體現(xiàn)用戶興趣在程度上的不同。接下來的實(shí)驗(yàn)表明,使用本文提出的新方法,推薦的效果取得了明顯的提高。第三,基于二分網(wǎng)絡(luò)的評分預(yù)測算法研究。針對節(jié)點(diǎn)度分布不均衡的數(shù)據(jù),提出一種二分網(wǎng)絡(luò)上無偏溫差傳導(dǎo)和有偏溫度恒定的算法處理評分預(yù)測問題。由于不需要進(jìn)行相似計(jì)算和選擇固定個數(shù)用戶(項(xiàng)目)作為近鄰,二分網(wǎng)絡(luò)的方法可以更好地緩解稀疏數(shù)據(jù)的影響。本文提出的算法基于熱傳導(dǎo)的過程,并采用用戶之間的溫差作為傳導(dǎo)與比較的內(nèi)容,并設(shè)定節(jié)點(diǎn)獲得的溫差是從所有連接節(jié)點(diǎn)處傳導(dǎo)過來的溫差的均值,以此平衡所有節(jié)點(diǎn)的影響;此外,利用溫度恒定的過程計(jì)算項(xiàng)目節(jié)點(diǎn)的預(yù)測溫度,得到用戶對項(xiàng)目的評分預(yù)測值。由文中進(jìn)行的實(shí)驗(yàn)可知,在特定類型的數(shù)據(jù)集上,本文提出的算法可以取得比基于協(xié)同過濾的方法更好的效果,并且該算法比經(jīng)典熱傳導(dǎo)方法具有更高的計(jì)算效率。第四,基于Mapreduce的評分預(yù)測與top-n推薦算法的大規(guī)模數(shù)據(jù)處理問題研究。個性化信息推薦在實(shí)際應(yīng)用中要處理的數(shù)據(jù)量越來越大,因此對算法的執(zhí)行效率提出了更高的要求。有些研究針對算法計(jì)算過程進(jìn)行精簡,比如矩陣降維等,但這類方法受限于算法本身,并不能保證精簡的效果一定能夠滿足要求,也不能無限地精簡來提升算法的擴(kuò)展能力。本文研究了所提出的幾種推薦算法,對基于二分網(wǎng)絡(luò)的top-n推薦算法與評分預(yù)測算法進(jìn)行并行化設(shè)計(jì)與實(shí)現(xiàn),利用Mapreduce的并行計(jì)算功能將整個算法的計(jì)算量分配到多個計(jì)算節(jié)點(diǎn)上并發(fā)進(jìn)行,以此提高算法的執(zhí)行效率,減少處理大規(guī)模數(shù)據(jù)時(shí)算法的時(shí)間消耗。這類方法的好處是,隨著數(shù)據(jù)量的不斷加大,在算法適用的前提下,只要提供足夠的計(jì)算節(jié)點(diǎn)分擔(dān)計(jì)算量,就可以不斷增加其擴(kuò)展能力。
[Abstract]:With the rapid development of Internet technology and the spread of information network, the number of information on the Internet is expanding rapidly. People are faced with the difficulty of obtaining information from information overload. How to help Internet users get more information they want more effectively, become information science, computer science and network science and so on Thanks to the unremitting efforts of many researchers, there have been several ways to obtain information more efficiently, the most important is the information retrieval technology and information filtering technology. The former takes various search engines as the typical representative, and gets the user's information to the user through the interaction with the user. By describing the key words in the network, the latter uses the information recommendation as the main method to analyze the user's potential interest by collecting the user's behavior data and other attribute information, and screening the information that may be interested in the user. And the limited key words can not further distinguish the users with different habits, and the results are all the same; and the recommendation technology uses the information of the user and the interest analysis represented by the past behavior to get the user's preference and tendency, which is not based on the self description of the user's needs, so the user can pay less. The recommendation technology has been developed for nearly twenty years and has achieved more successful applications in many fields. In the field of theoretical research, the recommendation technology has been paid attention by a large number of researchers and the classic recommendation. There are many other new methods, such as collaborative filtering, and many other new methods - such as the two - Network - based approach - have been put forward to further enrich the related research of recommendation technology. With the deepening of the research and the continued changes in the application environment, the recommendation technology is facing many problems and challenges. The most important of these is data sparsity and large-scale data processing. Data sparsity refers to the large number of users and projects in the recommendation based on collaborative filtering, but the user's evaluation data on the project is relatively small, which leads to the sparse data in the entire user item matrix. The calculation of the method brings about the effect of accuracy. The problem of large-scale data processing is that the real-time pressure of the recommended algorithm is increasing with the increasing of the amount of data to be processed in the practical application. This requires the design of more efficient methods or other methods to improve the efficiency of the algorithm to improve the recommendation. The processing ability and speed of data processing. In view of the main challenges facing recommendation technology, this paper will study the following problems. First, research on data sparsity in scoring prediction based on collaborative filtering method. The impact of data sparsity on the collaborative filtering algorithm is mainly reflected in the two stages of user similarity calculation and grade prediction generation. Data sparsity leads to more limited public data between users, which reduces the credibility of similar results among users; and sparsity is sparse. The score integrity of the nearest neighbor cannot be guaranteed, and the prediction value obtained on the incomplete reference score set can not guarantee a higher accuracy. Therefore, a method based on the absolute similarity measure to select the reference user (project) and to use the cross dimension filling method to improve the integrity of the reference score set is proposed. The proposed algorithm plays a role in reducing the impact of data sparsity and improving the accuracy of recommendation. Second, data sparsity in the top-N recommendation based on two points network research,.Top-n recommendation is another basic problem in personalized information recommendation. The purpose is to provide each user with a recommendation list containing N items. Two The recommendation method of the sub network is a novel method, which can better adapt to the relatively sparse data and obtain higher recommendation accuracy. When user interest is divided on user score, only the item part of the user's favorite item makes the data use rate very low and the user dislikes the part of the project. It is not enough; the interest difference reflected by the user's score should not only be reflected in the interest, but also should be further refined to the difference of interest intensity and the transfer of interest resources. A new two point network method is proposed in this paper to establish an interest perception by analyzing the information revealed by the items that the user dislikes. The user interest model is used and the user interest resource initialization method and resource transfer method are used to reflect the different degree of user interest. The next experiment shows that the proposed method has been greatly improved by using the new method proposed in this paper. Third, the score prediction algorithm based on the two point network is studied. For unbalanced data of node degree distribution, an algorithm to deal with score prediction with unbiased temperature difference conduction and constant temperature constant on two division networks is proposed. Because no similar calculation and selection of fixed number users (projects) are not needed as close neighbors, the method of two sub network can better alleviate the influence of sparse data. This paper proposes a method proposed in this paper. The algorithm is based on the process of heat conduction, and uses the temperature difference between users as the content of the conduction and comparison, and the temperature difference obtained by the node is the mean of the temperature difference conducted from all connection nodes to balance the influence of all nodes. In addition, the temperature of the node is calculated by the temperature constant of the temperature, and the user is obtained. The experiment in this article shows that the algorithm proposed in this paper can achieve better results than the collaborative filtering method on a specific type of data set, and the algorithm has a higher computational efficiency than the classic heat conduction method. Fourth, the Mapreduce based score prediction and the top-N recommendation algorithm are large Research on model data processing. Personalized information recommendation in the actual application to deal with more and more data, so the efficiency of the algorithm put forward higher requirements. Some of the algorithms to simplify the algorithm calculation process, such as matrix reduction, but this kind of method is limited to the algorithm itself, and can not ensure that the simplified effect is one. In this paper, we studied several proposed algorithms, designed and implemented the top-N recommendation algorithm and the scoring prediction algorithm based on the two sub network, and used the parallel computing power of Mapreduce to allocate the calculation amount of the whole algorithm to a number of meters. In order to improve the efficiency of the algorithm and reduce the time consuming of processing large scale data, the advantage of this kind of method is that as the amount of data is increasing, the expansion ability of the algorithm can be increased by providing enough computing nodes to share the computation.
【學(xué)位授予單位】:國防科學(xué)技術(shù)大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2014
【分類號】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 郁雪;李敏強(qiáng);;基于PCA-SOM的混合協(xié)同過濾模型[J];系統(tǒng)工程理論與實(shí)踐;2010年10期
2 張光衛(wèi);李德毅;李鵬;康建初;陳桂生;;基于云模型的協(xié)同過濾推薦算法[J];軟件學(xué)報(bào);2007年10期
3 邢春曉;高鳳榮;戰(zhàn)思南;周立柱;;適應(yīng)用戶興趣變化的協(xié)同過濾推薦算法[J];計(jì)算機(jī)研究與發(fā)展;2007年02期
4 周軍鋒,湯顯,郭景峰;一種優(yōu)化的協(xié)同過濾推薦算法[J];計(jì)算機(jī)研究與發(fā)展;2004年10期
5 鄧愛林,朱揚(yáng)勇,施伯樂;基于項(xiàng)目評分預(yù)測的協(xié)同過濾推薦算法[J];軟件學(xué)報(bào);2003年09期
6 潘紅艷;林鴻飛;趙晶;;基于矩陣劃分和興趣方差的協(xié)同過濾算法[J];情報(bào)學(xué)報(bào);2006年01期
,本文編號:2040987
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2040987.html
最近更新
教材專著