天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

基于個(gè)性化預(yù)測的推送算法研究

發(fā)布時(shí)間:2018-10-08 21:56
【摘要】:高效、準(zhǔn)確的從海量信息與數(shù)據(jù)中篩選可信度高、用戶感興趣的關(guān)鍵信息是信息服務(wù)行業(yè)的研究重點(diǎn)之一;谒阉饕娴睦》⻊(wù)方式與信息推送服務(wù)是當(dāng)前獲取信息的兩個(gè)主要渠道。我國農(nóng)村地區(qū)經(jīng)濟(jì)發(fā)展水平落后,農(nóng)民文化素質(zhì)普遍偏低,采取基于搜索引擎的信息獲取方式并不現(xiàn)實(shí),信息推送服務(wù)更加適合于農(nóng)村地區(qū)。 “個(gè)性化”是推送模型的根本出發(fā)點(diǎn),通過選擇距離最近的K個(gè)鄰居樣本并構(gòu)建預(yù)測模型,實(shí)現(xiàn)為目標(biāo)用戶推送特定信息的目的。實(shí)現(xiàn)K近鄰選擇,樣本相似性度量與K值大小確定是其關(guān)鍵與難點(diǎn)。本研究從以上兩個(gè)方面出發(fā),并對其進(jìn)行改進(jìn),報(bào)告結(jié)果如下。 構(gòu)建推送模型,首先需要為目標(biāo)用戶選擇一個(gè)近鄰集合,該集合由相似性測度最高的K個(gè)用戶樣本組成,常用的相似性測度有Pearson相關(guān)系數(shù)、cosine相似性和均方差相似性(Mean Squared Differences, MSD),歐氏距離等,但上述關(guān)系測度不能反映兩個(gè)用戶之間復(fù)雜非線性關(guān)系,導(dǎo)致近鄰集合不夠準(zhǔn)確。本文引入最大互信息系數(shù)(maximal mutual information coefficient,MIC)作為用戶之間的相似性測度。相比傳統(tǒng)互信息,MIC通過對變量劃分超簇,并基于逐步尋優(yōu)獲得每個(gè)變量的最優(yōu)分段點(diǎn),從而最大化兩個(gè)變量的互信息,適于任意形式的非線性函數(shù)甚至疊加函數(shù),可有效反應(yīng)兩個(gè)用戶之間的復(fù)雜非線性關(guān)系,使得近鄰集合更加準(zhǔn)確,提高推送模型的預(yù)測精度。 基于近鄰集合對目標(biāo)用戶未評分項(xiàng)目實(shí)施預(yù)測(項(xiàng)目評分預(yù)測模型),是推送模型的另一個(gè)關(guān)鍵點(diǎn),項(xiàng)目的預(yù)測得分值直接決定是否將該項(xiàng)目推送給目標(biāo)用戶,錯(cuò)誤的預(yù)測值可導(dǎo)致錯(cuò)誤的信息推送。構(gòu)建高精度的項(xiàng)目評分預(yù)測模型,選擇合適的訓(xùn)練樣本是關(guān)鍵。近鄰集合是基于全部已評分項(xiàng)目計(jì)算相似性獲得,但在預(yù)測某一特定用戶的特定項(xiàng)目時(shí),因時(shí)間差異、地域差異、文化差異等的存在,以全部的近鄰樣本作為訓(xùn)練樣本不一定能獲得最佳預(yù)測效果。從全部的近鄰集合中選擇k個(gè)最優(yōu)樣本是一個(gè)k-近鄰選擇問題,k值的選擇是核心。本研究引入地統(tǒng)計(jì)學(xué),分析每一個(gè)待預(yù)測項(xiàng)目的近鄰集合的結(jié)構(gòu)性,給出一個(gè)公用的變程a,并為每個(gè)用戶從全部近鄰集合中選擇距離小于a的k個(gè)訓(xùn)練樣本,實(shí)現(xiàn)了每個(gè)用戶的個(gè)性化預(yù)測。 基于上述近鄰選擇與訓(xùn)練樣本選擇兩部分的改進(jìn),以MovieLens評分?jǐn)?shù)據(jù)集為實(shí)例數(shù)據(jù),基于支持向量機(jī)構(gòu)建項(xiàng)目評分預(yù)測模型,大幅度提高了項(xiàng)目評分的預(yù)測精度。
[Abstract]:It is one of the key research points of information service industry to screen the key information of high reliability and interest from mass information and data efficiently and accurately. Search engine based pull service and information push service are the two main channels to obtain information. The level of economic development in rural areas in China is backward and the cultural quality of farmers is generally low. It is not realistic to adopt the way of obtaining information based on search engine, and the information push service is more suitable for rural areas. "Personalization" is the basic starting point of push model. By selecting K nearest neighbor samples and constructing prediction model, the purpose of pushing specific information for target users is realized. It is a key and difficult point to realize K-nearest neighbor selection, measure similarity of samples and determine the size of K-value. This study starts from the above two aspects and improves them. The results are as follows. In order to construct the push model, we first need to select a nearest neighbor set for the target user, which is composed of K user samples with the highest similarity measure. The commonly used similarity measures include Pearson correlation coefficient similarity and (Mean Squared Differences, MSD), Euclidean distance, but the above relation measures can not reflect the complex nonlinear relationship between two users, which leads to the inaccuracy of the nearest neighbor set. In this paper, the maximum mutual information coefficient (maximal mutual information coefficient,MIC) is introduced as the similarity measure between users. Compared with traditional mutual information mics, by dividing superclusters of variables and obtaining the optimal piecewise points of each variable based on stepwise optimization, this paper maximizes the mutual information of two variables and is suitable for any form of nonlinear function or even superposition function. It can effectively reflect the complex nonlinear relationship between two users, make the nearest neighbor set more accurate, and improve the prediction accuracy of the push model. It is another key point of the push model to predict the target user's ungraded items based on the nearest neighbor set. The prediction score of the project directly determines whether to push the project to the target user. An incorrect prediction can cause the wrong message to be pushed. It is crucial to construct a high-precision project score prediction model and select suitable training samples. The nearest neighbor set is based on the similarity calculation of all the graded items, but in predicting a particular item for a particular user, due to the existence of time differences, regional differences, cultural differences, etc. Using all nearest neighbor samples as training samples is not always the best prediction result. The selection of k optimal samples from all nearest neighbor sets is the core of a k-nearest neighbor selection problem. In this study, geostatistics is introduced to analyze the structure of the nearest neighbor set of each item to be predicted, a common variable range a is given, and k training samples with a distance less than a are selected for each user. The personalized prediction of each user is realized. Based on the improvement of neighbor selection and training sample selection, MovieLens score data set is taken as an example, and a project score prediction model based on support vector mechanism is built, which greatly improves the prediction accuracy of item score.
【學(xué)位授予單位】:湖南農(nóng)業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 馬建華;李本星;黃靜;陳武凡;;基于Minkowski距離最小化的多模態(tài)圖像配準(zhǔn)[J];電路與系統(tǒng)學(xué)報(bào);2008年05期

2 李聰;梁昌勇;馬麗;;基于領(lǐng)域最近鄰的協(xié)同過濾推薦算法[J];計(jì)算機(jī)研究與發(fā)展;2008年09期

3 張著英;黃玉龍;王翰虎;;一個(gè)高效的KNN分類算法[J];計(jì)算機(jī)科學(xué);2008年03期

4 王普,劉斌,戴瓊海,張大力;非對稱數(shù)據(jù)廣播系統(tǒng)的研究與應(yīng)用[J];計(jì)算機(jī)工程;1999年05期

5 李明,陳蘇,張雨,張根度;計(jì)算機(jī)網(wǎng)絡(luò)中的Push技術(shù)[J];計(jì)算機(jī)工程;2000年06期

6 ;Back-propagation network improved by conjugate gradient based on genetic algorithm in QSAR study on endocrine disrupting chemicals[J];Chinese Science Bulletin;2008年01期

7 梅虎,梁桂兆,周原,李志良;支持向量機(jī)用于定量構(gòu)效關(guān)系建模的研究[J];科學(xué)通報(bào);2005年16期

8 田真;陳曉芳;;寧夏農(nóng)業(yè)科技信息服務(wù)現(xiàn)狀分析研究[J];圖書館理論與實(shí)踐;2008年06期

9 房桂芝;董禮剛;;關(guān)于農(nóng)業(yè)科技信息服務(wù)現(xiàn)狀的調(diào)查與思考——以青島地區(qū)為例[J];農(nóng)業(yè)科技管理;2009年05期

10 邢廣成;強(qiáng)天偉;;人工神經(jīng)網(wǎng)絡(luò)的發(fā)展與應(yīng)用[J];科技風(fēng);2012年15期



本文編號:2258405

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2258405.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶91083***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com