基于機(jī)器學(xué)習(xí)的問(wèn)答推薦系統(tǒng)問(wèn)題推薦模型研究
[Abstract]:The question recommendation model described in this paper is based on a personalized recommendation system developed by an interactive Chinese question answering platform. There are a large number of unanswered questions on the Chinese question answering platform. The personalized recommendation system can recommend the relevant questions to the user according to the user's registration information and their login, browse and answer behavior on the interactive question answering platform. In order to reduce the cost of users to find the problem to be answered, improve the number of answers, better knowledge sharing. The recommendation model of the problem recommendation system adopts the content-based recommendation algorithm based on the machine learning technology, and draws lessons from the idea of the precision directed advertising system, and takes the click rate of the recommendation problem as the optimization goal of the system. Combined with the techniques of Chinese word segmentation [76 / 77/ 7/ 78/ 78/ 79], keyword extraction and named entity recognition (Named Entity Recognition,NER) [81 / 82/ 83/ 84], a (CTR) prediction model of click rate was established to match the user and the problem. The conditional probability P (click=true user=uid, question=qid) is calculated by using the prediction model of click rate, that is, the probability of the new problem being clicked by the user is taken as the measure of the matching program between the user and the new problem, and the maximum entropy (Max Entropy) model is used to fit the conditional probability. The original version of the problem recommendation model has the following two shortcomings: the first is that the recommendation model only uses a very small number of features. The lack of feature dimension leads to the underfitting of the model. Secondly, the static recommendation model can not adapt to the change of data distribution. The work of this paper is to improve the original version of the problem recommendation model, specifically including the following two aspects of work: 1. By introducing semantic features, combination features and bias items into the problem recommendation model, the accuracy of the recommendation model is improved by combining model selection and regularization techniques. The improved model uses probabilistic latent semantic analysis (probability Latent Semantic Analysis,pLSA) technique to extract semantic features of problem text. Text processing at the semantic level can achieve better results than at the lexical level. The accuracy of the original recommendation model on the datum data set is 88 and that of the improved model on the datum data set is 95. 2. 2. An offline training system for problem recommendation model is designed and implemented. The system can automatically download basic data, feature extraction, model training and model selection, and can realize offline training and periodic updating of problem recommendation model. The purpose of designing an offline training system is to produce a new recommendation model on a regular basis. The experimental results show that the data distribution of the problem recommendation model is time-series, and the static model can not adapt to the influence of the change of the data distribution. The improved question recommendation model and the offline training system have been launched to provide a more accurate personalized question recommendation service for the users of the interactive Chinese question answering system.
【學(xué)位授予單位】:中山大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類(lèi)號(hào)】:TP181;TP391.3
【共引文獻(xiàn)】
相關(guān)期刊論文 前10條
1 楊緒兵,韓自存;ε不敏感的核Adaline算法及其在圖像去噪中的應(yīng)用[J];安徽工程科技學(xué)院學(xué)報(bào)(自然科學(xué)版);2003年04期
2 陶秀鳳,唐詩(shī)忠,周鳴爭(zhēng);基于支持向量機(jī)的軟測(cè)量模型及應(yīng)用[J];安徽工程科技學(xué)院學(xué)報(bào)(自然科學(xué)版);2004年02期
3 王東雷;;基于單純形算法的優(yōu)化設(shè)計(jì)與實(shí)現(xiàn)[J];安徽農(nóng)業(yè)科學(xué);2007年36期
4 許高程;張文君;王衛(wèi)紅;;支持向量機(jī)技術(shù)在遙感影像滑坡體提取中的應(yīng)用[J];安徽農(nóng)業(yè)科學(xué);2009年06期
5 郭立萍;唐家奎;米素娟;張成雯;趙理君;;基于支持向量機(jī)遙感圖像融合分類(lèi)方法研究進(jìn)展[J];安徽農(nóng)業(yè)科學(xué);2010年17期
6 ;A Preliminary Application of the Differential Evolution Algorithm to Calculate the CNOP[J];Atmospheric and Oceanic Science Letters;2009年06期
7 馮學(xué)軍;;最小二乘支持向量機(jī)的研究與應(yīng)用[J];安慶師范學(xué)院學(xué)報(bào)(自然科學(xué)版);2009年01期
8 鄒心遙;姚若河;;基于LSSVM的威布爾分布形狀參數(shù)估計(jì)(英文)[J];半導(dǎo)體技術(shù);2008年06期
9 鄒心遙;姚若河;;基于LSSVM的小子樣元器件壽命預(yù)測(cè)[J];半導(dǎo)體技術(shù);2011年09期
10 李卓遠(yuǎn),吳為民,王e
本文編號(hào):2238205
本文鏈接:http://sikaile.net/wenyilunwen/guanggaoshejilunwen/2238205.html