天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于內(nèi)容的電子書和作者推薦方法研究

發(fā)布時(shí)間:2018-06-20 06:55

  本文選題:內(nèi)容推薦 + 電子書推薦 ; 參考:《哈爾濱工業(yè)大學(xué)》2016年碩士論文


【摘要】:隨著互聯(lián)網(wǎng)的迅速普及,電子書及作者資源變得越來越豐富,但在海量的資源中讀者卻越來越難以發(fā)現(xiàn)真正感興趣的信息,而推薦系統(tǒng)能夠快速地幫助讀者選擇感興趣的、有價(jià)值的信息。在實(shí)際生活當(dāng)中,讀者更傾向于閱讀題材內(nèi)容相同者相似的圖書,或者閱讀與自己喜愛的作者寫作風(fēng)格內(nèi)容相近的作者所著的圖書。所以,個(gè)性化的電子書推薦和作者推薦成為網(wǎng)上書城比較關(guān)注的熱點(diǎn)問題。因此,本課題對(duì)基于內(nèi)容的電子書推薦和作者推薦方法的研究具有重要的實(shí)用價(jià)值。本課題主要研究內(nèi)容如下:在基于內(nèi)容的電子書推薦方面,傳統(tǒng)一些文本處理模型針對(duì)短文本研究較多,對(duì)長文本研究較少,因?yàn)榕c短文本(如新聞)相比,長文本(如電子書)存在預(yù)處理維度更高、更復(fù)雜,文本語義關(guān)系更難度量等問題。本課題從電子書權(quán)威網(wǎng)站爬取電子書全文文本,構(gòu)造實(shí)驗(yàn)長文本數(shù)據(jù)集,針對(duì)電子書長文本維度高、處理復(fù)雜等特點(diǎn),采用分治的思想,將長文本分割成若干部分,提出了多維潛在語義算法模型,通過構(gòu)建詞語義關(guān)系圖譜矩陣,來表達(dá)文本語義關(guān)聯(lián)性的特征。針對(duì)長文本語義關(guān)系難度量的特點(diǎn),采用融合全局和局部語義的相似性距離,對(duì)電子書文本內(nèi)容相似性進(jìn)行衡量,并對(duì)實(shí)驗(yàn)涉及的參數(shù)進(jìn)行一系列實(shí)驗(yàn)研究。實(shí)驗(yàn)結(jié)果表明,多維潛在語義算法模型在五種量化評(píng)價(jià)指標(biāo)的衡量下,優(yōu)于傳統(tǒng)的其他文本處理模型;在基于內(nèi)容的電子書作者推薦方面,研究大多集中在專家推薦,并且推薦里使用的特征比較單一。針對(duì)上述問題,本課題利用爬蟲程序,從電子商務(wù)網(wǎng)站爬取與作者相關(guān)的三種異構(gòu)特征,即電子書作者簡介、作者所著書摘要以及讀者評(píng)論。利用這三種電子書作者的異構(gòu)特征,本課題提出了作者樹狀結(jié)構(gòu)的表示方法,應(yīng)用多層自組織映射算法模型,進(jìn)行電子書作者推薦。實(shí)驗(yàn)根據(jù)作者樹結(jié)構(gòu)中作者節(jié)點(diǎn)特征是否與另外兩種特征信息融合設(shè)計(jì)兩組實(shí)驗(yàn),并對(duì)實(shí)驗(yàn)涉及的參數(shù)進(jìn)行研究。實(shí)驗(yàn)結(jié)果表明,在五種量化指標(biāo)的衡量下,基于作者樹的多層自組織映射模型優(yōu)于傳統(tǒng)文本處理模型。
[Abstract]:With the rapid popularity of the Internet, e-books and author resources have become more and more abundant, but in the vast amount of resources, it is increasingly difficult for readers to find information of real interest, and recommendation systems can quickly help readers to choose what is interested. Valuable information. In real life, readers tend to read similar books with the same subject matter or books written by authors with similar writing styles. Therefore, personalized e-book recommendation and author recommendation has become a hot issue in online book city. Therefore, this topic has important practical value to the content-based e-book recommendation and the author recommendation method. The main contents of this paper are as follows: in the aspect of content based e-book recommendation, some traditional text processing models have more research on short text and less on long text, because compared with short text (such as news), Long text (e. G. E-books) has many problems such as higher preprocessing dimension, more complex, and more difficult to measure the semantic relationship of text. This topic crawls the full text from the ebook authority website, constructs the experiment long text data set, in view of the e-book long text dimension high dimension, processing complex and so on characteristic, adopts the division and conquer thought, divides the long text into several parts. A multi-dimensional latent semantic algorithm model is proposed to express the semantic relevance of the text by constructing the semantic relationship map matrix. According to the characteristics of the difficulty of long text semantic relationship, the similarity distance between global and local semantics is used to measure the similarity of e-book text content, and a series of experimental studies are carried out on the parameters involved in the experiment. The experimental results show that the multi-dimensional latent semantic algorithm model is superior to other traditional text processing models under the measurement of five quantitative evaluation indexes, and the research on the content based e-book author recommendation is mostly focused on the expert recommendation. And the features recommended for use are relatively simple. In order to solve the above problems, we use the crawler program to pick up three kinds of heterogeneous features related to the author from the e-commerce website, that is, the brief introduction of the author of the e-book, the abstract of the author's book and the reader's comment. Taking advantage of the heterogeneous characteristics of the three e-book authors, this paper proposes a representation method of the author's tree structure, and applies the multi-layer self-organizing mapping algorithm model to recommend the author of the e-book. Two groups of experiments are designed according to whether the author node features are fused with the other two kinds of feature information in the author tree structure, and the parameters involved in the experiment are studied. The experimental results show that the multi-layer self-organizing mapping model based on author tree is superior to the traditional text processing model under the measurement of five quantitative indexes.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.1
,

本文編號(hào):2043387

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/2043387.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶e13ac***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com