面向金融投資者及機構(gòu)的信息咨詢引擎系統(tǒng)研究
發(fā)布時間:2018-03-26 19:14
本文選題:搜索引擎 切入點:企業(yè)搜索引擎框架 出處:《哈爾濱工業(yè)大學(xué)》2017年碩士論文
【摘要】:信息咨詢引擎系統(tǒng)從類別上屬于一種垂直搜索引擎。它是按照一定的搜索策略、運用特定的計算機程序語言,將來自各個國家的金融機構(gòu)、上市公司和地方政府債券的數(shù)據(jù)進(jìn)行整合處理,然后將整合后的數(shù)據(jù)結(jié)果根據(jù)搜索關(guān)鍵詞展現(xiàn)給特定的用戶群體。其用戶群體主要是金融機構(gòu)投資者和個人投資者。所以如何更好地為客戶提供個性化檢索服務(wù),是實際應(yīng)用系統(tǒng)需要重點解決的問題;诖,所研究的基于用戶個性化模型排序算法和改進(jìn)網(wǎng)頁權(quán)重值排序算法,具有重要的實際意義。本文的主要研究內(nèi)容如下:首先借助企業(yè)搜索引擎框架Solr構(gòu)建本系統(tǒng)的搜索引擎平臺,研究搜索引擎的個性化排序技術(shù),最后對用戶瀏覽網(wǎng)頁的行為特征進(jìn)行分析和提取。將與用戶相關(guān)性較大的關(guān)鍵詞權(quán)重值進(jìn)行計算。進(jìn)而通過所獲得的用戶特征向量構(gòu)建用戶個性化模型。并基于此個性化模型對搜索結(jié)果進(jìn)行重排序,從而達(dá)到個性化排序的目的。實驗結(jié)果表明,這種重排序算法可以更好的滿足用戶的搜索需求,但同時降低了搜索引擎的檢索效率。從這個角度出發(fā),本文研究了兩種改進(jìn)的網(wǎng)頁權(quán)重值算法,以提高個性化排序的效率。首先提出基于個性化網(wǎng)頁權(quán)重計算的網(wǎng)頁權(quán)重值算法。該算法利用對用戶日志的挖掘分析,從而使網(wǎng)頁的網(wǎng)頁權(quán)重值具有用戶個性化特征。其次提出基于事務(wù)聚類模式的個性化網(wǎng)頁權(quán)重值算法。該算法通過獲取用戶的關(guān)鍵詞訪問序列,從而得到用戶所感興趣的關(guān)鍵詞集合并以此來修正網(wǎng)頁權(quán)重值,以體現(xiàn)用戶個性化特征;進(jìn)而提出基于主體化事務(wù)聚類模式的個性化網(wǎng)頁權(quán)重值算法,將用戶的檢索關(guān)鍵詞和網(wǎng)頁主題進(jìn)行歸納,使網(wǎng)頁的權(quán)重值具有用戶的個性化偏好。為了驗證本文所提算法的有效性,研發(fā)了面向金融投資者及機構(gòu)的信息咨詢引擎系統(tǒng)。該系統(tǒng)已成功應(yīng)用于某司實際業(yè)務(wù)檢索平臺。通過QA測試平臺實驗表明,基于Solr構(gòu)建的搜索引擎要略優(yōu)于基于Endeca構(gòu)建的搜索引擎;基于用戶個性化排序算法的檢索結(jié)果更符合用戶的檢索需求;改進(jìn)的網(wǎng)頁權(quán)重值算法的檢索效率明顯優(yōu)于基于用戶個性化模型排序算法;同時基于主題化事務(wù)聚類模式的個性化網(wǎng)頁權(quán)重值算法從檢索效率上又明顯優(yōu)于基于個性化網(wǎng)頁權(quán)重計算和事務(wù)聚類模式網(wǎng)頁權(quán)重值算法。
[Abstract]:The information consulting engine system belongs to a vertical search engine in terms of category. It is based on a certain search strategy, using a specific computer programming language, and will come from financial institutions in various countries. The data of listed companies and local government bonds are consolidated and processed. Then the integrated data results are presented to a specific user group according to the search keywords. The user groups are mainly financial institutional investors and individual investors. So how to better provide personalized retrieval services for customers, It is an important problem that needs to be solved in practical application system. Based on this, the sorting algorithm based on user personalization model and the improved ranking algorithm of Web page weight value are studied. The main contents of this paper are as follows: firstly, with the help of the enterprise search engine framework Solr, the platform of the system is constructed, and the personalized ranking technology of the search engine is studied. Finally, we analyze and extract the behavior features of users browsing web pages, calculate the weights of keywords that are highly relevant to users, and then construct a user personalized model based on the obtained user feature vectors. This personalization model reorders search results, Experimental results show that the reordering algorithm can better meet the search needs of users, but at the same time reduce the search efficiency of search engines. In this paper, two improved web page weight algorithms are studied to improve the efficiency of personalized ranking. Firstly, a web page weight algorithm based on personalized web page weight calculation is proposed, which uses mining and analysis of user logs. In order to make the web page weight value have the characteristic of user personalization. Secondly, a personalized web page weight value algorithm based on transaction clustering mode is proposed, which obtains the user's keyword access sequence. In order to get the keyword set of users' interest and modify the Web page weight value to reflect the personalized characteristics of the user, a personalized web page weight value algorithm based on the subject transaction clustering model is proposed. In order to verify the validity of the algorithm proposed in this paper, the user's search keywords and web page topics are summed up to make the weights of the web pages have the users' personalized preferences. An information consulting engine system for financial investors and institutions is developed. The system has been successfully applied to a department's actual business search platform. The search engine based on Solr is a little better than the search engine based on Endeca, and the search result based on personalized sorting algorithm meets the needs of users. The retrieval efficiency of the improved weighted value algorithm is obviously better than that of the ranking algorithm based on the user personalized model. At the same time, the retrieval efficiency of personalized web page weight algorithm based on thematic transaction clustering model is obviously better than that based on personalized web page weight calculation and transaction clustering model.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 陳艷秋;孫培立;;一種基于類別強信息特征和貝葉斯算法的中文文本分類器[J];計算機應(yīng)用與軟件;2014年08期
2 吳潔明;冀單單;韓云輝;;基于Web的DCI垂直搜索引擎的研究與設(shè)計[J];計算機工程與設(shè)計;2013年04期
3 劉徽;黃寬娜;余建橋;;一種Deep Web爬蟲爬行策略[J];計算機工程;2012年11期
4 江婕;李建民;曾R挽,
本文編號:1669218
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1669218.html
最近更新
教材專著