基于時間序列的搜索引擎評估模型算法研究
發(fā)布時間:2018-05-13 14:24
本文選題:時間序列 + 熵權(quán); 參考:《東北師范大學(xué)》2015年碩士論文
【摘要】:隨著大數(shù)據(jù)時代的到來,搜索引擎作為人類獲取信息的首要工具它的優(yōu)劣直接決定了人們獲取信息的準(zhǔn)確性、速度和成本。因此,分析用戶對搜索引擎的滿意度一直是信息檢索界的重要研究方向。本文以現(xiàn)代搜索技術(shù)中滿意度及相關(guān)性等評估工作為基礎(chǔ),通過構(gòu)建熵權(quán)模型及時間序列模型,分析引起搜索引擎滿意度變化的主要因素。本文主要分為三大部分進行研究:首先,基于目前網(wǎng)頁搜索評估中最重要的相關(guān)性評估和Session滿意度分析兩個主要部分,分析了兩種評估的現(xiàn)狀及存在的問題,并深入探討了與本文相關(guān)的信息熵理論和時間序列模型。然后,確定分析滿意度所需要的數(shù)據(jù)源——用戶行為日志。對日志進行融合、對象重構(gòu)、指標(biāo)篩選、維度扁平化、擴展和多維度的拆分等操作,最終將處理后的數(shù)據(jù)存儲到Infobright數(shù)據(jù)倉庫中從而成為后續(xù)分析的數(shù)據(jù)基礎(chǔ)。在此基礎(chǔ)上,構(gòu)造了熵權(quán)模型的判別矩陣,在已知結(jié)論的情況下逆向求解熵權(quán),通過實驗確定其可行性和合理性。在分析定位熵權(quán)模型的缺點后,設(shè)計了符合數(shù)據(jù)特點的時間序列模型,并構(gòu)造出最終的滿意度變化分析模型,實驗分析了不同指標(biāo)在Session滿意度變化過程中的貢獻。最后,設(shè)計并開發(fā)了基于時間序列的滿意度變化分析系統(tǒng),包括日志處理、計算熵權(quán)、計算滿意度貢獻、時間序列預(yù)測以及結(jié)論存儲等5個模塊。實際分析中該系統(tǒng)的結(jié)論為搜索引擎功能和服務(wù)的改進提供了一定的方向性建議,理論方面提出了一套簡單易行、適合集群運算的分析滿意度變化因素的算法。
[Abstract]:With the arrival of big data era, search engine, as the primary tool to obtain information, directly determines the accuracy, speed and cost of obtaining information. Therefore, the analysis of users' satisfaction with search engines has been an important research direction in information retrieval field. Based on the evaluation of satisfaction and correlation in modern search technology, this paper analyzes the main factors that cause the change of search engine satisfaction by constructing entropy weight model and time series model. This paper is mainly divided into three parts: first, based on the most important relevance evaluation and Session satisfaction analysis, this paper analyzes the current situation and existing problems of the two kinds of evaluation. The information entropy theory and time series model related to this paper are also discussed. Then, the user behavior log, the data source needed to analyze satisfaction, is determined. Log fusion, object refactoring, index filtering, dimension flattening, expansion and multi-dimensional split operations are carried out. Finally, the processed data is stored in the Infobright data warehouse, which becomes the data base for subsequent analysis. On this basis, the discriminant matrix of entropy weight model is constructed, and the entropy weight is solved in reverse under the condition of known conclusion. The feasibility and rationality of entropy weight are determined by experiments. After analyzing the shortcomings of the localization entropy weight model, a time series model which accords with the characteristics of the data is designed, and the final satisfaction change analysis model is constructed. The contribution of different indexes in the process of Session satisfaction change is analyzed experimentally. Finally, a time series based satisfaction analysis system is designed and developed, which includes five modules: log processing, entropy weight calculation, satisfaction contribution calculation, time series prediction and conclusion storage. The conclusion of the system provides some direction suggestions for the improvement of search engine function and service. In theory, a set of simple and suitable algorithm for analyzing the factor of satisfaction change is put forward.
【學(xué)位授予單位】:東北師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:TP391.3
【參考文獻】
相關(guān)期刊論文 前2條
1 董哈微;葉先一;;基于時間序列的呼叫中心話務(wù)量預(yù)測[J];閩江學(xué)院學(xué)報;2008年05期
2 郝香芝;李少穎;;我國GDP時間序列的模型建立與預(yù)測[J];統(tǒng)計與決策;2007年23期
相關(guān)碩士學(xué)位論文 前4條
1 王秋彬;基于多因素非線性敏感分析的工業(yè)項目評估模型[D];華中科技大學(xué);2005年
2 盧林蘭;ontology及其在個性化信息檢索中的應(yīng)用研究[D];蘭州理工大學(xué);2007年
3 高峰;時間序列分析在顧客滿意度中的應(yīng)用研究[D];華東師范大學(xué);2007年
4 郭龍;時間序列數(shù)據(jù)的周期性研究[D];電子科技大學(xué);2013年
,本文編號:1883575
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1883575.html
最近更新
教材專著