基于Hadoop平臺(tái)的圖書館讀者興趣分析與導(dǎo)向系統(tǒng)模型的建立
本文選題:大數(shù)據(jù) + Hadoop。 參考:《長(zhǎng)春工業(yè)大學(xué)》2017年碩士論文
【摘要】:近年來(lái),特別是黨的十八大以來(lái),在創(chuàng)新驅(qū)動(dòng)國(guó)家戰(zhàn)略的引領(lǐng)下,在大眾創(chuàng)業(yè),萬(wàn)眾創(chuàng)新的大時(shí)代背景下,互聯(lián)網(wǎng)+高校信息化管理成為了炙手可熱的話題。作為學(xué)校信息數(shù)據(jù)量巨大的部門——圖書館,隨著移動(dòng)網(wǎng)絡(luò)、大數(shù)據(jù)、云計(jì)算以及物聯(lián)網(wǎng)等新興技術(shù)的逐漸成熟和高校對(duì)圖書館信息化的日益重視,已經(jīng)對(duì)越來(lái)越多的圖書資源實(shí)現(xiàn)了信息化管理,并且使其呈現(xiàn)出了功能多樣化的百花齊放的局面。廣大師生在圖書館檢索查閱過(guò)程中會(huì)產(chǎn)生非常大的數(shù)據(jù)量,這些數(shù)據(jù)正是我們對(duì)讀者的閱讀和檢索興趣進(jìn)行深入分析的依據(jù)。值得注意的是,讀者的檢索多樣化和讀者興趣分析及圖書應(yīng)用推薦智能化需求也為圖書館帶來(lái)了巨大的挑戰(zhàn):一方面,單一的服務(wù)器外加磁盤陣列架構(gòu)的圖書館智能化硬件設(shè)施已無(wú)法適應(yīng)大量的數(shù)據(jù)檢索分析及存儲(chǔ)的需要,并且,以上硬件存在成本較高和難以適應(yīng)大數(shù)據(jù)時(shí)代需求的缺點(diǎn);另一方面,當(dāng)前對(duì)高校的圖書館借閱流通數(shù)據(jù)的分析方法仍舊停留在對(duì)各種單一指標(biāo)對(duì)象的分析上,分析得不夠全面;最后,現(xiàn)行的讀者興趣分析只是得出了一個(gè)統(tǒng)計(jì)結(jié)果,而不是能將統(tǒng)計(jì)結(jié)果進(jìn)一步轉(zhuǎn)化為對(duì)學(xué)校圖書館如采購(gòu)等其他工作產(chǎn)生直觀指導(dǎo)的數(shù)據(jù)。為解決以上問(wèn)題,通過(guò)對(duì)長(zhǎng)春某大學(xué)及其周邊部分省屬高校圖書館進(jìn)行數(shù)據(jù)分析與實(shí)地調(diào)研,結(jié)合大數(shù)據(jù)理論,結(jié)合導(dǎo)師的研究方向和相關(guān)橫向課題,在深入研究Hadoop大數(shù)據(jù)技術(shù)的基礎(chǔ)上,建立興趣分析和導(dǎo)向模型可以對(duì)高校圖書館信息化建設(shè)的研究起到一定的借鑒作用。本文主要在以下四方面展開(kāi)研究:一是將Hadoop大數(shù)據(jù)技術(shù)與C#語(yǔ)言數(shù)據(jù)分析技術(shù)應(yīng)用于讀者興趣分析與導(dǎo)向分析工作中,同時(shí)為了解決大數(shù)據(jù)存儲(chǔ)與運(yùn)算的高成本問(wèn)題,本文選擇了成本較低的Microsoft Azurez作為服務(wù)器群,用來(lái)構(gòu)建Hadoop數(shù)據(jù)平臺(tái);二是利用NoSQL分布式數(shù)據(jù)庫(kù)和HBase數(shù)據(jù)庫(kù)對(duì)圖書館讀者借閱檢索日志(圖書電子資源訪問(wèn)來(lái)源)進(jìn)行分析,通過(guò)日志分析,實(shí)現(xiàn)了對(duì)圖書館電子資源的使用信息進(jìn)行監(jiān)控、優(yōu)化;三是通過(guò)對(duì)讀者的文獻(xiàn)借閱的歷史數(shù)據(jù)進(jìn)行挖掘,構(gòu)建文獻(xiàn)推薦模型框架;第四,將文獻(xiàn)推薦模型框架生成的圖書推薦列表進(jìn)行轉(zhuǎn)化,最終為采編部生成采購(gòu)清單。本論文實(shí)現(xiàn)了以下三個(gè)方面的創(chuàng)新:一是將現(xiàn)流行的Hadoop大數(shù)據(jù)分析技術(shù)引入了較為傳統(tǒng)的高校圖書館讀者興趣分析與導(dǎo)向工作中,實(shí)現(xiàn)了在廉價(jià)的計(jì)算機(jī)上構(gòu)建文獻(xiàn)借閱分析集群平臺(tái),充分利用微軟的云平臺(tái),使海量文獻(xiàn)借閱信息的分布式存儲(chǔ)問(wèn)題迎刃而解。二是在數(shù)據(jù)分析方面,將C#編程語(yǔ)言引入Hadoop框架,為興趣分析與導(dǎo)向模型的建立提供了高效、直觀的數(shù)據(jù)分析方案,把讀者的借閱歷史數(shù)據(jù)進(jìn)行大數(shù)據(jù)分析與比對(duì),最終實(shí)現(xiàn)針對(duì)讀者的個(gè)性化推薦,進(jìn)而為圖書館的館藏圖書建設(shè)、館內(nèi)圖書資源結(jié)構(gòu)調(diào)整提供強(qiáng)有力的決策支持。第三是實(shí)現(xiàn)了個(gè)性化推薦圖書,為圖書館的圖書采購(gòu)部采購(gòu)圖書提供參考。
[Abstract]:In recent years, especially in the eighteen major party, under the guidance of innovation driven national strategy, the Internet + university information management has become a hot topic under the background of public entrepreneurship and great innovation. As a huge department of information and data in schools, the library, with the mobile network, large data, cloud computing and things With the growing maturity of emerging technologies such as networking and the increasing attention of colleges and universities to the information of libraries, more and more book resources have been realized by information management, and they have shown a variety of diversified functions. The vast amount of data will be produced in the process of searching and consulting the library by the teachers and students. These data are positive. It is the basis for the in-depth analysis of readers' reading and retrieval interest. It is noteworthy that the diversity of readers' retrieval, the analysis of readers' interest and the intelligent demand for the recommendation of library application have also brought great challenges to the library. On the one hand, the single server and the magnetic disk array architecture of library intelligent hardware facilities have already been used. It is unable to adapt to a large number of data retrieval analysis and storage needs, and the above hardware has a high cost and is difficult to adapt to the needs of the large data age. On the other hand, the current analysis method of library circulation data in Colleges and universities remains on the analysis of a variety of single target objects, and the analysis is not comprehensive; finally, The current analysis of readers' interest only draws a statistical result, but not the data that can be further transformed into a direct guide to other work such as procurement, such as the school library. In order to solve the above problems, the data analysis and field research are carried out on the provincial university libraries in Changchun and its surrounding provinces. Combining with the big data theory, combining the direction of the tutor's research and the related lateral issues, on the basis of the in-depth study of the Hadoop data technology, the establishment of an interest analysis and guidance model can play a certain reference for the research of the information construction of the university library. This paper mainly studies the following four aspects: one is the large data technology of Hadoop And C# language data analysis technology is applied to the reader's interest analysis and guidance analysis. At the same time, in order to solve the high cost problem of large data storage and operation, this paper chooses the lower cost Microsoft Azurez as the server group to construct the Hadoop data platform; two is to use the NoSQL distributed database and the HBase database to book the books. The library reader borrows the retrieval log (the source of the book electronic resource access) to carry on the analysis, through the log analysis, realizes the library electronic resources use information to carry on the monitoring, the optimization; three is through the historical data that the literature borrowed from the reader to excavate, constructs the literature recommendation model frame; fourth, the literature recommendation model frame generates The book recommendation list is transformed, and the purchase list is finally generated for the editing department. The following three innovations are realized in this paper. One is to introduce the popular Hadoop data analysis technology into the more traditional reader interest analysis and guidance work of the university library, and to build a literature review and analysis set on a cheap computer. The group platform makes full use of Microsoft's cloud platform to solve the distributed storage problem of borrowing information in mass literature. Two, in data analysis, the C# programming language is introduced into the Hadoop framework to provide an efficient and intuitive data analysis scheme for the establishment of interest analysis and guidance model, and the reader's borrowed historical data is carried out large data. Analysis and comparison, finally realize the personalized recommendation for readers, and then provide a strong decision support for the library collection books construction and the book resource structure adjustment in the library. Third the personalized recommendation books are realized, and the library book purchasing department purchase books are provided for reference.
【學(xué)位授予單位】:長(zhǎng)春工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13;TP311.52
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 王慶;陳澤亞;郭靜;陳晰;王晶華;;基于詞共現(xiàn)矩陣的項(xiàng)目關(guān)鍵詞詞庫(kù)和關(guān)鍵詞語(yǔ)義網(wǎng)絡(luò)[J];計(jì)算機(jī)應(yīng)用;2015年06期
2 孫彥超;王興芬;;基于Hadoop框架的MapReduce計(jì)算模式的優(yōu)化設(shè)計(jì)[J];計(jì)算機(jī)科學(xué);2014年S2期
3 楊鋒英;劉會(huì)超;;基于Hadoop的在線網(wǎng)絡(luò)日志分析系統(tǒng)研究[J];計(jì)算機(jī)應(yīng)用與軟件;2014年08期
4 周磊;楊威;張玉峰;;共現(xiàn)矩陣聚類分析的問(wèn)題與再思考[J];情報(bào)雜志;2014年06期
5 張紅燕;;高校圖書館新書推薦系統(tǒng)的研究與實(shí)現(xiàn)[J];大學(xué)圖書館學(xué)報(bào);2013年05期
6 包增輝;宋余慶;;協(xié)同過(guò)濾算法的多樣性研究[J];無(wú)線通信技術(shù);2013年03期
7 康鐘榮;;基于項(xiàng)目特征分類與填充的協(xié)同過(guò)濾算法研究[J];河南科技;2013年12期
8 錢玲飛;汪榮;;基于h指數(shù)的OPAC數(shù)據(jù)分析及應(yīng)用——以南京航空航天大學(xué)圖書館為例[J];大學(xué)圖書館學(xué)報(bào);2012年02期
9 李鐵柱;;淺談數(shù)據(jù)收集的方法及特點(diǎn)[J];信息技術(shù)與信息化;2011年06期
10 楊博;趙鵬飛;;推薦算法綜述[J];山西大學(xué)學(xué)報(bào)(自然科學(xué)版);2011年03期
,本文編號(hào):2084770
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2084770.html