基于Hadoop的中醫(yī)藥Web信息資源評(píng)價(jià)體系研究
發(fā)布時(shí)間:2018-03-06 09:30
本文選題:中醫(yī)藥 切入點(diǎn):Web 出處:《山東中醫(yī)藥大學(xué)》2016年博士論文 論文類(lèi)型:學(xué)位論文
【摘要】:隨著計(jì)算機(jī)和通訊技術(shù)的發(fā)展,Internet(互聯(lián)網(wǎng))逐漸滲透到人們生產(chǎn)、生活的各個(gè)領(lǐng)域,成為人們重要的知識(shí)來(lái)源,人們不斷的從網(wǎng)上獲取信息用來(lái)指導(dǎo)自己的工作和生活,現(xiàn)代社會(huì)已經(jīng)每時(shí)每刻都離不開(kāi)互聯(lián)網(wǎng)。Web,指的是Internet上與HTML相關(guān)的部分,即基于HTML協(xié)議的信息資源頁(yè)面。Web上的中醫(yī)藥信息資源每天都在不斷的增長(zhǎng),已經(jīng)存在的資源也在不斷的發(fā)生著變化和更新,信息技術(shù)的快速發(fā)展使得Web上的中醫(yī)藥信息資源相關(guān)數(shù)據(jù)呈爆炸式增長(zhǎng),但這些不斷增長(zhǎng)的中醫(yī)藥信息質(zhì)量良莠不齊,并且在現(xiàn)有的情況下很難有一套相對(duì)完善的方法對(duì)中醫(yī)藥信息資源的質(zhì)量進(jìn)行客觀的評(píng)價(jià),并指導(dǎo)人們從大量的中醫(yī)藥信息資源中找到正確的、對(duì)自己有用的信息。因此,我們需要一種方法,能夠?qū)δ壳癢eb上存在的中醫(yī)藥信息資源進(jìn)行客觀的評(píng)價(jià)。論文從Web中醫(yī)藥信息資源特點(diǎn)出發(fā),使用Hadoop分布式計(jì)算技術(shù),提出基于數(shù)據(jù)輔助的德?tīng)柗品ㄅcAHP(Analytic Hierarchy Process,即層次分析法)建立中醫(yī)藥Web信息資源評(píng)價(jià)指標(biāo)體系,并針對(duì)中醫(yī)藥健康服務(wù)類(lèi)網(wǎng)站進(jìn)行了實(shí)證研究。主要研究成果包括以下幾個(gè)方面:(1)中醫(yī)藥主題爬蟲(chóng)的設(shè)計(jì)。(第3章)討論了Web中醫(yī)藥信息資源具有增速快、分布廣、易變化的特點(diǎn),如果要對(duì)Web上存在的中醫(yī)藥信息資源進(jìn)行分析和評(píng)價(jià),前提是能夠以廉價(jià)、快速、高質(zhì)量的方法獲取信息,因此應(yīng)使用自動(dòng)化的Web信息獲取方式,即使用網(wǎng)絡(luò)爬蟲(chóng)對(duì)中醫(yī)藥Web信息進(jìn)行自動(dòng)爬取。同時(shí),該爬蟲(chóng)與通用搜索引擎的爬蟲(chóng)有所區(qū)別,只針對(duì)以中醫(yī)藥為主題的網(wǎng)站進(jìn)行爬取,避免浪費(fèi)爬蟲(chóng)時(shí)間,從而提高爬取目標(biāo)的準(zhǔn)確率。因此針對(duì)上述要求,確定了中醫(yī)藥主題爬蟲(chóng)分布式、可伸縮、高性能、高質(zhì)量的爬取目標(biāo),制定相應(yīng)的爬取策略,并對(duì)爬蟲(chóng)進(jìn)行開(kāi)發(fā)。(2)中醫(yī)藥信息資源的Hadoop平臺(tái)搭建。(第3章、第6章)爬取到的中醫(yī)藥Web相關(guān)主題頁(yè)面內(nèi)容,由于范圍廣泛、需要定期不斷的進(jìn)行數(shù)據(jù)更新,同時(shí)在進(jìn)行頁(yè)面分析和數(shù)據(jù)挖掘時(shí),使用單機(jī)的分析策略,對(duì)單機(jī)的性能帶來(lái)很高的要求,因此使用單機(jī)關(guān)系數(shù)據(jù)庫(kù)的存儲(chǔ)方式,不能滿(mǎn)足高性能的計(jì)算要求,因此,在爬蟲(chóng)爬取到頁(yè)面后,使用Hadoop的HDFS進(jìn)行存儲(chǔ),在后期對(duì)現(xiàn)有網(wǎng)頁(yè)內(nèi)容的文本挖掘、統(tǒng)計(jì)分析上,都能夠保證高性能和低系統(tǒng)開(kāi)銷(xiāo)。(3)中醫(yī)藥Web信息資源評(píng)價(jià)指標(biāo)體系的構(gòu)建。(第4章、第5章)從中醫(yī)藥Web信息資源特點(diǎn)入手,探討了針對(duì)Web中醫(yī)藥信息資源評(píng)價(jià)的原則,對(duì)評(píng)價(jià)指標(biāo)體系進(jìn)行了構(gòu)建。整個(gè)評(píng)價(jià)指標(biāo)體系共分為四個(gè)大的部分,即信息內(nèi)容評(píng)價(jià)、網(wǎng)站設(shè)計(jì)評(píng)價(jià)、易用性評(píng)價(jià)和其他評(píng)價(jià)。每個(gè)部分又細(xì)分了具體的二級(jí)指標(biāo),總共24項(xiàng),并詳細(xì)說(shuō)明了這24項(xiàng)評(píng)價(jià)指標(biāo)的意義和作用。進(jìn)而對(duì)基于AHP層次分析法的中醫(yī)藥信息資源評(píng)價(jià)進(jìn)行了分析,建立判斷矩陣,確定指標(biāo)體系具體指標(biāo)的權(quán)重,并進(jìn)行一致性檢驗(yàn)。根據(jù)權(quán)重的比較,確定中醫(yī)藥Web信息資源評(píng)價(jià)中各個(gè)指標(biāo)的重要性程度。(4)基于數(shù)據(jù)分析的中醫(yī)藥Web信息資源評(píng)價(jià)實(shí)施(第6章)以具體的中醫(yī)藥網(wǎng)站評(píng)價(jià)實(shí)務(wù)為例,從搭建分析環(huán)境開(kāi)始,包括對(duì)于軟硬件的配置要求、系統(tǒng)架構(gòu)、Hadoop集群搭建等都進(jìn)行了詳細(xì)的說(shuō)明。并解釋了相關(guān)Map Reduce算法設(shè)計(jì)與實(shí)現(xiàn),闡述了對(duì)網(wǎng)站進(jìn)行分類(lèi)、打分評(píng)價(jià)的具體實(shí)施過(guò)程。并指出了基于該評(píng)價(jià),網(wǎng)站應(yīng)做的改進(jìn)。
[Abstract]:With the development of computer and communication technology, Internet (Internet) has gradually penetrated into people's production and life in all areas, become an important source of knowledge, people from the Internet to obtain information to guide their work and life, modern society has all the time, all cannot do without the Internet.Web, refers to the Internet and HTML related parts, namely Chinese medicine information resources day HTML protocol based on.Web page information resources are growing, existing resources are constantly changing and updating, the rapid development of information technology makes the traditional Chinese medicine information resources related data on the Web is growing explosively, but the traditional Chinese medicine the growing information quality uneven in quality, and in the existing situation is very difficult to assess the quality of a relatively perfect method of traditional Chinese medicine information resources, and To guide people from Chinese medicine information resources found in the correct and useful information on their own. Therefore, we need a method to objectively assess TCM information resources exist on the Web at present. From the characteristics of information resources of traditional Chinese medicine of Web, using the Hadoop distributed computing technology, put forward Delphy Fa and AHP based on the data aided (Analytic Hierarchy Process, the analytic hierarchy process) to establish the evaluation index system of traditional Chinese medicine Web information resources, and makes an empirical research on Chinese medicine health service website. The main research results as follows: (1) the design of traditional Chinese medicine topic crawler. (Chapter third) discusses the Web of traditional Chinese medicine the medicine information resource with fast growth, wide distribution, easy to change, if you want to analyze and evaluate the traditional Chinese medicine information resource on the Web, the premise is to cheap, fast, high quality The method of obtaining information, so should the use of automated Web information retrieval method, namely the use of web crawler on traditional Chinese medicine Web information automatic crawling. At the same time, the difference of the reptiles and the general search engine crawler, only for the traditional Chinese medicine as the theme of the web crawling, avoid the waste of time so as to improve the accuracy of the crawler. Rate of climb from the target. So based on the above requirements, determine the TCM topical crawler distributed, scalable, high performance, high quality crawling target, formulate the corresponding crawling strategy, and the development of reptiles. (2) Chinese medicine information resources of the Hadoop platform. (Chapter third, chapter sixth) to take up the Chinese medicine Web topic page content, because of the extensive range, need to regularly update the data at the same time, page analysis and data mining, analysis of strategy use single, to bring high performance single Storage requirements, so the use of stand-alone database, can not meet the requirements of high performance computing, therefore, in the crawler crawl page, use Hadoop HDFS for storage, mining in the late of the existing web content text, statistical analysis, can ensure the high performance and low system overhead construction (3). The evaluation index system of Web information resources of traditional Chinese medicine. (Chapter fourth, chapter fifth) starting from the characteristics of information resources of traditional Chinese medicine Web, Chinese medicine Web on information resources evaluation principle, the evaluation index system was constructed. The evaluation index system is divided into four parts, namely information content evaluation website design, evaluation, usability evaluation and other evaluation. Each part is divided two levels of specific indicators, a total of 24 items, and a detailed description of the meaning and function of these 24 evaluation indexes. Then the analysis method based on AHP levels. The analysis of medical information resource evaluation, establish judgment matrix, determining the index weight of the index system, and consistency checking. According to the weight of the comparison, determine the degree of importance of each index of traditional Chinese medicine Web information resources evaluation. (4) in the evaluation of the implementation of pharmaceutical Web information resources based on data analysis (Chapter sixth) to TCM site specific evaluation practice, starting from the analysis of constructing the environment, including the software and hardware configuration requirements, system architecture, Hadoop cluster are discussed in detail. And explain the design and implementation of Map Reduce algorithm, describes the classification of the site, the specific implementation process and evaluation. Pointed out based on the evaluation, improve the site should be done.
【學(xué)位授予單位】:山東中醫(yī)藥大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類(lèi)號(hào)】:TP393.09;R2-03
【相似文獻(xiàn)】
相關(guān)博士學(xué)位論文 前1條
1 李學(xué)博;基于Hadoop的中醫(yī)藥Web信息資源評(píng)價(jià)體系研究[D];山東中醫(yī)藥大學(xué);2016年
,本文編號(hào):1574269
本文鏈接:http://sikaile.net/zhongyixuelunwen/1574269.html
最近更新
教材專(zhuān)著