面向數(shù)據(jù)融合的半環(huán)溯源計(jì)算方法
發(fā)布時(shí)間:2018-01-01 08:35
本文關(guān)鍵詞:面向數(shù)據(jù)融合的半環(huán)溯源計(jì)算方法 出處:《計(jì)算機(jī)研究與發(fā)展》2016年02期 論文類(lèi)型:期刊論文
更多相關(guān)文章: 數(shù)據(jù)融合 半環(huán)溯源 多項(xiàng)式系統(tǒng) 派生樹(shù) 遞歸查詢(xún)
【摘要】:數(shù)據(jù)融合是集成數(shù)據(jù)的質(zhì)量保證和分析挖掘的前提條件;然而,數(shù)據(jù)融合作為一個(gè)整體對(duì)于用戶(hù)來(lái)講是一個(gè)黑盒過(guò)程,使得當(dāng)前數(shù)據(jù)融合過(guò)程缺乏可解釋性和可調(diào)試性.為了便于數(shù)據(jù)融合過(guò)程中有效的沖突檢測(cè)和調(diào)試,需要利用數(shù)據(jù)溯源技術(shù)建立數(shù)據(jù)融合的可回溯機(jī)制.數(shù)據(jù)溯源描述了數(shù)據(jù)產(chǎn)生并隨著時(shí)間推移而演變的整個(gè)過(guò)程,半環(huán)溯源模型作為一種經(jīng)典的數(shù)據(jù)溯源表示形式,不僅能表示結(jié)果數(shù)據(jù)是由哪些數(shù)據(jù)派生的,而且還能夠描述這些數(shù)據(jù)以什么方式進(jìn)行派生.主要研究用于數(shù)據(jù)融合的半環(huán)溯源的計(jì)算問(wèn)題.用于數(shù)據(jù)融合的半環(huán)溯源計(jì)算是一個(gè)pay as you go的模式,計(jì)算數(shù)據(jù)的溯源信息是一個(gè)非常耗時(shí)的過(guò)程.首先,提出一種基于Kleene序列的近似迭代方法,并證明了該方法與半環(huán)溯源的派生樹(shù)定義的關(guān)系,從而證明了該方法的正確性.然后,提出了一種類(lèi)牛頓序列,這種方法比Kleene序列有更好的收斂性.由于遞歸的引入可能會(huì)導(dǎo)致這2種迭代算法無(wú)法終止,通過(guò)分析結(jié)果元組的半環(huán)多項(xiàng)式溯源的特點(diǎn),證明這2種近似算法最壞可在n次迭代后終止.最后,通過(guò)實(shí)驗(yàn)說(shuō)明了本文提出的方法是可行和有效的.
[Abstract]:Data fusion is a prerequisite for quality assurance of integrated data and analysis and mining. However, data fusion as a whole is a black box process for users. It makes the current data fusion process lack of interpretability and debugging, in order to facilitate the effective conflict detection and debugging in the process of data fusion. Data traceability technology needs to be used to establish a traceability mechanism for data fusion. Data traceability describes the whole process of data generation and evolution over time. Semi-loop traceability model, as a classical data traceability representation, can not only represent the data derived from the result data. It is also able to describe how these data are derived. This paper mainly studies the computation of semicyclic traceability for data fusion. The semicyclic traceability calculation for data fusion is a pay as you. Go mode. The traceability information of computing data is a time-consuming process. Firstly, an approximate iterative method based on Kleene sequence is proposed, and the relationship between the method and the definition of derivative tree of semi-traceability is proved. The correctness of the method is proved. Then, a kind of Newtonian sequence is proposed. This method has better convergence than Kleene sequence. Because the introduction of recursion may lead to the two iterative algorithms can not be terminated, by analyzing the characteristics of the semi-ring polynomial of the result tuple traceability. It is proved that the worst of these two approximate algorithms can be terminated after n iterations. Finally, the experimental results show that the proposed method is feasible and effective.
【作者單位】: 東北大學(xué)信息科學(xué)與工程學(xué)院;
【基金】:國(guó)家自然科學(xué)基金項(xiàng)目(61472070) 國(guó)家“九七三”重點(diǎn)基礎(chǔ)研究發(fā)展規(guī)劃基金項(xiàng)目(2012CB316201)~~
【分類(lèi)號(hào)】:TP202
【正文快照】: 隨著網(wǎng)絡(luò)的飛速發(fā)展,Web技術(shù)以其廣泛性、交互性、快捷性和開(kāi)放性等特點(diǎn)迅速風(fēng)靡全球,并且已經(jīng)滲入到社會(huì)的各個(gè)領(lǐng)域,網(wǎng)站及網(wǎng)頁(yè)數(shù)量正以指數(shù)級(jí)飛速增長(zhǎng).如何準(zhǔn)確、有效地集成海量高價(jià)值的Web信息,對(duì)于諸如市場(chǎng)情報(bào)分析、輿情分析、商業(yè)智能等分析型應(yīng)用尤為重要,具有非常重要,
本文編號(hào):1363747
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1363747.html
最近更新
教材專(zhuān)著