基于本體的洗錢案例特征提取研究
發(fā)布時間:2018-11-09 10:39
【摘要】: 洗錢案例特征值是金融領(lǐng)域中判定洗錢活動的一項重要參考依據(jù)。在基于案例推理的監(jiān)測甄別中,首要任務(wù)是將案例報告的特征值錄入到案例庫。由于洗錢案例報告自身所具有的信息隱藏性和非結(jié)構(gòu)化性,使得這項工作尚處于人工操作階段,在效率和準(zhǔn)確率上難以達(dá)到要求;诖,提出一種基于本體的特征提取方法,設(shè)計并實現(xiàn)了文本知識的自動獲取。 本體作為一種概念化的顯示說明,是對客觀存在的概念和關(guān)系的描述。通常情況下,本體的構(gòu)建是在領(lǐng)域?qū)<业闹笇?dǎo)下進(jìn)行的。實際應(yīng)用中,在分析了大量洗錢案例報告之后,將其抽象出一個概念模型,抽取其中能代表洗錢特性的關(guān)鍵字作為本體中的類。利用同樣原理再定義相應(yīng)類的子類以及子類與父類之間的屬性關(guān)系,最后是定義實例和加入約束。 在特征提取中,采用模式匹配和定義文法相結(jié)合的方法實現(xiàn)。模式匹配的功能是確定索引關(guān)鍵字出現(xiàn)在在文本向量中的位置;文法定義規(guī)定了被抽取數(shù)據(jù)的出現(xiàn)形式,數(shù)據(jù)定義提供了數(shù)據(jù)規(guī)格化的參考標(biāo)準(zhǔn)。此外,對模式匹配算法做了深入的研究,分析了各算法的優(yōu)缺點和復(fù)雜度,并對現(xiàn)有算法做了改進(jìn)。 最后,設(shè)計了一個原型系統(tǒng)。系統(tǒng)是開發(fā)語言是Java,運行在B/S模式下。系統(tǒng)中使用了開源工具protégé3.1進(jìn)行本體的編輯和Jena 2.4進(jìn)行本體解析,實驗的輸入數(shù)據(jù)來自官方提供的洗錢案例報告樣本,輸出形式為可以存儲在關(guān)系數(shù)據(jù)庫中的結(jié)構(gòu)化數(shù)據(jù)。
[Abstract]:The characteristic value of money laundering cases is an important reference for judging money laundering activities in the financial field. In case based reasoning (CBR) based monitoring, the primary task is to input the eigenvalues of case reports into the case base. Due to the information hiding and unstructured nature of money laundering case report, the work is still in the stage of manual operation, and it is difficult to meet the requirements in efficiency and accuracy. Based on this, a feature extraction method based on ontology is proposed, and the automatic acquisition of text knowledge is designed and realized. Ontology, as a conceptual representation, is a description of the concept and relationship of objective existence. In general, ontology construction is conducted under the guidance of domain experts. In practical application, after analyzing a large number of money laundering case reports, it is abstracted into a conceptual model and the keywords that represent the characteristics of money laundering are extracted as classes in the ontology. By using the same principle, we define the subclasses of the corresponding classes and the relationship between the subclasses and the parent classes. Finally, we define the instances and add the constraints. Pattern matching and definition grammar are used in feature extraction. The function of pattern matching is to determine where the index key appears in the text vector, and the grammar definition defines the appearance of the extracted data, and the data definition provides the reference standard for data normalization. In addition, the pattern matching algorithm is studied deeply, the advantages, disadvantages and complexity of each algorithm are analyzed, and the existing algorithms are improved. Finally, a prototype system is designed. The system is a development language and Java, runs in B / S mode. The open source tool prot 茅 g 茅 3.1 is used for ontology editing and Jena 2.4 for ontology parsing. The output is structured data that can be stored in a relational database.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2007
【分類號】:TP399-C1;D917
本文編號:2320162
[Abstract]:The characteristic value of money laundering cases is an important reference for judging money laundering activities in the financial field. In case based reasoning (CBR) based monitoring, the primary task is to input the eigenvalues of case reports into the case base. Due to the information hiding and unstructured nature of money laundering case report, the work is still in the stage of manual operation, and it is difficult to meet the requirements in efficiency and accuracy. Based on this, a feature extraction method based on ontology is proposed, and the automatic acquisition of text knowledge is designed and realized. Ontology, as a conceptual representation, is a description of the concept and relationship of objective existence. In general, ontology construction is conducted under the guidance of domain experts. In practical application, after analyzing a large number of money laundering case reports, it is abstracted into a conceptual model and the keywords that represent the characteristics of money laundering are extracted as classes in the ontology. By using the same principle, we define the subclasses of the corresponding classes and the relationship between the subclasses and the parent classes. Finally, we define the instances and add the constraints. Pattern matching and definition grammar are used in feature extraction. The function of pattern matching is to determine where the index key appears in the text vector, and the grammar definition defines the appearance of the extracted data, and the data definition provides the reference standard for data normalization. In addition, the pattern matching algorithm is studied deeply, the advantages, disadvantages and complexity of each algorithm are analyzed, and the existing algorithms are improved. Finally, a prototype system is designed. The system is a development language and Java, runs in B / S mode. The open source tool prot 茅 g 茅 3.1 is used for ontology editing and Jena 2.4 for ontology parsing. The output is structured data that can be stored in a relational database.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2007
【分類號】:TP399-C1;D917
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 鄧志鴻,唐世渭,張銘,楊冬青,陳捷;Ontology研究綜述[J];北京大學(xué)學(xué)報(自然科學(xué)版);2002年05期
2 莫洪憲;略論我國的金融反恐[J];法學(xué)評論;2005年05期
3 晉耀紅,苗傳江;一個基于語境框架的文本特征提取算法[J];計算機研究與發(fā)展;2004年04期
4 羅三定,陸文彥,王浩,賈維嘉;基于概念的文本類別特征提取與文本模糊匹配[J];計算機工程與應(yīng)用;2002年16期
5 欒艷 ,丁二玉 ,駱斌;基于Ontology的語義檢索技術(shù)[J];計算機工程與應(yīng)用;2005年28期
6 成瑜;何潔月;;本體驅(qū)動的半結(jié)構(gòu)化Web生物數(shù)據(jù)抽取[J];計算機工程;2006年05期
7 金芝;基于本體的需求自動獲取[J];計算機學(xué)報;2000年05期
8 王海濤;曹存根;高穎;;基于領(lǐng)域本體的半結(jié)構(gòu)化文本知識自動獲取方法的設(shè)計和實現(xiàn)[J];計算機學(xué)報;2005年12期
9 鄧志鴻,唐世渭,楊冬青;面向語義集成——本體在Web信息集成中的研究進(jìn)展[J];計算機應(yīng)用;2002年01期
10 郝占剛,王正歐;基于模式聚類和遺傳算法的文本特征提取方法[J];計算機應(yīng)用;2005年07期
,本文編號:2320162
本文鏈接:http://sikaile.net/shekelunwen/gongan/2320162.html