天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

“天眼查”分布式爬蟲系統(tǒng)中驗(yàn)證碼識(shí)別模塊的設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-10-24 09:55
【摘要】:"天眼查"是一款提供了全面的企業(yè)信息查詢、專業(yè)的企業(yè)關(guān)系挖掘的工具平臺(tái),可查詢企業(yè)工商信息、法律訴訟、商標(biāo)專利、對(duì)外投資、招投標(biāo)、失信、經(jīng)營異常、企業(yè)年報(bào)、招聘及新聞動(dòng)態(tài)等,覆蓋全國超8000萬家企業(yè)信息,與工商局網(wǎng)站同步更新。"天眼查"平臺(tái)通過抓取互聯(lián)網(wǎng)公開信息,將主體間的關(guān)系以可視化的方式直觀呈現(xiàn),為用戶提供全面可靠的企業(yè)數(shù)據(jù)分析,幫助用戶發(fā)現(xiàn)更多隱藏的商業(yè)利益關(guān)系,適合金融、投資、律師、記者、商務(wù)人士及時(shí)了解企業(yè)經(jīng)營狀況、洞察企業(yè)經(jīng)營信息。然而,在抓取互聯(lián)網(wǎng)公開信息的時(shí)候,會(huì)遇到各種類型的驗(yàn)證碼,如填寫成語、漢語拼音、算術(shù)題、英文數(shù)字字母等等,人工識(shí)別或傳統(tǒng)技術(shù)識(shí)別無法適應(yīng)大量數(shù)據(jù)爬取的需求。因此需要設(shè)計(jì)一套高效的驗(yàn)證碼識(shí)別系統(tǒng)以有效提高信息的獲取速度,并為將來的數(shù)據(jù)挖掘獲取提供保障。論文選題來源于公司實(shí)際應(yīng)用項(xiàng)目,在分析"天眼查"產(chǎn)品的驗(yàn)證碼識(shí)別需求的基礎(chǔ)上,設(shè)計(jì)和實(shí)現(xiàn)了基于深度學(xué)習(xí)的驗(yàn)證碼識(shí)別系統(tǒng)。論文完成的具體工作包括:完成了驗(yàn)證碼識(shí)別系統(tǒng)的需求分析;設(shè)計(jì)了技術(shù)架構(gòu);將系統(tǒng)功能分解為基于深度學(xué)習(xí)的驗(yàn)證碼訓(xùn)練子系統(tǒng)、驗(yàn)證碼識(shí)別服務(wù)子系統(tǒng)和爬蟲應(yīng)用子系統(tǒng)三個(gè)相對(duì)獨(dú)立的部分,并分別完成了三個(gè)部分的概要設(shè)計(jì)、詳細(xì)設(shè)計(jì)和實(shí)現(xiàn);完成了對(duì)原有Spring、Redis技術(shù)架構(gòu)進(jìn)行相匹配的架構(gòu)升級(jí)設(shè)計(jì);完成了系統(tǒng)功能測試。本文的成果最終已經(jīng)成功應(yīng)用到"天眼查"平臺(tái)的實(shí)際生產(chǎn)環(huán)節(jié)中,驗(yàn)證碼識(shí)別率高,大大提高了爬蟲的爬取效率。論文涉及的軟件成果也已成功申請(qǐng)到了軟件著作權(quán)。本文成果的成功應(yīng)用,證實(shí)了機(jī)器學(xué)習(xí),特別是深度學(xué)習(xí),在驗(yàn)證碼識(shí)別的領(lǐng)域具有很大應(yīng)用前景,值得進(jìn)一步探究。
[Abstract]:"Sky Eye check" is a tool platform that provides comprehensive enterprise information inquiry, professional enterprise relationship mining, can query enterprise business information, legal proceedings, trademark patents, foreign investment, bidding, breach of trust, abnormal management, Annual reports, recruitment and news trends, covering more than 80 million enterprises across the country, updated with the website of the Bureau of Industry and Commerce. " Through grabbing public information on the Internet, the platform visually presents the relationship between subjects in a visual manner, provides users with comprehensive and reliable enterprise data analysis, helps users to discover more hidden commercial interests, and is suitable for finance. Investment, lawyers, journalists, business people timely understand the status of business operations, insight into business information. However, when grabbing public information on the Internet, you will encounter various types of verification codes, such as filling in idioms, Pinyin, arithmetic problems, English numerals, and so on. Manual identification or traditional technology recognition can not meet the needs of a large number of data crawling. Therefore, it is necessary to design a set of efficient verification code recognition system to effectively improve the speed of information acquisition and provide a guarantee for data mining in the future. This paper is based on the practical application project of the company. On the basis of analyzing the requirement of the verification code recognition of the "Sky Eye check" product, the paper designs and implements the verification code recognition system based on the deep learning. The specific work accomplished in this paper includes: the requirement analysis of the verification code recognition system is completed; the technical framework is designed; the function of the system is decomposed into the CAPTC-code training subsystem based on in-depth learning. There are three relatively independent parts of CAPTCA service subsystem and crawler application subsystem, and the outline design, detailed design and implementation of the three parts are completed, and the architecture upgrade design of the original Spring,Redis technology architecture is completed. The system function test is completed. The results of this paper have been successfully applied to the actual production of "Sky Eye check" platform. The recognition rate of the verification code is high and the crawler crawling efficiency is greatly improved. The software products involved in the paper have also been successfully applied for software copyright. The successful application of this paper proves that machine learning, especially deep learning, has a great application prospect in the field of verification code recognition, and it is worthy of further exploration.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.52

【參考文獻(xiàn)】

相關(guān)期刊論文 前5條

1 沈金萍;;第39次《中國互聯(lián)網(wǎng)絡(luò)發(fā)展?fàn)顩r統(tǒng)計(jì)報(bào)告》發(fā)布我國網(wǎng)民達(dá)7.3億[J];傳媒;2017年03期

2 劉歡;邵蔚元;郭躍飛;;卷積神經(jīng)網(wǎng)絡(luò)在驗(yàn)證碼識(shí)別上的應(yīng)用與研究[J];計(jì)算機(jī)工程與應(yīng)用;2016年18期

3 ;CNNIC發(fā)布第38次《中國互聯(lián)網(wǎng)絡(luò)發(fā)展?fàn)顩r統(tǒng)計(jì)報(bào)告》[J];信息網(wǎng)絡(luò)安全;2016年08期

4 李小正;成功;趙全軍;;分布式爬蟲系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];中國科技信息;2014年15期

5 覃光華;李祚泳;;BP網(wǎng)絡(luò)過擬合問題研究及應(yīng)用[J];武漢大學(xué)學(xué)報(bào)(工學(xué)版);2006年06期

相關(guān)碩士學(xué)位論文 前4條

1 呂霽;基于神經(jīng)網(wǎng)絡(luò)的驗(yàn)證碼識(shí)別技術(shù)研究[D];華僑大學(xué);2015年

2 呂陽;分布式網(wǎng)絡(luò)爬蟲系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];電子科技大學(xué);2013年

3 許可;卷積神經(jīng)網(wǎng)絡(luò)在圖像識(shí)別上的應(yīng)用的研究[D];浙江大學(xué);2012年

4 呂剛;帶干擾的驗(yàn)證碼識(shí)別研究[D];浙江工業(yè)大學(xué);2009年

,

本文編號(hào):2291052

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2291052.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶26c03***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com