“天眼查”分布式爬蟲系統(tǒng)中驗(yàn)證碼識(shí)別模塊的設(shè)計(jì)與實(shí)現(xiàn)
[Abstract]:"Sky Eye check" is a tool platform that provides comprehensive enterprise information inquiry, professional enterprise relationship mining, can query enterprise business information, legal proceedings, trademark patents, foreign investment, bidding, breach of trust, abnormal management, Annual reports, recruitment and news trends, covering more than 80 million enterprises across the country, updated with the website of the Bureau of Industry and Commerce. " Through grabbing public information on the Internet, the platform visually presents the relationship between subjects in a visual manner, provides users with comprehensive and reliable enterprise data analysis, helps users to discover more hidden commercial interests, and is suitable for finance. Investment, lawyers, journalists, business people timely understand the status of business operations, insight into business information. However, when grabbing public information on the Internet, you will encounter various types of verification codes, such as filling in idioms, Pinyin, arithmetic problems, English numerals, and so on. Manual identification or traditional technology recognition can not meet the needs of a large number of data crawling. Therefore, it is necessary to design a set of efficient verification code recognition system to effectively improve the speed of information acquisition and provide a guarantee for data mining in the future. This paper is based on the practical application project of the company. On the basis of analyzing the requirement of the verification code recognition of the "Sky Eye check" product, the paper designs and implements the verification code recognition system based on the deep learning. The specific work accomplished in this paper includes: the requirement analysis of the verification code recognition system is completed; the technical framework is designed; the function of the system is decomposed into the CAPTC-code training subsystem based on in-depth learning. There are three relatively independent parts of CAPTCA service subsystem and crawler application subsystem, and the outline design, detailed design and implementation of the three parts are completed, and the architecture upgrade design of the original Spring,Redis technology architecture is completed. The system function test is completed. The results of this paper have been successfully applied to the actual production of "Sky Eye check" platform. The recognition rate of the verification code is high and the crawler crawling efficiency is greatly improved. The software products involved in the paper have also been successfully applied for software copyright. The successful application of this paper proves that machine learning, especially deep learning, has a great application prospect in the field of verification code recognition, and it is worthy of further exploration.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.52
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 沈金萍;;第39次《中國互聯(lián)網(wǎng)絡(luò)發(fā)展?fàn)顩r統(tǒng)計(jì)報(bào)告》發(fā)布我國網(wǎng)民達(dá)7.3億[J];傳媒;2017年03期
2 劉歡;邵蔚元;郭躍飛;;卷積神經(jīng)網(wǎng)絡(luò)在驗(yàn)證碼識(shí)別上的應(yīng)用與研究[J];計(jì)算機(jī)工程與應(yīng)用;2016年18期
3 ;CNNIC發(fā)布第38次《中國互聯(lián)網(wǎng)絡(luò)發(fā)展?fàn)顩r統(tǒng)計(jì)報(bào)告》[J];信息網(wǎng)絡(luò)安全;2016年08期
4 李小正;成功;趙全軍;;分布式爬蟲系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];中國科技信息;2014年15期
5 覃光華;李祚泳;;BP網(wǎng)絡(luò)過擬合問題研究及應(yīng)用[J];武漢大學(xué)學(xué)報(bào)(工學(xué)版);2006年06期
相關(guān)碩士學(xué)位論文 前4條
1 呂霽;基于神經(jīng)網(wǎng)絡(luò)的驗(yàn)證碼識(shí)別技術(shù)研究[D];華僑大學(xué);2015年
2 呂陽;分布式網(wǎng)絡(luò)爬蟲系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];電子科技大學(xué);2013年
3 許可;卷積神經(jīng)網(wǎng)絡(luò)在圖像識(shí)別上的應(yīng)用的研究[D];浙江大學(xué);2012年
4 呂剛;帶干擾的驗(yàn)證碼識(shí)別研究[D];浙江工業(yè)大學(xué);2009年
,本文編號(hào):2291052
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2291052.html