基于機器學習的用戶反饋數(shù)據(jù)中心設計與實現(xiàn)
發(fā)布時間:2018-06-05 16:37
本文選題:用戶反饋 + 文本分類。 參考:《北京交通大學》2017年碩士論文
【摘要】:該項目是來源于百度公司度秘產(chǎn)品線的實際項目,屬于互聯(lián)網(wǎng)人工智能領域。度秘是新一代智能操作系統(tǒng)的杰出代表,以NLP(Natural Language Processing,自然語言處理)技術為基礎,明確用戶需求,提供相應服務。在該產(chǎn)品線上,每天要接收到十萬量級的用戶評論和用戶反饋,數(shù)據(jù)量十分龐大。通過對用戶反饋進行分類篩選,可以得到用戶對于當前產(chǎn)品使用體驗的相關問題和建議,直觀的反映出當前版本產(chǎn)品所存在的問題和亟待優(yōu)化的部分,從而引導迭代需求,也為質量保證人員跟蹤線上問題提供了依據(jù)。數(shù)量龐大的用戶反饋數(shù)據(jù)的文本分類和篩選成為問題的關鍵,但是當前的解決方法為人工從線上數(shù)據(jù)庫中導出部分數(shù)據(jù),并進行人工分類篩選有用的反饋。論文運用機器學習的方法,設計和實現(xiàn)了用戶反饋數(shù)據(jù)中心平臺,將戶反饋數(shù)據(jù)導入到平臺中,可以高效且準確地對龐大數(shù)據(jù)量的用戶反饋文本進行分類篩選,進行分類展現(xiàn)和統(tǒng)計,方便相關人員進行查閱并跟進用戶反饋問題的原因排查和問題解決。用戶反饋數(shù)據(jù)中心平臺系統(tǒng)可劃分成三大部分:用戶反饋數(shù)據(jù)的拉取、反饋數(shù)據(jù)分類篩選和用戶反饋數(shù)據(jù)中心。其中,用戶反饋數(shù)據(jù)的拉取利用Python 編寫相關輪詢 API(Application Programming Interface,應用程序編程接口)從公司統(tǒng)一的用戶反饋平臺上拉取該產(chǎn)品線的所有反饋數(shù)據(jù)并根據(jù)需要重新組織數(shù)據(jù)格式,并存儲到Hbase中;反饋數(shù)據(jù)的分類篩選利用機器學習中的遺傳算法等相關算法,完成特征詞的提取,優(yōu)化分類以及數(shù)據(jù)根據(jù)特征詞進行相應的數(shù)據(jù)分類篩選;數(shù)據(jù)中心基于PHP和MySQL,實現(xiàn)數(shù)據(jù)的分類展現(xiàn)、條件查詢、反饋問題跟蹤處理等功能。論文完成了用戶反饋數(shù)據(jù)中心平臺系統(tǒng)的需求分析、總體設計、詳細設計、測試驗證等具體工作。本人參與設計和開發(fā)了用戶反饋數(shù)據(jù)拉取、基于機器學習的反饋數(shù)據(jù)分類篩選和數(shù)據(jù)平臺中的相關功能。目前論文完成的用戶反饋數(shù)據(jù)中心平臺系統(tǒng)已經(jīng)上線投入使用,數(shù)據(jù)分類合格率達到91%以上。用戶反饋數(shù)據(jù)中心極大地提高了用戶反饋處理的效率,并釋放了數(shù)據(jù)人力,獲得了部門領導和同事的一致好評。
[Abstract]:The project is derived from Baidu Company's secret product line of the actual project, belong to the field of artificial intelligence on the Internet. Degree secret is an outstanding representative of the new generation of intelligent operating system. It is based on the NLP Natural language processing (NLP) technology to define the user's needs and provide the corresponding services. In this product line, we receive 100,000 comments and feedback every day, and the amount of data is very large. By classifying and filtering the user feedback, we can get the relevant problems and suggestions of the user for the current product use experience, and intuitively reflect the problems existing in the current version of the product and the parts that need to be optimized so as to guide the iterative requirements. It also provides the basis for the quality assurance personnel to track the problems on the line. A large number of user feedback data text classification and filtering become the key to the problem, but the current solution is to manually export part of the data from the online database, and carry out manual classification filtering useful feedback. In this paper, the user feedback data center platform is designed and implemented by the method of machine learning, and the household feedback data is imported into the platform, which can efficiently and accurately classify and filter the user feedback text of the huge amount of data. Conduct classification presentation and statistics, facilitate related personnel to consult and follow up user feedback problem of the cause and problem solving. The system of user feedback data center platform can be divided into three parts: the pulling of user feedback data, the classification and filtering of feedback data and the user feedback data center. Among them, the pull of the user feedback data uses Python to write the related polling API Application programming Interface (API) from the company's unified user feedback platform to pull all the feedback data of the product line and reorganize the data format according to the need. The feedback data is classified and filtered by genetic algorithm in machine learning to extract the feature words, optimize the classification and select the corresponding data according to the feature words. The data center is based on PHP and MySQL to realize the functions of data classification, conditional query, feedback problem tracking and so on. In this paper, the requirements analysis, overall design, detailed design, test and verification of the user feedback data center platform system are completed. I have participated in the design and development of user feedback data extraction, feedback data classification and filtering based on machine learning and related functions in the data platform. At present, the user feedback data center platform system has been put into use, and the qualified rate of data classification is over 91%. The user feedback data center greatly improves the efficiency of user feedback processing, and releases the data manpower, which is well received by department leaders and colleagues.
【學位授予單位】:北京交通大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.52;TP181
【參考文獻】
相關期刊論文 前4條
1 何永;;一種元數(shù)據(jù)驅動數(shù)據(jù)倉庫設計與應用[J];科技創(chuàng)新與應用;2014年02期
2 趙龍;江榮安;;基于Hive的海量搜索日志分析系統(tǒng)研究[J];計算機應用研究;2013年11期
3 黃楠;;海量信息存儲中數(shù)據(jù)庫性能優(yōu)化方法[J];科技通報;2013年03期
4 張海軍;彭成;欒靜;;基于外部排序的字串左右熵快速計算方法[J];計算機工程與應用;2011年19期
,本文編號:1982651
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1982651.html
最近更新
教材專著