基于APP數(shù)據(jù)的標(biāo)簽提取與整合
發(fā)布時間:2018-01-20 10:46
本文關(guān)鍵詞: APP數(shù)據(jù) 標(biāo)簽提取 標(biāo)簽整合 標(biāo)簽系統(tǒng) 出處:《浙江大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:近幾年隨著移動互聯(lián)網(wǎng)的高速發(fā)展,手機(jī)上的應(yīng)用程序(APP)呈現(xiàn)出爆炸式增長,如何幫助用戶在海量的APP中找到合適的APP是各下載平臺急需解決的問題。傳統(tǒng)熱門排序的方法缺點(diǎn)十分明顯,熱門的APP競爭有限的曝光位置,大量長尾的APP得不到出場機(jī)會,這十分不利于APP生態(tài)圈的構(gòu)建。標(biāo)簽體系是Web2.0時代解決資源對象長尾、幫助用戶管理、檢索資源對象的有效方式。在APP領(lǐng)域鮮有標(biāo)簽體系的研究,APPStore、Google Play等國外代表性APP下載平臺也尚無標(biāo)簽功能,但未來隨著APP數(shù)量繼續(xù)爆炸式增長,通過標(biāo)簽體系來解決APP長尾問題勢在必行。本文致力于對這一全新領(lǐng)域進(jìn)行探索,研究如何高效、自動、準(zhǔn)確地構(gòu)建APP領(lǐng)域標(biāo)簽體系,具體包括四方面工作:·數(shù)據(jù)預(yù)處理工作:針對APP數(shù)據(jù)專門進(jìn)行了新詞發(fā)現(xiàn)、停用詞表構(gòu)建,在360APP數(shù)據(jù)集上實(shí)驗(yàn)表明,預(yù)處理工作可以顯著改善標(biāo)簽質(zhì)量!(biāo)簽提取工作:歸納總結(jié)了常用的關(guān)鍵詞提取、標(biāo)簽推薦算法,根據(jù)APP數(shù)據(jù)存在多維度文本的特點(diǎn),改進(jìn)提出了三種有效的標(biāo)簽提取算法(SemanticRank、RankScore1、RankScOre2算法),更加適應(yīng)APP數(shù)據(jù)集特點(diǎn),得到較優(yōu)的標(biāo)簽結(jié)果。·標(biāo)簽整合工作:借鑒知識圖譜思想,引入多個外部數(shù)據(jù)源構(gòu)建同義詞關(guān)系、偏序關(guān)系整合標(biāo)簽,大幅提高了APP覆蓋率、召回率,解決了標(biāo)簽雜亂、不規(guī)范的問題。另外,本文提出APP標(biāo)簽樹的方法管理APP標(biāo)簽,使得APP標(biāo)簽多維度問題得到有效解決!(biāo)簽管理系統(tǒng)工作:實(shí)現(xiàn)了APP標(biāo)簽管理系統(tǒng),系統(tǒng)融入了預(yù)處理、標(biāo)簽提取、標(biāo)簽整合相關(guān)成果,并提供友好的交互界面和可視化界面,方便標(biāo)簽管理人員維護(hù)、管理標(biāo)簽。
[Abstract]:In recent years, with the rapid development of the mobile Internet, the application on the mobile phone (app) has shown an explosive growth. How to help users find the right APP in the massive APP is an urgent problem for the download platforms. The disadvantages of the traditional popular sorting method are very obvious, and the hot APP competes for the limited exposure location. A large number of long-tailed APP is not available, which is not conducive to the construction of the APP ecosystem. Tag system is the Web2.0 era to solve the long tail of resource objects, to help users manage. In the field of APP, there are few research on tag system. There is no tag function on the APP download platform such as app Store Play and other representative foreign countries. However, as the number of APP continues to explode in the future, it is imperative to solve the long tail problem of APP through label system. This paper is devoted to explore this new field, how to study how to be efficient and automatic. The accurate construction of APP domain label system includes four aspects: 路data preprocessing work: for the APP data specifically for the new word discovery, stop the construction of vocabulary. Experiments on 360 app data set show that preprocessing can significantly improve tag quality... Tag extraction: summarized commonly used keyword extraction, tag recommendation algorithm. According to the characteristics of multi-dimensional text in APP data, three effective label extraction algorithms are proposed. RankScOre2 algorithm, more suitable for the characteristics of APP data sets, get better tag results... Tag integration work: learn from the idea of knowledge map. Introduce a number of external data sources to build synonym relationships, partial order relationship integration tags, significantly improve APP coverage, recall rate, to solve the label clutter, non-standard problem. In this paper, the method of APP tag tree is put forward to manage APP tag, so that the multi-dimension problem of APP tag can be solved effectively. 路tag management system work: implement APP tag management system. The system integrates preprocessing, label extraction, label integration, and provides friendly interface and visual interface for label manager maintenance and label management.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1;TP311.56
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 張子柯;周濤;張翼成;;Tag-Aware Recommender Systems:A State-of-the-Art Survey[J];Journal of Computer Science & Technology;2011年05期
2 黃昌寧;趙海;;中文分詞十年回顧[J];中文信息學(xué)報;2007年03期
3 周茜,趙明生,扈e,
本文編號:1447908
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1447908.html
最近更新
教材專著