基于電子商務(wù)領(lǐng)域分類樹和眾包的商品語義標注方法研究
本文選題:電子商務(wù)領(lǐng)域分類樹 + 語義標注。 參考:《華東師范大學(xué)》2017年碩士論文
【摘要】:隨著電商行業(yè)和互聯(lián)網(wǎng)技術(shù)如火如茶的發(fā)展,一種將視頻與電子商務(wù)相結(jié)合的新型商業(yè)模式T20應(yīng)運而生。視頻中一閃而過的商品畫面可以通過圖像匹配算法與商品資源庫中的商品圖片準確匹配,從而向用戶提供商品的購買鏈接。如果在構(gòu)建商品資源庫的時候為商品資源添加更多的語義標簽,那么能夠在節(jié)約用戶瀏覽商品詳情時間的同時,根據(jù)商品的不同標簽信息為用戶進行商品推薦。本文主要對商品文本資源進行語義標注研究。現(xiàn)有對文本資源語義標注的研究中,標注資源(如文檔、網(wǎng)頁)多為結(jié)構(gòu)文本或者長文本,依賴領(lǐng)域本體或知識庫等知識組織體系。然而,在電子商務(wù)領(lǐng)域,缺乏共享通用的領(lǐng)域本體,商品描述文本具有"碎片化"、缺乏上下文語義信息等特點。針對這種情況,本文以電子商務(wù)領(lǐng)域分類樹為知識組織體系,提出基于詞向量的商品語義標注方法,由此為商品添加類別、屬性等語義標簽。本文的主要研究內(nèi)容包括:首先,利用在線商品資源庫的商品目錄以及大規(guī)模商品資源的屬性描述,抽取商品概念、概念關(guān)系以及概念屬性,構(gòu)建電子商務(wù)領(lǐng)域的商品分類樹;其次,通過訓(xùn)練電子商務(wù)領(lǐng)域的Word2vec詞向量提取商品描述文本的語義特征;然后,將電子商務(wù)領(lǐng)域分類樹的商品概念視為已知的分類標簽集合,訓(xùn)練基于詞向量的商品分類器,將待標注的商品視為待分類的數(shù)據(jù),通過分類器將商品映射到分類樹中的商品概念,標注商品的類別;根據(jù)商品概念映射的結(jié)果,在分類樹上獲取商品的概念屬性,從詞形和語義兩方面衡量商品描述文本中屬性-屬性值對的屬性與概念屬性之間的相似度,標注商品的屬性值;最后,通過融合眾包和主動學(xué)習(xí)迭代訓(xùn)練商品分類器,提高商品分類的準確率,改進商品語義標注的質(zhì)量。本文的主要貢獻如下:1.提出了一種基于電子商務(wù)領(lǐng)域分類樹和詞向量的商品語義標注方法,以電子商務(wù)領(lǐng)域分類樹為知識組織體系,能夠同領(lǐng)域本體一樣較好地表達出領(lǐng)域知識的層次關(guān)系,并且相較于本體構(gòu)建更為簡單,更容易理解;利用Word2vec詞向量生成商品描述的語義特征,使得商品描述具有明確的語義信息。通過兩者的結(jié)合使得在構(gòu)建商品資源庫時能夠為商品資源添加類別、屬性、屬性值等語義標簽。本文的方法適用于不同商品資源庫的構(gòu)建,解決了商品來源的異構(gòu)性。2.提出了一種融合眾包和主動學(xué)習(xí)的商品語義標注質(zhì)量改進方法,結(jié)合眾包標注準確率高和機器分類速度快的優(yōu)勢,通過主動學(xué)習(xí)的采樣策略,選取機器分類結(jié)果中可信度低的結(jié)果交于眾包進行標注,能夠利用少量已知分類標簽的商品數(shù)據(jù)和大量未知分類標簽的商品數(shù)據(jù),通過迭代訓(xùn)練出一個精度較高的商品分類器,能夠提升分類質(zhì)量的同時節(jié)約標注成本。
[Abstract]:With the development of e-commerce industry and Internet technology such as tea a new business model T20 which combines video and electronic commerce emerges as the times require. The flash of commodity images in the video can match accurately with the commodity images in the commodity resource database through the image matching algorithm, so as to provide a link to the purchase of the products to the user. If we add more semantic tags to the commodity resources when we build the commodity resource bank, then we can save the time for users to browse the details of the goods, and then we can recommend the goods to the users according to the different label information of the goods. This paper focuses on the semantic annotation of commodity text resources. In the current research on semantic annotation of text resources, annotation resources (such as documents, web pages) are mostly structured or long text, relying on domain ontology or knowledge base and other knowledge organization systems. However, in the field of electronic commerce, there is a lack of shared domain ontology, and commodity description texts are characterized by "fragmentation" and lack of contextual semantic information. In this paper, the classification tree of electronic commerce is taken as the knowledge organization system, and the semantic tagging method based on word vector is proposed to add category, attribute and other semantic labels to the product. The main research contents of this paper are as follows: firstly, the commodity classification tree in the field of electronic commerce is constructed by using the commodity catalogue of online commodity resource bank and attribute description of large-scale commodity resources, extracting commodity concept, concept relation and conceptual attribute; Secondly, the semantic feature of the product description text is extracted by training Word2vec word vector in the field of electronic commerce, and then, the concept of commodity in the electronic commerce domain classification tree is regarded as a known set of classification labels, and the commodity classifier based on word vector is trained. The goods to be labeled are regarded as the data to be classified, and the goods are mapped to the concept of goods in the classification tree by classifier, and the categories of goods are marked; according to the results of the mapping of commodity concepts, the conceptual attributes of goods are obtained on the classification tree. The similarity between attribute-attribute value pair and conceptual attribute in commodity description text is measured from word form and semantic aspect. Finally, product classifier is trained by combining crowdsourcing and active learning iteration. Improve the accuracy of commodity classification, improve the quality of commodity semantic tagging. The main contributions of this paper are as follows: 1. This paper presents a semantic labeling method for goods based on the domain classification tree and word vector of electronic commerce. Taking the domain classification tree as the knowledge organization system, it can express the hierarchical relationship of domain knowledge as well as the domain ontology. Compared with ontology construction, it is simpler and easier to understand. By using Word2vec word vector to generate semantic features of commodity description, the product description has clear semantic information. The combination of the two makes it possible to add categories, attribute values and other semantic labels to commodity resources. The method proposed in this paper is suitable for the construction of different commodity resource banks and solves the isomerism of commodity sources. 2. 2. In this paper, a new method for improving the quality of commodity semantic tagging is proposed, which combines crowdsourcing and active learning. It combines the advantages of high accuracy of crowdsourcing tagging and fast machine classification, and adopts the sampling strategy of active learning. The results with low credibility in the machine classification results are selected to be annotated by crowdsourcing. It can use a small number of commodity data of known classification labels and a large number of commodity data of unknown classification labels to train a high precision commodity classifier through iterations. It can improve the classification quality and save the marking cost at the same time.
【學(xué)位授予單位】:華東師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1;F724.6
【參考文獻】
相關(guān)期刊論文 前10條
1 劉錦文;許靜;張利萍;芮偉康;;基于標簽傳播和主動學(xué)習(xí)的人物社會關(guān)系抽取[J];計算機工程;2017年02期
2 岳麗欣;劉文云;;國內(nèi)外領(lǐng)域本體構(gòu)建方法的比較研究[J];情報理論與實踐;2016年08期
3 吳潔明;劉雁昆;段建勇;;基于維基百科的領(lǐng)域本體自動構(gòu)建方法研究[J];計算機應(yīng)用與軟件;2016年07期
4 徐良英;;機器學(xué)習(xí)中主動學(xué)習(xí)方法研究[J];科技展望;2016年16期
5 張紅斌;姬東鴻;尹蘭;任亞峰;;基于梯度核特征及N-gram模型的商品圖像句子標注[J];計算機科學(xué);2016年05期
6 傅柱;;語義標注研究綜述[J];圖書館學(xué)研究;2016年04期
7 張紅斌;姬東鴻;任亞峰;尹蘭;;基于多核學(xué)習(xí)的商品圖像句子標注[J];計算機科學(xué)與探索;2015年11期
8 熊晶;支麗平;袁冬;;基于本體和依存句法的詞匯語義關(guān)系標注及評價方法研究[J];中文信息學(xué)報;2015年03期
9 吳國芳;余玉霞;;一種基于重用本體的語義標注系統(tǒng)[J];紹興文理學(xué)院學(xué)報(自然科學(xué));2015年01期
10 呂剛;王曉峰;胡春玲;;基于本體學(xué)習(xí)的標簽推薦方法研究[J];小型微型計算機系統(tǒng);2015年03期
相關(guān)會議論文 前1條
1 周小田;王宏志;郭翔宇;胡筱;董志鑫;李建中;高宏;;基于知識庫的互聯(lián)網(wǎng)商品信息分類與推薦系統(tǒng)[A];第29屆中國數(shù)據(jù)庫學(xué)術(shù)會議論文集(B輯)(NDBC2012)[C];2012年
相關(guān)碩士學(xué)位論文 前2條
1 江大鵬;基于詞向量的短文本分類方法研究[D];浙江大學(xué);2015年
2 王亞斌;基于本體的語義標注研究[D];蘭州理工大學(xué);2010年
,本文編號:2076078
本文鏈接:http://sikaile.net/jingjilunwen/guojimaoyilunwen/2076078.html