天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于電子商務(wù)領(lǐng)域分類樹和眾包的商品語(yǔ)義標(biāo)注方法研究

發(fā)布時(shí)間:2018-06-28 01:20

  本文選題:電子商務(wù)領(lǐng)域分類樹 + 語(yǔ)義標(biāo)注; 參考:《華東師范大學(xué)》2017年碩士論文


【摘要】:隨著電商行業(yè)和互聯(lián)網(wǎng)技術(shù)如火如茶的發(fā)展,一種將視頻與電子商務(wù)相結(jié)合的新型商業(yè)模式T20應(yīng)運(yùn)而生。視頻中一閃而過(guò)的商品畫面可以通過(guò)圖像匹配算法與商品資源庫(kù)中的商品圖片準(zhǔn)確匹配,從而向用戶提供商品的購(gòu)買鏈接。如果在構(gòu)建商品資源庫(kù)的時(shí)候?yàn)樯唐焚Y源添加更多的語(yǔ)義標(biāo)簽,那么能夠在節(jié)約用戶瀏覽商品詳情時(shí)間的同時(shí),根據(jù)商品的不同標(biāo)簽信息為用戶進(jìn)行商品推薦。本文主要對(duì)商品文本資源進(jìn)行語(yǔ)義標(biāo)注研究,F(xiàn)有對(duì)文本資源語(yǔ)義標(biāo)注的研究中,標(biāo)注資源(如文檔、網(wǎng)頁(yè))多為結(jié)構(gòu)文本或者長(zhǎng)文本,依賴領(lǐng)域本體或知識(shí)庫(kù)等知識(shí)組織體系。然而,在電子商務(wù)領(lǐng)域,缺乏共享通用的領(lǐng)域本體,商品描述文本具有"碎片化"、缺乏上下文語(yǔ)義信息等特點(diǎn)。針對(duì)這種情況,本文以電子商務(wù)領(lǐng)域分類樹為知識(shí)組織體系,提出基于詞向量的商品語(yǔ)義標(biāo)注方法,由此為商品添加類別、屬性等語(yǔ)義標(biāo)簽。本文的主要研究?jī)?nèi)容包括:首先,利用在線商品資源庫(kù)的商品目錄以及大規(guī)模商品資源的屬性描述,抽取商品概念、概念關(guān)系以及概念屬性,構(gòu)建電子商務(wù)領(lǐng)域的商品分類樹;其次,通過(guò)訓(xùn)練電子商務(wù)領(lǐng)域的Word2vec詞向量提取商品描述文本的語(yǔ)義特征;然后,將電子商務(wù)領(lǐng)域分類樹的商品概念視為已知的分類標(biāo)簽集合,訓(xùn)練基于詞向量的商品分類器,將待標(biāo)注的商品視為待分類的數(shù)據(jù),通過(guò)分類器將商品映射到分類樹中的商品概念,標(biāo)注商品的類別;根據(jù)商品概念映射的結(jié)果,在分類樹上獲取商品的概念屬性,從詞形和語(yǔ)義兩方面衡量商品描述文本中屬性-屬性值對(duì)的屬性與概念屬性之間的相似度,標(biāo)注商品的屬性值;最后,通過(guò)融合眾包和主動(dòng)學(xué)習(xí)迭代訓(xùn)練商品分類器,提高商品分類的準(zhǔn)確率,改進(jìn)商品語(yǔ)義標(biāo)注的質(zhì)量。本文的主要貢獻(xiàn)如下:1.提出了一種基于電子商務(wù)領(lǐng)域分類樹和詞向量的商品語(yǔ)義標(biāo)注方法,以電子商務(wù)領(lǐng)域分類樹為知識(shí)組織體系,能夠同領(lǐng)域本體一樣較好地表達(dá)出領(lǐng)域知識(shí)的層次關(guān)系,并且相較于本體構(gòu)建更為簡(jiǎn)單,更容易理解;利用Word2vec詞向量生成商品描述的語(yǔ)義特征,使得商品描述具有明確的語(yǔ)義信息。通過(guò)兩者的結(jié)合使得在構(gòu)建商品資源庫(kù)時(shí)能夠?yàn)樯唐焚Y源添加類別、屬性、屬性值等語(yǔ)義標(biāo)簽。本文的方法適用于不同商品資源庫(kù)的構(gòu)建,解決了商品來(lái)源的異構(gòu)性。2.提出了一種融合眾包和主動(dòng)學(xué)習(xí)的商品語(yǔ)義標(biāo)注質(zhì)量改進(jìn)方法,結(jié)合眾包標(biāo)注準(zhǔn)確率高和機(jī)器分類速度快的優(yōu)勢(shì),通過(guò)主動(dòng)學(xué)習(xí)的采樣策略,選取機(jī)器分類結(jié)果中可信度低的結(jié)果交于眾包進(jìn)行標(biāo)注,能夠利用少量已知分類標(biāo)簽的商品數(shù)據(jù)和大量未知分類標(biāo)簽的商品數(shù)據(jù),通過(guò)迭代訓(xùn)練出一個(gè)精度較高的商品分類器,能夠提升分類質(zhì)量的同時(shí)節(jié)約標(biāo)注成本。
[Abstract]:With the development of e-commerce industry and Internet technology such as tea a new business model T20 which combines video and electronic commerce emerges as the times require. The flash of commodity images in the video can match accurately with the commodity images in the commodity resource database through the image matching algorithm, so as to provide a link to the purchase of the products to the user. If we add more semantic tags to the commodity resources when we build the commodity resource bank, then we can save the time for users to browse the details of the goods, and then we can recommend the goods to the users according to the different label information of the goods. This paper focuses on the semantic annotation of commodity text resources. In the current research on semantic annotation of text resources, annotation resources (such as documents, web pages) are mostly structured or long text, relying on domain ontology or knowledge base and other knowledge organization systems. However, in the field of electronic commerce, there is a lack of shared domain ontology, and commodity description texts are characterized by "fragmentation" and lack of contextual semantic information. In this paper, the classification tree of electronic commerce is taken as the knowledge organization system, and the semantic tagging method based on word vector is proposed to add category, attribute and other semantic labels to the product. The main research contents of this paper are as follows: firstly, the commodity classification tree in the field of electronic commerce is constructed by using the commodity catalogue of online commodity resource bank and attribute description of large-scale commodity resources, extracting commodity concept, concept relation and conceptual attribute; Secondly, the semantic feature of the product description text is extracted by training Word2vec word vector in the field of electronic commerce, and then, the concept of commodity in the electronic commerce domain classification tree is regarded as a known set of classification labels, and the commodity classifier based on word vector is trained. The goods to be labeled are regarded as the data to be classified, and the goods are mapped to the concept of goods in the classification tree by classifier, and the categories of goods are marked; according to the results of the mapping of commodity concepts, the conceptual attributes of goods are obtained on the classification tree. The similarity between attribute-attribute value pair and conceptual attribute in commodity description text is measured from word form and semantic aspect. Finally, product classifier is trained by combining crowdsourcing and active learning iteration. Improve the accuracy of commodity classification, improve the quality of commodity semantic tagging. The main contributions of this paper are as follows: 1. This paper presents a semantic labeling method for goods based on the domain classification tree and word vector of electronic commerce. Taking the domain classification tree as the knowledge organization system, it can express the hierarchical relationship of domain knowledge as well as the domain ontology. Compared with ontology construction, it is simpler and easier to understand. By using Word2vec word vector to generate semantic features of commodity description, the product description has clear semantic information. The combination of the two makes it possible to add categories, attribute values and other semantic labels to commodity resources. The method proposed in this paper is suitable for the construction of different commodity resource banks and solves the isomerism of commodity sources. 2. 2. In this paper, a new method for improving the quality of commodity semantic tagging is proposed, which combines crowdsourcing and active learning. It combines the advantages of high accuracy of crowdsourcing tagging and fast machine classification, and adopts the sampling strategy of active learning. The results with low credibility in the machine classification results are selected to be annotated by crowdsourcing. It can use a small number of commodity data of known classification labels and a large number of commodity data of unknown classification labels to train a high precision commodity classifier through iterations. It can improve the classification quality and save the marking cost at the same time.
【學(xué)位授予單位】:華東師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1;F724.6

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 劉錦文;許靜;張利萍;芮偉康;;基于標(biāo)簽傳播和主動(dòng)學(xué)習(xí)的人物社會(huì)關(guān)系抽取[J];計(jì)算機(jī)工程;2017年02期

2 岳麗欣;劉文云;;國(guó)內(nèi)外領(lǐng)域本體構(gòu)建方法的比較研究[J];情報(bào)理論與實(shí)踐;2016年08期

3 吳潔明;劉雁昆;段建勇;;基于維基百科的領(lǐng)域本體自動(dòng)構(gòu)建方法研究[J];計(jì)算機(jī)應(yīng)用與軟件;2016年07期

4 徐良英;;機(jī)器學(xué)習(xí)中主動(dòng)學(xué)習(xí)方法研究[J];科技展望;2016年16期

5 張紅斌;姬東鴻;尹蘭;任亞峰;;基于梯度核特征及N-gram模型的商品圖像句子標(biāo)注[J];計(jì)算機(jī)科學(xué);2016年05期

6 傅柱;;語(yǔ)義標(biāo)注研究綜述[J];圖書館學(xué)研究;2016年04期

7 張紅斌;姬東鴻;任亞峰;尹蘭;;基于多核學(xué)習(xí)的商品圖像句子標(biāo)注[J];計(jì)算機(jī)科學(xué)與探索;2015年11期

8 熊晶;支麗平;袁冬;;基于本體和依存句法的詞匯語(yǔ)義關(guān)系標(biāo)注及評(píng)價(jià)方法研究[J];中文信息學(xué)報(bào);2015年03期

9 吳國(guó)芳;余玉霞;;一種基于重用本體的語(yǔ)義標(biāo)注系統(tǒng)[J];紹興文理學(xué)院學(xué)報(bào)(自然科學(xué));2015年01期

10 呂剛;王曉峰;胡春玲;;基于本體學(xué)習(xí)的標(biāo)簽推薦方法研究[J];小型微型計(jì)算機(jī)系統(tǒng);2015年03期

相關(guān)會(huì)議論文 前1條

1 周小田;王宏志;郭翔宇;胡筱;董志鑫;李建中;高宏;;基于知識(shí)庫(kù)的互聯(lián)網(wǎng)商品信息分類與推薦系統(tǒng)[A];第29屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集(B輯)(NDBC2012)[C];2012年

相關(guān)碩士學(xué)位論文 前2條

1 江大鵬;基于詞向量的短文本分類方法研究[D];浙江大學(xué);2015年

2 王亞斌;基于本體的語(yǔ)義標(biāo)注研究[D];蘭州理工大學(xué);2010年

,

本文編號(hào):2076078

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/guojimaoyilunwen/2076078.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶5bae6***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com