電商和微博評論中商品屬性與傾向性識別技術(shù)的研究與實現(xiàn)
發(fā)布時間:2018-07-17 00:13
【摘要】:在電子商務(wù)和社交網(wǎng)絡(luò)席卷世界的今天,用戶在網(wǎng)上購物的同時,會發(fā)表自己對于商品的觀點。這些觀點以用戶為中心,反映了用戶的使用體驗,包含了用戶對產(chǎn)品的特征、功能和性能等的看法。然而,隨著網(wǎng)絡(luò)評論數(shù)量飛速增長,評論內(nèi)容越來越復(fù)雜,很難人工地根據(jù)評論內(nèi)容,得到有用的信息。因此,迫切需要一種自動化的收集用戶評論信息,分析用戶對商品的評價屬性和識別用戶觀點的技術(shù)。商品評論的情感挖掘分析技術(shù)正是在這樣的背景下產(chǎn)生并迅速發(fā)展起來的。 挖掘產(chǎn)品特性、挖掘用戶對于產(chǎn)品特征的主要觀點以及判斷主要觀點的情感導(dǎo)向是商品評論的情感挖掘的三個核心問題,本文針對這三個問題進(jìn)行了深入的研究。同時考慮到網(wǎng)上評論信息的特點,即用戶在網(wǎng)上發(fā)表評論時通常不太遵守語法規(guī)則,句子的語法結(jié)構(gòu)不完整,在句子中常常省略主語,重點分析了隱式主語的抽取問題。對于沒有顯式主語的句子中,識別并且抽取出真正的主語。 本文的工作主要包括以下三個方面: (1)識別評價對象和評價詞。利用POSEM算法來抽取評價對象和評價詞二元組。并且,由于網(wǎng)絡(luò)上產(chǎn)品評論信息的語法結(jié)構(gòu)比較自由,很多句子沒有完整的主謂賓結(jié)構(gòu),提出隱式主語抽取的方法,提高了評價對象和評價詞抽取的召回率和準(zhǔn)確率。 (2)判定評論者的態(tài)度,也就是從用戶的評論中找到和產(chǎn)品屬性相關(guān)的評論者的觀點極性。要從用戶的評論中找到觀點極性,首先需要找到評價詞匯,然后綜合利用形容詞、副詞情感詞庫和領(lǐng)域詞庫來對詞匯的極性進(jìn)行判斷。因為詞匯所代表的極性往往與詞匯的背景上下文和特定領(lǐng)域是息息相關(guān)的。相同的詞在不同的領(lǐng)域所代表的情感極性很可能是截然相反的。 (3)設(shè)計并實現(xiàn)商品評論信息分析工具,工具主要包括自動抽取網(wǎng)頁DOM樹中特定標(biāo)簽下的評論信息,數(shù)據(jù)預(yù)處理,主觀評價句抽取,評價對象和評價詞抽取,評價詞極性判別等功能。
[Abstract]:Today, when electronic commerce and social networks sweep the world, users will express their views on goods while shopping online. These views are user-centered and reflect the user's experience, including the user's views on product features, functions and performance. However, with the rapid growth of the number of online comments, the content of comments is becoming more and more complex, and it is difficult to obtain useful information according to the content of comments manually. Therefore, there is an urgent need for an automatic collection of user comment information, analysis of users' evaluation attributes of commodities and identification of user views. It is under this background that the emotion mining and analysis technology of commodity comment is produced and developed rapidly. Mining product characteristics, mining users' main views on product features and judging the emotional orientation of main viewpoints are the three core issues of emotion mining in commodity reviews. This paper makes an in-depth study on these three issues. At the same time, considering the characteristics of online comment information, that is, users usually do not abide by grammatical rules when publishing comments on the Internet, the grammatical structure of sentences is incomplete, subject is often omitted in sentences, and the problem of extracting implicit subjects is analyzed emphatically. For sentences without explicit subjects, the real subject is identified and extracted. The work of this paper mainly includes the following three aspects: (1) Identification of evaluation objects and evaluation words. POSEM algorithm is used to extract the binary groups of evaluation objects and words. Moreover, because the grammatical structure of product comment information on the network is relatively free, many sentences do not have a complete subject-predicate structure, so an implicit subject extraction method is proposed. It improves the recall rate and accuracy of evaluation object and word extraction. (2) judging the attitude of the reviewer, that is, finding the opinion polarity of the reviewer related to the product attribute from the user's comment. In order to find the polarity of the viewpoint from the user's comments, the evaluation vocabulary should be found first, and then the polarity of the vocabulary should be judged by the comprehensive use of adjectives, affective lexicon of adverbs and domain lexicon. Because the polarity of words is often closely related to the context of the context and specific areas of the word. The same words may represent the opposite emotional polarity in different domains. (3) Design and implement a commodity review information analysis tool, which mainly includes automatically extracting comment information under a specific label in the Dom tree of a web page. Data preprocessing, subjective evaluation sentence extraction, evaluation object and evaluation word extraction, evaluation word polarity discrimination and other functions.
【學(xué)位授予單位】:東華大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092;TP391.1
本文編號:2128177
[Abstract]:Today, when electronic commerce and social networks sweep the world, users will express their views on goods while shopping online. These views are user-centered and reflect the user's experience, including the user's views on product features, functions and performance. However, with the rapid growth of the number of online comments, the content of comments is becoming more and more complex, and it is difficult to obtain useful information according to the content of comments manually. Therefore, there is an urgent need for an automatic collection of user comment information, analysis of users' evaluation attributes of commodities and identification of user views. It is under this background that the emotion mining and analysis technology of commodity comment is produced and developed rapidly. Mining product characteristics, mining users' main views on product features and judging the emotional orientation of main viewpoints are the three core issues of emotion mining in commodity reviews. This paper makes an in-depth study on these three issues. At the same time, considering the characteristics of online comment information, that is, users usually do not abide by grammatical rules when publishing comments on the Internet, the grammatical structure of sentences is incomplete, subject is often omitted in sentences, and the problem of extracting implicit subjects is analyzed emphatically. For sentences without explicit subjects, the real subject is identified and extracted. The work of this paper mainly includes the following three aspects: (1) Identification of evaluation objects and evaluation words. POSEM algorithm is used to extract the binary groups of evaluation objects and words. Moreover, because the grammatical structure of product comment information on the network is relatively free, many sentences do not have a complete subject-predicate structure, so an implicit subject extraction method is proposed. It improves the recall rate and accuracy of evaluation object and word extraction. (2) judging the attitude of the reviewer, that is, finding the opinion polarity of the reviewer related to the product attribute from the user's comment. In order to find the polarity of the viewpoint from the user's comments, the evaluation vocabulary should be found first, and then the polarity of the vocabulary should be judged by the comprehensive use of adjectives, affective lexicon of adverbs and domain lexicon. Because the polarity of words is often closely related to the context of the context and specific areas of the word. The same words may represent the opposite emotional polarity in different domains. (3) Design and implement a commodity review information analysis tool, which mainly includes automatically extracting comment information under a specific label in the Dom tree of a web page. Data preprocessing, subjective evaluation sentence extraction, evaluation object and evaluation word extraction, evaluation word polarity discrimination and other functions.
【學(xué)位授予單位】:東華大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092;TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前9條
1 龍樹全;趙正文;唐華;;中文分詞算法概述[J];電腦知識與技術(shù);2009年10期
2 孫立偉;何國輝;吳禮發(fā);;網(wǎng)絡(luò)爬蟲技術(shù)的研究[J];電腦知識與技術(shù);2010年15期
3 董靜;孫樂;馮元勇;黃瑞紅;;中文實體關(guān)系抽取中的特征選擇研究[J];中文信息學(xué)報;2007年04期
4 楊寶珍;;企業(yè)市場營銷戰(zhàn)略創(chuàng)新[J];企業(yè)經(jīng)濟(jì);2011年05期
5 于嘉;網(wǎng)絡(luò)時代的百科全書——維基百科[J];圖書館論壇;2005年04期
6 沈睿芳,郭立甫,時希杰;數(shù)據(jù)挖掘中的數(shù)據(jù)預(yù)處理模型與算法研究[J];計算機系統(tǒng)應(yīng)用;2005年07期
7 宋銳;洪莉;林鴻飛;;基于ChunkCRF的觀點持有者識別及其在觀點摘要中的應(yīng)用[J];小型微型計算機系統(tǒng);2009年07期
8 姚天順,張俐,高竹;WordNet綜述[J];語言文字應(yīng)用;2001年01期
9 陳建美;林鴻飛;楊志豪;;基于語法的情感詞匯自動獲取[J];智能系統(tǒng)學(xué)報;2009年02期
,本文編號:2128177
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2128177.html
最近更新
教材專著