天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于詞向量的在線評論話題及其特征抽取研究

發(fā)布時間:2018-04-29 06:18

  本文選題:在線評論 + 特征抽取; 參考:《電子科技大學》2016年碩士論文


【摘要】:IT技術和互聯(lián)網(wǎng)對人類社會的信息交互帶來了巨大的革新,同時也促使了新的交易方式——電子商務的出現(xiàn),隨著電子商務的發(fā)展成熟,人們越來越熱衷于通過網(wǎng)絡購買商品和服務,在研究領域,眾多的學者對消費者行為的研究也從線下遷移到了線上。電子商務話題的研究是近幾年的熱門領域。Web2.0帶來的交互便利、快捷使得用戶能輕易的在網(wǎng)上留下自己的行為軌跡、發(fā)表自己的觀點和意見,網(wǎng)絡購物人群的快速增長使得電子商務網(wǎng)站積累了大量的購物數(shù)據(jù),其中包括大量的非結構化的評論文本信息。對于消費者而言,這些評論信息有助于其做出更有效的購物決策,而對于商品的生產(chǎn)廠商而言,這些評論反映了消費者對其產(chǎn)品和公司服務的市場反饋,相較于普通問卷、咨詢等調研方式,在線商品評論數(shù)據(jù)更為龐大和直接。用戶在電子商務網(wǎng)站上留下的在線評論是消費者自發(fā)、隨意撰寫的,這些評論往往結構散亂、內容簡短,這種文本的稀疏特性使得學者們在研究評論時面臨很大的困難;另一方面,電子商務網(wǎng)站上的商品成千上萬,各自的評論更是從體量上超過了人類能夠閱讀、判斷的極限;即大數(shù)據(jù)、稀疏性帶來的問題使得研究難以進行。對于在線商品評論的研究,以前的學者多從文檔層面對評論文本進行研究,考慮句子結構,語法特點、詞頻等特征,或者從概率模型的角度,研究潛語義層面的話題特征,這些研究雖然取得了一定的結果,不過在處理文本的過程中,忽視了作為一個整體句子的語義信息。隨著當今計算能力的提高,神經(jīng)網(wǎng)絡語言模型在語義層面解釋了文本的產(chǎn)生和語義的表達。本文利用神經(jīng)網(wǎng)絡將在線評論文本從傳統(tǒng)的文檔空間轉移到高維的向量語義空間,并對挖掘的評論特征種子詞進行聚類,對于在線評論的話題和特征抽取達到了更好的效果。另外,對于大量數(shù)據(jù)的真實背景缺失問題,本文通過改進的困惑度指標,基于最大熵的原理,證明了本文所提方法的可靠性。同時,本文所改進的困惑度指標也可擴展為對大數(shù)據(jù)環(huán)境下聚類問題的統(tǒng)一評價指標,對大數(shù)據(jù)下的研究有一定貢獻。為真實背景缺失的算法比較,提供了一個較好的評價方式。
[Abstract]:It technology and the Internet have brought great innovation to the information exchange in human society, and at the same time, it has also promoted the emergence of a new transaction method-e-commerce, with the development and maturity of e-commerce. People are more and more interested in buying goods and services through the Internet. In the field of research, many scholars have moved their research on consumer behavior from offline to online. The research on the topic of electronic commerce is the interaction convenience brought by the popular field. Web 2.0 in recent years, which makes it easy for users to leave their own behavior track and express their views and opinions on the Internet. With the rapid growth of online shopping population, e-commerce websites have accumulated a lot of shopping data, including a large amount of unstructured comment text information. For consumers, these comments help them to make more effective shopping decisions, while for manufacturers of goods, they reflect consumer market feedback on their products and company services, as opposed to general questionnaires. Consulting and other research methods, online commodity review data is larger and more direct. The online comments left by users on e-commerce websites are spontaneous and random written by consumers. These comments are often scattered in structure and short in content. The sparse nature of the text makes it difficult for scholars to study comments. On the other hand, there are thousands of goods on e-commerce websites, and their respective comments exceed the limits of human reading and judgment; big data, the sparsity of the problem makes it difficult to carry out research. For the research of online commodity review, previous scholars have studied the comment text from the document level, considering sentence structure, grammatical characteristics, word frequency and other features, or from the perspective of the probability model, to study the topic features at the latent semantic level. Although these studies have achieved some results, they ignore the semantic information as a whole in the process of text processing. With the improvement of computational power, the neural network language model explains the text generation and semantic expression at the semantic level. In this paper, the neural network is used to transfer the online comment text from the traditional document space to the high-dimensional vector semantic space, and to cluster the comment feature seed words, which can achieve better results for online comment topic and feature extraction. In addition, the reliability of the proposed method is proved by the improved bewilderment index and the principle of maximum entropy for the lack of real background of a large number of data. At the same time, the improved bewilderment index in this paper can be extended to the unified evaluation index of cluster problem under big data environment. It provides a better evaluation method for the comparison of the algorithms without real background.
【學位授予單位】:電子科技大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP391.1

【參考文獻】

相關期刊論文 前9條

1 何有世;李金海;馬云蕾;李爍朋;;基于復雜網(wǎng)絡構建面向主題的在線評論挖掘模型[J];軟科學;2015年10期

2 王祖輝;姜維;李一軍;;在線評論情感分析中固定搭配特征提取方法研究[J];管理工程學報;2014年04期

3 陳炯;張虎;曹付元;;面向中文客戶評論的評價搭配識別研究[J];計算機工程與設計;2013年03期

4 楊源;馬云龍;林鴻飛;;評論挖掘中產(chǎn)品屬性歸類問題研究[J];中文信息學報;2012年03期

5 徐戈;王厚峰;;自然語言處理中主題模型的發(fā)展[J];計算機學報;2011年08期

6 李培;何中市;黃永文;;基于依存關系分析的網(wǎng)絡評論極性分類研究[J];計算機工程與應用;2010年11期

7 周杰;林琛;李弼程;;基于機器學習的網(wǎng)絡新聞評論情感分類研究[J];計算機應用;2010年04期

8 李實;葉強;李一軍;Rob Law;;中文網(wǎng)絡客戶評論的產(chǎn)品特征挖掘方法研究[J];管理科學學報;2009年02期

9 劉群,張華平,俞鴻魁,程學旗;基于層疊隱馬模型的漢語詞法分析[J];計算機研究與發(fā)展;2004年08期

,

本文編號:1818759

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/1818759.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶1a895***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com