基于社區(qū)分析的大眾分類多義詞發(fā)現(xiàn)方法研究
發(fā)布時間:2019-06-14 20:29
【摘要】:由社會化標注系統(tǒng)形成的大眾分類在個性化推薦領域和信息檢索領域已經得到了廣泛的應用。社會化標注系統(tǒng)的成功主要緣于用戶可以隨意使用標簽標注資源。然而,正是這種不規(guī)范的標注方式使得社會化標注系統(tǒng)及大眾分類長期受到語義模糊問題的困擾,阻礙著社會化標注系統(tǒng)進一步發(fā)展。本文針對大眾分類中的多義詞這一語義模糊問題開展研究。在大多數(shù)已有研究中,研究者的關注點更多集中于使用標簽、資源以及它們之間的關聯(lián)信息,常常忽略表現(xiàn)用戶特征的信息。然而,作為社會化標注系統(tǒng)的主體,用戶對于標簽的理解直接影響著標簽所蘊含的語義。同時,對于標簽語義的挖掘也不應局限于用戶集合整體層面,也應當深入到個體層面。因此,本文根據(jù)用戶的興趣信息對大眾分類進行分割,分析同一個標簽在不同用戶社區(qū)中的上下文差異,并通過對這些差異的比較來發(fā)現(xiàn)大眾分類中的多義詞標簽。具體而言,本文進行了兩方面的工作。一方面,本文構建了基于用戶興趣的關系網(wǎng)絡,并在該網(wǎng)絡上通過社區(qū)發(fā)現(xiàn)算法進行用戶社區(qū)發(fā)現(xiàn)。另一方面,本文提出了語義聚集度和語義離散度兩個度量指標,其中語義聚集度用來度量上下文中的標簽之間的語義相似程度,語義離散度用來度量標簽在不同社區(qū)中的上下文之間的差異程度。通過這兩個指標,本文可以量化地比較不同用戶社區(qū)之間標簽上下文的差異,進而判斷標簽是否為多義詞標簽。本文使用了Delicious數(shù)據(jù)集和Movie Lens數(shù)據(jù)集進行了實驗,并于基于重疊聚類的一詞多義發(fā)現(xiàn)算法進行了對比。實驗結果證明,本文所提出的多義詞發(fā)現(xiàn)方法優(yōu)于對比方法,尤其是在擁有大量具有不同興趣用戶的數(shù)據(jù)集上表現(xiàn)更為明顯。
[Abstract]:Public classification formed by socialized tagging system has been widely used in the field of personalized recommendation and information retrieval. The success of socialized tagging system is mainly due to the fact that users can use label tagging resources at will. However, it is this irregular tagging method that makes the socialized tagging system and the public classification suffer from the semantic ambiguity problem for a long time, which hinders the further development of the socialized tagging system. In this paper, the semantic ambiguity of polysemy in popular classification is studied. In most of the existing studies, researchers focus more on the use of tags, resources and their association information, often neglecting the information that represents the characteristics of the user. However, as the main body of socialized tagging system, users' understanding of tags directly affects the semantics of tags. At the same time, the mining of tag semantics should not be limited to the overall level of user collection, but also should go deep into the individual level. Therefore, this paper divides the popular classification according to the interest information of the user, analyzes the context difference of the same label in different user communities, and finds the polysemous word label in the popular classification through the comparison of these differences. Specifically, this paper has carried on two aspects of work. On the one hand, this paper constructs a relational network based on user interest, and carries on the user community discovery through the community discovery algorithm on the network. On the other hand, this paper proposes two metrics: semantic aggregation and semantic dispersion, in which semantic aggregation is used to measure the semantic similarity between tags in context, and semantic dispersion is used to measure the degree of difference between the contexts of tags in different communities. Through these two indicators, this paper can quantitatively compare the differences of label context among different user communities, and then judge whether the label is polysemous or not. In this paper, Delicious dataset and Movie Lens dataset are used for experiments, and the polysemy discovery algorithm based on overlapping clustering is compared. The experimental results show that the polysemy discovery method proposed in this paper is superior to the contrast method, especially on the dataset with a large number of users with different interests.
【學位授予單位】:大連理工大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP391.1
本文編號:2499669
[Abstract]:Public classification formed by socialized tagging system has been widely used in the field of personalized recommendation and information retrieval. The success of socialized tagging system is mainly due to the fact that users can use label tagging resources at will. However, it is this irregular tagging method that makes the socialized tagging system and the public classification suffer from the semantic ambiguity problem for a long time, which hinders the further development of the socialized tagging system. In this paper, the semantic ambiguity of polysemy in popular classification is studied. In most of the existing studies, researchers focus more on the use of tags, resources and their association information, often neglecting the information that represents the characteristics of the user. However, as the main body of socialized tagging system, users' understanding of tags directly affects the semantics of tags. At the same time, the mining of tag semantics should not be limited to the overall level of user collection, but also should go deep into the individual level. Therefore, this paper divides the popular classification according to the interest information of the user, analyzes the context difference of the same label in different user communities, and finds the polysemous word label in the popular classification through the comparison of these differences. Specifically, this paper has carried on two aspects of work. On the one hand, this paper constructs a relational network based on user interest, and carries on the user community discovery through the community discovery algorithm on the network. On the other hand, this paper proposes two metrics: semantic aggregation and semantic dispersion, in which semantic aggregation is used to measure the semantic similarity between tags in context, and semantic dispersion is used to measure the degree of difference between the contexts of tags in different communities. Through these two indicators, this paper can quantitatively compare the differences of label context among different user communities, and then judge whether the label is polysemous or not. In this paper, Delicious dataset and Movie Lens dataset are used for experiments, and the polysemy discovery algorithm based on overlapping clustering is compared. The experimental results show that the polysemy discovery method proposed in this paper is superior to the contrast method, especially on the dataset with a large number of users with different interests.
【學位授予單位】:大連理工大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP391.1
【相似文獻】
相關期刊論文 前2條
1 劉增榮;余雪麗;李志;;視聽覺情感語義相干及應用研究[J];太原理工大學學報;2012年03期
2 ;[J];;年期
相關碩士學位論文 前10條
1 崔一;基于社區(qū)分析的大眾分類多義詞發(fā)現(xiàn)方法研究[D];大連理工大學;2016年
2 孫永;經濟原則下解讀語義模糊:對漢字“幾”的研究[D];山東大學;2008年
3 侯麗娟;語義模糊的認知探索及其啟示[D];廈門大學;2007年
4 董志強;語義模糊初探[D];四川大學;2002年
5 張海;《紅樓夢》中語義模糊數(shù)字的翻譯[D];沈陽師范大學;2012年
6 韓紅紅;[D];西安外國語大學;2011年
7 馬潔;語義場理論與語義模糊性研究[D];河北大學;2008年
8 雙元鳳;從語用功能角度看《朝花夕拾》中副詞的語義模糊研究及翻譯策略[D];中南大學;2013年
9 鄭麗;語義模糊及其翻譯策略[D];山西大學;2006年
10 張愛珍;語義模糊的認知分析[D];福建師范大學;2002年
,本文編號:2499669
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2499669.html
最近更新
教材專著