天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 搜索引擎論文 >

基于用戶行為和語(yǔ)義擴(kuò)展的中文商品查詢分類方法研究

發(fā)布時(shí)間:2018-07-17 03:43
【摘要】:Web查詢分類就是把查詢分到預(yù)先定義好的一個(gè)或者多個(gè)類別中。Web查詢語(yǔ)句通常十分簡(jiǎn)短,很難全面地表達(dá)用戶的查詢意圖。手工標(biāo)注查詢類別的成本過高,使得訓(xùn)練數(shù)據(jù)缺少,這也讓W(xué)eb查詢分類更加困難。目前研究查詢分類一般從兩方面入手:一方面,可以通過自動(dòng)獲取更多訓(xùn)練數(shù)據(jù)的方法來提高分類器的準(zhǔn)確率;另一方面,可以通過對(duì)查詢本身進(jìn)行擴(kuò)展來豐富查詢文本的特征信息。Web查詢分類是一種有效識(shí)別用戶查詢意圖的方法,它不僅可以應(yīng)用于Web搜索,提高搜索的準(zhǔn)確率,而且可以應(yīng)用于垂直搜索、商品推薦和廣告推薦等眾多領(lǐng)域。本文主要研究的是中文商品查詢分類,它是一種特殊的Web查詢意圖分類。選擇商品查詢分類作為研究課題,是因?yàn)樯唐凡樵兪种匾,特別是現(xiàn)在越來越多的人選擇了網(wǎng)上購(gòu)物,準(zhǔn)確的商品查詢分類不僅方便客戶、提高用戶體驗(yàn),而且能給商家們帶來巨大的利益。另一方面是因?yàn)橛谐渥愕挠嘘P(guān)商品查詢的數(shù)據(jù)。本文的方法不僅可以解決商品查詢分類的問題,而且可以把該方法應(yīng)用于其他查詢分類領(lǐng)域。 本文以用戶點(diǎn)擊行為和查詢相似性擴(kuò)展兩種方法,從商品搜索日志中自動(dòng)獲取大量訓(xùn)練和測(cè)試數(shù)據(jù),解決了通常Web查詢分類訓(xùn)練數(shù)據(jù)缺少的問題。對(duì)于商品查詢文本太短的問題,使用了基于搜索引擎和中文維基百科擴(kuò)展兩種不同的方法對(duì)商品查詢進(jìn)行擴(kuò)展。其中基于搜索引擎返回信息的擴(kuò)展方法分類效果較優(yōu),但這種方法需要在線獲取搜索引擎返回結(jié)果并對(duì)結(jié)果進(jìn)行處理,效率較低。根據(jù)搜索引擎擴(kuò)展方法的優(yōu)缺點(diǎn),,提出了一種混合的商品查詢分類方法。首先把原商品查詢放進(jìn)已經(jīng)學(xué)習(xí)好的分類器中分類,如果分類的置信度高于閾值則直接分類,否則,再使用搜索引擎擴(kuò)展方法對(duì)查詢進(jìn)行擴(kuò)展,最后把擴(kuò)展后的查詢放進(jìn)分類器分類得出最終結(jié)果。置信度閾值是通過實(shí)驗(yàn)獲取的,實(shí)驗(yàn)表明使用這種方法可以準(zhǔn)確和高效地獲得商品查詢分類結(jié)果。并使用了兩個(gè)分類器組合的方法,進(jìn)一步提高分類的正確率和效率。最后實(shí)現(xiàn)了商品查詢層次分類算法,并把混合分類算法應(yīng)用于層次分類中,取得了較好的分類效果。
[Abstract]:Web query classification is to divide the query into one or more predefined categories. Web query statements are usually very short, and it is difficult to express the user's query intention comprehensively. The cost of manually tagging query categories is too high, which makes training data scarce, which makes Web query classification more difficult. At present, the research on query classification generally starts from two aspects: on the one hand, it can improve the accuracy of classifier by automatically obtaining more training data; on the other hand, Web query classification can enrich the feature information of query text by extending the query itself. Web query classification is an effective method to identify users' query intention. It can not only be applied to Web search, but also improve the accuracy of search. And can be applied to vertical search, commodity recommendation and advertising recommendation and many other areas. This paper mainly studies the classification of Chinese commodity query, which is a special classification of Web query intention. The choice of commodity query classification as a research topic is because commodity query is very important, especially now more and more people choose online shopping. Accurate commodity query classification is not only convenient for customers, but also improves user experience. And can bring great benefits to businessmen. On the other hand, there is sufficient data about commodity queries. This method can not only solve the problem of commodity query classification, but also can be applied to other query classification fields. Based on user click-behavior and query similarity expansion, this paper automatically acquires a large number of training and test data from commodity search logs, which solves the problem of lack of general Web query classification training data. For the problem that the query text is too short, two different methods based on search engine and Chinese Wikipedia extension are used to extend the query. The extended method based on the return information of search engine has better classification effect, but this method needs to obtain the result of search engine return online and deal with the result, which is inefficient. According to the advantages and disadvantages of search engine extension method, a mixed commodity query classification method is proposed. First of all, the original commodity query is put into the classifier that has been learned well. If the confidence of the classification is higher than the threshold value, it will be classified directly. Otherwise, the search engine expansion method will be used to expand the query. Finally, the extended query is put into the classifier to get the final result. The confidence threshold is obtained by experiments. Experiments show that the classification results of commodity queries can be obtained accurately and efficiently by using this method. The combination of two classifiers is used to further improve the accuracy and efficiency of classification. Finally, the hierarchical classification algorithm of commodity query is implemented, and the hybrid classification algorithm is applied to the hierarchical classification, and a good classification effect is obtained.
【學(xué)位授予單位】:中山大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前2條

1 李榮陸,王建會(huì),陳曉云,陶曉鵬,胡運(yùn)發(fā);使用最大熵模型進(jìn)行中文文本分類[J];計(jì)算機(jī)研究與發(fā)展;2005年01期

2 張森;王斌;;Web檢索查詢意圖分類技術(shù)綜述[J];中文信息學(xué)報(bào);2008年04期



本文編號(hào):2128850

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2128850.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶d9c72***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com