社交媒體數(shù)據(jù)上的時(shí)態(tài)關(guān)鍵詞查詢
[Abstract]:Social media services have become one of the most frequent Internet services used in people's daily use. It records the original content, forwarded and commented by users. With the continuous accumulation of data, these long - span data are of great significance to the study of the user's cluster behavior and the overall understanding of people or events. In order to track events, users will frequently submit the same query in order to get the latest news of the event. In order to understand the object thoroughly, the analyst needs to collect data at different times. However, the existing social media search service and research Work is mainly focused on real-time search, and the release time recorded in information is also used to measure the timeliness of data. This paper uses social media data flow model to model original content, forward and comment, and defines its reference time series for each social object. Based on this model, keyword query uses keywords as a check. In this paper, the time series data in the query time range and the input of the corresponding scoring function are selected to select the largest K social object with the maximum value. In this paper, the time is promoted to a constraint condition of the query. In this paper, two kinds of application scenarios are explored with a query and real-time tracking and analysis. Then the offline index is followed by an offline index. The characteristics of the available social media data and the index update efficiency of the online index are two points of view. The index technology and query algorithm for this query are proposed. Finally, based on the time series data, this paper analyses the change of information propagation behind the rise and fall of sina micro-blog, and also based on the real time social media number. According to the stream, an online micro-blog analysis platform is built, which constitute an example of the application of temporal keyword query. The full text is carried out around the question of temporal keyword query. The main contributions include the following three aspects:. The design of a double index structure based on the characteristics of social media data and the maximum approximate summary. The reference tree of the intersection obeys the long tail distribution in the size and life cycle length. On the other hand, the social objects are often kept hot in some time periods, and are rarely concerned for the rest of the long time. This paper designs a double inverted list structure based on the above two characteristics of social media data. The double inverted list structure uses different index structures to manage the hot objects and ordinary objects respectively. The two structures all support the filtering of data from the time dimension and return the data according to the reverse order of the social object's final reference tree size. This paper reveals that the query algorithm using the index needs to access the upper bound of the amount of data. The statistical analysis on the real data set shows that the upper bound of the number of access data is sublinear with the K value in most cases. This paper further proposes a piecewise maximum approximate summary, which can predict each object more accurately in the query window. The upper boundary of the tree size is quoted in order to avoid the disk access generated by the actual value of a hot object in a non hot state. A log structure octree index is proposed to solve the real-time temporal keyword query. The other feature of social media data is the high-speed generation of user data, which is a phenomenon. It is particularly prominent during hot events. Therefore, it is important to quickly index the data and reflect it to the query results in the face of an online index scene, whether to improve the user experience of the ordinary user, or to provide timely data support for the quick decision. This article introduces the reference time series of each social object. The approximate approximate segment data is mapped to the point in the three-dimensional space, and the octree is used to maintain the locality in the importance and time dimension of the social object in the index. The encoding method of the octree node makes the index not only support the data filtering of the time dimension, but also guarantee the return of the data required by the temporal threshold algorithm. The combination of the merging tree with the log structure, fully utilizing the fast and disk sequence read-write efficiency of the memory access, implements the rapid index of social media data. In the full volume micro-blog behavior data, the change of information propagation behind the rise and fall of sina micro-blog is analyzed. The temporal keyword query is used to improve the accuracy of the data extraction rules in this analysis process and help to cover more comprehensive data. The logarithmic Gauss model is proposed by using the modeling of a single micro-blog forwarding time sequence. Based on the method of fitting the parameters of a group of micro-blog forwarding models, this paper points out a statistic related to the speed of information propagation. This paper further defines the behavior characteristics of the users on the Sina micro-blog platform, as well as the external characteristics that reflect the attitude of the entire network users to the social platforms, and analyzes their changing trends. And explore the relationship between them and the statistics reflecting the information dissemination. Finally, this paper systematized the full text related technology and constructed an online analysis platform of real-time micro-blog data stream based on Sina micro-blog. It can cluster the results of the temporal keyword search search into a topic, and display the preliminary statistics of the topic from several dimensions. In summary, this paper extends the function of keyword search on social media data, proposes temporal keyword query, and explores the organization structure and query arithmetic of index from two aspects of social media data characteristics and index updating efficiency. Two analysis applications based on this query It can be more flexible to adapt to various application scenarios, help users excavate important information from social media data, and provide data base for further complex analysis tasks. The open access system at the end of this paper implements the index and analysis technology in the text, and makes researchers and analysts in various fields. People can benefit from massive real-time social media data.
【學(xué)位授予單位】:華東師范大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.3;TP393.09
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 梁銀;董永權(quán);;基于對(duì)象集合的空間關(guān)鍵詞查詢[J];計(jì)算機(jī)應(yīng)用;2014年07期
2 張穎;李昕;;一種關(guān)系數(shù)據(jù)庫(kù)上的關(guān)鍵詞查詢排序方法[J];遼寧工業(yè)大學(xué)學(xué)報(bào)(自然科學(xué)版);2013年05期
3 寇蘇玲;蔡慶生;;應(yīng)用于用戶興趣建模的多文本關(guān)鍵詞抽取研究[J];計(jì)算機(jī)仿真;2007年02期
4 林子雨;楊冬青;王騰蛟;張東站;;基于關(guān)系數(shù)據(jù)庫(kù)的關(guān)鍵詞查詢[J];軟件學(xué)報(bào);2010年10期
5 林子雨;鄒權(quán);賴永炫;林琛;;關(guān)系數(shù)據(jù)庫(kù)中的關(guān)鍵詞查詢結(jié)果動(dòng)態(tài)優(yōu)化[J];軟件學(xué)報(bào);2014年03期
6 李益民;;一種大規(guī)模Deep Web查詢重構(gòu)技術(shù)[J];情報(bào)科學(xué);2014年01期
7 李慧穎;瞿裕忠;;基于關(guān)鍵詞的RDF數(shù)據(jù)查詢方法[J];東南大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年02期
8 楊書新;徐慧琴;;基于數(shù)據(jù)圖的關(guān)系數(shù)據(jù)庫(kù)關(guān)鍵詞查詢排序研究[J];計(jì)算機(jī)應(yīng)用研究;2014年02期
9 海沫;郭樹行;;網(wǎng)絡(luò)環(huán)境中基于語(yǔ)義聚類的多關(guān)鍵詞查詢機(jī)制[J];圖書情報(bào)工作;2012年20期
10 安鎮(zhèn)宙;楊鑒;仇汶;;一種新的基于分層查詢表的關(guān)鍵詞識(shí)別模型[J];計(jì)算機(jī)工程與應(yīng)用;2008年02期
相關(guān)會(huì)議論文 前3條
1 修慧蘭;;臺(tái)灣大學(xué)生個(gè)人競(jìng)爭(zhēng)力之相關(guān)研究[A];全國(guó)教育與心理統(tǒng)計(jì)與測(cè)量學(xué)術(shù)年會(huì)暨第八屆海峽兩岸心理與教育測(cè)驗(yàn)學(xué)術(shù)研討會(huì)論文摘要集[C];2008年
2 楊艷;何天宇;;基于短語(yǔ)的關(guān)系數(shù)據(jù)庫(kù)關(guān)鍵詞查詢方法[A];第29屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集(B輯)(NDBC2012)[C];2012年
3 李_,
本文編號(hào):2166788
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2166788.html