天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

非標(biāo)準(zhǔn)Multi-armed bandit的隨機(jī)調(diào)度

發(fā)布時(shí)間:2018-03-18 09:21

  本文選題:最優(yōu)停時(shí) 切入點(diǎn):允許停時(shí) 出處:《華東師范大學(xué)》2016年博士論文 論文類(lèi)型:學(xué)位論文


【摘要】:本文的主要目的是拓展具有指數(shù)策略的multi-armed bandit (MAB)隨機(jī)調(diào)度模型,使之更符合復(fù)雜的現(xiàn)實(shí)背景:(1)諸arm具有不同的切換限制;(2)諸arm具有不同的折現(xiàn)率;(3)機(jī)器隨機(jī)中斷引起的不完全信息。為此,本文的另一個(gè)目的是研究帶限制的最優(yōu)停時(shí)問(wèn)題和非參貝葉斯,使之適用于上述非標(biāo)準(zhǔn)的MAB。在隨機(jī)變量集合的層面上,在帶限制的停時(shí)類(lèi)范圍內(nèi),討論最優(yōu)停時(shí)問(wèn)題,運(yùn)用經(jīng)典的概率理論給出一般結(jié)論。這理論涵蓋離散時(shí)間、連續(xù)時(shí)間、半馬氏框架下所得的經(jīng)典結(jié)果。大致分三個(gè)階段:在第一階段在單指標(biāo)的隨機(jī)變量集的框架下展開(kāi),首先引入允許停時(shí)類(lèi)的概念,建立帶限制的最優(yōu)停時(shí)模型,討論兩類(lèi)價(jià)值族和最優(yōu)停時(shí)的性質(zhì);接著建構(gòu)最優(yōu)停時(shí)存在的充分條件,進(jìn)而討論價(jià)值變量族的局部性質(zhì)、正則性等。在第二階段,把最優(yōu)停時(shí)問(wèn)題拓展到雙指標(biāo)容許隨機(jī)變量類(lèi)上,研究最優(yōu)雙停時(shí)的性質(zhì),所得結(jié)果自然可推廣到多指標(biāo)的情形。第三階段,討論第一階段中的可及集,證明了可及集的可列停時(shí)分解的性質(zhì)。在連續(xù)時(shí)間的隨機(jī)MAB模型中,考慮了相互獨(dú)立的arm均有自身允許的停止范圍,且只有在該范圍上才能切換,目標(biāo)是最大化在無(wú)限時(shí)間上的期望總折扣報(bào)酬。首先,引入允許停止隨機(jī)集的概念,建立過(guò)程版的帶停止限制的最優(yōu)停時(shí)一般理論;接著,基于EL Karoui and Karatzas (1994)的想法,運(yùn)用所得的理論解決單arm的報(bào)酬過(guò)程與Gittins指標(biāo)過(guò)程的關(guān)系,最后,運(yùn)用Kaspi and Mandelbaum (1998)的偏移法(excursion method)證明Gittins指標(biāo)的最優(yōu)性,其中的論證過(guò)程也比以往的證明簡(jiǎn)潔。在連續(xù)時(shí)間的隨機(jī)MAB模型中,同時(shí)了考慮arm的切換要求和變折現(xiàn)的情況。分別采用兩種期望總折扣報(bào)酬,運(yùn)用帶限制的最優(yōu)停時(shí)理論,導(dǎo)出相應(yīng)的指數(shù)定義,運(yùn)用偏移法,證明了其一指標(biāo)為最優(yōu)策略,而另一卻不是。運(yùn)用貝葉斯方法把帶隨機(jī)中斷的調(diào)度問(wèn)題轉(zhuǎn)化為不完全信息的調(diào)度問(wèn)題,選擇期望折扣報(bào)酬為目標(biāo)函數(shù),分別在靜態(tài)策略、動(dòng)態(tài)策略下討論最優(yōu)指數(shù)策略特點(diǎn),尤其是動(dòng)態(tài)策略中的一步報(bào)酬率的情況,目的是想了解不同的貝葉斯框架對(duì)調(diào)度策略的影響。在靜態(tài)策略下,采用一般框架與參數(shù)框架所得的結(jié)論基本相似;而就動(dòng)態(tài)策略而言,通過(guò)分析兩個(gè)例子的一步報(bào)酬率與貝葉斯框架的之間的關(guān)系,以此說(shuō)明不同的貝葉斯結(jié)構(gòu)對(duì)調(diào)度的影響。
[Abstract]:The main purpose of this paper is to extend the multi-armed bandit mabs stochastic scheduling model with exponential policy. Make it more in line with the complex realistic background: 1) the arm has different handoff restrictions / 2) and the arm has different discount rate / / 3) the incomplete information caused by the random interruption of the machine. Another purpose of this paper is to study the optimal stopping time problem with constraints and non-parametric Bayes, so that it can be applied to the above mentioned non-standard MAB.The optimal stopping time problem is discussed on the level of random variable set and within the stopping time class with constraints. A general conclusion is given by using the classical probability theory. This theory covers the classical results of discrete time, continuous time and semi-Markov frame. It is roughly divided into three stages: in the first stage, the results are expanded under the framework of a single index random variable set. Firstly, the concept of allowable stopping class is introduced, and a constrained optimal stopping time model is established to discuss the properties of two classes of value family and optimal stopping time, then the sufficient conditions for the existence of optimal stopping time are constructed, and then the local properties of the family of value variables are discussed. In the second stage, the optimal stopping time problem is extended to the class of two-parameter admissible random variables, and the properties of the optimal double stopping time are studied. In this paper, we discuss the reachability set in the first stage, and prove the property of the countable stopping time decomposition of the reachable set. In the continuous time stochastic MAB model, we consider that each independent arm has its own allowable stop range, and only in this range can we switch. The goal is to maximize the expected total discounted return in infinite time. Firstly, the concept of allowing stopping random sets is introduced, and the general theory of optimal stopping time with stop limit is established. Then, based on the idea of El Karoui and Karatzas 1994), The obtained theory is used to solve the relationship between the return process of a single arm and the Gittins index process. Finally, the excursion method of Kaspi and Mandelbaum 1998) is used to prove the optimality of the Gittins index. In the stochastic MAB model with continuous time, the switching requirements of arm and the case of variable discounting are taken into account. Two kinds of expected total discounted returns are adopted, and the optimal stopping time theory with restrictions is used. The corresponding exponential definition is derived, and the migration method is used to prove that one index is the optimal strategy while the other is not. The Bayesian method is used to transform the scheduling problem with random interruption into a scheduling problem with incomplete information. Choosing the expected discount return as the objective function, we discuss the characteristics of the optimal exponential strategy under static and dynamic strategies, especially the one-step return rate in the dynamic strategy. The purpose of this paper is to understand the influence of different Bayesian frameworks on scheduling policies. In static policies, the conclusions obtained by using general frameworks and parameter frameworks are basically similar. By analyzing the relationship between the one-step rate of return and the Bayesian framework of two examples, the influence of different Bayesian structures on scheduling is illustrated.
【學(xué)位授予單位】:華東師范大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類(lèi)號(hào)】:O212.8;F224

【相似文獻(xiàn)】

相關(guān)會(huì)議論文 前2條

1 Gaojie;;Positive education significance analysis of educational psychology on armed police officers and soldiers[A];2013年教育技術(shù)與管理科學(xué)國(guó)際會(huì)議論文集[C];2013年

2 Ye.M.Zholumbetov;Yeldar Zholumbetov;;THE ROLE OF CONFLICTS IN WORLD ECONOMY DEVELOPMENT: ARABIC COUNTRIES[A];2012 North-East Asia Academic Forum[C];2012年

相關(guān)博士學(xué)位論文 前1條

1 包文清;非標(biāo)準(zhǔn)Multi-armed bandit的隨機(jī)調(diào)度[D];華東師范大學(xué);2016年

,

本文編號(hào):1628980

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/jjglbs/1628980.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶(hù)219a8***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com