鴿子視覺—行為抉擇的強化學習研究
發(fā)布時間:2019-04-08 15:28
【摘要】:行為抉擇(認知執(zhí)行)是人類與動物等智體(agent)在自然界優(yōu)勝劣汰下得以生存而必備的技能,通過對外界信息的判斷而指導其行為做出抉擇。智體獲取外界信息的主要來源是視覺,占據(jù)所有感知信息的80%以上。在自然界中,智體賴以生存的視覺-行為抉擇大部分是后天學習(強化學習)得來。鴿子因其強大的視覺感知能力和不亞于哺乳動物的行為抉擇能力,成為視覺認知領域的典型模式動物。因此開展鴿子視覺-行為抉擇的強化學習研究,對于揭示智體在行為抉擇中的認知機制具有重要意義,有助于理解智能抉擇行為的腦機制,深化對大腦認知抉擇工作原理的認識。關于鴿子視覺-行為抉擇的研究雖已取得一些進展,但多側重于靜態(tài)規(guī)則下的強化學習研究,實驗范式過于簡化,多采用固定不變的學習率或單一的獎勵矩陣,并不能真正的模擬智體在動態(tài)環(huán)境規(guī)則下的行為抉擇機制。此外,NCL區(qū)(nidopallium caudolaterale)神經(jīng)元在強化學習過程中所起的作用尚不明確。為此本文以鴿子為實驗對象,設計了動態(tài)強化規(guī)則的視覺-行為抉擇實驗范式,開展行為訓練,同步采集了鴿子NCL區(qū)神經(jīng)元電信號,從行為學和神經(jīng)元響應角度分析了鴿子在動態(tài)強化學習過程中的行為抉擇特性和NCL區(qū)神經(jīng)元的響應特性。本文主要開展的工作如下:(1)設計了兩種動態(tài)規(guī)則下的視覺-行為抉擇訓練范式。設計了隨機強化和反轉強化兩種視覺-行為抉擇實驗范式;根據(jù)擬定的實驗流程搭建了行為訓練的硬件與軟件平臺,實現(xiàn)了鴿子基于特定獎懲信息的自動化訓練;同步采集了強化學習訓練過程中鴿子NCL區(qū)神經(jīng)元電信號,完成了神經(jīng)元電信號的預處理。(2)提出了一種新的動態(tài)強化學習模型。通過對經(jīng)典Q-Learning模型的學習率和獎勵矩陣進行改進,提出一種新的動態(tài)強化學習模型,對鴿子在兩種訓練過程中的行為反饋數(shù)據(jù)進行分析,并與經(jīng)典Q-Learning模型對比,結果表明采用動態(tài)強化學習模型預測行為的誤差分別降低了46.98%與30.55%,同時發(fā)現(xiàn)該模型的學習率反映了鴿子在不同訓練階段的內(nèi)部學習狀態(tài)。(3)提取了不同訓練階段鴿子NCL區(qū)神經(jīng)元的響應特征,并做了統(tǒng)計分析。通過篩選有效試次響應信號,選取合適的響應時間窗,計算了特定時間窗內(nèi)的放電頻率,作為神經(jīng)元響應特征;采用曼惠特尼檢驗分析了鴿子在強化學習過程中NCL區(qū)神經(jīng)元響應特征差異顯著性。結果表明,部分(10/60)神經(jīng)元的響應特征反映了訓練中的獎懲信息;部分(21/60)神經(jīng)元的響應特征包含了鴿子學習狀態(tài)的信息。該結果說明NCL區(qū)的神經(jīng)元在強化學習過程中扮演了不同的角色。
[Abstract]:Behavioral decision-making (cognitive execution) is a necessary skill for human and animal (agent) to survive under the survival of the fittest in nature. It guides the decision-making of human and animal behavior by judging the external information. Vision is the main source of external information, accounting for more than 80% of all perceptual information. In nature, most of the visual-behavioral choices on which intellectual bodies depend are acquired learning (reinforcement learning). Pigeons have become a typical model animal in the field of visual cognition because of their powerful visual perception and behavioral decision-making ability of mammals. Therefore, the study of enhanced learning of pigeon visual-behavioral choice is of great significance for revealing the cognitive mechanism of intellectual body in behavioral decision-making, and it is helpful to understand the brain mechanism of intelligent decision-making behavior. Deepen the understanding of the working principle of cognitive choice in the brain. Although some progress has been made in the study of pigeon visual-behavioral choice, most of them focus on reinforcement learning under static rules. The experimental paradigm is too simplified, and the fixed learning rate or a single reward matrix is often used. It can not really simulate the behavior choice mechanism of intelligent body under the dynamic environment rule. In addition, the role of (nidopallium caudolaterale) neurons in the NCL region in reinforcement learning is unclear. In this paper, a visual-behavioral choice experiment paradigm based on dynamic reinforcement rules was designed for pigeons, and the behavior training was carried out. The electrical signals of NCL neurons in pigeons were collected synchronously. The behavioral choice characteristics and the response characteristics of neurons in NCL region of pigeons in the process of dynamic reinforcement learning were analyzed in terms of behavior and neuron response. The main work of this paper is as follows: (1) two visual-behavioral decision-making training paradigms under dynamic rules are designed. Two experimental paradigms of visual-behavioral choice, random reinforcement and reverse reinforcement, are designed, and the hardware and software platform of behavior training is built according to the proposed experimental procedure, and the automatic training of pigeons based on specific rewards and punishments is realized. The neural signals in the NCL region of pigeons were collected synchronously in the process of intensive learning and training, and the preprocessing of neuron signals was completed. (2) A new dynamic reinforcement learning model was proposed. By improving the learning rate and reward matrix of the classical Q-Learning model, a new dynamic reinforcement learning model is proposed. The behavior feedback data of pigeons in the two training processes are analyzed and compared with the classical Q-Learning model. The results show that the error of predicting behavior by dynamic reinforcement learning model is reduced by 46.98% and 30.55%, respectively. At the same time, it was found that the learning rate of the model reflected the internal learning state of pigeons in different training stages. (3) the response characteristics of NCL neurons in different training stages were extracted and analyzed statistically. By selecting the effective response signal and selecting the appropriate response time window, the discharge frequency in the specific time window is calculated as the response characteristic of the neuron. ManWhitney test was used to analyze the characteristics of neuronal responses in the NCL region of pigeons during intensive learning. The results show that the response characteristics of some (10 ~ 60) neurons reflect the information of rewards and punishments in training, and the response characteristics of some (21 ~ (60) neurons contain the information of pigeons' learning state. The results show that the neurons in the NCL region play different roles in reinforcement learning.
【學位授予單位】:鄭州大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:Q42
本文編號:2454693
[Abstract]:Behavioral decision-making (cognitive execution) is a necessary skill for human and animal (agent) to survive under the survival of the fittest in nature. It guides the decision-making of human and animal behavior by judging the external information. Vision is the main source of external information, accounting for more than 80% of all perceptual information. In nature, most of the visual-behavioral choices on which intellectual bodies depend are acquired learning (reinforcement learning). Pigeons have become a typical model animal in the field of visual cognition because of their powerful visual perception and behavioral decision-making ability of mammals. Therefore, the study of enhanced learning of pigeon visual-behavioral choice is of great significance for revealing the cognitive mechanism of intellectual body in behavioral decision-making, and it is helpful to understand the brain mechanism of intelligent decision-making behavior. Deepen the understanding of the working principle of cognitive choice in the brain. Although some progress has been made in the study of pigeon visual-behavioral choice, most of them focus on reinforcement learning under static rules. The experimental paradigm is too simplified, and the fixed learning rate or a single reward matrix is often used. It can not really simulate the behavior choice mechanism of intelligent body under the dynamic environment rule. In addition, the role of (nidopallium caudolaterale) neurons in the NCL region in reinforcement learning is unclear. In this paper, a visual-behavioral choice experiment paradigm based on dynamic reinforcement rules was designed for pigeons, and the behavior training was carried out. The electrical signals of NCL neurons in pigeons were collected synchronously. The behavioral choice characteristics and the response characteristics of neurons in NCL region of pigeons in the process of dynamic reinforcement learning were analyzed in terms of behavior and neuron response. The main work of this paper is as follows: (1) two visual-behavioral decision-making training paradigms under dynamic rules are designed. Two experimental paradigms of visual-behavioral choice, random reinforcement and reverse reinforcement, are designed, and the hardware and software platform of behavior training is built according to the proposed experimental procedure, and the automatic training of pigeons based on specific rewards and punishments is realized. The neural signals in the NCL region of pigeons were collected synchronously in the process of intensive learning and training, and the preprocessing of neuron signals was completed. (2) A new dynamic reinforcement learning model was proposed. By improving the learning rate and reward matrix of the classical Q-Learning model, a new dynamic reinforcement learning model is proposed. The behavior feedback data of pigeons in the two training processes are analyzed and compared with the classical Q-Learning model. The results show that the error of predicting behavior by dynamic reinforcement learning model is reduced by 46.98% and 30.55%, respectively. At the same time, it was found that the learning rate of the model reflected the internal learning state of pigeons in different training stages. (3) the response characteristics of NCL neurons in different training stages were extracted and analyzed statistically. By selecting the effective response signal and selecting the appropriate response time window, the discharge frequency in the specific time window is calculated as the response characteristic of the neuron. ManWhitney test was used to analyze the characteristics of neuronal responses in the NCL region of pigeons during intensive learning. The results show that the response characteristics of some (10 ~ 60) neurons reflect the information of rewards and punishments in training, and the response characteristics of some (21 ~ (60) neurons contain the information of pigeons' learning state. The results show that the neurons in the NCL region play different roles in reinforcement learning.
【學位授予單位】:鄭州大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:Q42
【相似文獻】
相關碩士學位論文 前5條
1 陶夢妍;鴿子視覺—行為抉擇的強化學習研究[D];鄭州大學;2017年
2 陳雪美;鴿子海馬區(qū)位置細胞識別及位置野分布特性分析[D];鄭州大學;2017年
3 李珊;鋒電位功能網(wǎng)絡構建與鴿子轉向行為解碼[D];鄭州大學;2017年
4 楊松領;鴿子迷宮訓練系統(tǒng)的設計與實現(xiàn)[D];鄭州大學;2017年
5 陳艷;基于同步似然的gamma子帶功能網(wǎng)絡構建與鴿子轉向行為解碼[D];鄭州大學;2017年
,本文編號:2454693
本文鏈接:http://sikaile.net/shoufeilunwen/benkebiyelunwen/2454693.html
最近更新
教材專著