基于QAGKRL的強化學習在線神經(jīng)解碼方法
發(fā)布時間:2018-03-11 18:44
本文選題:腦機接口 切入點:神經(jīng)解碼 出處:《浙江大學》2017年碩士論文 論文類型:學位論文
【摘要】:腦機接口將大腦的神經(jīng)活動通過解碼器的解析,轉(zhuǎn)化成可以用來控制外部設備的控制信號,實現(xiàn)了大腦與外界環(huán)境的直接交互,為有運動功能缺失的患者帶來了一種新的康復途徑,F(xiàn)階段關(guān)于神經(jīng)解碼的研究主要是基于監(jiān)督學習和基于強化學習兩大類。相比于需要訓練數(shù)據(jù)的監(jiān)督學習解碼模型,基于強化學習的解碼模型的優(yōu)勢在于:(1)無需實際的用戶肢體行為數(shù)據(jù);(2)允許用戶通過試錯的方式來動態(tài)的操控外部設備;(3)解碼模型能自適應神經(jīng)元發(fā)放模式的變化。大腦具有很強的可塑性,環(huán)境的變化勢必會引起神經(jīng)元發(fā)放模式的改變,強化學習模型這種自適應的特性在腦機接口的解碼穩(wěn)定性中具有重要的作用。本文使用了兩只猴子(B04和B10),基于經(jīng)典center-out伸縮實驗范式對強化學習的自適應特性進行了探索,并和經(jīng)典監(jiān)督學習方法SVM進行了比較分析。在center-out范式中,猴子通過搖桿控制光標球擊打目標球以獲得獎賞。同時,用于離線分析的B04神經(jīng)數(shù)據(jù)來自于其大腦的雙側(cè)初級運動皮層(M1)區(qū),用于在線實驗的B10神經(jīng)數(shù)據(jù)來自于其雙側(cè)背側(cè)前運動皮層(PMd)區(qū)。算法部分,我們首先實現(xiàn)了基于誤差反向傳播(BP)的人工神經(jīng)網(wǎng)絡的強化學習方法(attention gated reinforcement learning,AGREL),和基于徑向基函數(shù)(RBF)神經(jīng)網(wǎng)絡的強化學習方法(quantized attention gated reinforcement learning,QAGKRL),相比于AGREL有陷入局部最小值的缺陷,該方法能實現(xiàn)非線性神經(jīng)解碼的全局最優(yōu)解,同時還利用量化方法壓縮神經(jīng)網(wǎng)絡的拓撲結(jié)構(gòu)以降低計算復雜度。在離線分析中,我們選用了 10天的數(shù)據(jù)進行比較分析,綜合來看單純分類時SVM優(yōu)于QAGKRL,QAGKRL優(yōu)于AGREL,但QAGKRL和AGREL不經(jīng)過訓練和不需要運動數(shù)據(jù)就獲得了和監(jiān)督學習方法近似的分類效果,且在樣本一的模型上測試樣本二時(樣本一二分別對應不同兩天的神經(jīng)信號數(shù)據(jù)集),QAGKRL和AGREL分類正確率下降后能迅速恢復到樣本一測試結(jié)果的水平,而SVM下降到隨機水平后無法恢復。在線腦控采用了在線腦機接口研究中的共享控制方法,引入共享控制參數(shù)來幫助猴子適應從手控到腦控的過渡過程,我們發(fā)現(xiàn)強化學習方法通過與外部環(huán)境的互適應可以獲得比SVM方法更高的在線解碼正確率,且QAGKRL優(yōu)于AGREL,同時作為比較在我們切斷這種互適應關(guān)系后,強化學習方法在線解碼正確率降到平均水平以下且低于SVM方法。綜上所述,本文在腦機接口相關(guān)研究背景下,利用已有資源成功搭建了在線實驗平臺并在平臺上實現(xiàn)了解碼模塊,擴展了 SVM、AGREL、QAGKRL三種解碼算法,先利用離線分析驗證了算法和平臺的有效性,再進行范式訓練和在線實驗,實現(xiàn)了猴子腦控光標球的系統(tǒng)功能。
[Abstract]:The brain-computer interface transforms the neural activity of the brain into a control signal that can be used to control the external equipment, and realizes the direct interaction between the brain and the external environment. The present research on neural decoding is mainly based on two categories of supervised learning and reinforcement learning. Compared with the supervised learning decoding model, which requires training data, The advantage of the decoding model based on reinforcement learning is that the decoding model does not need actual user body behavior data. (2) allows the user to control the external equipment dynamically by trial and error) the decoding model can adapt to the change of neuronal distribution mode. The brain is highly plastic, Changes in the environment will inevitably lead to changes in the pattern of neuronal distribution. The adaptive characteristic of reinforcement learning model plays an important role in the decoding stability of BCI. In this paper, two monkeys, B04 and B10, are used to explore the adaptive characteristics of reinforcement learning based on the classical center-out scaling experimental paradigm. In the center-out paradigm, the monkey controls the cursor ball to hit the target ball through a rocker to get the reward. The B04 neural data for offline analysis came from the bilateral primary motor cortex (M1) region of the brain, and the B10 neural data for the online experiment came from the bilateral dorsal anterior motor cortex (PMd) region. In this paper, we first implement the reinforcement learning method of artificial neural network based on back propagation of error (BP). The reinforcement learning method based on gated reinforcement learning and radial basis function neural network is quantized attention gated reinforcement learning QAGKRL. Compared with AGREL, it has the defect of falling into local minimum. This method can realize the global optimal solution of nonlinear neural decoding, and the topological structure of neural network can be compressed by quantization method to reduce the computational complexity. In off-line analysis, 10 days' data are compared and analyzed. On the whole, SVM is better than QAGKRL and QAGKRL is better than AGREL when classification is simple, but QAGKRL and AGREL have similar classification effect to supervised learning method without training and motion data. In the model of sample one, sample 2 (sample 1 and 2 respectively correspond to different neural signal data sets for two days) and the correct rate of QAGKRL and AGREL classification can quickly recover to the level of sample one test result after the accuracy of QAGKRL and AGREL is reduced. However, SVM can not be recovered when it drops to random level. On-line brain control adopts the shared control method in the research of online brain-computer interface, and the shared control parameters are introduced to help monkeys adapt to the transition process from manual control to brain control. We find that the reinforcement learning method can achieve a higher online decoding accuracy than the SVM method by adapting to the external environment, and that QAGKRL is superior to the AGREL method, and as a comparison, after we cut off this mutual adaptation relationship, The online decoding accuracy of reinforcement learning method is lower than the average level and lower than that of SVM method. The online experimental platform is successfully built using the existing resources and the decoding module is implemented on the platform. Three decoding algorithms are extended. The validity of the algorithm and platform is verified by off-line analysis, and then the normal form training and online experiment are carried out. The system function of monkey brain control cursor ball is realized.
【學位授予單位】:浙江大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TN911.7;R318
【參考文獻】
相關(guān)期刊論文 前1條
1 ;Development of an invasive brain machine interface with a monkey model[J];Chinese Science Bulletin;2012年16期
,本文編號:1599434
本文鏈接:http://sikaile.net/yixuelunwen/swyx/1599434.html
最近更新
教材專著