미분 가능한 신경 컴퓨터

조밀한 이진수를 저장하고 불러오도록 훈련되는 구별 가능한 신경 컴퓨터입니다.교육 중 참조 작업의 수행이 표시되어 있습니다.왼쪽 위: 5비트 워드 및 1비트 인터럽트 신호로서의 입력(빨간색) 및 대상(파란색).오른쪽 위: 모델의 출력입니다.

인공지능에서 DNC(Differentible Neural Computer)는 메모리 증강 뉴럴 네트워크 아키텍처(MANN)로, 일반적으로 구현 시 (정의에 따라) 반복됩니다.이 모델은 DeepMind의 ^[1]알렉스 그레이브스 등에 의해 2016년에 출판되었다.

적용들

DNC는 간접적으로 Von-Neumann 아키텍처에서 영감을 얻어 근본적으로 결정 경계를 찾아 학습할 수 없는 알고리즘적인 태스크에서 기존 아키텍처보다 더 뛰어난 성능을 발휘합니다.

지금까지 DNC는 기존 프로그래밍을 사용하여 해결할 수 있는 비교적 단순한 작업만 처리하는 것으로 입증되었습니다.그러나 DNC는 각 문제에 대해 프로그래밍할 필요가 없으며 대신 교육을 받을 수 있습니다.이 주의 범위를 통해 사용자는 그래프와 같은 복잡한 데이터 구조를 순차적으로 공급하고 나중에 사용하기 위해 불러올 수 있습니다.게다가, 그들은 상징적 추론의 측면을 배울 수 있고 그것을 작업 기억에 적용할 수 있다.이 방법을 발표한 연구진은 DNC가 복잡하고^[1]^[2] 구조화된 작업을 수행하도록 훈련받고 비디오 해설이나 의미 텍스트 ^[3]^[4]분석과 같은 어떤 종류의 추론을 필요로 하는 빅데이터 애플리케이션에 대처할 수 있다는 가능성을 보고 있습니다.

DNC는 고속 운송 시스템을 탐색하고 해당 네트워크를 다른 시스템에 적용하도록 훈련할 수 있습니다.메모리가 없는 뉴럴 네트워크는 일반적으로 각 트랜짓 시스템에 대해 처음부터 학습해야 합니다.지도 학습과 함께 그래프 통과 및 시퀀스 처리 작업에서, DNC는 장기 단기 기억이나 신경 ^[5]튜링 기계와 같은 대안보다 더 잘 수행했다.SHRDLU에서 영감을 받은 블록 퍼즐 문제에 대한 강화 학습 접근방식을 통해 DNC는 커리큘럼 학습을 통해 교육을 받았고 계획을 세우는 방법을 배웠습니다.전통적인 반복신경망보다 ^[5]더 잘 작동했다.

아키텍처

DNC 시스템 다이어그램

DNC 네트워크는 메모리의 저장 위치를 제어하는 메모리 어텐션 메커니즘과 이벤트의 순서를 기록하는 시간적 어텐션의 추가와 함께 뉴럴 튜링 머신(NTM)의 확장으로 도입되었습니다.이 구조를 통해 DNC는 NTM보다 더 견고하고 추상적이며 Long Short Term Memory(LSTM; 롱 단기 메모리) 등의 일부 이전 버전보다 장기적인 의존 관계를 갖는 작업을 수행할 수 있습니다.메모리는 단순한 매트릭스이며 동적으로 할당되어 무제한으로 액세스 할 수 있습니다.DNC는 엔드 투 엔드로 미분할 수 있습니다(모형의 각 하위 구성요소는 미분할 수 있으므로 전체 모델도 미분할 수 있습니다).이를 통해 경사 ^[3]^[6]^[7]강하를 사용하여 효율적으로 최적화할 수 있습니다.

DNC 모델은 Von Neumann 아키텍처와 유사하며, 메모리의 크기 조정성 때문에 Turing ^[8]Complete입니다.

종래의 DNC

DNC(초기^[1] 공개)

독립 변수
$\displaystyle \mathbf {x} _{t}$	입력 벡터
$\displaystyle \mathbf {z} _{t}$	목표 벡터
컨트롤러
${\boldsymbol {chi }_{t}=[\mathbf {x}_{t};\mathbf {r}_{t1};\cdots;\mathbf {r}_{t-1}^{R}}}}}$	컨트롤러 입력 매트릭스

딥(레이어드) LSTM	$\displaystyle \forall \;0\leq l\leq L}$
$({displaystyle \mathbf {i}_{t}^{l}=\display (W_{i}^{l};\mathbf {h}_{t-1}^{l};\mathbf {h}_{t}^l}+\mathbf {l}^{l}^{l})^{l}^{l}^{l}{l}}{l})$	입력 게이트 벡터
$({displaystyle \mathbf {o}_{l}=\mathbf (W_{o}^{l};\mathbf {h}_{t-1}^{l};\mathbf {h}_{t}^l}+\mathbf {l}^{l}^{l})^{l}}{l}}{l}}{l})$	출력 게이트 벡터
$({displaystyle \mathbf {f}_{t}^{l}=\display (W_{f}^{l};\mathbf {h}_{t-1}^{l};\mathbf {h}_{t}^l}+\mathbf {l}^{l}^{l})^{l}^{l})$	게이트 벡터 잊기
$\mathbf {s}_{t}^{l}=\mathbf {f}_{t-1}^{l}+\mathbf {i}_{t}^{l}\tanh(W_{s}^{l}{l}[\syboldmboldmbol {f}_ci}_{l}_mathb}_mathbf$	상태 게이트 벡터, $s_{0}=0$
$\displaystyle \mathbf {h} _{t}^l=\mathbf {o} _{t}^l}\tanh(\mathbf {s} _{t}^{l})}$	숨겨진 게이트 벡터, $h_{0}=0;h_{t}^{0}=0\;\forall \;t$

$\displaystyle \mathbf {y} _{t}=W_{y}[\mathbf {h} _{t}^{1};\cdots;\mathbf {h} _{t}^{L}]+W_{r}[\mathbf {r} _{t}^{1};\cdots;\mathbf {r} _{t}^{R}]}$	DNC 출력 벡터
읽기 및 쓰기 헤드
$\displaystyle \xi _{t}=W_{\xi }[h_{t}^{1};\cdots;h_{t}^{L}}$	인터페이스 파라미터
${{displaystyle =[\mathbf {k}_{r,1};\cdots;\mathbf {k}_{t}^{r,R};\cdots;{\hat}_{t}^{r,R};\cdots;{\mathbf {k}^{k}^{w}^{w}^{t}{\hat {g}_{t}^w}; {\hat {\boldsymbol {pi }^1};\cdots; {\hat {\boldsymbol {pi }_{t}^R}}}$

헤드 읽기	$\displaystyle \forall \;1\leq i\leq R$
$\mathbf {k} _{t}^{r,i}$	키 읽기
$_{t}^{r,i}=text{oneplus}({\hat}_{t}^{r,i})$	장점을 읽다
$f_{t}^{i}=\display {\hat {f}_{t}^{i}}$	프리 게이트
$(\displaystyle {\boldsymbol {pi }}_{t}^i}=text {softmax}({\hat {\boldsymbol {pi }_{t}^i})})$	읽기 모드, $\mathbb {R} ^{3} \displaystyle \boldsymbol \pi } ^{i} \in$

쓰기 헤드
$\mathbf {k} _{t}^{w}$	쓰기 키
$_{t}^{w}=\displayhat {t}^{w$	쓰기 강도
$(\displaystyle \mathbf {e} _{t}=\displays (\mathbf {e} _{t}))$	지우기 벡터
$\displaystyle \mathbf {v} _{t}$	쓰기 벡터
$g_{t}^{a}=\display {\hat {g}_{t}^{a}}$	할당 게이트
$g_{t}^{w}=\display({\hat {g}_{t}^{w})$	쓰기 게이트
기억
$({displaystyle M_{t}=M_{t-1}\circ (E-\mathbf {w}_{t}^{w}\mathbf {e}_{t}^{\intercal})+\mathbf {w}_{t}^{w}\intercal})$	메모리 매트릭스, $E\in \mathbb {R} ^{N\times W}$ 의 행렬 E $E\in \mathbb {R} ^{N\times W}$ $E\in \mathbb {R} ^{N\times W}$ N $E\in \mathbb {R} ^{N\times W}$ × $E\in \mathbb {R} ^{N\times W}$ \ $displaystyle$ E \ $in$ \ $mathbb$ { $R$ } ^ { $N$ \ $times$ W $}$
$\displaystyle \mathbf {u} _{t}=(\mathbf {u} +\mathbf {w} _{t-1}^{w}-\mathbf {u} _{t-1}^{w}\display \mboldsymbol {w} {w}) {t} {t} {w}} {t}}}\symboldsymbf {{t} {w} {t} {w} {t}$	사용 벡터
$\displaystyle \mathbf {p} _{t}=\left(1-\sum _{i}\mathbf {w} _{t}^w}[i]\right)\mathbf {p} _{t-1}+\mathbf {w}$	우선 순위 가중치, $\mathbf {p} _{0}=\mathbf {0}$
$\displaystyle L_{t}=(\mathbf {1} -\mathbf {I} )\left[(1-\mathbf {w} _{t}^w}[i]-\mathbf {w} _{t}^j})L_{t-1}[i,j]+\mathbf {w}_{t}^{w}[i]\mathbf {p}_{t-1}^{j}\right]}$	시간 링크 매트릭스, $L_{0}=\mathbf {0}$
$\displaystyle \mathbf {w} _{t} =g_{t}^{w} [g_{a} \mathbf {a} _{t} + (1-g_{t}^{a}) \mathbf {c} _{t}^{w} } }$	쓰기 가중치
$\mathbf {w}_{t}^{r,i}=bladbold symbol {pi }[1]\mathbf {b}_{t}^{i}+{i}[2] c_{t}^{ri}+{boldsymbol {b}^{t}}^{i}}{i}}}{i}}}}}{{boldsymb}}}}}}}{bl}}}}}}{b}}}}}}}}}$	읽기 가중치
$\mathbf {r} _{t}^{i}=M_{t}^{\intercal}\mathbf {w}_{t}^{r,i}$	벡터 읽기

${\displaystyle {C}(M,\mathbf {k},\mathbcal {D}, (\mathbf {k}, M[i,\cdot])\displaystyle {\mathbcal {D},\mathbf},$	콘텐츠 기반 어드레싱, 조회 $\mathbf {k}$ k(\ $displaystyle \mathbf {k$ 키 $\beta$ β(\ $displaystyle \beta)$
$\displaystyle \phi _{t}$	$\$ 의 인덱스 $\mathbf {u} _{t}$ 사용의 오름차순으로 정렬된
$\displaystyle \mathbf {a} _{t}[j]=(1-\mathbf {u} _{t}[j])\displaystyle _{i=1}^{j-1}\mathbf {u} _{t}[i]$	할당가중치
$\mathbf {c} _{t}^{w}=mathbf {k}_{t-1},\mathbf {k}_{t}^{w},\displaystyle _{t}^{w}}}$	쓰기 콘텐츠 가중치
$\mathbf {c}_{t}^{r,i}=mathbf {k}_{t-1},\mathbf {k}_{r,i}_{t}^{r,i}\display_{t}^{r,i}}}$	읽기 콘텐츠 가중치
$\mathbf {f} _{t}^{i}=L_{t}\mathbf {w}_{t-1}^{r,i}$	전진 가중치
$\mathbf {b} _{t}^{i}=L_{t}^{\intercal}\mathbf {w}_{t-1}^{r,i}$	역가중치
$(\displaystyle {\boldsymbol {psi }}_{t}=\display_{i=1}^{R}\left(\mathbf {1} -f_{t}^{i}\mathbf {w} _{t-1}^r,i}\right)})$	메모리 보유 벡터
정의들
$\displaystyle \mathbf {W} ,\mathbf {b}$	가중치 행렬, 바이어스 벡터
$\displaystyle \mathbf {0} ,\mathbf {1} ,\mathbf {I} }$	0 행렬, 1 행렬, 항등 행렬
$\displaystyle \displays }$	요소별 곱셈
${D}(\mathbf {u},\mathbf {v})=cdot \mathbf {u} {\mathbf {u} \ \ \mathbf {v}$	코사인 유사도
$\displaystyle \display (x)=1/(1+e^{-x})}$	S그모이드 함수
$(\displaystyle\text{oneplus}}(x)=1+\log(1+e^{x})}$	원플러스 함수
${\text{softmax}}(\mathbf {x} )_{j}={\frac {e^{x_{j}}}{\sum _{k=1}^{K}e^{x_{k}}}}$ ( ${\text{softmax}}(\mathbf {x} )_{j}={\frac {e^{x_{j}}}{\sum _{k=1}^{K}e^{x_{k}}}}$ ) ${\text{softmax}}(\mathbf {x} )_{j}={\frac {e^{x_{j}}}{\sum _{k=1}^{K}e^{x_{k}}}}$ ${\text{softmax}}(\mathbf {x} )_{j}={\frac {e^{x_{j}}}{\sum _{k=1}^{K}e^{x_{k}}}}$ ${\text{softmax}}(\mathbf {x} )_{j}={\frac {e^{x_{j}}}{\sum _{k=1}^{K}e^{x_{k}}}}$ x ${\text{softmax}}(\mathbf {x} )_{j}={\frac {e^{x_{j}}}{\sum _{k=1}^{K}e^{x_{k}}}}$ ${\text{softmax}}(\mathbf {x} )_{j}={\frac {e^{x_{j}}}{\sum _{k=1}^{K}e^{x_{k}}}}$ k ${\text{softmax}}(\mathbf {x} )_{j}={\frac {e^{x_{j}}}{\sum _{k=1}^{K}e^{x_{k}}}}$ ${\text{softmax}}(\mathbf {x} )_{j}={\frac {e^{x_{j}}}{\sum _{k=1}^{K}e^{x_{k}}}}$ ${\text{softmax}}(\mathbf {x} )_{j}={\frac {e^{x_{j}}}{\sum _{k=1}^{K}e^{x_{k}}}}$ ${\text{softmax}}(\mathbf {x} )_{j}={\frac {e^{x_{j}}}{\sum _{k=1}^{K}e^{x_{k}}}}$ ${\text{softmax}}(\mathbf {x} )_{j}={\frac {e^{x_{j}}}{\sum _{k=1}^{K}e^{x_{k}}}}$ k $(\displaystyle {softmax}$ (\ $mathbf$ { $x_{x})_{j}$ = $sum$ _ { $k=1}{K}e^{$ x_ ${k}}}}}$ = j ${\text{softmax}}(\mathbf {x} )_{j}={\frac {e^{x_{j}}}{\sum _{k=1}^{K}e^{x_{k}}}}$ = 1, …, K에 대해	Softmax 함수

내선번호

향상된 기능으로는 스파스 메모리 어드레싱이 있어 시간과 공간의 복잡성을 수천 배 줄일 수 있습니다.이것은 로컬에 민감한 해싱과 같은 대략적인 근접 근접 알고리즘을 사용하거나 ^[9]UBC의 Fast Library for Ascarate Neighbors와 같은 랜덤k-d 트리를 사용하여 실현할 수 있습니다.Adaptive Computation Time(ACT; 적응형 계산 시간)을 추가하면 계산 시간과 데이터 시간이 분리됩니다.이것은 문제의 길이와 문제의 난이도가 ^[10]항상 동일하지는 않다는 사실을 사용합니다.합성 구배를 사용한 트레이닝은 Back Propagation through Time(BPTT)^[11]보다 훨씬 뛰어난 성능을 발휘합니다.계층 정규화와 Bypass Dropout을 ^[12]정규화로 사용함으로써 견고성을 향상시킬 수 있습니다.

「」를 참조해 주세요.

차별화 가능한 프로그래밍

레퍼런스

^ ^a ^b ^c Graves, Alex; Wayne, Greg; Reynolds, Malcolm; Harley, Tim; Danihelka, Ivo; Grabska-Barwińska, Agnieszka; Colmenarejo, Sergio Gómez; Grefenstette, Edward; Ramalho, Tiago (2016-10-12). "Hybrid computing using a neural network with dynamic external memory". Nature. 538 (7626): 471–476. Bibcode:2016Natur.538..471G. doi:10.1038/nature20101. ISSN 1476-4687. PMID 27732574. S2CID 205251479.
^ "Differentiable neural computers DeepMind". DeepMind. Retrieved 2016-10-19.
^ ^a ^b Burgess, Matt. "DeepMind's AI learned to ride the London Underground using human-like reason and memory". WIRED UK. Retrieved 2016-10-19.
^ Jaeger, Herbert (2016-10-12). "Artificial intelligence: Deep neural reasoning". Nature. 538 (7626): 467–468. Bibcode:2016Natur.538..467J. doi:10.1038/nature19477. ISSN 1476-4687. PMID 27732576.
^ ^a ^b James, Mike. "DeepMind's Differentiable Neural Network Thinks Deeply". www.i-programmer.info. Retrieved 2016-10-20.
^ "DeepMind AI 'Learns' to Navigate London Tube". PCMAG. Retrieved 2016-10-19.
^ Mannes, John. "DeepMind's differentiable neural computer helps you navigate the subway with its memory". TechCrunch. Retrieved 2016-10-19.
^ "RNN Symposium 2016: Alex Graves - Differentiable Neural Computer". YouTube.
^ Jack W Rae; Jonathan J Hunt; Harley, Tim; Danihelka, Ivo; Senior, Andrew; Wayne, Greg; Graves, Alex; Timothy P Lillicrap (2016). "Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes". arXiv:1610.09027 [cs.LG].
^ Graves, Alex (2016). "Adaptive Computation Time for Recurrent Neural Networks". arXiv:1603.08983 [cs.NE].
^ Jaderberg, Max; Wojciech Marian Czarnecki; Osindero, Simon; Vinyals, Oriol; Graves, Alex; Silver, David; Kavukcuoglu, Koray (2016). "Decoupled Neural Interfaces using Synthetic Gradients". arXiv:1608.05343 [cs.LG].
^ Franke, Jörg; Niehues, Jan; Waibel, Alex (2018). "Robust and Scalable Differentiable Neural Computer for Question Answering". arXiv:1807.02658 [cs.CL].

외부 링크

[DNCnature2016-1] Graves, Alex; Wayne, Greg; Reynolds, Malcolm; Harley, Tim; Danihelka, Ivo; Grabska-Barwińska, Agnieszka; Colmenarejo, Sergio Gómez; Grefenstette, Edward; Ramalho, Tiago (2016-10-12). "Hybrid computing using a neural network with dynamic external memory". Nature. 538 (7626): 471–476. Bibcode:2016Natur.538..471G. doi:10.1038/nature20101. ISSN 1476-4687. PMID 27732574. S2CID 205251479.

[2] "Differentiable neural computers DeepMind". DeepMind. Retrieved 2016-10-19.

[:0-3] Burgess, Matt. "DeepMind's AI learned to ride the London Underground using human-like reason and memory". WIRED UK. Retrieved 2016-10-19.

[4] Jaeger, Herbert (2016-10-12). "Artificial intelligence: Deep neural reasoning". Nature. 538 (7626): 467–468. Bibcode:2016Natur.538..467J. doi:10.1038/nature19477. ISSN 1476-4687. PMID 27732576.

[:1-5] James, Mike. "DeepMind's Differentiable Neural Network Thinks Deeply". www.i-programmer.info. Retrieved 2016-10-20.

[6] "DeepMind AI 'Learns' to Navigate London Tube". PCMAG. Retrieved 2016-10-19.

[7] Mannes, John. "DeepMind's differentiable neural computer helps you navigate the subway with its memory". TechCrunch. Retrieved 2016-10-19.

[8] "RNN Symposium 2016: Alex Graves - Differentiable Neural Computer". YouTube.

[9] Jack W Rae; Jonathan J Hunt; Harley, Tim; Danihelka, Ivo; Senior, Andrew; Wayne, Greg; Graves, Alex; Timothy P Lillicrap (2016). "Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes". arXiv:1610.09027 [cs.LG].

[10] Graves, Alex (2016). "Adaptive Computation Time for Recurrent Neural Networks". arXiv:1603.08983 [cs.NE].

[11] Jaderberg, Max; Wojciech Marian Czarnecki; Osindero, Simon; Vinyals, Oriol; Graves, Alex; Silver, David; Kavukcuoglu, Koray (2016). "Decoupled Neural Interfaces using Synthetic Gradients". arXiv:1608.05343 [cs.LG].

[12] Franke, Jörg; Niehues, Jan; Waibel, Alex (2018). "Robust and Scalable Differentiable Neural Computer for Question Answering". arXiv:1807.02658 [cs.CL].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

Search

미분 가능한 신경 컴퓨터

네임스페이스

더

목차

적용들

아키텍처

종래의 DNC

내선번호

「」를 참조해 주세요.

레퍼런스

외부 링크

Search

미분 가능한 신경 컴퓨터

적용들

아키텍처

종래의 DNC

내선번호

「 」를 참조해 주세요.

레퍼런스

외부 링크

「」를 참조해 주세요.