08 lstm
- 机器之心-LSTM入门必读:从入门基础到工作方式详解
- medium-Understanding LSTM and its diagrams
- githubio-Understanding LSTM Networks
Long Short Term Memory
第一步LSTM的每层输入或输出有两个单元,长期记忆($ltm_t$)和工作记忆(wm_t)。这里我们可以理解为之前较长久的信息和当前使用的信息,或者类比于电脑系统上的硬盘上暂存的数据和内存上当前正在处理的数据。第二部增加了四个变量(虽然大部分被称为三个),分别为: - 记忆向量/遗忘门:$remember_t$,$f_t$,forget gate - 保存向量/输入门:$save_t$,$i_t$,input gate - 注意向量/输出门:$focus_t$,$o_t$,output gate - 候选记忆: $ltm'_t$,中间变量
其数学形式表述为: 区中$\sigma$为激活函数,一般表示sigmoid;$\phi$一般表示tanh。可以看到通过多个表达式把上一时刻的两个变量更新为当前时刻的两个变量。
```latex{cmd hide} % By J. Leon, Beerware licence is acceptable... \documentclass[tikz,border=10pt]{standalone} \usepackage{tikz} \usetikzlibrary{positioning, fit, arrows.meta, shapes}
% used to avoid putting the same thing several times... % Command \empt{var1}{var2} \newcommand{\empt}[2]{$#1^{\langle #2 \rangle}$}
\begin{tikzpicture}[ % GLOBAL CFG font=\sf \scriptsize, >=LaTeX, % Styles cell/.style={ % For the main box rectangle, rounded corners=5mm, draw, very thick, }, operator/.style={ %For operators like + and x circle, draw, inner sep=-0.5pt, minimum height =.2cm, }, function/.style={ %For functions ellipse, draw, inner sep=1pt }, ct/.style={ % For external inputs and outputs circle, draw, line width = .75pt, minimum width=1cm, inner sep=1pt, }, gt/.style={ % For internal inputs rectangle, draw, minimum width=4mm, minimum height=3mm, inner sep=1pt }, mylabel/.style={ % something new that I have learned font=\scriptsize\sffamily }, ArrowC1/.style={ % Arrows with rounded corners rounded corners=.25cm, thick, }, ArrowC2/.style={ % Arrows with big rounded corners rounded corners=.5cm, thick, }, ]
%Start drawing the thing...
% Draw the cell:
\node [cell, minimum height =4cm, minimum width=6cm] at (0,0){} ;
% Draw inputs named ibox#
\node [gt] (ibox1) at (-2,-0.75) {$\sigma$};
\node [gt] (ibox2) at (-1.5,-0.75) {$\sigma$};
\node [gt, minimum width=1cm] (ibox3) at (-0.5,-0.75) {Tanh};
\node [gt] (ibox4) at (0.5,-0.75) {$\sigma$};
% Draw opérators named mux# , add# and func# \node [operator] (mux1) at (-2,1.5) {$\times$}; \node [operator] (add1) at (-0.5,1.5) {+}; \node [operator] (mux2) at (-0.5,0) {$\times$}; \node [operator] (mux3) at (1.5,0) {$\times$}; \node [function] (func1) at (1.5,0.75) {Tanh};
% Draw External inputs? named as basis c,h,x
\node[ct, label={[mylabel]Cell}] (c) at (-4,1.5) {\empt{c}{t-1}};
\node[ct, label={[mylabel]Hidden}] (h) at (-4,-1.5) {\empt{h}{t-1}};
\node[ct, label={[mylabel]left:Input}] (x) at (-2.5,-3) {\empt{x}{t}};
% Draw External outputs? named as basis c2,h2,x2
\node[ct, label={[mylabel]Label1}] (c2) at (4,1.5) {\empt{c}{t}};
\node[ct, label={[mylabel]Label2}] (h2) at (4,-1.5) {\empt{h}{t}};
\node[ct, label={[mylabel]left:Label3}] (x2) at (2.5,3) {\empt{h}{t}};
% Start connecting all.
%Intersections and displacements are used.
% Drawing arrows
\draw [ArrowC1] (c) -- (mux1) -- (add1) -- (c2);
% Inputs
\draw [ArrowC2] (h) -| (ibox4);
\draw [ArrowC1] (h -| ibox1)++(-0.5,0) -| (ibox1);
\draw [ArrowC1] (h -| ibox2)++(-0.5,0) -| (ibox2);
\draw [ArrowC1] (h -| ibox3)++(-0.5,0) -| (ibox3);
\draw [ArrowC1] (x) -- (x |- h)-| (ibox3);
% Internal
\draw [->, ArrowC2] (ibox1) -- (mux1);
\draw [->, ArrowC2] (ibox2) |- (mux2);
\draw [->, ArrowC2] (ibox3) -- (mux2);
\draw [->, ArrowC2] (ibox4) |- (mux3);
\draw [->, ArrowC2] (mux2) -- (add1);
\draw [->, ArrowC1] (add1 -| func1)++(-0.5,0) -| (func1);
\draw [->, ArrowC2] (func1) -- (mux3);
\draw [-, ArrowC2] (mux3) |- (h2);
\draw (c2 -| x2) ++(0,-0.1) coordinate (i1);
\draw [-, ArrowC2] (h2 -| x2)++(-0.5,0) -| (i1);
\draw [-, ArrowC2] (i1)++(0,0.2) -- (x2);
\end{tikzpicture} \end{document} ```
Gated Recurrent Units
- 重置门(reset gate), $z_t$
- 更新门(update gate), $r_t$
$\odot, \circ$: Hadamard Product/element-wise multiplication/对应元素相乘