I have a question to this paper from Reinforcement Learning, but I figured I ask it here because it involves mostly calculus. I am a bit confused because of some derivations.
(1) In the following they define a gradient of the matrix. Is this gradient then a matrix itself?
(2) Now in the following I don't understand why $\nabla\pi'e=\nabla(\pi'e)$. In the next part I assume something like this is happening:$\nabla\pi'(I-P) - \nabla\pi'e= \nabla\pi'(I-P + \pi'e)=\pi'\nabla P$ although I am not sure why in the paper $\pi'$ and $e$ are swapped.
(3) Here I don't understand why we can assume $\lim_{t \to \infty} A^{t} = 0$ and why does $(I-A)^{-1}=\sum_{t=0}^{\infty}A^{t}$ show that the inverse exists?