Matrix-Rematrix

Tensor se transforma em matriz
Tensor se transforma em matriz

O trabalho de uma rede neural é baseado na manipulação de matrizes. Para o treinamento, uma variedade de métodos são usados, muitos dos quais surgiram do método de gradiente descendente, onde é necessário ser capaz de lidar com matrizes, para calcular gradientes (derivados em relação a matrizes). Se você olhar sob o capô de uma rede neural, poderá ver cadeias de matrizes, que geralmente parecem intimidantes. Simplificando, “a matriz está esperando por todos nós”. É hora de nos conhecermos melhor.





Para fazer isso, tomaremos as seguintes etapas:





  • Vamos considerar manipulações com matrizes: transposição, multiplicação, gradiente;





  • ;





  • .





NumPy . , , , , . , , , - , , , . , - : , .





-

- , , , . , , , Google TensorFlow.





, , , , , a_ {i} , i = 0, 1, 2, ..., n-1; n - .





import numpy as np #   numpy
a=np.array([1,2,5])
a.ndim #  ,   = 1
a.shape #      (3,)
a.shape[0] #      = 3
      
      



a_ {i} \ cdot b_ {i} = a_ {0} \ cdot b_ {0} + a_ {1} \ cdot b_ {1} + a_ {2} \ cdot b_ {2}​. , , ​ 0 2 .





b=np.array([3,4,7])
np.dot(a,b) #   = 46
a*b #   array([ 3,  8, 35])
np.sum(a*b) # = 46
      
      



( ) - UMA​, A_ {i, j} ​. , A_ {0, 2}- 0- 2- . , .





A=np.array([[ 1,  2,  3],
            [ 2,  4,  6]])
A # array([[1, 2, 3],
  #        [2, 4, 6]])
A[0, 2] #    ,    = 3
A.shape # (2, 3)   2 , 3 
      
      



UMABC = AB ​ , C_ {i, k} = A_ {i, j} B_ {j, k}​. , UMA B​ ( UMA B​)





B=np.array([[7, 8, 1, 3],
            [5, 4, 2, 7],
            [3, 6, 9, 4]])
A.shape[1] == B.shape[0] # true
A.shape[1], B.shape[0] # (3, 3) 
A.shape, B.shape # ((2, 3), (3, 4))
C = np.dot(A, B)
C # array([[26, 34, 32, 29],
  #        [52, 68, 64, 58]]); 
  #  , C[0,1]=A[0,0]B[0,1]+ A[0,1]B[1,1]+A[0,2]B[2,1]=1*8+2*4+3*6=34
C.shape # (2, 4)   
      
      



BA​ , :





np.dot(B, A) # ValueError: shapes (3,4) and (2,3) not aligned: 4 (dim 1) != 2 (dim 0)
      
      



B UMA, .





, . , a_ {i, 0} b_ {j, 0}​. D_ {i, j} = a_ {i, 0} b_ {j, 0}​. , , , b_ {j, 0} = (bT) _ {0, j}​, bT- ( NumPy). D = a \ cdot bT ​. , DT = (a \ cdot bT) .T = (bTT) \ cdot aT = b \ cdot aT​.





a = np.reshape(a, (3,1)) #   ,  a.shape = (3,)  (3,1),
b = np.reshape(b, (3,1)) #  ,  
D = np.dot(a,b.T)
D # array([[ 3,  4,  7],
  #        [ 6,  8, 14],
  #        [15, 20, 35]])
      
      



, . , .





, , . (cost function). , . . , (learning rate), , (epoch). , . (), . . , , , .





Hora do primeiro

, ( , ).





- (samples) . . , (), ( ) - (samples), - (features).





, ( ). (, …) , , . , .





!

, , . , . , , . , , . , , , .





, 10 . , ​ (10, 3). “ ”, . , . , :





  • , , 0 50 ;





X=np.random.randint(0, 50, (10, 3))
      
      



  • 0 1;





X=np.random.rand(10, 3)
      
      



  • \ mu = 2 \ sigma ^ 2 = 16​. , , N (\ mu, \ sigma ^ 2);





X=4*np.random.randn(10, 3) + 2
      
      



\ mu = 0 \ sigma = 1​, .





, X (10, 3) W ^ {(1)}​, . , , . , , , W ^ {(1)} (3, 4). , (10, 3) (3, 4) \ Rightarrow (10, 4)​. , X \ cdot W ^ {(1)} (10,4)​, - - , . . , UMA​ ​(m, n)( m, n ) a_ {i, j}​, f (A) , f (a_ {i, j}); , , a_ {1,2} \ Rightarrow f (a_ {1,2}), . , W ^ {(2)} , (4, 1)​. , (10, 3) (3, 4) (4, 1) \ Rightarrow (10, 1)​. , ​ \ hat {Y} 10- (samples) . :





\ hat {Y} = X \ cdot W ^ {(1)} \ cdot W ^ {(2)}, \ quad \ quad \ hat {Y} _ {i, 0} = X_ {i, j} W_ { j, k} ^ {(1)} W_ {k, 0} ^ {(2)}.

, . (bias).





. : , , , .





X=np.random.randint(0, 50, (10, 3))
w1=2*np.random.rand(3,4)-1 #       -1  +1
w2=2*np.random.rand(4,1)-1
Y=np.dot(np.dot(x,w1),w2) #   
Y.shape # (10, 1)
Y.T.shape # (1, 10)
(np.dot(Y.T,Y)).shape # (1, 1), ,    
      
      



​. -1 +1, “” ( ).





. f_1 “ ”, - .





\ hat {Y} _ {i, 0} = f_2 (f_1 (X_ {i, j} W_ {j, k} ^ {(1)}) W_ {k, 0} ^ {(2)}), \hat{Y}=f_2(f_1(X \cdot W^{(1)})\cdot W^{(2)}).

, .





\triangle=\sum_i(Y_{i,0}-\hat{Y}_{i,0})^2=\sum_i\widetilde{Y}_{i,0}^2=(\widetilde{Y}.T)_{0,i}\widetilde{Y}_{i,0}=(\widetilde{Y}.T)\cdot\widetilde{Y},

(X,Y)- , \widetilde{Y}_{i,0}=Y_{i,0}-\hat{Y}_{i,0}. , (\widetilde{Y}.T)_{0,i}=\widetilde{Y}_{i,0}.





, . .





. - . , . , .





- , . f(x) f^{'}(x_0)=0​, “ ” - . , , . , , . : - , , - . (, 16 ), , . . ,f^{'}(W)<0​, , , f^{'}(W)>0 ​, . , ​ .





W\Rightarrow W+\mu\cdot\delta W=W-\mu\cdot\frac{\partial \triangle}{\partial W},





W_{i,j}\Rightarrow W_{i,j}+\mu\cdot\delta W_{i,j}=W_{i,j}-\mu\cdot\frac{\partial \triangle}{\partial W_{i,j}},

\mu- (learning rate). , . . - , , . , - .





.





\frac{\partial a_{m, n}}{\partial a_{i,j}}=\delta_{m,i}\delta_{n,j},

\delta_{i,j}​- , , i=j . , \delta_{1,1}=1 ​, \delta_{2,1}=0​. : .









\frac{\partial \triangle}{\partial W_{m,n}}=-2\sum_i(Y_{i,0}-\hat{Y}_{i,0})\frac{\partial \hat{Y}_{i,0}}{\partial W_{m,n}}=-2\widetilde{Y}_{i,0}\frac{\partial \hat{Y}_{i,0}}{\partial W_{m,n}},

, , \widetilde{Y}_{i,0}=Y_{i,0}-\hat{Y}_{i,0}​, .





. . , , .





, \hat{Y}_{i,0}=X_{i,j} W_{j,k}^{(1)} W_{k,0}^{(2)},





\frac{\partial \triangle}{\partial W_{m,0}^{(2)}}=-2\widetilde{Y}_{i,0}\frac{\partial \hat{Y}_{i,0}}{\partial W_{m,0}^{(2)}}=-2\widetilde{Y}_{i,0}X_{i,j} W_{j,k}^{(1)}\delta_{k,m}=-2\widetilde{Y}_{i,0}X_{i,j} W_{j,m}^{(1)}=-2\widetilde{Y}_{i,0}(X\cdot W^{(1)})_{i,m}

, A_{i,m}=(A.T)_{m.i}​. , :





\delta  W_{m,0}^{(2)}=-\frac{\partial \triangle}{\partial W_{m,0}^{(2)}}=2((X\cdot W^{(1)}).T)_{m,i}\widetilde{Y}_{i,0},





\delta  W^{(2)}=2((X\cdot W^{(1)}).T)\cdot \widetilde{Y}.

, , , \delta  W^{(2)}​. X\cdot W^{(1)} (10,3)(3,4)=(10,4)​, - (4,10)​. \widetilde{Y} \hat{Y}- (10,1)​. , \delta  W^{(2)} (4,10)(10,1)=(4,1)​, .





deltaW2=2*np.dot(np.dot(X,w1).T,Y)
deltaW2.shape # (4,1)
      
      



W^{(1)}.





\frac{\partial \triangle}{\partial W_{m,n}^{(1)}}=-2\widetilde{Y}_{i,0}\frac{\partial \hat{Y}_{i,0}}{\partial W_{m,n}^{(1)}}=-2\widetilde{Y}_{i,0}X_{i,j} \delta_{j,m}\delta_{k,n}W_{k,0}^{(2)}=-2\widetilde{Y}_{i,0}X_{i,m} W_{n,0}^{(2)}=-2(X.T)_{m,i}\widetilde{Y}_{i,0}(W^{(2)}.T)_{0,n}, \delta  W^{(1)}=2(X.T)\cdot \widetilde{Y}\cdot (W^{(2)}.T).

, “ ”, “ ” - m n​. , , . : “” ( ), , .





\delta  W^{(1)}: (3,10)(10,1)(1,4)=(3,4).





. ,, , , . . , . , . , , : z=f(y(x))​, z xz_x^{'}=f_y^{'}y_x^{'}​.





,





\hat{Y}_{i,0}=f_2(f_1(X_{i,j} W_{j,k}^{(1)})W_{k,0}^{(2)})\quad\Rightarrow\quad  \hat{Y}_{i,0}=f_2(C_{i,0}),

:





C_{i,0}=B_{i,k}W_{k,0}^{(2)}, \quad\quad B_{i,k}=f_1(A_{i,k}), \quad\quad A_{i,k}=X_{i,j} W_{j,k}^{(1)}.

W_2 , . ,





\delta  W_{m,0}^{(2)}=2\widetilde{Y}_{i,0}\frac{\partial \hat{Y}_{i,0}}{\partial W_{m,0}^{(2)}}=2\widetilde{Y}_{i,0}\frac{\partial f_2(C_{i,0})}{\partial C_{\mu,0}}\frac{\partial C_{\mu,0}}{\partial W_{m,0}^{(2)}}=2\widetilde{Y}_{i,0}f_2^{'}(C_{i,0})\delta_{i,\mu}B_{\mu,k}\delta_{k,m}=2\widetilde{Y}_{i,0}f_2^{'}(C_{i,0})B_{i,m}.

,





\frac{\partial f_2(C_{i,0})}{\partial C_{\mu,0}}=f_2^{'}(C_{i,0})\delta_{i,\mu}, \quad\quad \frac{\partial C_{\mu,0}}{\partial W_{m,0}^{(2)}}=B_{\mu,k}\frac{\partial W_{k,0}^{(2)}}{\partial W_{m,0}^{(2)}}=B_{\mu,k}\delta_{k,m}.

, - . m : B_{i,m}=(B.T)_{m,i}, f_1(A_{i,m})=(f_1(A).T)_{m,i}. ,





\delta  W_{m,0}^{(2)}=2(B.T)_{m,i}\widetilde{Y}_{i,0}f_2^{'}(C_{i,0}) \Rightarrow \delta  W^{(2)}=2(B.T)\cdot(\widetilde{Y}*f_2^{'}(C))

“*” . , a b​, , a*b , ; , a_{1,2}b_{1,2}​.





. f_1(x)=x^2 f_2(x)=x^3. , , . NumPy .





def f1(x): #  
    return np.power(x,2)
def graf1(x): # 
    return 2*x
def f2(x): #  
    return np.power(x,3)
def gradf2(x): # 
    return 3*np.power(x,2)

A=np.dot(X,w1) #   
B=f1(A)        #   
C=np.dot(B,w2) #    
Y=f2() #   
deltaW2=2*np.dot(B.T, Y*gradf2(C))
deltaW2.shape # (4,1)
      
      



W^{(1)} , . - .





\delta  W_{m,n}^{(1)}=2\widetilde{Y}_{i,0}\frac{\partial \hat{Y}_{i,0}}{\partial W_{m,n}^{(1)}}=2\widetilde{Y}_{i,0}\frac{\partial f_2(C_{i,0})}{\partial C_{\mu,\nu}}\frac{\partial C_{\mu,\nu}}{\partial B_{l,s}}\frac{\partial B_{l,s}}{\partial W_{m,n}^{(1)}},

C_{\mu,\nu}=B_{\mu,k}W_{k,\nu}^{(2)}. :





\frac{\partial f_2(C_{i,0})}{\partial C_{\mu,\nu}}=f_2^{'}(C_{i,0})\delta_{i,\mu}\delta_{0,\nu},\quad\quad \frac{\partial C_{\mu,\nu}}{\partial B_{l,s}}=\delta_{\mu,l}\delta_{k,s}W_{k,\nu}^{(2)},\quad\quad \frac{\partial B_{l,s}}{\partial W_{m,n}^{(1)}}=\frac{\partial B_{l,s}}{\partial A_{r,e}}\frac{\partial A_{r,e}}{\partial W_{m,n}^{(1)}}=f_1^{'}(A_{l,s})\delta_{l,r}\delta_{s,e}\delta_{j,m}\delta_{e,n}X_{r,j}=f_1^{'}(A_{l,s})\delta_{l,r}\delta_{s,n}X_{r,m}.

,





\ delta W_ {m, n} ^ {(1)} = 2 \ widetilde {Y} _ {i, 0} f_2 ^ {'} (C_ {i, 0}) \ delta_ {i, \ mu} \ delta_ {0, \ nu} \ delta _ {\ mu, l} \ delta_ {k, s} W_ {k, \ nu} ^ {(2)} f_1 ^ {'} (A_ {l, s}) \ delta_ {s, n} \ delta_ {l, r} X_ {r, m} = 2 \ widetilde {Y} _ {i, 0} f_2 ^ {'} (C_ {i, 0}) W_ {n, 0} ^ {(2)} f_1 ^ {'} (A_ {i, n}) X_ {i, m},





\ delta_ {i, \ mu} \ delta_ {0, \ nu} \ delta _ {\ mu, l} \ delta_ {k, s} \ delta_ {s, n} \ delta_ {l, r} = \ delta_ { i, l} \ delta_ {i, r} \ delta_ {k, n} \ delta_ {s, n}.

, \ delta_ {0, \ nu} W_ {k, \ nu} ^ {(2)} = W_ {k, 0} ^ {(2)}​, m n , “”, l, r, k, s​.





“” ,





\ delta W_ {m, n} ^ {(1)} = 2 (XT) _ {m, i} \ widetilde {Y} _ {i, 0} f_2 ^ {'} (C_ {i, 0}) ( W ^ {(2)}. T) _ {0, n} f_1 ^ {'} (A_ {i, n}), \ delta W ^ {(1)} = 2 (XT) \ cdot [[(\ widetilde {Y} * f_2 ^ {'} (C)) \ cdot (W ^ {(2)}. T)] * f_1 ^ {'} (A)].

, D_ {i, o} = \ widetilde {Y} _ {i, 0} f_2 ^ {'} (C_ {i, 0}) \ Rightarrow \ widetilde {Y} * f_2 ^ {'} (C), F_ {i, n} = D_ {io} (W ^ {(2)}. T) _ {0, n}, F_ {i, n} f_1 ^ {'} (A_ {i, n}) \ Rightarrow F * f_1 ^ {'} (A)​.





.





deltaW1=2*np.dot(X.T, np.dot(Y*gradf2(C),w2.T)*gradf1(A))
deltaW1.shape # (3,4)
      
      



. .





“, - . -!” ? , , , . , . - , , . ! , , - . , , .





, . James Loy - , , , , , . . , , , . “-”, , , . , TensorFlow Keras. , a fonte original (há uma tradução para o russo).





Escreva códigos, mergulhe em fórmulas, leia livros, faça perguntas a si mesmo.





Quanto às ferramentas, são Jupyter Notebook ( regras do Anaconda !), Colab ...








All Articles