[1]

Matrix Multiplication Technique

Difficulty of Matrix Multiplication

Simply, a matrix is nothing but rectangular array of numbers. We say a matrix $\mathbf{A}$ has $m$ rows and $n$ columns if it is of the form

Now suppose we have two square matrices $\mathbf{A}, \mathbf{B}$ both of size $n \times n$ . What we want to calculate is the product,

$\mathbf{C} = \mathbf{AB}$

which is another $n \times n$ matrix. We can directly calculate $\mathbf{C}$ using the definition of matrix multiplication. If we denote the entries of $\mathbf{A, B, C}$ as $a_{ij}, b_{ij}, c_{ij}$ , then

which can be expressed as formula

$c_{ij} = \sum_{k=1}^{n} a_{ik}b_{kj}$

As many of us are familiar with four basic arithmetic operations; +, -, x, and /, it is reasonable to assume that these four basic operations are unit operations , which are fundamental and indecomposable operations. This brings up a question,

How many unit operations are needed in computing square matrix multiplication?

First, for an arbitrary entry $c_{ij}$ ,

we need $(n-1) + n = 2n-1$ unit operations, so that in total,

$U(n) = n^2(2n-1) = 2n^3 - n^2$

number of unit operations for multiplying two $n \times n$ square matrices. (Because there are $n^2$ number of entries in $n \times n$ matrix)

Some say this is reasonable, but when $n$ gets large, the dominating term $2n^3$ diverges in fast speed, so that only supercomputers can handle them.

Obvious Lower Bound

We already proved that the obvious upper bound is $n^2(2n-1)$ , now what is the obvious lower bound for matrix multiplication? If there are no additional information about $\mathbf{A,B}$ , we need to examine all the entries (at least!). Each entry of $\mathbf{C}$ need at least constant number of unit operations, so this gives an obvious lower bound of the form

$L(n) = K n^2$

where $K$ is a constant.

Between Upper and Lower bound

So what we are trying to seek is a matrix multiplcation algorithm which has total number of unit operations between $L(n)$ and $U(n)$ . At first, you might think that any matrix multiplication algorithm must have at least as much unit operations as obvious upper bound. But in 1969, German mathematician Volker Strassen published a remarkable algorithm for matrix multiplication that runs in

$\Theta(n^{\log_2 7})$

time, or in equivalent form, the number of unit operations $T(n)$ satisfies

$K_1 n^{\log_2 7} \leq T(n) \leq K_2 n^{\log_2 7}$

for large enough $n$ , where $K_1, K_2$ are constants. It is clear that

$K_2 n^{\log_2 7} \ll U(n)=2n^3-2n^2$
for large enough $n$ , (because $\log_2 7 < \log_2 8 = 3$ ), so definitely Strassen's remarkable algorithm needs fewer unit operations than usual multiplication for large matrices.

Strassen's algorithm

To keep things simple, let $n$ be a power of 2. If not, pad $2^{k+1} - n$ number of zeros to matrix $\mathbf{A, B}$ where $k$ is an unique integer satisfying

$2^k \leq n < 2^{k+1}$

First Step

We divide $\mathbf{A}, \mathbf{B}$ each into four submatrices, as follows.

$\mathbf{A} = \begin{pmatrix} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21}& \mathbf{A}_{22} \end{pmatrix},\ \mathbf{B} = \begin{pmatrix} \mathbf{B}_{11} & \mathbf{B}_{12}\\ \mathbf{B}_{21}& \mathbf{B}_{22} \end{pmatrix}$

Now, the result would be

$\begin{align*}\mathbf{AB}&=\begin{pmatrix}\mathbf{A}_{11}&\mathbf{A}_{12}\\\mathbf{A}_{21}&\mathbf{A}_{22}\end{pmatrix}\begin{pmatrix}\mathbf{B}_{11}&\mathbf{B}_{12}\\\mathbf{B}_{21}&\mathbf{B}_{22}\end{pmatrix}\\&=\begin{pmatrix}\mathbf{A}_{11}\mathbf{B}_{11}+\mathbf{A}_{12}\mathbf{B}_{21}&\mathbf{A}_{11}\mathbf{B}_{12}+\mathbf{A}_{12}\mathbf{B}_{22}\\\mathbf{A}_{21}\mathbf{B}_{11}+\mathbf{A}_{22}\mathbf{B}_{21}&\mathbf{A}_{21}\mathbf{B}_{12}+\mathbf{A}_{22}\mathbf{B}_{22}\end{pmatrix} \\ &= \begin{pmatrix} \mathbf{C}_{11} & \mathbf{C}_{12}\\ \mathbf{C}_{21} & \mathbf{C}_{22} \end{pmatrix}\end{align*}$

Second Step

This is the hardest and clever part, but it is straightforward. Create 7 matrices

$\begin{align*} \mathbf{P}_1 &= (\mathbf{A}_{11} + \mathbf{A}_{22})(\mathbf{B}_{11} + \mathbf{B}_{22}) \\ \mathbf{P}_2 &= (\mathbf{A}_{21} + \mathbf{A}_{22})\mathbf{B}_{11} \\ \mathbf{P}_3 &= \mathbf{A}_{11} (\mathbf{B}_{12} - \mathbf{B}_{22}) \\ \mathbf{P}_4 &= \mathbf{A}_{22} (\mathbf{B}_{21} - \mathbf{B}_{11}) \\ \mathbf{P}_5 &= (\mathbf{A}_{11} + \mathbf{A}_{12}) \mathbf{B}_{22} \\ \mathbf{P}_6 &= (\mathbf{A}_{21} - \mathbf{A}_{11})(\mathbf{B}_{11} + \mathbf{B}_{12}) \\ \mathbf{P}_7 &= (\mathbf{A}_{12} - \mathbf{A}_{22})(\mathbf{B}_{21} + \mathbf{B}_{22}) \end{align*}$

Note that all submatrices have size $n/2 \times n/2$ so that additions, subtractions, and multiplications are indeed well defined.

Creating Output

From 7 matrices, we need to obtain submatrices $\mathbf{C}_{11},\ \mathbf{C}_{12},\ \mathbf{C}_{21},\ \mathbf{C}_{22}$ . Look carefully.

so that $\mathbf{C}_{11} = \mathbf{P}_1 + \mathbf{P}_4 - \mathbf{P}_5 + \mathbf{P}_7$ .

so that $\mathbf{C}_{12} = \mathbf{P}_3 + \mathbf{P}_5$ .

so that $\mathbf{C}_{21} = \mathbf{P}_2+\mathbf{P}_4$ . Finally,

so that $\mathbf{C}_{22} = \mathbf{P}_1 - \mathbf{P}_2+\mathbf{P}_3 + \mathbf{P}_6$ . Thus we've created all the entries of $\mathbf{C}=\mathbf{AB}$ using 7 submatrices.

Algorithm Analysis

Let's count how many unit operations were used in each step. Denote $T(n)$ as the total number of unit operations on multiplication of $\mathbf{A,B}$ .

First Step Revisited

Dividing matrices $\mathbf{A, B}$ does not require any operations (instead, it needs memory of course)

Second Step Revisited

$\mathbf{P}_1$ is obtained by adding two $n/2 \times n/2$ matrices twice and multiplying them each other. Addition of two $n/2 \times n/2$ matrices needs

$\frac{n}{2} \times \frac{n}{2} = \frac{n^2}{4}$

number of + operations, and also we need to recursively call Strassen's algorithm for multiplication, which is nothing but $T(n/2)$ .

There are 3 matrices of this form, $\mathbf{P}_1, \mathbf{P}_6, \mathbf{P}_7$ .

$\mathbf{P}_2$ is obtained by single addition and multiplication.

There are 4 matrices of this form, $\mathbf{P}_2, \mathbf{P}_3, \mathbf{P}_4, \mathbf{P}_5$ . In total,

$3 \left( \frac{n^2}{2} + T(n/2) \right ) + 4\left( \frac{n^2}{4} + T(n/2) \right ) = 7T(n/2) + \frac{7n^2}{4}$

number of operations are needed in second step.

Final Creation Step Revisited

$\begin{align*} \mathbf{C}_{11} &: \text{4 additions (including subtractions)} \\ \mathbf{C}_{12} &: \text{2 additions (including subtractions)} \\ \mathbf{C}_{21} &: \text{2 additions (including subtractions)} \\ \mathbf{C}_{22} &: \text{4 additions (including subtractions)} \\ \end{align*}$

Therefore, in this step, we need

$(4 + 2 + 2 + 2) \times \frac{n^2}{4} = \frac{5n^2}{2}$

number of unit operations.

Total sum

Adding up,

$T(n) = 7 T(n/2) + \frac{17n^2}{4}\ (n \geq 2)$

How do we solve this?

Let $n = 2^k$ for some positive integer $k$ . Then

$\begin{align*} T(n) &= T(2^k) \\ &= 7T(2^{k-1})+ \frac{17}{4}n^2 \\ &= 7^2T(2^{k-2}) + \frac{17n^2}{4}\left(1 + \frac{1}{4} \right ) \\ &= 7^3 T(2^{k-3}) + \frac{17n^2}{4}\left(1 + \frac{1}{4} + \frac{1}{4^2} \right ) \\ &= ... \\ &= 7^k T(1) + \frac{17n^2}{4}\left(1 + \frac{1}{4} + \frac{1}{4^2}+...+\frac{1}{4^{k-1}} \right ) \\ \end{align*}$

solving the last equation gives

$\begin{align*} T(n) &= 7^k T(1) + \frac{17n^2}{4} \left( \frac{1-(1/4)^k}{1-1/4} \right ) \\ &= 7^{\log_2 n} T(1) + \frac{17n^2}{3} (1- (1/4)^{\log_2 n}) \\ &= T(1) n^{\log_2 7} + \frac{17n^2}{3} - \frac{17}{3} \end{align*}$

using geometric series formula. $n^{\log_2 7}$ grows faster than $n^{2}$ , therefore

$T(n) = \Theta(n^{\log_2 7})\approx \Theta(n^{2.807})$

as desired! The padding of zeros, does not affect the result, since padding increases memory requirements, not number of operations.

Example

For those who do not believe the result, let's look at particular example,

$\mathbf{A} = \begin{pmatrix} 5 & 2 &6 &1 \\ 0& 6 & 2 & 0\\ 3& 8 & 1 & 4\\ 1 & 8 &5 & 6 \end{pmatrix},\ \mathbf{B}=\begin{pmatrix} 7 & 5 & 8 &0 \\ 1 & 8 & 2 & 6\\ 9 & 4 & 3 &8 \\ 5 & 3 & 7 & 9 \end{pmatrix}$

$\begin{align*} \mathbf{P}_1 = \begin{pmatrix} 108 & 180\\ 146 & 269 \end{pmatrix} \end{align*}$ , $\begin{align*} \mathbf{P}_2 = \begin{pmatrix} 40 & 116\\ 56 & 142 \end{pmatrix} \end{align*}$ , $\begin{align*} \mathbf{P}_3 = \begin{pmatrix} 15 & -46\\ -30 & -18 \end{pmatrix} \end{align*}$ ,
$\begin{align*} \mathbf{P}_4 = \begin{pmatrix} 18 & -21\\ -34 & -35 \end{pmatrix} \end{align*}$ , $\begin{align*} \mathbf{P}_5 = \begin{pmatrix} 54 & 115\\ 48 & 70 \end{pmatrix} \end{align*}$ , $\begin{align*} \mathbf{P}_6 = \begin{pmatrix} -12 & 74\\ 21 & 33 \end{pmatrix} \end{align*}$ ,
$\begin{align*} \mathbf{P}_7 = \begin{pmatrix} 24 & 24\\ -108 & -108 \end{pmatrix} \end{align*}$

we get

$\begin{align*} \mathbf{C}_{11} &= \begin{pmatrix} 96& 68\\ 24& 56 \end{pmatrix},\mathbf{C}_{12} = \begin{pmatrix} 69& 69\\ 18 & 52 \end{pmatrix},\\ \mathbf{C}_{21} &= \begin{pmatrix} 58 &95 \\ 90 & 107 \end{pmatrix},\mathbf{C}_{22} = \begin{pmatrix} 71 & 92\\ 81 & 142 \end{pmatrix} \end{align*}$

matches with direct computation

$\mathbf{C} = \begin{pmatrix} 96 &68 &69 &69 \\ 24&56 &18 &52 \\ 58&95 &71 &92 \\ 90&107 &81 &142 \end{pmatrix}$

Limitations

For practical reasons, Strassen's algorithm is often not the choice of matrix multiplication.

It uses a lot of memory. At each step, it produces 7 different submatrices, which is huge waste of memory.
Algorithm itself is numerically unstable. If all the entries are integers (or at least fractions), the multiplication has no error. However, due to limited precision of computer arithmetic on irrational numbers, larger errors accumulate

Conclusion

Theoretically faster but not practical enough. Still, it was a great breakthrough in matrix multiplication algorithm!

Citations

[1] Image source Link Creative Commons Attribution-Share Alike 3.0 (commercial usage allowed)

[2] Introduction to algorithms 3rd edition, Chapter 4, Section 4.2

[3] All other images are self made

Lastly I will post my implementation of Strassen's algorithm using MATLAB

function C = strassen( A, B )
    [n, m ] = size(A);    
    % Base case
    if n == 1
        C = A(1,1) * B(1,1);
    else
        % Recursive Case
        n = n / 2;
        A11 = A(1:n, 1:n);
        A12 = A(1:n, (n+1):end);
        A21 = A((n+1):end, 1:n);
        A22 = A((n+1):end, (n+1):end);
        B11 = B(1:n, 1:n);
        B12 = B(1:n, (n+1):end);
        B21 = B((n+1):end, 1:n);
        B22 = B((n+1):end, (n+1):end);
        
        % Compute P1 to P7
        P1 = strassen(A11 + A22, B11 + B22);
        P2 = strassen(A21 + A22, B11);
        P3 = strassen(A11, B12 - B22);
        P4 = strassen(A22, B21 - B11);
        P5 = strassen(A11 + A12, B22);
        P6 = strassen(A21 - A11, B11 + B12);
        P7 = strassen(A12 - A22, B21 + B22);
        
        % Compute submatrices of C
        C11 = P1 + P4 - P5 + P7;
        C12 = P3 + P5;
        C21 = P2 + P4;
        C22 = P1 - P2 + P3  + P6;
        C = [ C11, C12; C21, C22 ];        
    end    
end

Sort:

Trending

[-]

amansharma555 (46) 6 years ago

Great knowledge as always, both addition and subtraction of two matrices are really very simple, but multiplication is not so much easy, but you made it is easy, thanks for sharing such a great knowledge.

$0.00

mathsolver (53) 6 years ago

Thank you

[CS and Math #21] Matrix Multiplication - Strassen's Algorithm