Properties of Matrix Multiplication

SciencePedia

Key Takeaways

Matrix multiplication represents the composition of linear transformations, where the order of operations is crucial.
The associative property, $(AB)C = A(BC)$ , guarantees that chains of transformations have a consistent outcome, enabling the analysis of complex systems.
Unlike scalar multiplication, matrix multiplication is generally non-commutative ( $AB \neq BA$ ), reflecting that the order of actions matters.
Linearity ensures that the transformation of a sum of inputs is the sum of their individual transformations, a principle known as superposition.
Inverses allow for the "undoing" of transformations and the solving of matrix equations, but are only defined for certain square matrices.

Introduction

Many encounter matrix multiplication as a set of arbitrary, often confusing, arithmetic rules. The notion that order matters, for instance, runs counter to all our experience with regular numbers. This article seeks to demystify these properties by reframing the core concept: a matrix is not a static grid of numbers, but a dynamic action—a transformation. Understanding matrix multiplication, therefore, is about understanding how these actions combine and compose. We will move beyond rote memorization to explore the deep logic that governs the algebra of transformations.

In the sections that follow, we will first dissect the fundamental "Principles and Mechanisms" of matrix multiplication. We will explore why associativity is the bedrock of sequential processes, how linearity enables the powerful principle of superposition, and why the famous non-commutative property is an intuitive reflection of real-world actions. Following this, the "Applications and Interdisciplinary Connections" section will demonstrate how these abstract rules are the essential grammar for describing systems across science and engineering, from the dynamics of satellites and the structure of error-correcting codes to the symmetries of quantum mechanics.

Principles and Mechanisms

To truly understand the properties of matrix multiplication, we must first abandon a comfortable notion: that matrices are just static grids of numbers. This is like describing a car as just a collection of metal and plastic parts. It misses the entire point! A matrix is an action. It is a machine that takes a vector (which you can think of as a point in space, or a signal, or a set of inputs) and transforms it into another vector. The "multiplication" of two matrices, then, is not just some arbitrary arithmetic procedure; it is the act of composing these transformations, of running our input through one machine, and then feeding its output directly into the next.

The Rules of the Game: Associativity and Linearity

Imagine you have a series of signal processing filters. The first, $R$ , takes your 2D input signal and maps it into a 3D space. The second, $S$ , takes that 3D signal and brings it back to 2D. The third, $T$ , processes that final 2D signal. The total transformation is the composition $L = T \circ S \circ R$ . In the language of matrices, the single matrix $M_L$ that represents this entire chain is the product $M_L = M_T M_S M_R$ .

This leads us to the first, and perhaps most comfortable, property: associativity. When calculating this product, does it matter if we first find the combined effect of $M_S M_R$ and then apply $M_T$ ? Or if we first combine $M_T M_S$ and then apply it to the result of $M_R$ ? Of course not. The final output is identical regardless of how we group the operations. In algebra, this is written as $(M_T M_S) M_R = M_T (M_S M_R)$ . This property gives us tremendous freedom and is the bedrock of algebraic manipulation. It assures us that a chain of transformations has a single, unambiguous meaning.

The next property is the very soul of "linear" algebra: linearity. It consists of two simple, but powerful, rules.

First, imagine you have two solutions, $\mathbf{x}_1$ and $\mathbf{x}_2$ , to two different problems, $A\mathbf{x} = \mathbf{b}_1$ and $A\mathbf{x} = \mathbf{b}_2$ . What happens if we add the solutions together? The linearity of matrix multiplication tells us that $A(\mathbf{x}_1 + \mathbf{x}_2) = A\mathbf{x}_1 + A\mathbf{x}_2 = \mathbf{b}_1 + \mathbf{b}_2$ . This is a "principle of superposition." The transformation of a sum of inputs is simply the sum of their individual transformations. The system handles combined inputs gracefully, without them interfering with one another in unpredictable ways.

Second, consider a scenario where a specific recipe of stock solutions, represented by vector $\mathbf{x}_0$ , yields a desired nutrient medium, described by vector $\mathbf{b}_0$ , through the equation $A\mathbf{x}_0 = \mathbf{b}_0$ . Now, what if you need to produce a batch that is 15 times larger, requiring a final composition of $15\mathbf{b}_0$ ? Linearity provides the wonderfully simple answer: you just need to scale up your recipe by the same factor. The new solution is $15\mathbf{x}_0$ , because $A(15\mathbf{x}_0) = 15(A\mathbf{x}_0) = 15\mathbf{b}_0$ . This direct scalability is a hallmark of linear systems and a cornerstone of their utility in science and engineering.

The Big Surprise: Order Is Everything

Here we arrive at the most famous, and initially most perplexing, property of matrix multiplication. For the numbers we use every day, $a \times b$ is always the same as $b \times a$ . We take this commutativity for granted. For matrices, this is catastrophically wrong. In general, for two matrices $A$ and $B$ ,

AB \ne BA

Why? Because matrices are actions, and the order of actions matters. Putting on your socks and then your shoes is not the same as putting on your shoes and then your socks. Rotating an object and then shearing it is not the same as shearing it and then rotating it.

This has profound consequences for algebra. Consider the familiar expansion $(a+b)^2 = a^2 + 2ab + b^2$ . Let's try this with matrices. The correct expansion is $(A+B)^2 = (A+B)(A+B) = A(A+B) + B(A+B) = A^2 + AB + BA + B^2$ . We can only combine the middle terms into $2AB$ if $AB = BA$ . When they are not equal, we are stuck with four separate terms.

A fascinating example comes from studying nilpotent matrices—matrices that become the zero matrix when squared. Let's take two matrices $A$ and $B$ such that $A^2 = O$ and $B^2 = O$ . Is their sum, $A+B$ , also nilpotent? Our intuition, spoiled by commutative algebra, might say yes. But look at the expansion: $(A+B)^2 = A^2 + AB + BA + B^2 = AB + BA$ . This is not necessarily zero! In a specific case constructed in problem, the sum $AB+BA$ actually turns out to be the identity matrix, the complete opposite of the zero matrix. This is a dramatic demonstration that we must always be vigilant about the order of multiplication.

Oases of Calm and Hidden Symmetries

While non-commutativity is the general rule, it's not a universal law. There are beautiful "islands of calm" where order does not matter. The most intuitive example is the set of rotation matrices in a 2D plane, $SO(2)$ . A matrix $R(\theta)$ rotates a vector by an angle $\theta$ . If we perform a rotation by $\alpha$ and then by $\beta$ , the result is a total rotation by $\alpha+\beta$ . It makes no difference to our intuition or the final outcome if we had rotated by $\beta$ first and then $\alpha$ . The algebra confirms this perfectly: $R(\alpha)R(\beta) = R(\beta)R(\alpha) = R(\alpha+\beta)$ . Sets of matrices that commute with each other are algebraically special and form structures known as abelian groups.

Even when matrices don't commute, they can hide surprising symmetries. Consider the products $AB$ and $BA$ . These two matrices can look completely different. Yet, if you calculate the sum of the elements on their main diagonals—a quantity called the trace, denoted $\text{Tr}$ —you will find something remarkable:

\text{Tr}(AB) = \text{Tr}(BA)

This is always true. It's a kind of "conservation law." No matter which order you perform the transformations in, this specific numerical characteristic of the resulting composite transformation remains invariant. It's a clue that even though $AB$ and $BA$ are different matrices, they share a deep underlying connection (in fact, they have the same set of eigenvalues).

The Art of Undoing: Inverses

If a matrix $A$ represents an action, it's natural to ask if there's an action that undoes it. This is the role of the inverse matrix, denoted $A^{-1}$ . The "do-nothing" action, which leaves any vector unchanged, is the identity matrix, $I$ (a matrix with 1s on the diagonal and 0s everywhere else). The fundamental definition of an inverse is a matrix $A^{-1}$ that satisfies the undoing property from both sides:

AA^{-1} = I \quad \text{and} \quad A^{-1}A = I

This seemingly simple definition has immediate and powerful consequences. First, it tells us that only square matrices can have such an inverse. Why? Consider a non-square matrix $M$ of size $p \times q$ , with $p \ne q$ . For the products to be defined, a hypothetical inverse $N$ must have size $q \times p$ . But then the product $MN$ is a $p \times p$ matrix, while the product $NM$ is a $q \times q$ matrix. The definition demands that both equal the same identity matrix, but they can't! One would have to be $I_p$ and the other $I_q$ , which is a contradiction since $p \ne q$ .

For square matrices, there is another beautiful subtlety. Do we always need to check both conditions, $AB=I$ and $BA=I$ ? For general algebraic structures, you must. But the world of square matrices is more rigid. It's a theorem that if you have two square matrices $A$ and $B$ , and you've verified that $AB=I$ , then it is guaranteed that $BA=I$ will also hold.

Armed with inverses, we can solve matrix equations. To solve $AXB=C$ for $X$ , we cannot simply "divide." We must carefully "peel away" the matrices on either side, respecting the non-commutative order. To eliminate $A$ from the left, we must multiply by $A^{-1}$ from the left: $A^{-1}(AXB) = A^{-1}C$ , which simplifies to $XB = A^{-1}C$ . Then, to eliminate $B$ from the right, we multiply by $B^{-1}$ from the right: $(XB)B^{-1} = (A^{-1}C)B^{-1}$ . This isolates our unknown: $X = A^{-1}CB^{-1}$ . Each step is dictated by the fundamental properties of matrix algebra.

A Glimpse of Deeper Unity

The properties of matrix multiplication are not just a sterile set of rules; they are threads that connect different branches of mathematics. Consider the connection to geometry. The dot product of two vectors, written as $\mathbf{x}^T \mathbf{y}$ , tells us about the angle between them and their lengths. What happens to this geometric relationship after both vectors are transformed by a matrix $A$ ? The new dot product is $(A\mathbf{x})^T(A\mathbf{y})$ . Using the transpose property $(MN)^T = N^T M^T$ , this expression becomes $\mathbf{x}^T (A^T A) \mathbf{y}$ . This is a beautiful result. It shows that all the information about how the transformation $A$ stretches, shrinks, and rotates the space is encapsulated in the single symmetric matrix $A^T A$ .

Finally, as a testament to the profound and often surprising unity of mathematics, consider the Cayley-Hamilton theorem. Every square matrix satisfies its own characteristic equation. This sounds abstract, but it's pure magic. The characteristic equation is what we solve to find a matrix's eigenvalues, for instance, $\lambda^2 + 2\lambda - 8 = 0$ . The theorem states that if we replace the variable $\lambda$ with the matrix $A$ itself (and the constant term $-8$ with $-8I$ ), the equation still holds true: $A^2 + 2A - 8I = O$ . This is like finding out that a person's life story is governed by the same equation that describes their fundamental character traits. We can even exploit this! From $A^2 + 2A = 8I$ , we can multiply by $A^{-1}$ to get $A + 2I = 8A^{-1}$ . Rearranging gives us the inverse: $A^{-1} = \frac{1}{8}(A+2I)$ . We have found the inverse of a matrix not by brute force computation, but by using a deep, intrinsic property that ties the matrix to its own defining equation. It is in these unexpected connections that we see the true beauty and power of the mathematical world.

Applications and Interdisciplinary Connections

Having explored the fundamental rules of matrix multiplication—associativity, distributivity, and the curious lack of commutativity—one might be tempted to see them as just that: rules for an abstract mathematical game. But nothing could be further from the truth. These properties are not arbitrary conventions; they are the very grammar of change, the logic that underpins the structure and evolution of systems throughout science and engineering. They dictate how we can describe the world, what is possible within it, and how we can harness its principles. Let's embark on a journey to see how these simple algebraic laws blossom into a rich tapestry of applications, from the bits and bytes of our digital world to the deepest symmetries of the cosmos.

The Grammar of Change and Transformation

At its heart, matrix multiplication is about transformation. A vector represents a state, and multiplying it by a matrix represents a step in its evolution. The property of associativity, the idea that $(AB)C = A(BC)$ , might seem like a dry technicality. In reality, it is a profound statement about the nature of sequential processes. It tells us that if you have a series of transformations, it doesn't matter how you group them; the final outcome is the same. This principle is the bedrock upon which we build our understanding of how systems evolve over time.

Imagine you are tracking a satellite, a stock price, or a population of cells. The state of the system at a given time can be represented by a vector $x_k$ , and its state at the next time step is given by a linear rule, $x_{k+1} = A x_k$ . How do you predict the state 100 steps into the future? You would need to compute $x_{100} = A^{100} x_0$ . One way is to laboriously multiply the vector $x_0$ by the matrix $A$ , one hundred times. But that's the brute-force way. The elegance of linear algebra, powered by associativity, gives us a far more insightful method. If the matrix $A$ can be "diagonalized"—written as $A = V \Lambda V^{-1}$ —then calculating its power becomes astonishingly simple. The associative property guarantees that $A^2 = (V \Lambda V^{-1})(V \Lambda V^{-1}) = V \Lambda(V^{-1}V)\Lambda V^{-1} = V \Lambda^2 V^{-1}$ . By induction, this extends to any power: $A^k = V \Lambda^k V^{-1}$ . Since $\Lambda$ is a diagonal matrix, calculating $\Lambda^k$ is trivial; we just raise its diagonal entries to the $k$ -th power. What have we done here? We've performed a clever change of coordinates (using $V^{-1}$ ), let the system evolve in its "natural" basis where the dynamics are simple (multiplying by $\Lambda^k$ ), and then changed back to our original coordinates (using $V$ ). This trick, which is central to solving linear dynamical systems, is entirely underwritten by the associative property of matrix multiplication. It transforms a complex iterative problem into a simple, direct calculation, revealing the underlying "modes" of the system's behavior encoded in the eigenvalues.

This idea of changing coordinates to simplify a problem leads to an even deeper insight about the distinction between a physical system and our description of it. In control theory, we model systems using a set of matrices $(A, B, C)$ . But is this description unique? What if we choose a different set of internal state variables? This amounts to a "similarity transformation," where the new matrices are related to the old ones by an invertible matrix $T$ : $A' = TAT^{-1}$ , $B' = TB$ , and $C' = CT^{-1}$ . From the inside, the system looks completely different; the matrices are all jumbled up. And yet, the external, physical behavior—the way the system responds to inputs—remains absolutely unchanged. Why? Consider a key measure of this behavior, the sequence of Markov parameters, $g_k = C A^{k-1} B$ . For the transformed system, this becomes $g_k' = C' (A')^{k-1} B'$ . Let's substitute the new matrices and watch the magic of associativity: $g_k' = (C T^{-1}) (T A T^{-1})^{k-1} (T B)$ As we've seen, $(TAT^{-1})^{k-1}$ simplifies to $TA^{k-1}T^{-1}$ . So, $g_k' = (C T^{-1}) (T A^{k-1} T^{-1}) (T B) = C (T^{-1}T) A^{k-1} (T^{-1}T) B = C A^{k-1} B = g_k$ The transformation matrices $T$ and $T^{-1}$ meet in the middle and annihilate each other! Associativity reveals that the physical input-output map is invariant under a change of our internal description. This is a beautiful and powerful concept: matrix properties help us distinguish what is fundamental about reality from what is merely an artifact of our chosen perspective.

The Algebra of Information and Geometry

Beyond describing change, matrix properties define the very structure of information and geometry. They provide the framework for everything from ensuring our data transmits correctly to rendering realistic 3D worlds on a 2D screen.

Consider the miracle of modern communication. Data flies across the globe, through noisy channels, and arrives remarkably intact. Part of this magic is due to error-correcting codes. Many of these are "linear codes," which possess a wonderfully simple structure: if you add any two valid codewords together, you get another valid codeword. Where does this crucial property come from? It's a direct consequence of the distributive property of matrix multiplication. In a linear code, a message vector $u$ is encoded into a codeword $c$ by multiplying it with a "generator" matrix $G$ , so that $c = uG$ . If we have two messages, $u_1$ and $u_2$ , they produce codewords $c_1 = u_1G$ and $c_2 = u_2G$ . Their sum is $c_1 + c_2 = u_1G + u_2G$ . Here, distributivity allows us to factor out the matrix $G$ : $u_1G + u_2G = (u_1 + u_2)G$ Since $u_1+u_2$ is just another valid message vector, its product with $G$ is, by definition, a valid codeword. This closure property, which stems directly from distributivity, is what gives linear codes their elegant algebraic structure, a structure that we exploit to detect and correct errors with remarkable efficiency.

The properties of matrix multiplication also capture the essence of geometric operations. What does it mean, intuitively, to "project" a 3D object onto a 2D surface? It means we map it onto the surface, and if we try to project it again, it's already there, so nothing changes. This simple intuition is perfectly captured by the algebraic property of idempotency: for a projection matrix $P$ , we have $P^2 = P$ . From this single, simple equation, we can deduce something profound about the geometry of projection. If $v$ is an eigenvector of $P$ with eigenvalue $\lambda$ , then $Pv = \lambda v$ . Applying $P$ again gives $P^2v = P(\lambda v) = \lambda(Pv) = \lambda^2v$ . Since $P^2=P$ , we must have $\lambda^2v = \lambda v$ . For a non-zero vector $v$ , this forces $\lambda^2 - \lambda = 0$ , which means the only possible eigenvalues are $\lambda=0$ or $\lambda=1$ . This isn't just a mathematical curiosity; it's a deep truth about projection. Any vector is either annihilated (projected to the zero vector, $\lambda=0$ ) or left untouched by the projection (if it's already in the target space, $\lambda=1$ ). This simple algebraic property underpins computer graphics, statistics, and even quantum mechanics, where measuring a system is often described as a projection.

On a more practical level, matrix properties are the indispensable tools of the computational scientist. When optimizing a process, fitting a model to data, or finding a system's minimum energy configuration, we often end up with complex expressions involving matrices. To solve these problems on a computer, they must be manipulated into a standard, manageable form. A crucial tool in this process is the rule for the transpose of a product: $(XY)^T = Y^T X^T$ . Notice the reversal of order! This rule, along with associativity and distributivity, allows us to take a seemingly impenetrable objective function, such as one you might find in a parameter estimation algorithm, and methodically expand and rearrange it into a clean quadratic form that a computer can minimize. These properties are the workhorses, the trusty wrenches in the toolkit of anyone who uses mathematics to solve real-world problems.

The Language of Symmetry and Composite Systems

Pushing our inquiry to a more abstract level, we find that matrix properties provide the language for some of the most profound concepts in modern science: symmetry and the nature of composite systems.

Physicists love symmetries. A symmetry is a transformation that leaves a system looking the same, and Emmy Noether taught us that for every continuous symmetry in nature, there is a corresponding conserved quantity (like energy, momentum, or charge). Many of these symmetry transformations—rotations, boosts, and more abstract internal symmetries—can be represented by matrices. The set of all symmetry transformations of a given type often forms a "group." This means the set is closed under multiplication (one symmetry operation followed by another is still a symmetry), it contains an identity (doing nothing), and every operation is reversible (it has an inverse). Matrix multiplication is automatically associative, which is also a requirement for a group. For example, the set of all $2 \times 2$ matrices with determinant equal to 1 forms the group $SL(2, \mathbb{R})$ . This can be verified by checking that if $\det(A)=1$ and $\det(B)=1$ , then $\det(AB) = \det(A)\det(B) = 1$ , and that $\det(A^{-1}) = (\det A)^{-1} = 1$ . Most importantly, these matrix groups are generally non-commutative ( $AB \neq BA$ ). Think about rotating a book: a 90-degree turn around the vertical axis followed by a 90-degree turn around the horizontal axis leaves it in a different orientation than if you had performed the rotations in the opposite order. This non-commutativity isn't a mathematical quirk; it's a fundamental feature of our 3D world, and its generalization in physics gives rise to the structure of elementary particles and the forces between them.

The rules of matrix multiplication also scale up to describe how separate systems combine. In quantum mechanics, if we have two systems—say, two particles—that are described individually, how do we describe the combined entity? The answer lies in the Kronecker product (or tensor product), denoted by $\otimes$ . If system 1 evolves according to matrix $A$ and system 2 according to matrix $B$ , the combined evolution involves expressions like $A \otimes I + I \otimes B$ . To work with these much larger matrices, we need to know how they multiply. The essential rule is the "mixed-product property": $(X \otimes Y)(Z \otimes W) = (XZ) \otimes (YW)$ . Notice how this beautiful rule keeps the two "worlds" separate on the right-hand side: the first matrices in each product combine, and the second matrices combine. This allows us to extend the binomial theorem to these objects. For instance, if $A$ and $B$ commute, we find that $(A \otimes I + I \otimes B)^2$ expands just like $(x+y)^2$ , yielding $A^2 \otimes I + 2(A \otimes B) + I \otimes B^2$ . This mathematical machinery is what allows us to handle multiple quantum particles and is the foundation for understanding one of quantum theory's most famous phenomena: entanglement.

Finally, what about systems that are not just complicated, but chaotic or random, like the weather or turbulence in a fluid? We can model such a system as a product of a different random matrix at each time step: $x_n = A_n A_{n-1} \cdots A_1 x_0$ . It seems hopeless to predict the long-term behavior of such a product. Yet, Oseledec's multiplicative ergodic theorem, a landmark result in mathematics, tells us that for almost any sequence of random matrices, the long-term exponential growth rate of $\|x_n\|$ converges to a set of well-defined numbers called Lyapunov exponents. These exponents tell us whether the system is stable or chaotic. The entire mathematical structure of this theory is built on the associative property of the matrix product, expressed in what is known as the cocycle identity. This identity essentially states that the evolution from time $m$ to $n+m$ can be seen as the product of the evolution from $0$ to $m$ and the evolution from $0$ to $n$ in the world that has already evolved for $m$ steps. It's just associativity, applied over and over, allowing us to find deep structural order hidden within apparent randomness.

From the humble act of multiplying two arrays of numbers, a universe of structure unfolds. The properties of this operation are the threads that weave together the principles of dynamics, information, geometry, and symmetry. They provide a unified language for describing our world, demonstrating with beautiful clarity how a few simple rules can give rise to the extraordinary complexity and richness we see all around us.