Inverse matrices play a key role in linear algebra, and particularly in computations. However, only square matrices can possibly be invertible.
This leads us to introduce the Moore-Penrose inverse of a potentially non-square real- or complex-valued matrix, which satisfies some but not necessarily all of the properties of an inverse matrix.
Definition.
Let

be an
m-by-
n matrix over a field

and

be an
n-by-
m matrix over

, where

is either

, the real numbers, or

, the complex numbers. Recall that

refers to the conjugate transpose of

. Then the following four criteria are called the
Moore–Penrose conditions for 
:
,
,
,
.
We will see below that given a matrix
, there exists a unique matrix
that satisfies all four of the Moore–Penrose conditions. They generalise the properties of the usual inverse.
Remark.
If

is an invertible square matrix, then the ordinary inverse

satisfies the Moore-Penrose conditions for

. Observe also that if

satisfies the Moore-Penrose conditions for

, then

satisfies the Moore-Penrose conditions for

.
We assemble some basic properties of the conjugate transpose for later use. In the following lemmas,
is a matrix with complex elements and n columns,
is a matrix with complex elements and n rows.
Lemma (1).
For any

-matrix

,

Proof. The assumption says that all elements of A*A are zero. Therefore,

Therefore, all
equal 0 i.e.
.
Lemma (2).
For any

-matrix

,

Proof. :
Lemma (3).
For any

-matrix

,

Proof. This is proved in a manner similar to the argument of Lemma 2 (or by simply taking the Hermitian conjugate).
We establish existence and uniqueness of the Moore-Penrose inverse for every matrix.
Theorem.
If

is a

-matrix and

and

satisfy the Moore-Penrose conditions for

, then

.
Proof. Let
be a matrix over
or
. Suppose that
and
are Moore–Penrose inverses of
. Observe then that

Analogously we conclude that
. The proof is completed by observing that then

Theorem.
For every

-matrix

there is a matrix

satisfying the Moore-Penrose conditions for

Proof. The proof proceeds in stages.
is a 1-by-1 matrix
For any
, we define:

It is easy to see that
is a pseudoinverse of
(interpreted as a 1-by-1 matrix).
is a square diagonal matrix
Let
be an n-by-n matrix over
with zeros off the diagonal. We define
as an n-by-n matrix over
with
as defined above. We write simply
for
.
Notice that
is also a matrix with zeros off the diagonal.
We now show that
is a pseudoinverse of
:




is a general diagonal matrix
Let
be an m-by-n matrix over
with zeros off the main diagonal, where m and n are unequal. That is,
for some
when
and
otherwise.
Consider the case where
. Then we can rewrite
by stacking where
is a square diagonal m-by-m matrix, and
is the m-by-(n−m) zero matrix. We define
as an n-by-m matrix over
, with
the pseudoinverse of
defined above, and
the (n−m)-by-m zero matrix. We now show that
is a pseudoinverse of
:
- By multiplication of block matrices,
so by property 1 for square diagonal matrices
proven in the previous section,
.
- Similarly,
, so 
- By 1 and property 3 for square diagonal matrices,
.
- By 2 and property 4 for square diagonal matrices,

Existence for
such that
follows by swapping the roles of
and
in the
case and using the fact that
.
is an arbitrary matrix
The singular value decomposition theorem states that there exists a factorization of the form

where:
is an m-by-m unitary matrix over
.
is an m-by-n matrix over
with nonnegative real numbers on the diagonal and zeros off the diagonal.
is an n-by-n unitary matrix over
.[1]
Define
as
.
We now show that
is a pseudoinverse of
:




This leads us to the natural definition:
Definition (Moore-Penrose inverse).
Let

be a

-matrix. Then the unique

-matrix satisfying the Moore-Penrose conditions for

is called the
Moore-Penrose inverse 
of

.
We have already seen above that the Moore-Penrose inverse generalises the classical inverse to potentially non-square matrices.
We will now list some basic properties of its interaction with the Hermitian conjugate, leaving most of the proofs as exercises to the reader.
Exercise.
For any

-matrix

,

The following identities hold:
- A+ = A+ A+* A*
- A+ = A* A+* A+
- A = A+* A* A
- A = A A* A+*
- A* = A* A A+
- A* = A+ A A*
Proof of the first one:
and
imply that
. □
The remaining identities are left as exercises.
The results of this section show that the computation of the pseudoinverse is reducible to its construction in the
Hermitian case. It suffices to show that the putative constructions satisfy the defining criteria.
Proposition.
For every

-matrix

,

Proof. Observe that

Similarly,
implies that
i.e.
.
Additionally,
so
.
Finally,
implies that
.
Therefore,
.
Exercise.
For every

-matrix

,

We now turn to calculating the Moore-Penrose inverse for a product of two matrices,
Proposition.
If

has orthonormal columns i.e.

, then for any

-matrix

of the right dimensions,

.
Proof. Since
,
. Write
and
. We show that
satisfies the Moore–Penrose criteria for
.
![{\displaystyle {\begin{aligned}CDC&=ABB^{+}A^{*}AB=ABB^{+}B=AB=C,\\[4pt]DCD&=B^{+}A^{*}ABB^{+}A^{*}=B^{+}BB^{+}A^{*}=B^{+}A^{*}=D,\\[4pt](CD)^{*}&=D^{*}B^{*}A^{*}=A\left(B^{+}\right)^{*}B^{*}A^{*}=A\left(BB^{+}\right)^{*}A^{*}=ABB^{+}A^{*}=CD,\\[4pt](DC)^{*}&=B^{*}A^{*}D^{*}=B^{*}A^{*}A\left(B^{+}\right)^{*}=\left(B^{+}B\right)^{*}=B^{+}B=B^{+}A^{*}AB=DC.\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d47101e6fd996def4bbaab662b4cba4e89730adb)
Therefore,
.
Exercise.
If

has orthonormal rows, then for any

-matrix

of the right dimensions,

.
Another important special case which approximates closely that of invertible matrices is when
has full column rank and
has full row rank.
Proposition.
If

has full column rank and

has full row rank, then

.
Proof. Since
has full column rank,
is invertible so
. Similarly, since
has full row rank,
is invertible so
.
Write
(using reduction to the Hermitian case). We show that
satisfies the Moore–Penrose criteria.
![{\displaystyle {\begin{aligned}CDC&=ABB^{*}\left(BB^{*}\right)^{-1}\left(A^{*}A\right)^{-1}A^{*}AB=AB=C,\\[4pt]DCD&=B^{*}\left(BB^{*}\right)^{-1}\left(A^{*}A\right)^{-1}A^{*}ABB^{*}\left(BB^{*}\right)^{-1}\left(A^{*}A\right)^{-1}A^{*}=B^{*}\left(BB^{*}\right)^{-1}\left(A^{*}A\right)^{-1}A^{*}=D,\\[4pt]CD&=ABB^{*}\left(BB^{*}\right)^{-1}\left(A^{*}A\right)^{-1}A^{*}=A\left(A^{*}A\right)^{-1}A^{*}=\left(A\left(A^{*}A\right)^{-1}A^{*}\right)^{*},\\\Rightarrow (CD)^{*}&=CD,\\[4pt]DC&=B^{*}\left(BB^{*}\right)^{-1}\left(A^{*}A\right)^{-1}A^{*}AB=B^{*}\left(BB^{*}\right)^{-1}B=\left(B^{*}\left(BB^{*}\right)^{-1}B\right)^{*},\\\Rightarrow (DC)^{*}&=DC.\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/08bdb378e22cd4b25999bba6b945be51c9da492b)
Therefore,
.
We finally derive a formula for calculating the Moore-Penrose inverse of
.
Proposition.
If

, then

.
Proof. Here,
, and thus
and
. We show that indeed
satisfies the four Moore–Penrose criteria.
![{\displaystyle {\begin{aligned}CDC&=AA^{*}A^{+*}A^{+}AA^{*}=A\left(A^{+}A\right)^{*}A^{+}AA^{*}=AA^{+}AA^{+}AA^{*}=AA^{+}AA^{*}=AA^{*}=C\\[4pt]DCD&=A^{+*}A^{+}AA^{*}A^{+*}A^{+}=A^{+*}A^{+}A\left(A^{+}A\right)^{*}A^{+}=A^{+*}A^{+}AA^{+}AA^{+}=A^{+*}A^{+}AA^{+}=A^{+*}A^{+}=D\\[4pt](CD)^{*}&=\left(AA^{*}A^{+*}A^{+}\right)^{*}=A^{+*}A^{+}AA^{*}=A^{+*}\left(A^{+}A\right)^{*}A^{*}=A^{+*}A^{*}A^{+*}A^{*}\\&=\left(AA^{+}\right)^{*}\left(AA^{+}\right)^{*}=AA^{+}AA^{+}=A\left(A^{+}A\right)^{*}A^{+}=AA^{*}A^{+*}A^{+}=CD\\[4pt](DC)^{*}&=\left(A^{+*}A^{+}AA^{*}\right)^{*}=AA^{*}A^{+*}A^{+}=A\left(A^{+}A\right)^{*}A^{+}=AA^{+}AA^{+}\\&=\left(AA^{+}\right)^{*}\left(AA^{+}\right)^{*}=A^{+*}A^{*}A^{+*}A^{*}=A^{+*}\left(A^{+}A\right)^{*}A^{*}=A^{+*}A^{+}AA^{*}=DC\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8297222023e6c7515b29e06a7e06a22b9c66bfdd)
Therefore,
. In other words:

and, since

The defining feature of classical inverses is that
What can we say about
and
?
We can derive some properties easily from the more basic properties above:
Exercise.
Let

be a

-matrix. Then

and

We can conclude that
and
are orthogonal projections.
Proposition.
Let

be a

-matrix. Then

and

are orthogonal projections
Proof. Indeed, consider the operator
: any vector decomposes as

and for all vectors
and
satisfying
and
, we have
.
It follows that
and
. Similarly,
and
. The orthogonal components are now readily identified.
We finish our analysis by determining image and kernel of the mappings encoded by the Moore-Penrose inverse.
Proposition.
Let

be a

-matrix. Then

and

.
Proof. If
belongs to the range of
then for some
,
and
. Conversely, if
then
so that
belongs to the range of
. It follows that
is the orthogonal projector onto the range of
.
is then the orthogonal projector onto the orthogonal complement of the range of
, which equals the kernel of
.
A similar argument using the relation
establishes that
is
the orthogonal projector onto the range of
and
is the orthogonal projector onto the kernel of
.
Using the relations
and
it follows that the range of P equals the range of
, which in turn implies that the range of
equals the kernel of
. Similarly
implies that the range of
equals the range of
. Therefore, we find,

We present two applications of the Moore-Penrose inverse in solving linear systems of equations.
Moore-Penrose inverses can be used for least-squares minimisation of a system of equations that might not necessarily have an exact solution.
Proposition.
For any

matrix

,

where

.
Proof. We first note that (stating the complex case), using the fact that
satisfies
and
, we have

so that (
stands for the Hermitian conjugate of the previous term in the following)

as claimed.
Remark.
This lower bound need not be zero as the system

may not have a solution (e.g. when the matrix
A does not have full rank or the system is overdetermined).
If

is injective i.e. one-to-one (which implies

), then the bound is attained uniquely at

.
The proof above also shows that if the system
is satisfiable i.e. has a solution, then necessarily
is a solution (not necessarily unique). We can say more:
Proposition.
If the system

is satisfiable, then

is the unique solution with smallest Euclidean norm.
Proof. Note first, with
, that
and that
. Therefore, assuming that
, we have

Thus

with equality if and only if
, as was to be shown.
An immediate consequence of this result is that
is also the uniquely smallest solution to the least-squares minimization problem for all
, including when
is neither injective nor surjective. It can be shown that the least-squares approximation
is unique. Thus it is necessary and sufficient for all
that solve the least-squares minimization to satisfy
. This system always has a solution (not necessarily unique) as
lies in the column space of
. From the above result the smallest
which solves this system is
.
- ↑ Some authors use slightly different dimensions for the factors. The two definitions are equivalent.