# Wald GR, Chapter 2 Notes

## 2.1: Manifolds

• A manifold is, roughly, a set in which the vicinity of every point “looks like” $\mathbb{R}^n$.

• An open set is defined as a set which can be expressed as a union of open balls.

• An $n$-dimensional $C^\infty$ manifold is a set $M$ and a set of open subsets ${O_\alpha}$ of $M$ satisfying

1. Each $p\in M$ belongs to some $O\alpha$, i.e. ${O\alpha}$ is an open cover of $M$.
2. For each $\alpha$, there is a bijection $\psi\alpha:O\alpha\to U\alpha$ with $U\alpha\subset\mathbb{R}^n$ open.
3. If $O\alpha\cap O\beta\ne 0$ then $O\alpha\cap O\beta$ is open and $\psi\beta\circ\psi\alpha^{-1}:U\alpha\to U\beta$ is $C^{\infty}$. I think this is essentially the statement that, in an area of overlap, you can smoothly transition from one set of coordinates ($\psi\alpha$) to another set of coordinates ($\psi\beta$).
• The maps ${\psi_\alpha}$ are called charts or coordinate systems.

• We can introduce a topology on $M$ by requiring that the ${\psi_\alpha}$ be homeomorphisms.

• The manifolds considered in this book will all be Hausdorff and paracompact.

• The sphere $S^2$ in $\mathbb{R}^3$ is a two-dimensional manifold, taking: $O^{\pm}_i={(x^1,x^2,x^3)\in S^2\mid \pm x^i>0}$ as our cover and projection maps $\psi^{\pm}_1(x^1,x^2,x^3)=(x^2,x^3)$, etc.

• If $M$ and $M’$ are manifolds, of dimension $n$ and $n’$, with coordinate systems ${\psi\alpha}$ and ${\psi’\beta}$, then $f:M\to M’$ is $C^\infty$ if, for all $\alpha,\beta$, the map $\psi’\beta\circ f\circ\psi\alpha^{-1}:\mathbb{R}^n\to\mathbb{R}^{n’}$ is $C^\infty$.

• If $f:M\to M’$ is $C^\infty$, bijective, and has a $C^\infty$ inverse, then it is a diffeomorphism. Diffeomorphic manifolds have identical manifold structure.

## 2.2: Vectors

• In considering curved spacetime geometries, the natural notion of vectors (“arrows” that can be moved around in space) is lost. For example, there is not a natural way to “add” two points on a sphere and generate a third point.

• It is natural to define a tangent vector to a manifold by referencing an embedding in a higher-dimensional space. In GR it becomes important to define a tangent vector without reference to an embedding.

• Tangent vectors can be defined, without reference to embedding, as directional derivatives. In $\mathbb{R}^n$ there is a one-to-one mapping between vectors and directional derivatives: $v\leftrightarrow v\cdot\nabla$.

• How do we generalize the notion of “directional derivative”? Claim: the defining characteristics are linearity and Leibniz product rule.

• Define a tangent vector in this way: let $\mathcal{F}$ be the set of all $C^\infty$ functions $M\to\mathbb{R}$ (note $\mathbb{R}^1$). Then a tangent vector $v$ at point $p\in M$ is a map $\mathcal{F}\to\mathbb{R}$ which has these properties:

1. $v(af+bg)=av(f)+bv(g)$ for all $f,g\in\mathcal{F}$ and $a,b\in\mathbb{R}$.
2. $v(fg)=v(f)g(p)+f(p)v(g)$ for all $f,g\in\mathcal{F}$.

So a tangent vector acts on functions. I guess it does something like taking a function $f:M\to\mathbb{R}$ and giving back a directional derivative.

• The collection $V_p$ of tangent vectors at $p$ forms a vector space over $\mathbb{R}$: e.g. if $v_1,v_2\in V_p$ then $$(v_1+v_2)(af+bg)=v_1(af+bg)+v_2(af+bg)=a(v_1+v_2)(f)+b(v_1+v_2)(g)$$ so $v_1+v_2\in V_p$.

• Theorem 2.2.1: if $M$ is an $n$-dimensional manifold and $p\in M$, then the tangent space $V_p$ has ${\rm dim}V_p=n$. The proof is by construction of a basis. There is a chart map $\psip:M\to\mathbb{R}^n$ in a neighborhood of $p$. Let $v\mu:\mathcal{F}\to\mathbb{R}$ be given by $$v_\mu(f) = \left.\frac{\partial}{\partial x^\mu}(f\circ\psip^{-1})\right|{\psi_p(p)}=\frac{\partial f}{\partial x^\mu}(p)\cdot\frac{\partial\psi_p^{-1}}{\partial x^\mu}(\psi_p(p)).$$

$f\circ\psi_p^{-1}$ is a map $\mathbb{R}^n\to\mathbb{R}$. The $x^\mu$ are cartesian coordinates in $\mathbb{R}^n$. This is a tangent vector: e.g.

\begin{multline} v_\mu(fg)=\frac{\partial(fg)}{\partial x^\mu}(p)\cdot\frac{\partial\psi_p^{-1}}{\partial x^\mu}(\psi_p(p))=\left[f(p)\frac{\partial g}{\partial x^\mu}(p)+\frac{\partial f}{\partial x^\mu}(p)g(p)\right]\cdot\frac{\partial\psi_p^{-1}}{\partial x^\mu}(\psip(p))
=f(p)v
\mu(g)+v_\mu(f)g(p). \end{multline
}

If $a^1 v_1+\cdots+a^n v_n=0$, then it is zero on all input functions. Let $f^\nu:M\to\mathbb{R}$ be given by $f^\nu=P^\nu\circ\psip$ where $P^\nu:\mathbb{R}^n\to\mathbb{R}$ projects out the $\nu$-th coordinate (i.e. $P^\nu:(x^1,\ldots,x^n)\mapsto x^\nu$). Then $$v\mu(f^\nu)=\frac{\partial}{\partial x^\mu}(P^\nu)=\delta^\nu\mu.$$ Therefore, plugging in $f^\nu$, we find that the coefficient $a^\nu$ must be zero, for each $\nu$. Hence the $v\mu$ are linearly independent.

The fact that the $v_\mu$ also span $V_p$ is explained through an argument which I don’t really follow at this point. I will take this for granted for now.

• The basis $v_\mu$ of $V_p$ is called a coordinate basis and is sometimes denoted $\partial/\partial x^\mu$.

• A different chart $\psi’$ gives rise to a different coordinate basis $v’\mu$. They are related through the chain rule: $$v’\mu=\frac{\partial x^\nu}{\partial x’^\mu}v\nu \qquad {\rm and} \qquad v\mu=\frac{\partial x’^\nu}{\partial x^\mu}v’_\nu$$ where $x’^\nu$ is the $\nu$-th component of the map $\psi’\circ\psi^{-1}$.

• Then an arbitrary vector $w=a^\mu v\mu$ with components $a^\mu$ in the $v\mu$ basis will look like $$w=\left(a^\mu\frac{\partial x’^\nu}{\partial x^\mu}\right)v’\nu$$ in the $v’\nu$ basis; we can read off the transformation law for its components. Of course, the actual vector $w$ is unchanged — only its components change in reaction to the change of basis.

• A smooth curve on $M$ is a $C^\infty$ map $C:\mathbb{R}\to M$. Define the tangent vector to $C$ at $p\in M$, call it $T\in V_p$, by $T(f)=\frac{d}{dt}(f\circ C)(t)$. I still don’t have great intuition for this, besides that it’s a “natural-looking” construction, matching up domains and codomains, etc.

• For $p,q\in M$, there is not a natural relationship between the tangent spaces $V_q$ and $V_p$, given just the manifold structure on $M$. That structure will be provided by the “connection” and “parallel transport” in chapter 3.

• A tangent field or vector field $v$ on $M$ is the obvious thing: at each $p\in M$, it assigns a vector $v(p)\in V_p$. Notationally this is a little tricky. I guess there isn’t a symbol $V$ yet defined for which we could say a tangent field is a map $v:M\to V$. Maybe $V$ is what turns out to be called a “tangent bundle”. $v$ is smooth (meaning $C^\infty$) if it’s smooth on all smooth functions. That is, at a point $p\in M$, $v(p)\in V_p$ is a tangent vector mapping $\mathcal{F}\to\mathbb{R}$ where $\mathcal{F}$ is the set of all smooth functions $M\to\mathbb{R}$. If $f:M\to\mathbb{R}$, then $v(p):M\to\mathbb{R}$. If $v(p)$ is smooth whenever $f$ is, then $v$ is a smooth tangent field. What does it mean for a function $M\to\mathbb{R}$ to be smooth, though? Well, $\mathcal{F}$ is the set of such maps, so the question is better posed up there where it’s defined. Use the charts.

• To check if a vector field $v$ is smooth, it is sufficient to check that its components in the $\partial/\partial x^\mu$ basis are all smooth.

• I don’t yet understand the import of this: a one-parameter group of diffeomorphisms $\phi_t$ is a smooth map $\mathbb{R}\times M\to M$ such that

1. $\phi_t:M\to M$ is a diffeomorphism for each $t\in\mathbb{R}$.
2. $\phi_t\circ\phis=\phi{t+s}$. Thus $\phi_0={\rm id}$.

Hence this is an abelian group of diffeomorphisms in its parameter $t$.

• For fixed $p\in M$, we have $\phi_t(p):\mathbb{R}\to M$, a function of $t$, which is an orbit of $\phi_t$ ($\phi_t$ has an orbit for each $p\in M$). We can create a vector field $v$ from $\phi_t$ by taking $v(p)$ to be the tangent vector to $\phi_t(p)$ at $t=0$. This “can be thought of as the infinitesimal generator of these transformations” — of what transformations?

• On the other hand: “given a smooth vector field $v$ on $M$”, we can come up with integral curves of $v$, meaning “a family of curves in $M$ having the property that one and only one curve passes through each $p\in M$ and the tangent to this curve at $p$ is $v(p)$“.

• I guess the point is that a vector field is kind of equivalent to a set of streamlines on the manifold.

• Given vector fields $v$ and $w$, define a new vector field, their commutator $[v,w]$ by $$v,w = v[w(f)]-w[v(f)].$$ My question is: how does this definition make sense, input/output wise? Fix a point $p\in M$; then, because $v$ is a vector field, $v_p\in V_p$ is a tangent vector, which means it takes in a function $f\in\mathcal{F}$ and outputs a number in $\mathbb{R}$. However, perhaps $v(f)$ signifies $v_p(f)$ as a function of $p$, i.e. that $v(f):M\to{\textrm{all tangent spaces}}$. But then, in an expression like $w[v(f)]$, $w$ wants an input which is a point on the manifold. It seems to be getting an input which is a tangent vector to the manifold. I will leave this alone for now and worry about it when it becomes important in the text.

## 2.3: Tensors; the Metric Tensor

• Measuring a magnetic field: orient a probe in some direction and get a number (field strength) in that direction. Conceivably one would have to measure all infinitely many orientations to know the field. However, the magnetic field turns out to be linear in these probe orientation vectors. Thus, it suffices to measure in three linearly independent orientations. In this sense the magnetic field is a “dual vector”, i.e. in the dual space of $\mathbb{R}^3$. This makes sense, but is it the same thing as a “pseudo-vector”? If so, can’t the electric field be described in the same way, but it’s not a pseudo-vector.

• Stress in a body: a plane with normal vector $\bf n$ passing through point $\bf p$; force per unit area $F$ in the $\bf\ell$ direction. Supposedly $F$ is linear in both $\bf n$ and $\bf\ell$. This isn’t obvious to me. This linearity leads us to define the stress tensor at $\bf p$, which is a multilinear map from [two+] vectors to a number?

• $V$ is a finite-dimensional vector space over $\Bbb R$. The dual vector space to $V$ is $V^*={f:V\to{\Bbb R}\mid f$ linear $}$, which is a vector space, as advertised.

• Given a basis ${vi}\subset V$, we can define a dual basis ${v^{i^}}\subset V^$ by $v^{\mu^*}(v\nu)={\delta^{\mu}}_\nu$. This suffices as a definition because of the linearity of the maps in $V^$ and the fact that the ${v_i}$ are a basis of $V$. Then ${\rm dim}\ V^={\rm dim}\ V$.

• The double-dual $V^{}$ is naturally isomorphic (in a coordinate-free way) to $V$. Define $\phi:V\to V^{}$ by $\phi(v)=\left(w^\mapsto w^(v)\right)$. That is, $\phi(v)\in V^{*}$ is a function taking $w^\in V^$ and outputting a number. That number is the evaluation of $w^$ on $v$. Now $$\phi(v_1+v_2)=\left(w^\mapsto w^(v_1+v_2)\right)=\left(w^\mapsto w^(v_1)\right)+\left(w^\mapsto w^(v_2)\right)=\phi(v_1)+\phi(v_2)$$ and scalar multiplication works the same way. Hence $\phi$ is a homomorphism. It’s also a bijection.

• Let $V$ be a finite dimensional vector space over $\Bbb R$ and $V^*$ its dual. A tensor of type $(k,l)$ over $V$ is a multilinear map $$T:\underbrace{V^\times\cdots\times V^}_k\times\underbrace{V\times\cdots\times V}_l\to{\Bbb R}.$$ As I recall, $f$ being multilinear means this sort of thing: $$f(a+b,c+d)=f(a,c+d)+f(b,c+d)+f(a+b,c)+f(a+b,d).$$ Note that no particular dimension is being specified in this definition.

• This definition encompasses normal vectors: a $(1,0)$ tensor is a map $V^*\to{\Bbb R}$, i.e. an element of the double-dual. It also encompasses dual vectors: a $(0,1)$ tensor is a map $V\to{\Bbb R}$. $(1,1)$ is like a matrix (this is explained in some detail in the text), but why not $(2,0)$ or $(0,2)$? Also, I guess $k$ and $l$ correspond to the number of raised and lowered indices written on the tensor (not necessarily respectively — haven’t thought it through).

• The set $\mathcal{T}(k,l)$ of all tensors of type $(k,l)$ is a vector space of dimension $n^{k+l}$ where $n={\rm dim}\ V$.

• Contraction with respect to the $i$-th dual vector and $j$-th vector is a map $C:\mathcal{T}(k,l)\to\mathcal{T}(k-1,l-1)$. The action of the map is $$C(T)=\sum{\sigma=1}^n T(\ldots,v^{\sigma^*},\ldots;\ldots,v\sigma,\ldots)$$ with ${v\sigma}\subset V$ a basis and ${v^{\sigma^}}\subset V^$ its dual basis. I guess it’s multiplying like $T^\mu x\mu$, but does it have to be symmetric, killing off both an upper and a lower index?

• Outer product of tensors $T$ (type $(k,l)$) and $T’$ (type $(k’,l’)$). I guess the two tensors just multiply and the number of indices adds up directly. Yes, $T\otimes T’$ is just $$T\otimes T’=T({v^};{v})T’({v’^};{v’}).$$

• A simple tensor is an outer product of vectors and dual vectors. It turns out that, given a basis and its dual basis, a basis for any $\mathcal{T}(k,l)$ may be constructed from the simple vectors made out of those bases.

• $V_p$ is a tangent space at point $p$ on a manifold. $V_p^*$ is the cotangent space and its members are cotangent vectors or covariant vectors. Tangent vectors (in $V_p$) are called contravariant vectors. Given coordinates, there is the basis ${\partial/\partial x^i}\subset V_p$. Its dual basis is ${dx^i}\subset Vp^*$. This means: $dx^\mu$ is the linear map defined by $dx^\mu(\partial/\partial x^\nu)={\delta^{\mu}}\nu$.

• A covariant vector $\omega\in Vp^*$ is expressed in the basis $dx^\mu$, so its components have low indices. The components transform as $$\omega’{\mu’}=\omega_\mu \frac{\partial x^\mu}{\partial x’^{\mu’}}.$$ In general, tensors transform as $${T’^{\mu’_1\cdots\mu’k}}{\nu’_1\cdots\nu’_l}={T^{\mu_1\cdots\muk}}{\nu_1\cdots\nu_l}\frac{\partial x’^{\mu’_1}}{\partial x^{\mu_1}}\cdots\frac{\partial x^{\nu_l}}{\partial x’^{\nu’_l}}.$$

• Tensor field, smoothness of tensor are defined in (what seem to be) the obvious ways.

• “Intuitively, a metric is supposed to tell us the “infinitesimal squared distance” associated with an “infinitesimal displacement”.” An “infinitesimal displacement” is a tangent vector. I still don’t really grasp that connection.

• Squared distance $\rightarrow$ quadratic in displacements $\rightarrow$ metric should be a map $V_p\times V_p\to{\Bbb R}$, a tensor of type $(0,2)$ on $V_p$. The metric is also required to be symmetric ($g(v_1,v_2)=g(v_2,v_1)$) and non-degenerate (if $g(v,v’)=0$ for all $v’$, then $v=0$). The metric is an inner product on $V_p$.

• The metric $g$ can be written in terms of its components as $$g=g_{\mu\nu}\ dx^\mu\otimes dx^\nu.$$ Define the signature of the metric by taking an orthonormal basis (allowing norm to be $\pm 1$ rather than strictly $1$) and counting the number of $+1$ norms and $-1$ norms. Riemannian metrics are positive definite (signature $++\cdots+$); Lorentzian metrics have one minus and the rest plus.

## 2.4: The Abstract Index Notation

• The idea behind abstract index notation is to (1) have a compact way of writing tensor equations, while (2) avoiding reference to any particular coordinate basis. The notation looks like it’s referring to a coordinate basis, but using latin letters ($a,b,c$) to indicate that it’s meant to hold in any set of coordinates. ${T^{\mu\nu\lambda}}{\sigma\rho}$ will refer to a basis component of the tensor ${T^{abc}}{de}$.

• Contraction looks like ${T^{abc}}_{be}$, the result being a tensor of type $(2,1)$. Ah, so contraction is an operation on a tensor itself, not a result of multiplying by another entity (vector, etc.) and contracting over the multiplication index.

• The inverse of a metric tensor $g{ab}$ exists (“because of the non-degeneracy of $g{ab}$”: I guess non-degenerate means non-singular/invertible). Apparently it’s a tensor of type $(2,0)$; it can be denoted $(g^{-1})^{ab}$. How is this an inverse? $g_{ab}:V_p\times V_p\to{\Bbb R}$. I would expect its inverse to be a map $g^{-1}:{\Bbb R}\to V_p\times V_p$, not $g^{-1}:V^_p\times V^_p\to{\Bbb R}$. See comments below.

$g^{ab}g_{bc}={\delta^a}_c$ with ${\delta^a}_c:V_p\to V_p$ the identity map on $V_p$.

• This section ends with a discussion of totally symmetric and anti-symmetric tensors, which can be formed like $$T^{(ab)}=\frac{1}{2}(T^{ab}+T^{ba}).$$ The swapped indices refer to the order the tensor operates on its arguments. This is all pretty confusing to me at this point, so I will leave it alone for now. A totally anti-symmetric tensor of type $(0,l)$ is called a differential $l$-form.

• After some more thought about the metric tensor, I now understand the passages towards the end of chapter 2. First of all, I don’t know how I missed the statement that a metric is an inner product on $V_p$, but I did, and it is helpful. Second, fixing $v\in V_p$, the quantity $g(\cdot,v)$ is a map taking a vector $v’\in V_p$ and mapping it to a number; hence $g(\cdot,v)$ is effectively a dual vector (in $Vp^*$). This justifies the raising/lowering notation, $g{ab}v^b=v_a$ with $v_a\in V_p^*$. But, then again, this could be said of any tensor of type $(0,2)$; I guess the metric tensor is the tensor chosen specially to make the connection between the tangent space and its dual.

What’s meant by the “inverse” $g^{ab}$? In what sense is it an inverse? It’s a tensor of type $(2,0)$, and $g^{ab}(\cdot,v^)$ is a map $V_p^\to V_p$, i.e. an element of $V_p^{**}=Vp$. Then I guess that’s the sense of “inverse”: if $r=g{ab}(\cdot,v)$ and $s=g^{ab}(\cdot,r(v))$, then $(s\circ r)(v’)=v’$, or $s\circ r$ is the identity map on $Vp$. We have $g^{ab}g{bc}={\delta^a}_c$.

What is a dual vector, intuitively? In baby linear algebra, if a “normal” vector is a column vector, then a dual vector is actually a row vector. Note that $(a,b) (c,d)^T=ac+bd$, a number. Also, in the notation of quantum mechanics, a bra $\langle\psi|$ is a dual vector. The transpose, or Hermitian conjugate, is the (an?) isomorphism between normal and dual vectors. Does transpose have a coordinate-free meaning? The usual sort of quadratic form $x^T Ax\in{\Bbb R}$ features a tensor $A$ of type $(1,1)$.

## Problem 1: $S^2$ as a manifold

1. Show the overlap functions $f^\pm_i\circ(f^\pm_j)^{-1}$ are $C^\infty$, proving that $S^2$ is a manifold.

These functions are defined in section 2.1. An example suffices: consider $f^+_1\circ(f^-_2)^{-1}$. We have $f^-_2$ mapping the hemisphere $y<0$ down to a unit disk in ${\Bbb R}^2$ and $f^+_1$ also mapping the hemisphere $x>0$ down to a unit disk (squashing the $y$ and $x$ coordinate, respectively). The composed map only makes sense on $\psi^-_2(O^+_1\cap O^-_2)$, where $\psi^-_2$ is the chart from $O^-_2$ to the unit disk $D$. On that set, we have $$(f^-_2)^{-1}(a,b)=(a,-\sqrt{1-a^2-b^2},b)\in O^+_1\cap O^-_2$$ and so $$\left(f^+_1\circ\left(f^-_2\right)^{-1}\right)(a,b)=(-\sqrt{1-a^2-b^2},b).$$ This map is nice and continuously differentiable, staying away from any bad parts of the square root. Any other choices of overlap functions result in a similar form, except for when $i=j$, in which case the composed map doesn’t have a square root in it.

2. Show that two coordinate systems suffice to cover $S^2$.

My first thought is to use spherical coordinates, which map the sphere onto the open set $(0,\pi)\times(0,2\pi)$ in the $\theta\varphi$-plane. There is a seam missing from the sphere, half of a great circle from the north pole down to the south pole. I think this can be covered with some other open set and this can be made to work.

The more traditional answer here is two stereographic projections, (1) one from the north pole, which maps everything but the north pole onto a plane, and (2) one from the south pole, which maps everything but the south pole onto a plane.

## Problem 3: Commutator vector field

1. Given vector fields $v$ and $w$, verify that the commutator $[v,w]$ satisfies the linearity and Liebniz properties, so that it defines a vector field.

First, one has to make sense of this symbol $$v,w=v[w(f)]-w[v(f)],$$ which appears to make no sense. A vector field is a function taking $M\times{\cal F}$ into the tangent bundle of $M$ (the collection of all tangent spaces to $M$). Fixing $p\in M$, the vector field at $p$ maps ${\cal F}\to V_p$. Thus, a straightforward interpretation of the symbols is meaningless.

Apparently, the key is the stuff about orbits and streamlines, as explained here. If $v$ is a vector field and $f\in{\cal F}$, then we can define a function $v_f:M\to\mathbb{R}$ by $$vf(p)=\left.\frac{d}{dt}(f\circ\gamma)(t)\right|{t=0}$$ with $\gamma(t)$ any curve with $\gamma(0)=p$ and $\gamma’(0)=v$. See also derivations and Lie brackets.

Then $v,w$ should be interpreted as $v(w_f)-w(v_f)$; at $p\in M$, this is a vector in $Vp$. Note that, doing the straightforward expansion of the chain rule, we have $$v{fg}=\left.\frac{d}{dt}((fg)\circ\gamma)(t)\right|_{t=0}=v_f g+fv_g$$ so that \begin{multline} v,w=v(w_f g+fw_g)-w(v_f g+fv_g)
=v(w_f)g+w_f v_g+v_f w_g+fv(w_g)-w(v_f)g-v_f w_g-w_f v_g-fw(v_g)
=v(w_f)g+fv(w_g)-w(v_f)g-fw(v_g)
=fv,w+v,wg. \end{multline} Here, we’ve used the linearity and Liebniz property of the vector fields $v$ and $w$, and successfully shown that $[v,w]$ has the Liebniz property. Linearity of $[v,w]$ is easy.

2. Show that the Jacobi identity holds: $[[X,Y],Z]+[[Y,Z],X]+[[Z,X],Y]]=0.$

There’s nothing tricky here.

3. Let ${Y_1,\ldots,Yn}$ be a smooth vector field basis on manifold $M$. Define ${C^\gamma}{\alpha\beta}$ by $$[Y\alpha,Y\beta]={C^\gamma}{\alpha\beta}Y\gamma.$$ Use the Jacobi identity to derive an equation satisfied by the $C$ values.

Write \begin{multline} 0=[[Y\alpha,Y\beta],Y\gamma]+[[Y\beta,Y\gamma],Y\alpha]+[[Y\gamma,Y\alpha],Y\beta]
=[{C^\delta}
{\alpha\beta}Y\delta,Y\gamma]+[{C^\delta}{\beta\gamma}Y\delta,Y\alpha]+[{C^\delta}{\gamma\alpha}Y\delta,Y\beta]. \end{multline} If the $C$ can be pulled out of commutators, then we have further $$0=({C^\delta}{\alpha\beta}{C^\epsilon}{\delta\gamma}+{C^\delta}{\beta\gamma}{C^\epsilon}{\delta\alpha}+{C^\delta}{\gamma\alpha}{C^\epsilon}{\delta\beta})Y\epsilon.$$ Because the ${Y\epsilon}$ are a basis, and hence linearly independent, we have that $${C^\delta}{\alpha\beta}{C^\epsilon}{\delta\gamma}+{C^\delta}{\beta\gamma}{C^\epsilon}{\delta\alpha}+{C^\delta}{\gamma\alpha}{C^\epsilon}{\delta\beta}=0.$$

## Problem 4: More on commutator vector fields

1. Compute the components of $[v,w]$ in a coordinate basis ${\partial_\mu}$.

In the coordinate basis, we have $v=v^\mu\partial\mu$ and $w=w^\mu\partial\mu$. Then \begin{multline} [v,w]=v(w(f))-w(v(f))=v^\nu\partial\nu(w^\mu\partial\mu)-w^\nu\partial\nu(v^\mu\partial\mu)
=v^\nu(\partial\nu w^\mu)\partial\mu+v^\nu w^\mu\partial\nu\partial\mu-w^\nu v^\mu\partial\nu\partial\mu-w^\nu(\partial\nu v^\mu)\partial\mu. \end{multline}

Because partial derivatives commute, $\partial\mu\partial\nu=\partial\nu\partial\mu$, the two middle terms cancel. This leaves us with $$[v,w]=\left(v^\nu\partial\nu w^\mu-w^\nu\partial\nu v^\mu\right)\partial\mu$$ and therefore $$[v,w]^\mu=v^\nu\partial\nu w^\mu-w^\nu\partial_\nu v^\mu.$$

## Problem 6: Dual basis

1. Let ${v_1,\ldots,v_n}$ be a basis of vector space $V$. Define dual vectors $f_i\in V^$ by $f_i(vj)=\delta{ij}$. Show that the ${f_i}$ constitute a basis of $V^$.

First of all, are we free to define such ${f_i}$? Yes, there is a lot of freedom in defining dual vectors; they must just be linear functions $V\to{\Bbb R}$. I think about $f_i$ as a row vector with all zeros except a $1$ at the $i$-th position, while $v_i$ would be the transpose of $f_i$.

First, consider a linear combination $\sum_i c_i f_i=0$. If we evaluate this dual vector at $v_j$, then we find $$0=\sum_i c_i f_i(v_j)=c_j.$$ Hence the coefficients of such a combination must all be zero, and so the ${f_i}$ are linearly independent.

To show that the ${f_i}$ span $V^$, consider an arbitrary $g\in V^$. Observe that the dual vector $$g(v_1)f_1+\cdots+g(v_n)f_n$$ agrees with $g$ on each of the inputs $v_i$. Furthermore, because $g\in V^$, it is linear and hence completely determined by its action on a basis of $V$. Therefore we have succeeded in expressing an arbitrary element of $V^$ as a linear combination of the ${f_i}$, and so the ${f_i}$ are a basis of $V^*$.

2. With ${v_i}$ a basis of $V$ and ${f_i}$ its dual basis of $V^*$, compute the components of an arbitrary vector in each basis.

As shown above, arbitrary $g\in V^*$ may be expressed as $$g=\sum_i g(v_i)f_i.$$ Similarly, the coefficients of $w\in V$ in the ${v_i}$ basis are given by the dual basis acting on $w$, and we have $$w=\sum_i f_i(w)v_i.$$

3. Show that tensor contraction is independent of basis.

I didn’t really understand what was going on in tensor contraction when I took notes on it above. I have a better understanding now. The idea is that a contraction of tensor $T$ looks like $${T^{\alpha\beta\gamma}}{\beta\delta}={U^{\alpha\gamma}}\delta,$$ producing a new tensor $U$ with one smaller rank in both covariant and contravariant indices.

With a coordinate basis of $V$ (on which the tensor is defined), a tensor $T$ is written in components as $$T={T^{\mu_1\cdots\muk}}{\nu_1\cdots\nu_l}\ \frac{\partial}{\partial x^{\mu_1}}\otimes\cdots\otimes\frac{\partial}{\partial x^{\mu_k}}\otimes dx^{\nu_1}\otimes\cdots\otimes dx^{\nu_l}$$ where the big tensor product is a basis vector in the tensor space, an outer product of the various basis vectors of $V$ and dual basis vectors of $V^*$. This makes it fairly obvious what happens under a change of basis. If the basis for the $i$-th copy of $V$ changes, $T$ needs to swap out a $\partial/\partial x^{\mu_i}$, in its basis expansion, for a $\partial/\partial x’^{\mu_i}$. That introduces a chain rule factor of $\partial x^{\mu_i}/\partial x’^{\mu’_i}$ (and a sum over $\mu’_i$).

Now, a contraction of $T$, denoted $CT$, looks like (Eq. 2.3.4) $${(CT)^{\mu1\cdots\mu{k-1}}}_{\nu1\cdots\nu{l-1}}=\sum_{\sigma=1}^n {T^{\mu1\cdots\sigma\cdots\mu{k-1}}}_{\nu1\cdots\sigma\cdots\nu{l-1}}.$$ If the basis changes in the $\sigma$ spot (both basis and dual basis), then we express the contraction in the new basis as $${(CT)‘^{\mu1\cdots\mu{k-1}}}_{\nu1\cdots\nu{l-1}}=\sum_{\sigma,\sigma’,\sigma”} {T^{\mu1\cdots\sigma’\cdots\mu{k-1}}}_{\nu1\cdots\sigma”\cdots\nu{l-1}}\ \frac{\partial x^\sigma}{\partial x’^{\sigma’}}\frac{\partial x’^{\sigma”}}{\partial x^\sigma}.$$ By the chain rule, $$\frac{\partial x’^{\sigma”}}{\partial x^\sigma}\frac{\partial x^\sigma}{\partial x’^{\sigma’}}=\frac{\partial x’^{\sigma”}}{\partial x’^{\sigma’}}={\delta^{\sigma”}}_{\sigma’}.$$ Hence $${(CT)‘^{\mu1\cdots\mu{k-1}}}_{\nu1\cdots\nu{l-1}}=\sum_{\sigma=1}^n {T^{\mu1\cdots\sigma\cdots\mu{k-1}}}_{\nu1\cdots\sigma\cdots\nu{l-1}}$$ and $(CT)‘=CT$, the contraction is independent of choice of basis.

## Problem 7: Orthonormal bases

1. Let $V$ be an $n$-dimensional metric space with metric $g$. Show that it’s always possible to find an orthonormal basis of $V$ (with vectors of norm $\pm 1$).

Let ${v_1,\ldots,v_n}$ be a basis of $V$. These vectors are not necessarily normalized or mutually orthogonal. Define $$u_1=\frac{v_1}{\sqrt{|g(v_1,v_1)|}}$$ so that $g(u_1,u_1)=\pm 1$. Next, we want to take $v_2$ and produce another normalized vector, $u_2$, which is orthogonal to $u_1$. We can do this by considering $$v’_2=v_2-\alpha u_1.$$ In order to have $v’_2$ be orthogonal to $u_1$, we must have $g(v’_2,u_1)=0$. Then applying $g(\cdot,u_1)$, we find that $$\alpha=\frac{g(v_2,u_1)}{g(u_1,u_1)}.$$

Now, define $u_2=v’_2/g(v’_2,v’_2)$ and we have a set ${u_1,u_2}$ of mutually orthogonal, normalized vectors.

Assuming that we have generated a set ${u1,\ldots,u{k-1}}\subset V$ of mutually orthogonal, normalized vectors, we augment our set by considering $$v’_k=vk-\alpha{k-1}u_{k-1}-\cdots-\alpha_1 u_1$$ and then determining the coefficients $\alpha_i$ by applying $g(\cdot,u_i)$. Then $u_k=v’_k/g(v’_k,v’_k)$ is another normalized vector, mutually orthogonal to all of the earlier ones. In this way, we construct an orthonormal basis of $V$.

## Problem 8: Specific metrics

1. Flat ${\Bbb R}^3$ has a metric $g=dx^2+dy^2+dz^2$. Compute the components of the metric in spherical coordinates.

The metric is $$g=g{\mu\nu} dx^\mu dx^\nu=dx^2+dy^2+dz^2.$$ In order to express it in spherical coordinates, which I’ll write $y^\mu=(r,\theta,\phi)$ for clarity of notation, we have to use the chain rule to do something like $$g=(\cdots){\mu\nu}\ dy^\mu dy^\nu=\left(g{\alpha\beta}\frac{\partial x^\alpha}{\partial y^\mu}\frac{\partial x^\beta}{\partial y^\nu}\right)\ dy^\mu dy^\nu=h{\mu\nu}\ dy^\mu dy^\nu.$$ Now the spherical metric coordinates are $h{\mu\nu}$ as defined above, and they can be computed easily (making reference to the parametrization $x=r\sin\theta\cos\phi$, $y=r\sin\theta\sin\phi$, $z=r\cos\theta$): $$h{rr}=h{11}=g{\alpha\beta}\frac{\partial x^\alpha}{\partial r}\frac{\partial x^\beta}{\partial r}=\left(\frac{\partial x}{\partial r}\right)^2+\left(\frac{\partial y}{\partial r}\right)^2+\left(\frac{\partial z}{\partial r}\right)^2=1,$$ $$h{\theta\theta}=h{22}=\left(\frac{\partial x}{\partial\theta}\right)^2+\left(\frac{\partial y}{\partial\theta}\right)^2+\left(\frac{\partial z}{\partial\theta}\right)^2=r^2,$$ $$h{\phi\phi}=h{22}=\left(\frac{\partial x}{\partial\phi}\right)^2+\left(\frac{\partial y}{\partial\phi}\right)^2+\left(\frac{\partial z}{\partial\phi}\right)^2=r^2\sin^2\theta.$$ The off-diagonal terms of the metric all magically vanish, as, for example \begin{multline} h{12}=\frac{\partial x}{\partial r}\frac{\partial x}{\partial\theta}+\frac{\partial y}{\partial r}\frac{\partial y}{\partial\theta}+\frac{\partial z}{\partial r}\frac{\partial z}{\partial\theta}
=r\cos^2\phi\sin\theta\cos\theta+r\sin^2\phi\sin\theta\cos\theta-r\sin\theta\cos\theta=0. \end{multline} Hence the metric in spherical coordinates may be written $$g=h {\mu\nu}dy^\mu dy^\nu=dr^2+r^2 d\theta^2+r^2\sin^2\theta\ d\phi^2.$$ The problem stated in the book gives the inverse parametrization: $r(x,y,z)$, etc. Can that be used in a direct way to compute the metric? I don’t see how.

2. The metric of spacetime in special relativity is $g=-dt^2+dx^2+dy^2+dz^2$. With the coordinates \begin{eqnarray} t’&=t
x’&=\sqrt{x^2+y^2}\cos(\phi-\omega t)
y’&=\sqrt{x^2+y^2}\sin(\phi-\omega t)
z’&=z \end{eqnarray
} and $\tan\phi=y/x$, compute the metric components $g_{\mu\nu}$ and $g^{\mu\nu}$.

First of all, it’s natural to switch the non-rotating frame to polar coordinates $x=r\cos\phi$ and $y=r\sin\phi$, in which case the metric becomes $$g=g{\mu\nu}x^\mu x^\nu=-dt^2+dr^2+r^2 d\phi^2+dz^2$$ where $x^\mu=(t,r,\phi,z)$. Then the rotating coordinates are simply $x’=r\cos(\phi-\omega t)$ and $y’=r\sin(\phi-\omega t)$. The lower indices of the metric transform as $$h{\mu\nu}=g_{\alpha\beta}\frac{\partial x^\alpha}{\partial y^\mu}\frac{\partial x^\beta}{\partial y^\nu}$$ with $y^\mu=(t’,x’,y’,z’)$, but this is an awkward thing to compute: we don’t have $x^\mu$ explicitly in terms of $y^\mu$; we have $y^\mu$ explicitly in terms of $x^\mu$.

There are at least two ways to proceed from here: 1. we can compute the derivative matrix $\frac{\partial y^\mu}{\partial x^\nu}$ and invert it to get the values $\frac{\partial x^\mu}{\partial y^nu}$. 2. we can compute the upper components $h^{\mu\nu}$, which involve the derivatives we have, and then invert that matrix to get $h_{\mu\nu}$.

The second way is easier, so I’ll do it first. The transformation of the upper components can be seen from: $$g=g^{\mu\nu}\frac{\partial}{\partial x^\mu}\frac{\partial}{\partial x^\nu}=g^{\alpha\beta}\frac{\partial y^\mu}{\partial x^\alpha}\frac{\partial y^\nu}{\partial x^\beta}\frac{\partial}{\partial y^\mu}\frac{\partial}{\partial y^\nu}$$ so that $$h^{\mu\nu}=g^{\alpha\beta}\frac{\partial y^\mu}{\partial x^\alpha}\frac{\partial y^\nu}{\partial x^\beta}.$$ The metric components $g^{\mu\nu}$ are the inverse of $g{\mu\nu}$, which is easy to compute because $g{\mu\nu}$ is diagonal. We have $$g=g^{\mu\nu}\frac{\partial}{\partial x^\mu}\frac{\partial}{\partial x^\nu}=-\left(\frac{\partial}{\partial t}\right)^2+\left(\frac{\partial}{\partial r}\right)^2+\frac{1}{r^2}\left(\frac{\partial}{\partial\phi}\right)^2+\left(\frac{\partial}{\partial z}\right)^2.$$ Now, for example, \begin{multline} h^{x’x’}=-\left(\frac{\partial x’}{\partial t}\right)^2+\left(\frac{\partial x’}{\partial r}\right)^2+\frac{1}{r^2}\left(\frac{\partial x’}{\partial\phi}\right)^2+\left(\frac{\partial x’}{\partial z}\right)^2
=\frac{1}{r^2}(x’^2+y’^2)-\omega^2 y’^2=1-\omega^2 y’^2. \end{multline}

There are off-diagonal terms in the new coordinates, because time is mixed in with the spatial coordinates. The components are $$h^{\mu\nu}= \begin{pmatrix} -1 & -\omega y’ & \omega x’ & 0 -\omega y’ & 1-\omega^2 y’^2 & \omega^2 x’ y’ & 0 \omega x’ & \omega^2 x’ y’ & 1-\omega^2 x’^2 & 0 0 & 0 & 0 & 1 \end{pmatrix}.$$