This is a reference for physics at the level of AP physics or college physics that is typically taught during the first year of university. It is mostly taken from the widely used textbook “university physics with modern physics”.
(Citation: Young, Freedman& al., 2011Young, H.,
Freedman, R. & Ford, A.
(2011).
University physics with modern physics 13th edition.
Addison-Wesley Reading, MA.)
Understanding of physics is very important because our entire society is built upon it.
Classical mechanics and some other fields are temporarily omitted at this moment. They may be added later.
A mechanical wave is a disturbance that travels through some medium. A transverse wave is a mechanical wave for which the displacements of the medium are perpendicular to the travel direction. A longitudinal wave is a mechanical wave for which the displacements of the medium are in the same direction as the travel direction. A sinusoidal wave is a periodic wave that undergoes simple harmonic motion with amplitude $A$, period $T$, frequency $f=1/T$, wavelength $\lambda$ and constant speed $v$. We have the relation
$$
v=\lambda/T=\lambda f,
$$
namely speed equals wavelength times frequency.
Wave equation. Suppose $y(0, t)=A\cos\omega t$. The motion of point $x$ at time $t$ is the same as the motion of point $x=0$ at earlier time $t- x/v$. So we have
$$
y(x, t) = A\cos\left[\omega\left(t-\frac{x}{v}\right)\right] = A\cos\left[2\pi f\left(\frac{x}{v}-t\right)\right] = A\cos\left[2\pi f\left(\frac{x}{\lambda}-\frac{t}{T}\right)\right]
$$
Call $k=2\pi/\lambda$ the wave number, we have
$$
\begin{cases}
\lambda = 2\pi/k\newline
f = \omega/2\pi\newline
v=\lambda f
\end{cases}\quad\Longrightarrow\quad \omega = vk
$$
We can then re-write the wave function as
$$
y(x, t) = A\cos(kx - \omega t).
$$
From another perspective, the wave speed is such that $kx-\omega t$ is constant. Taking the derivative we get $v=dx/dt = \omega/k$.
Taking the second partial derivative of the wave function w.r.t. $t$ and $x$ respectively we get
$$
y_{t} = \omega A\sin(kx - \omega t)\quad\Rightarrow\quad y_{tt} = -\omega^2 A\cos(kx - \omega t) = -\omega^2y(x, t)
$$
$$
y_{x} = -kA\sin(kx - \omega t)\quad\Rightarrow\quad y_{xx} = -k^2A\cos(kx - \omega t) = -k^2y(x, t)
$$
so we have
$$
\boxed{y_{tt}=v^2y_{xx}}
$$
This is the wave equation.
The wave speed of a transversal string is positively related to tension force $F$ of the string, and negatively related to mass per unit length $\mu$. We now derive the wave speed and also the wave equation using Newton’s second law $F=ma$. Consider a segment of length $\Delta x$. Suppose the force at the left end is $F_1=(F, F_{1y})$, and the force at the right end is $F_2=(F, F_{2y})$. We use $F$ for horizontal components because there is no horizontal movement and the two are equal. The ratios of the second component and the first component are equal to the magnitudes of the slope of the string at point $x$ and $x+\Delta x$ respectively:
$$
\frac{F_{1y}}{F} = -\left(\frac{\partial y}{\partial x}\right)_x,\qquad \frac{F_{2y}}{F}= \left(\frac{\partial y}{\partial x}\right)_{x+\Delta x}.
$$
So from
$$
F_y = F_{1y} + F_{2y} = \mu\Delta x\cdot\frac{\partial^2y}{\partial t^2}
$$
we obtain
$$
\frac{\left(\frac{\partial y}{\partial x}\right)_{x+\Delta x}-\left(\frac{\partial y}{\partial x}\right)_x}{\Delta x} = \frac{\mu}{F}\frac{\partial^2y}{\partial t^2}
$$
Letting $\Delta x\to\infty$ we get
$$
\frac{\partial^2 y}{\partial x^2} =\frac{\mu}{F}\frac{\partial^2y}{\partial t^2}
$$
which is the wave equation. We have
$$
v = \sqrt{\frac{F}{\mu}}.
$$
Standing Waves. Note that the wave equation is linear: if $y_1(x,t)$ and $y_2(x,t)$ are solutions to the wave equation, then so does $y_1(x,t)+y_2(x,t)$. This is the principle of superposition: if two waves propagate against each other from opposite sides, then the resulting wave function is the sum of the two. In particular, if we fix the two ends of a string, like a guitar string, and plunk the string, it will oscillate up and down without appearing to move in either direction, so that it is called a standing wave. To see this, let
$$
\begin{cases}
y_1(x, t) = -A\cos(kx + \omega t)\qquad\text{(wave traveling to the left)}\newline
y_2(x, t) = A\cos(kx - \omega t)\qquad\text{(wave traveling to the right)}\newline
\end{cases}
$$
then
$$
y(x,t)=y_1(x,t)+y_2(x,t)=A[-\cos(kx + \omega t) + \cos(kx - \omega t)]
$$
Using thr trigonometric identity $\cos(a\pm b)=\cos a\cos b \mp\sin a\sin b$, we have
$$
y(x,t)=2A(\sin kx)(\sin\omega t).
$$
Positions for which $y(x,t)$ can be $0$ is called nodes, they are $x$’s such that $\sin kx=0$:
$$
\begin{aligned}
x&=0,\frac{\pi}{k},\frac{2\pi}{k},\frac{3\pi}{k},\ldots\newline
&= 0, \frac{\lambda}{2},\frac{2\lambda}{2},\frac{3\lambda}{2},\ldots
\end{aligned}
$$
An oscillating string can have 2, 3, 4 or more nodes, depending on the initial shape applied. Suppose the string has length $L$. Because the right end is a node, $L$ must also be a multiple of $\lambda/2$, so $L=n\cdot(\lambda/2)$. The possible wavelengths are $\lambda_n=2L/n$ for $n=1,2,3,\ldots$. With $f_n=v/\lambda_n$, we have
$$
f_1=\frac{v}{2L},\qquad f_n=nf_1 (n=1,2,3,\ldots)
$$
$f_1$ is called the fundamental frequency, and $f_n$s are called harmonics, and the series is called a harmonic series.
A normal mode is a motion in which all particles of the system move sinusoidally with the same frequency, namely with multiples of $\lambda/2$. When a string in a musical instrument is struck or plucked, it will not produce any particular normal mode, but a superposition of many normal modes, with fundamental frequency $f_1$. This vibration would displace the surrounding air with the same frequency, giving a rich, complex tone.
Sound waves. Sound waves are longitudinal waves. If the wave is sinusoidal propagating in the $x$ direction, the displacement function is $y(x, t)=A\cos(kx - \omega t)$. But we can also express the wave in terms of pressure fluctuation. Let $p(x, t)$ be the pressure difference at $x$ from atmospheric pressure $p_a$. Under Hooke’s law, stress force (bulk stress) is proportional to strain or deformation (bulk strain) that it causes. The ratio is called the bulk modulus
$$
B = \frac{\text{Bulk stress}}{\text{Bulk strain}} = -\frac{\Delta p}{\Delta V/V}.
$$
Large $B$ means a less compressible medium. Consider a cylinder with cross-sectional area $S$ and length $\Delta x$. The volume displacement is
$$
\Delta V = S(y_2 - y_1) = S[y(x+\Delta x, t) - y(x, t)]
$$
In the limit as $\Delta x \to 0$,
At any time, the displacement is greatest (partial derivative is 0) where the pressure fluctuation is zero, and vice versa. Evaluate it we have $p(x, t)= BkA\sin(kx - \omega t)$. The constant $p_{\max}=BkA$ is called the pressure amplitude.
It can be derived that the speed of sound wave is $v = \sqrt{B/\rho}$ where $\rho$ is the density of the medium (mass per unit volume). With this formula, we can calculate speed of sound in various materials. The speed of sound is around 344 m/s in air (20℃), 1482 m/s in water (20℃), and 5941 m/s in steel.
Sound waves carry energies. We can calculate power as force times velocity:
$$
p(x,t)v(x,t) = B\omega k A^2\sin^2(kx - \omega t).
$$
The intensity $I$ is defined as the time average of $p(x,t)v(x,t)$. The average value of $\sin^2$ is $1/2$, so $I=(1/2)B\omega k A^2$. With $\omega = vk$ and $v^2=B/\rho$, we can also express intensity as
$$
I = \frac{1}{2}\sqrt{\rho B}\omega^2A^2 = \frac{p_{\max}^2}{2\rho v} = \frac{p_{\max}^2}{2\sqrt{\rho B}}.
$$
The sound intensity level $\beta$ is defined on a logarithmic scale as
$$
\beta = (10\,\mathrm{dB})\log\frac{I}{I_0}
$$
where $I_0$ is a reference intensity chosen to be $10^{-12} W/m^2$, approximately the threshold of human hearing at 1000 Hz. Intensity $10^{-11} W/m^2$ corresponds to $10$ dB, intensity $1 W/m^2$ corresponds to $120$ dB, intensity $10^2 W/m^2$ corresponds to $140$ dB, and so on.
Kelvin scale. The Celsius scale defines freezing temperature of pure water as 0℃ and boiling temperature as 100℃. However, even if two thermometers agree on 0℃ and 100℃, they may not agree on intermediate values. We would like to define a temperature scale that doesn’t depend on the properties of a particular material. It turns out that when we use gas thermometers to measure temperature with pressure, we can always extrapolate the plots of different gases to a single point -273.15℃ at which the absolute pressure of the gas would become zero. The Kelvin scale uses this point as zero temperature. It is Celsius scale shifted by -273.15:
$$
T_K = T_C + 273.15.
$$
To complete the definition, we need another point. The point is chosen to be the triple point of water, in which solid, liquid and vapor state of water can coexist, and is defined to have value $T_{\text{triple}}=273.16\mathrm{K}$. Using the relation $T_2/T_1 = p_2/p_1$, $T$ is given by
$$
T = T_{\text{triple}}\cdot(p/p_{\text{triple}}) = (273.16\mathrm{K})\cdot(p/p_{\text{triple}}).
$$
Thermal expansion. For a rod with length $L_0$ and temperature $T_0$, experiments show that if $\Delta T$ is not too large, $\Delta L/L$ is directly proportional to $\Delta T$:
$$
\Delta L / L = \alpha\Delta T
$$
The constant $\alpha$ is called coefficient of linear expansion, which is typically very small, on the magnitude of $10^{-5}$. Similarly, volumes of materials can also expand or contract when temperature changes, and we also have coefficient of volume expansion for different materials.
$$
\Delta V / V = \beta\Delta T
$$
We have the relation $\beta = 3\alpha$. There are two practical phenomena to note. First, when temperature drops from 4℃ to 0℃, water increases in volume. This makes lakes freeze from top to bottom, instead of the other way around. Second, under temperature change, thermal stress can develop. Engineers must account for thermal stress when designing structures. That’s why concrete bridges usually have interlocking teeth, or long pipes have expansion joints.
Heat. In physics, heat is defined as energy transfer solely due to temperature difference. We can define a unit of heat based on temperature change of some specific material. The calorie is defined as the amount of heat required to raise the temperature of 1g of water from 14.5℃ to 15.5℃. We have the relation $1\text{ cal} = 4.186\mathrm{J}$. Experiments show that
$$
\Delta Q = m\cdot c\cdot \Delta T
$$
Where $\Delta Q$ is heat, $m$ is mass, and the constant $c$ is called the specific heat of the material. It is calculated as
$$
c = \frac{1}{m}\frac{dQ}{dT}.
$$
If we express mass in terms of molar mass $m=nM$, then the constant $Mc$ is called the molar heat capacity. For liquid water the value is 75.4.
We know that materials will undergo phase transition when temperature changes. The heat required per unit mass to change material from solid state to liquid state is called heat of fusion. For water it is 79.6 cal/g. The process is reversible. Similarly, we have heat of vaporization, and for water it is 539 cal/g. Chemical reactions such as combustion are analogous to heat exchanges in that they involves definite quantities of heat. We can similarly define heat of combustion as the heat produced for complete combustion of 1g of materials.
Heat can be transferred by conduction, convection and radiation. For conduction, experiments show that
$$
H=\frac{dQ}{dt} = kA\frac{dT}{dx}
$$
where $A$ is contact area, and $k$ is called the thermal conductivity. The thermal conductivity of dead air is very small (~0.02), that’s why animal’s fur keep them warm. By contrast, metals have large thermal conductivities (~100) because the free electrons can carry away heats, so metals feel colder than many other materials.
Convection is the transfer of heat by mass motion of fluids from one region to another, for example the flow of blood in the body. Radiation is transfer of heat by electromagnetic waves. Radiation requires no medium, and can still occur in vacuum. The formula for radiation is
$$
H = Ae\sigma T^4
$$
where $A$ is the surface area, $e$ is emissivity, and $\sigma$ is the Stefan-Boltzman constant.
We aim to derive the relations among properties of matters, $p$, $V$ and $T$. Here is a simple one for a solid material. By thermal expansion,
$$
V = V_0[1 + \beta(T - T_0) - k(p - p_0)].
$$
An ideal gas is a situation with the following assumptions: (1) no intermolecular forces; (2) elastic collisions; (3) negligible molecular volume; (4) random motion. Experiments show that
volume $V$ is proportional to the number of moles $n$;
volume $V$ varies inversely with pressure $p$. In other words, $pV$ is constant;
pressure $p$ is proportional to absolute temperature $T$.
We then have the ideal-gas equation
$$
pV = nRT
$$
where $R=8.314\mathrm{J/mol\cdot K}$ is a constant.
From $m=nM$ where $M$ is mass per mole, we can calculate gas density as $\rho=m/V=pM/RT$. When mass is held constant, we have the relation $p_1V_1/T_1 = p_2V_2/T_2 = c$. We can for example use this relation to calculate temperature in automobile engine combustion chamber before ignition. The van der Waals equation adds corrections for the equation:
$$
\left(p + \frac{an^2}{V^2}\right)(V - nb) = nRT
$$
where $a$ represents attractive intermolecular forces and $b$ represents the volume of a mole of molecules.
For a constant amount of ideal gas, we can plot its $pV$-diagram of isotherms, or constant-temperature curves. If $T_2>T_1$, then $T_2$ will be above $T_1$. The plot resembles plot of utility curves $u(x_1, x_2)$ of two goods in economics.
We now develop a simple kinetic-molecular model. The goal is to derive molecular speed in terms of thermal properties. Assume all molecules have the same magnitude of $x$-velocity $|v_x|$. When a molecule collide with the container wall, the $x$-momentum changes from $-m|v_x|$ to $m|v_x|$, for a total change of $2m|v_x|$. Consider an area $A$ on the wall during time $dt$. The volume of the cylinder is $A|v_x|dt$. Assuming the number of molecules per unit volume $N/V$ is uniform, the number of molecules in the cylinder is $(N/V)(A|v_x|dt)$. On the average, half of these molecules are moving toward the wall and half are moving away from it. So the number of collisions with $A$ during $dt$ is $1/2$ that number. The total momentum change $dP_x$ during $dt$ is the number of collisions times $2m|v_x|$, which is $NAmv_x^2dt/V$, so $F=dP_x/dt=NAmv_x^2/V$. The pressure is then the force per unit area, $p=F/A$. We assume that on average, the three components in average speed $v=(v_x, v_y, v_z)$ are equal in magnitude, so $v_x^2=\frac{1}{3}v^2$. We now have
$$
pV = \frac{1}{3}Nmv^2 = \frac{2}{3}N\left[\frac{1}{2}mv^2\right].
$$
The term in the bracket is the average translational kinetic energy of a single molecule. So its product with $N$ is the total translational kinetic energy of all molecules. Denote this quantity by $K_{\mathrm{tr}}$, we have $pV = \frac{2}{3}K_{\mathrm{tr}}$. From the ideal gas equation
$$
pV = nRT
$$
we get $K_{\mathrm{tr}} = \frac{3}{2}nRT$. The kinetic energy is directly proportional to temperature $T$. On a per molecule basis, we have
$$
\frac{1}{2}mv^2 = \frac{3}{2}kT
$$
where $k=R/N_A = 1.381\cdot10^{-23}\mathrm{J/mol\cdot K}$ is the Boltzmann constant, $N_A$ is the Avogadro’s number with $N=nN_A$. With $M= N_Am$, we get
$$
\frac{1}{2}Mv^2 = \frac{3}{2}RT.
$$
From these equations, we can get the formula for molecular speed, called root-mean-square speed
$$
v_{\mathrm{rms}} = \sqrt{v^2} = \sqrt{\frac{3kT}{m}} = \sqrt{\frac{3RT}{M}}.
$$
For a given temperature $T$, the smaller the mass, the greater the speed. Hydrogen has the smallest mass per mole $M=2\mathrm{g/mol}$, and its speed is greater than earth’s escape speed of ~$10^4\mathrm{m/s}$, so there is little hydrogen in the earth’s atmosphere.
Heat capacity. In the ideal gas model, $K_{\mathrm{tr}}$ is assumed to represent the total molecular energy, so $dQ=dK_{\mathrm{tr}}$. We can use this relationship to derive heat capacity:
$$
dQ = nCdT = \frac{3}{2}nRdT \quad\Rightarrow\quad C=\frac{3}{2}R = 12.47\mathrm{J/mol\cdot K}.
$$
The prediction is accurate for monatomic gases like He and Ar, but diatomic gases like H2 and O2 and polyatomic gases like CO2 and SO2 have larger heat capacities, because in addition to translational motion, these molecules also undergo rotational motions and vibrational motions. The principle of equipartition of energy says that each velocity component (either linear or angular) has, on average, an associated kinetic energy per molecule of $\frac{1}{2}kT$. Adding two rotational components we have $C=\frac{5}{2}R=20.79\mathrm{J/mol\cdot K}$ for diatomic gases. For solid materials, with a 3d spring model, each atom has an average kinetic energy of $\frac{3}{2}kT$ and an average potential energy of $\frac{3}{2}kT$, for a total of $3kT$ per atom. The heat capacity is predicted to be $C=3R=25\mathrm{J/mol\cdot K}$. This is called the rule of Dulong and Petit. Experiment data shows that heat capacities of several materials do converge to this value as $T$ increases.
The Maxwell–Boltzmann Distribution. The uniform speed assumption in our simple model can be relaxed. Using techniques in statistical mechanics, the distribution of molecular speeds can be derived as
$$
f(v) = 4\pi\left(\frac{m}{2\pi kT}\right)^{3/2}v^2e^{-mv^2/2kT}.
$$
The average speed is then
$$
v_{\mathrm{av}} = \int_0^\infty vf(v)dv = \sqrt{\frac{8kT}{\pi m}}
$$
and the average of $v^2$ is
$$
v^2_{\mathrm{av}} = \int_0^\infty v^2f(v)dv = \frac{3kT}{m}.
$$
This result agrees with our earlier value of $v_{\mathrm{rms}}$.
Consider a cylinder with a piston. When the piston moves out a small distance $dx$, the work done $dW$ by the force $F=pA$ is
$$
dW = F\,dx = pA\,dx = p\,dV
$$
and so
$$
W = \int_{V_1}^{V_2}p\,dV.
$$
In other words, to find the work done, we integrate the pressure curve $p$ as a function of $V$ in the $pV$-diagram. However, from an initial state $(p_1, V_1, T_1)$ to a final state $(p_2, V_2, T_2)$, there can be many possible paths, and each path may have a different area under the curve. So the work done by the system depends not only on the initial and final states, but also on the path.
The internal energy $U$ of a system is the sum of all kinetic energies and all potential energies of all individual particles. The first law of thermodynamics states that change in internal energy $\Delta U$ is equal to heat $Q$ added to the system minus the work done by the system to its surroundings
$$
U_2 - U_1 = \Delta U = Q - W.
$$
Although $W$ depends on the path, experiments show that $\Delta U$ is independent of path. In differential form, the equation is $dU = dQ - dW$.
Special cases of the equation:
an adiabatic process is one with no heat transfer in and out of a system: $Q=0$. In this case $\Delta U = -W$.
an isochoric process is a constant-volume process. When the volume is constant, it does no work on its surroundings, so $W=0$ and $\Delta U = Q$.
an isobaric process is a constant-pressure process. In this case the work done is $W=p(V_2 - V_1)$.
an isothermal process is a constant-temperature process. In general, none of the quantities $\Delta U$, $Q$ or $W$ is zero in an isothermal process.
For an ideal gas, its internal energy only depends on temperature $T$, but not volume or pressure. Now we will derive work done for an idiabatic ideal gas.
If volume is fixed, then $dW=0$ so $dU = dQ - 0 = dQ = nC_VdT$ where $C_V$ is molar heat capacity at constant volume. The equation holds also when volume is not constant, because $U$ only depends on $T$ for an ideal gas.
If pressure is fixed, then $dQ=nC_pdT$ and $dW = p\,dV = nR\,dT$, where $C_p$ is the molar heat capacity at constant pressure. From $dQ=dU+dW$ we have
$$
nC_p\,dT = nC_V\,dT + nR\,dT\newline
\Rightarrow C_p = C_V + R
$$
We define $\gamma=C_p/C_V$ as the ratio of heat capacities. For an adiabatic process, $dU=-dW$, and
$$
nC_VdT = -p\,dV
$$
Substituting the ideal gas equation $p=nRT/V$, we have
$$
nC_VdT = -\frac{nRT}{V}\,dV
$$
$$
\frac{dT}{T} + \frac{R}{C_V}\frac{dV}{V} = 0
$$
And from above we know
$$
\frac{R}{C_V} = \frac{C_p - C_V}{C_V} = \frac{C_p}{C_V} - 1 = \gamma - 1.
$$
So the equation becomes
$$
\frac{dT}{T} + (\gamma-1)\frac{dV}{V} = 0.
$$
$(\gamma-1)$ is always positive for an ideal gas, so $dT$ and $dV$ always have opposite signs. An expansion always occurs with a drop in temperature, and a compression occurs with a rise in temperature.
Integrate the equation we have
$$
\begin{gathered}
\ln T + (\gamma-1)\ln V &= \text{constant}\newline
\ln T + \ln V^{\gamma-1} &= \text{constant}\newline
\ln TV^{\gamma-1} &= \text{constant}\newline
TV^{\gamma-1} &= \text{constant}\newline
\end{gathered}
$$
So for an initial state $(T_1, V_1)$ and final state $(T_2,V_2)$, we have the relation
$$
T_1V_1^{\gamma-1} = T_2V_2^{\gamma-1},
$$
which allows us to derive quantity of interest in practical problems. Similarly, we can substitute the ideal gas equation $T=pV/nR$ and get $pV^\gamma=\text{constant}$, and this implies
$$
p_1V_1^\gamma=p_2V_2^\gamma
$$
for initial state $(p_1,V_1)$ and final state $(p_2,V_2)$.
Again using the ideal gas equation, the work done by an ideal gas during an adiabatic process is
$$
W = -\Delta U = nC_V(T_2-T_1) = \frac{C_V}{R}(p_1V_1-p_2V_2)=\frac{1}{\gamma-1}(p_1V_1-p_2V_2).
$$
Application of equations
The compression ratio of a diesel engine is 15 to 1; air is compressed 1/15 of its initial volume. If the initial temperature is 27℃ (300K) and initial pressure is $1.01\times10^5$ Pa, find the finial pressure and temperature, and work done after compression. Volume of cylinder is 1L, $C_V=20.8$ and $\gamma=1.4$ for air.
Solution.
$$
\begin{aligned}
T_2 &= T_1\left(\frac{V_1}{V_2}\right)^{\gamma-1} = (300\mathrm{K})\cdot 15^{0.4}=886\mathrm{K}=613℃.\newline
p_2 &= p_1\left(\frac{V_1}{V_2}\right)^{\gamma} = 44.8\times10^5\text{ Pa}=44\text{ atm.}
\end{aligned}
$$
Similarly, the work done can be calculated as $W=-494\text{ J}$.
Reversible and irreversible processes. A reversible process is one whose direction can be reversed by an infinitesimal change in the conditions of the process, and in which the system is always in or very close to thermal equilibrium. All other thermodynamic processes are irreversible.
Heat engine. Any device that transforms heat partly into work/mechanical energy is called a heat engine. Examples include car engines, jet engines, and steam turbines. The medium that carries the heat transfer in the engine is called the working substance. Let $\mathrm{H}$ denote the hot source and $\mathrm{C}$ the cold source, the thermal efficiency $e$ is defined as work produced divided by heat from the hot source. With $W=Q_\mathrm{H}+Q_\mathrm{C}=|Q_\mathrm{H}|-|Q_\mathrm{C}|$,
$$
e = \frac{W}{Q_\mathrm{H}} = 1 + \frac{Q_\mathrm{C}}{Q_\mathrm{H}}= 1 - \left|\frac{Q_\mathrm{C}}{Q_\mathrm{H}}\right|.
$$
For internal combustion engines, it can be calculated that in an idealized model called the Otto cycle, the thermal efficiency formula is
$$
e = 1 - \frac{1}{r^{\gamma-1}}
$$
where $r$ is the compression ratio and $\gamma$ is the ratio of heat capacities of the working substance. With $r=8$ and $\gamma=1.4$, $e=0.56$. Real gasoline engines typically have $e\approx0.35$, or 35%.
Refrigerators. While burning fuels can convert heat to mechanical work, a refrigerator works in the opposite direction and use mechanical work to transfer heat from cold places to hot places. The working substance (coolant), such as Freon, gains heat $Q_\mathrm{C}$ at the cold place by work input $-W$, and discards $Q_\mathrm{H}$ at the hot place. We have the relation $|Q_\mathrm{H}|=Q_C+W$. The coefficient of performance $K$ of a refrigerator is heat removed divided by work done
$$
K = \frac{|Q_\mathrm{C}|}{W}.
$$
A refrigerator works by compressing the substance into a thin tube (condenser) using a compressor, rasing its pressure and temperature (so that the tube gives off heat to surroundings), then letting it go through an expansion valve to the evaporator so than the fluid’s pressure and temperature drop considerably, absorbing heat from inside the refrigerator. The fluid then enters the compressor to begin another cycle. An air conditioner operates on the same principle. A heat pump, which is essentially a refrigerator turned inside out, is used to heat buildings by cooling the outside air.
The second law. The first law is a statement on conservation of energy, but it doesn’t have any favor on the direction of thermodynamic processes. The second law says the direction is always towards disorder. There are many equivalent statements of the second law:
heat always flows spontaneously from hotter to colder regions.
heat cannot be converted 100% into work.
heat cannot be transferred from a colder place to a hotter place without work.
the entropy of an isolated system is non-decreasing.
Entropy. To define entropy, consider an infinitesimal isothermal expansion of an ideal gas. Because $T$ is constant, we have $dU=dQ-dW=0$ and so
$$
dQ = dW = p\,dV = \frac{nRT}{V}\,dV\quad\Rightarrow\quad \frac{dV}{V}=\frac{1}{nR}\frac{dQ}{T}
$$
The fractional volume change $dV/V$ can be a measure of disorder, so we define entropy $S$ as
$$
dS = \frac{dQ}{T}\quad\text{(infinitesimal reversible process)}
$$
We can generalize the definition to any reversible process to be
$$
\Delta S = \int_1^2\frac{dQ}{T}
$$
where $1$ and $2$ refer to the initial and final states.
We can also define entropy in terms of microscopic states. For any system, the most probable macroscopic state, which is also the one with the greatest disorder and the greatest entropy, is the one with the greatest number of corresponding microscopic states, denoted by $w$. Entropy is defined as
$$
S = k\ln w
$$
where $k=R/N_A$ is the Boltzmann constant.
Coulomb’s law. Experiments show that the electric force between two charges iis inversely proportional to the squared distance.
$$
F = k\frac{|q_1q_2|}{r^2}
$$
where $k\approx 9\cdot10^9$. It is convenient to express the constant as
$$
F = \frac{1}{4\pi\epsilon_0}\frac{|q_1q_2|}{r^2}
$$
where $\epsilon_0\approx8.9\cdot10^{-12}$.
Electric Field. Electric field $\bm{E}$ at a point is defined as the electric force experienced by a test charge $q_0$, divided by $q_0$: $\bm{E} = \bm{F}/q_0$. Electric fields obey linearity: the total electric field at point $x$ is the sum of all electric fields at $x$: $\bm{E}=\bm{E_1}+\cdots+\bm{E_n}$.
A special configuration is an electric dipole, which is a pair of point charges $q$ and $-q$ separated by distance $d$. An example is the water molecule. There’s a net negative charge on the oxygen end and a net positive charge on the hydrogen end. In a field $\bm{E}$, a dipole will experience a torque $\tau=(qE)(d\sin\phi)$ where $\phi$ is the angle between the line segment of the dipole and the direction of the external field $\bm{E}$. Define $p=qd$ as the electric dipole moment, we have $\bm{\tau}=\bm{p}\times\bm{E}$. Integrate the relation $dW=\tau d\phi$, we have
$$
W = \int_{\phi_1}^{\phi_2}(-pE\sin\phi)d\phi = pE\cos\phi_2 - pE\cos\phi_1
$$
We can define the potential energy of the dipole as $U(\phi)=-pE\cos\phi = -\bm{p}\cdot\bm{E}$ so that $W=U_1-U_2$.
To motivate Gauss’s law, suppose we want to know how many electric charges are enclosed in a box. Although we are not allowed to look inside the box, we can use a test charge to move around the box to detect the electric field. If there’s a single charge inside, then we will experience $\bm{E}=\bm{F}/q_0$. If there is a negative charge, then we will experience $-\bm{E}$. If there is none, then we should experience no net force after moving around all the surfaces around the box.
The electric flux of field $\bm{E}$ over a surface $A$ is defined as
$$
\Phi_E = \int E\cos\phi dA = \int E_{\perp} dA = \int \bm{E}\cdot d\bm{A}.
$$
where $\bm{A}=A\bm{n}$ is the normal vector of the surface multiplied by area.
Gauss’s law says that the total electric flux through a closed surface is equal to the total (net) electric charge inside the surface, divided by $\epsilon_0$.
$$
\Phi_E = \oint\bm{E}\cdot d\bm{A} = \frac{Q}{\epsilon_0}.
$$
Gauss’s law is an equivalent form of Coulomb’s law. We can use Gauss’s law to calculate electric field produced by various configurations of interest. For example, consider a point charge inside a sphere. The electric field is $E(r)=(1/4\pi\epsilon_0)q/r^2$. A longer radius $r$ will have a greater surface area $A(r)=4\pi r^2$, but the electric field will also be weaker by a factor of $r^2$. The net flux is $\Phi_E=E(r)A(r)=q/\epsilon_0$, which is independent of $r$.
Here is a list of electric fields of some basic configurations, calculated from Gauss’s law.
single point charge $q$: $\displaystyle E(r) = \frac{1}{4\pi\epsilon_0}\frac{q}{r^2}$.
infinite wire, charge per unit length $\lambda$: $\displaystyle E(r) = \frac{1}{2\pi\epsilon_0}\frac{\lambda}{r}$.
infinite sheet of charge with charge per unit area $\sigma$: $\displaystyle E\equiv\frac{\sigma}{2\epsilon_0}$.
Two oppositely charged conducting plates with surface densities $+\sigma$ and $-\sigma$: $\displaystyle E\equiv\sigma/\epsilon_0$.
charged conductors: $\displaystyle E \equiv \sigma/\epsilon_0$.
Electric potential is the integral of electric force over a distance. It represents the energy or work within a configuration. Consider a charge $q$. When a test charge $q_0$ moves from point $a$ to point $b$, the work done on $q_0$ is
$$
W_{a\to b} = \int_{r_a}^{r_b}Fdr = \int_{r_a}^{r_b}\frac{1}{4\pi\epsilon_0}\frac{qq_0}{r^2}dr = \frac{qq_0}{4\pi\epsilon_0}\left(\frac{1}{r_a} - \frac{1}{r_b}\right)
$$
So we define the electric potential energy of two point charges $q$ and $q_9$ as
$$
U = \frac{1}{4\pi\epsilon_0}\frac{qq_0}{r}
$$
and $W_{a\to b} = U_a - U_b$. Note when $q$ and $q_0$ have the same sign, the potential energy $U(r)$ is decreasing in $r$. $U\to0$ as $r\to\infty$ and $U$ increases to infinity as the distance $r$ shrinks to zero. If they have opposite signs, then $U(r)$ is negative and increasing in $r$. It increases to $0$ as $r\to\infty$ and decreases to $-\infty$ as $r\to0$.
Electric potential energy is also linear. For configuration $\{q_1,\ldots,q_n\}$, we have
$$
U = \frac{q_0}{4\pi\epsilon_0}\left(\frac{q_1}{r_1}+\cdots+\frac{q_n}{r_n}\right).
$$
For a continuous distribution of $q$, we replace the sum by integral $\int dq/r$.
Potential is potential energy per unit charge.
$$
V = \frac{U}{q_0}.
$$
The unit of potential is volt.
$$
1\,\mathrm{V} = 1\text{ volt} = 1\,\mathrm{J/C} = 1\text{ joule/coulomb}.
$$
1 electron volt (eV) is defined as the amount of energy an electron gains or loses when it moves through an electric potential difference of 1 volt. The value is
$$
1\,\mathrm{eV}=U_a - U_b = q\cdot V_{ab} = (1.602\cdot10^{-19}\mathrm{C})\cdot(1V)=1.602\cdot10^{-19}\mathrm{J}
$$
The potential difference between two points is often called voltage. Since electric field is force per unit charge, potential difference is essentially integral of electric field. Conversely, electric field is the gradient of potential:
$$
V_a - V_b = \int_a^b\bm{E}\cdot d\bm{l} \quad\Leftrightarrow\quad \bm{E} = -\nabla V.
$$
Capacitors. Electric force acts without medium. So if we place two metal plates near each other, and push electrons onto the left plate, those electrons will repeal electrons on the right plate, leaving it with a net positive charge, thus forming an electric field. The capacitance $C$ is defined as the ratio of charge to potential difference
$$
C = \frac{Q}{V}
$$
The unit of capacitance is farad (F), with 1 F = 1 farad = 1 $\mathrm{C/V}$ = 1 coulomb/volt. For two plates in vacuum with area $A$ and distance $d$ apart, the electric field is $E=\sigma/\epsilon_0=Q/\epsilon_0A$, where $\sigma=Q/A$, while $V_{ab}=Ed=(1/\epsilon_0)(Qd/A)$, so in this case $C=Q/V_{ab}=\epsilon_0\cdot A/d$.
From the definition, we can derive that, for two or more capacitors $\{C_1,\ldots,C_n\}$,
if they are connected in series, each capacitor will obtain the same charge $Q$, and their total capacitance $C_{\mathrm{eq}}$ follows
$$
\frac{1}{C_{\mathrm{eq}}} = \frac{1}{C_1}+\frac{1}{C_2}+\cdots+\frac{1}{C_n}.
$$
if they are connected in parallel, each capacitor will share the same voltage $V$, and their total capacitance $C_{\mathrm{eq}}$ follows
$$
C_{\mathrm{eq}} = C_1 + C_2 + \cdots + C_n.
$$
We can calculate energy stored in capacitors. The infinitesimal energy required to transfer charge $dq$ is
$$
dW = Vdq = \frac{qdq}{C}
$$
So the total work $W$ required to increase the capacitor charge $q$ from zero to $Q$ is
$$
W = \frac{1}{C}\int_0^Q qdq = \frac{Q^2}{2C}.
$$
So we define the potential energy stored in a capacitor as
$$
U = \frac{Q^2}{2C} = \frac{1}{2}CV^2 = \frac{1}{2}QV.
$$
We define the energy density $u$ as energy per unit volume
$$
u = \frac{1}{2}CV^2 / Ad = \frac{1}{2}\epsilon_9E^2.
$$
In practice, capacitors are made by inserting a nonconducting material called dielectric in between the two plates, because the molecules in the material can polarize during charging and thus help transfer energy, resulting in a greater capacitance.
Current and Resistance. Current is motion of charge from one region to another. Electrons in conductors undergo random motions with velocity of magnitude $10^6$ m/s. When we apply an electric field, electrons slowly drift in the direction of the field in addition to random motions. The drift velocity $v_d$ is actually very small, only in the magnitude of $10^{-4}$ m/s. Much of the work done by the electric field goes into heating the conductor, not into making the moving charges move ever faster.
We define the current through a cross-sectional area $A$ to be the net charge flowing through the area per unit time
$$
I = \frac{dQ}{dt}.
$$
The SI unit of current is ampere $\mathrm{A}$ ($1\,\mathrm{A}= 1\,\mathrm{C/s}$).
Let $n$ be number of moving charges per unit volume. It can be shown that the current density $\bm{J}=nq\bm{v}_d$ has magnitude $J=I/A = nqv_d$. For certain materials like metals, at a given temperature, $\bm{J}$ is nearly directly proportional to $\bm{E}$. This is the Ohm’s law. We define the resistivity $\rho$ of a material as the ratio of the magnitudes of electric field and current density:
$$
\rho = \frac{E}{J}
$$
Its unit is $\Omega\cdot m$. The reciprocal of resistivity is conductivity. There’s correlation between electrical and thermal conductivity, because electrons carrying electrical conduction also provide the principal mechanism for heat conduction.
For metals, resistivity almost always increases with increasing temperature $T$, with approximate equation $\rho(T)=\rho_0[1+\alpha(T-T_0)]$ where $\alpha>0$. For semiconductors, resistivity decreases with increasing temperature. Some metallic alloys exhibit superconductivity below a critical temperature $T_c$. Once a current has been established in a superconducting
ring, it continues indefinitely without the presence of any driving field. The current maximum critical temperature discovered is around 133 K (-140℃).
For a conductor with cross-sectional area $A$ and length $L$, the current is $I=JA$ and the potential difference is $V=EL$, so we have
$$
V = \rho J\cdot L = \rho\frac{I}{A}\cdot L = \frac{\rho L}{A} I.
$$
We define $R=\rho L/A$ as the resistance of the conductor, with unit ohm ($\Omega$). The relation
$$
V = IR \quad\text{or}\quad R = \frac{V}{I}
$$
is also called Ohm’s law. Ohm’s law says the flow rate is approximately proportional to the pressure difference between the ends. Higher voltage results in larger current.
An important device that does not obey the Ohm’s law is the semiconductor diode. It is not bidirectional. Only positive voltage can cause current flow, not the negative one.
For resistors $\{R_1,\ldots,R_n\}$ connected in series, they have only one current and share end voltage $V$. We have $V_i=IR_i$ for $i=1,\ldots,n$, so
$$
IR_{\mathrm{eq}}=V=V_1+\cdots+V_n=I(R_1+\cdots+R_n).
$$
The equivalent resistance is the sum of all resistances.
$$
R_{\mathrm{eq}} = R_1+\cdots+R_n.
$$
For resistors connected in parallel, they share the same voltage while having different currents, so $I_i=V/R_i$ for $i=1,\ldots,n$ and
$$
\frac{V}{R_\mathrm{eq}}=I = I_1+\cdots+I_n = V\left(\frac{1}{R_1}+\cdots+\frac{1}{R_n}\right)
$$
In this case we have the relation
$$
\frac{1}{R_\mathrm{eq}} = \left(\frac{1}{R_1}+\cdots+\frac{1}{R_n}\right).
$$
emf. In a complete circuit, for current to flow, there must be a source of energy for continuously pushing electrons uphills from lower potential to higher potential. In physics, this source is called electromotive force (emf, denoted as $\mathcal{E}$), though it is a misnomer because it is really an energy per unit charge quantity.
Ideally, $V=\mathcal{E}IR$, but there may be internal resistance $r$ in the emf. Taking that into account, we have the relation $V=\mathcal{E}-Ir$. where $V$ is called the terminal voltage.
Power. Power is the time rate of energy transfer. The potential energy change for $dQ$ passing through an element is $VdQ = VIdt$, so power is voltage times current:
$$
P = VI = I^2R = \frac{V^2}{R}
$$
with unit $(1\mathrm{J/C})(1\mathrm{C/s})=1\mathrm{J/s}=1\mathrm{W}$. Utility companies quote electricity price by the energy unit kW·h. By reading the power of an electronic device, and knowing the price per kW·h, you’re able to calculate the cost of using the appliance by time.
Kirchhoff’s Rules. Kirchhoff’s rules can be used to analyze complex circuits. Quantities of interest, like current or voltage of some particular elements in the circuit, can be derived by solving a set of linear equations. A junction in a circuit is a point where three or more conductors meet. A loop is any closed conducting path.
(Kirchhoff’s junction rule) The algebraic sum of the currents into any junction is zero.
$$
\sum I = 0.
$$
(Kirchhoff’s loop rule) The algebraic sum of the potential differences in any loop must equal zero.
$$
\sum V = 0.
$$
R-C circuit. An R-C circuit consists of a battery $\mathcal{E}$, a switch, a resistor $R$ and a capacitor $C$ connected in series. It is the simplest circuit in which currents, voltages and powers change with time. We would like to derive the charge of the capacitor as well as the current in the circuit as functions of time.
When capacitor is charging. By Kirchhoff’s loop rule
$$
\mathcal{E} - i(t)R - \frac{q(t)}{C} = 0
$$
so the current over time is
$$
i(t) = \frac{\mathcal{E}}{R} - \frac{q(t)}{RC}.
$$
When the capacitor is fully charged with total charge $Q_f$, the current becomes zero, so we have
$$
\frac{\mathcal{E}}{R} = \frac{Q_f}{RC}\quad\Rightarrow\quad Q_f = C\mathcal{E}
$$
Now $i(t) = q'(t)$, so we actually have a differential equation. The derivative of $q$ is equal to a constant times itself plus some constant, so the solution is an exponential function plus some constant. It can be worked out as
$$
q(t) = C\mathcal{E}(1 - e^{-t/RC}) = Q_f(1 - e^{-t/RC})
$$
and the current as a function of time is
$$
i(t) = q'(t) = \frac{\mathcal{E}}{R}e^{-t/RC} = I_0e^{-t/RC}.
$$
The current decreases exponentially with time, while charge on the capacitor increase exponentially. The constant $\tau=RC$ is called the time constant, or relaxation time of the R-C circuit. When $\tau$ is small, the capacitor charges quickly; when it is larger, the charging takes more time.
When the capacitor is discharging. In this case $\mathcal{E}$ is simply $0$, and we have the differential equation
$$
i(t) = q'(t) = -\frac{1}{RC}q(t).
$$
The solution is
$$
q(t) = Q_0e^{-t/RC}\quad\text{and}\quad q'(t) = -\frac{Q_0}{RC}e^{-t/RC} = I_0e^{-t/RC}.
$$
Experiments show that moving charges create magnetic field $\bm{B}$ that will act on moving charges. Even the attraction and repulsion of two magnets is due to interactions between moving electrons in the atoms of the bodies. We define the direction of $\bm{B}$ at any point in space as the direction of the north pole of a compass at that point. The larger the velocity $\bm{v}$, the larger the observed force $\bm{F}$. The direction of the force is perpendicular to the plane containing $\bm{v}$ and $\bm{B}$. We observe in experiment the following relation
$$
\bm{F} = q\bm{v}\times \bm{B}
$$
$\bm{F}$ has magnitude of $|q|vB\sin\phi$, where $\phi$ is the angle between $\bm{v}$ and $\bm{B}$. In general, charges in magnetic field tend to move in circular or helix path around the magnetic field lines. From the equation, the unit of $B$ is $1\mathrm{N\cdot s/C\cdot m}=1\mathrm{N/A\cdot m} = 1\mathrm{T}$ and it is called a tesla (T). Earth’s magnetic field has magnitude of 0.0001T, the interior of atoms has 10T. The largest magnetic field that can be produced in lab is about 45T.
To date, magnetic monopole has not been discovered. North poles and south poles always come in pairs, and magnetic field lines always form loops. Thus the magnetic flux through a closed surface is always zero. This is called the Gauss’s law for magnetism.
$$
\Phi_B = \oint\bm{B}\cdot d\bm{A} = 0.
$$
Note that the direction of the normal surface vector $d\bm{A}$ always points outward on the surface. The unit of magnetic flux is unit of magnetic field (T) times unit of area ($m^2$), which is called a weber (Wb). $1\mathrm{Wb}=1\mathrm{T\cdot m^2}=1\mathrm{N\cdot m/A}$.
Application: Measuring particle masses
We can use an electric field and a magnetic field together to build a velocity selector. We can create a left-pointing $\bm{E}$, then add a perpendicular $\bm{B}$ so that charges have tendency to move to the right. Particles will move in straight line only when $-qE + qvB=0$, i.e.
$$
v=E/B.
$$
On the other hand, if we shoot electrons using potential difference $V$, then equating kinetic energy with potential energy we have
$$
\frac{1}{2}mv^2 = eV \quad\Rightarrow\quad v = \sqrt{\frac{2eV}{m}}
$$
Equating the two equations for speed, we get
$$
E/B = \sqrt{\frac{2eV}{m}} \quad\Rightarrow\quad e/m=\frac{E^2}{2VB^2}.
$$
This way, we can determine the ratio of charge to mass for electrons to be $e/m=1.76\cdot10^{11}\mathrm{C}/kg$. The oil drop experiment measured the charge of a single electron. This enables us to derive the mass of an electron to be
$$
m = 9.1\cdot10^{-31}\mathrm{kg}
$$
Magnetic field on loops. when charges pass through a uniform magnetic field, the field tends to rotate the charges in circles. This means if we place a current loop in the field, although the net force is zero, the field tends to impose a restorative torque on it whenever the normal direction of the loop is not aligned with the field direction.
Let’s work out the torque on a loop with side lengths $a$ and $b$ and area $A=ab$. The magnetic force on a line segment $d\bm{l}$ is $d\bm{F}=Id\bm{l}\times\bm{B}$. If the two sides with length $a$ are perpendicular to $\bm{B}$, then the forces on them are $(IaB)$ and $(-IaB)$ respectively. The torque has magnitude $\tau = 2F(b/2)\sin\phi = IBA\sin\phi$, where $\phi$ is the angle between the normal vector of the area $A$ and direction of $\bm{B}$. Call the normal area vector with norm $IA$ as magnetic (dipole) moment $\mu$, we get the torque formula
$$
\bm{\tau} = \bm{\mu}\times\bm{B}
$$
A current loop, or any other body that experiences a magnetic torque, is called a magnetic dipole. Analogous to electric dipoles, the potential energy for magnetic dipole is $U = -\bm{\mu}\cdot\bm{B}$.
The torque has a tendency to align the magnetic moment $\bm{\mu}$ with the direction of the magnetic field $\bm{B}$. This explains why a bar magnet tends to align with field lines when placed in a magnetic field, because different from most other atoms, iron atoms have nonzero magnetic moments, and in a bar magnet they align with each other to form a net magnet moment $\bm{\mu}$ pointing from south pole to north pole. Finally, we mention that we can utilize this property of magnetic field acting on current loops to build electric motors.
Magnetic fields sources. For a positive charge moving at a constant velocity $\bm{v}$ along a straight line, experiments show that it produces a magnetic field. The magnitude of the field at distance $r$ is proportional to $|q|$ and $1/r^2$, and the field circles around the path clockwise when viewed from behind. Mathematically,
$$
\bm{B} = \frac{\mu_0}{4\pi}\frac{q\bm{v}\times\bm{r}}{r^2}
$$
where $\mu_0=4\pi\times10^{-7}\mathrm{T\cdot m/A}$. For a current segment $d\bm{l}$, the magnetic field it produces is the sum of the magnetic fields produced by all charges.
$$
d\bm{B} = \frac{\mu_0}{4\pi}\frac{Id\bm{l}\times\bm{r}}{r^2}
$$
This equation is called the law of Biot and Savart. To find the magnetic field at any point in space produced by a complete circuit, we integrate over all line segments.
$$
\bm{B} = \frac{\mu_0}{4\pi}\int\frac{Id\bm{l}\times\bm{r}}{r^2}
$$
For an infinitely long, straight current-carrying conductor, through integration by trigonometric substitution it can be worked out that the magnitude of the generated magnetic field is
$$
B = \frac{\mu_0I}{2\pi r}
$$
Two parallel current conductors with the same current direction will attract each other, and if the current direction is opposite, they will repel each other. This force is used to define ampere. The unit of ampere is defined as that unvarying current that, if present in each of two parallel conductors of infinite length and one meter apart in empty space, causes each conductor to experience a force of exactly $2\times10^{-7}\mathrm{N}$ per meter of length. The definition of one coulomb is then the amount of charge transferred in one second by a current of one ampere.
The magnetic field of a current line circles around it. If we wire the line to form a loop, the magnetic field will have an upward direction, and it will behave like a magnet with north and south poles. To derive the magnitude formula, consider a loop with radius $a$ and a point at horizontal distance $x$ from the center. For a line segment $d\bm{l}$,
$$
dB = \frac{\mu_0I}{4\pi}\frac{dl}{x^2+a^2}
$$
The $x$-component and $y$-component of $d\bm{B}$ are
$$
\begin{aligned}
dB_x &= dB\cos\theta = \frac{\mu_0I}{4\pi}\frac{dl}{x^2+a^2}\frac{a}{(x^2+a^2)^{1/2}}\newline
dB_y &= dB\sin\theta = \frac{\mu_0I}{4\pi}\frac{dl}{x^2+a^2}\frac{x}{(x^2+a^2)^{1/2}}
\end{aligned}
$$
The $y$-components of two symmetric line segments cancel out, and the field will only have an $x$-component of
$$
B_x = \frac{\mu_0Ia}{4\pi(x^2+a^2)^{3/2}}\int dl = \frac{\mu_0Ia^2}{2(x^2+a^2)^{3/2}}.
$$
If we stack $N$ loops to form a coil, them total magnetic field is $N\cdot B_x$. Its maximum will occur at $x=0$ (the center of the loop or coil) with
$$
(B_x)_{\max}=\frac{\mu_0NI}{2a}.
$$
Using the magnetic moment $\mu=N\cdot IA = NI\pi a^2$, we can also write $B_x$ as
$$
B_x = \frac{\mu_0\mu}{2\pi(x^2+a^2)^{3/2}}.
$$
Ampere’s law. If we integrate the generated field $\bm{B}$ over a circle of radius $r$ around a current line, we get
$$
\oint\bm{B}\cdot d\bm{l} =B\oint dl = \frac{\mu_0}{2\pi r}(2\pi r) = \mu_0I.
$$
The integral does not depend on radius and shape of the path around the current, as long as it is closed. If the closed path is outside the wire, then the line integral will be zero. More generally, the Ampere’s law states that the line integral of a magnetic field over a closed path will be $\mu_0$ times the total current enclosed in the path.
$$
\oint B\cdot d\bm{l} = \mu_0I_{\text{encl}} \quad\text{(Ampere’s law)}
$$
Magnetic materials. We can calculate the magnetic moment of an electron if we picture electrons as moving in a circular orbit with radius $r$ and speed $v$. Now $T=2\pi r/v$, so the current is $I=e/T=ev/2\pi r$, and the magnetic moment is $ \mu = \frac{ev}{2\pi r}(\pi r^2) = evr/2$. The angular momentum is $L=mvr$, so we can write $\mu=\frac{e}{2m}L$. The angular momentum is quantized and is always an integer multiple of $h/2\pi$, where $h$ is the Planck’s constant. The unit of magnetic moment
$$
\mu_B = \frac{e}{2m}\left(\frac{h}{2\pi}\right) = \frac{eh}{4\pi m} = 9.274\cdot10^{-24}\mathrm{A\cdot m^2}
$$
is called the Bohr magneton. In addition, electron spin also has an associated magnetic moment that is almost exactly one Bohr magneton.
Atoms in paramagnetic materials have magnetic moments that are of the order of $\mu_B$. When placed in a magnetic field, the field exerts a torque on each magnetic moment, and the torques tend to align the moments with the field. The result is that the magnetic field at any point in such material is greater by a factor of $K_m$, called the relative permeability of the material. The permeability of the material is $\mu=K_m\mu_0$. The factor is very small, typically ranging from $1.00001$ to $1.003$. The difference $\chi_m=K_m-1$ is called magnetic susceptibility.
A material is diamagnetic if the material becomes slightly opposed to the ambient magnetic field. $K_m$ is typically $0.9999$ to $0.99999$.
Iron, nickel and cobalt are ferromagnetic. Even when no external field is present, strong interactions between atomic magnetic moments cause them to align with each other in regions called magnetic domains. When a field is present, the domains tend to orient themselves parallel to the field. The permeability is much larger, typically of the order of $1,000$ to $100,000$.
By moving magnets through coils and by changing coil shapes in a magnetic field, Faraday’s experiments show that changing magnetic flux can induce current. The induced emf in a closed loop is equal to the negative of time derivative of magnetic flux $\Phi_B$ through the loop.
$$
\mathcal{E} = - \frac{d\Phi_B}{dt} \quad\text{(Faraday’s law of induction)}
$$
The direction of the induced current is such that the generated magnetic field is in opposite direction to the original field if the original field is increasing, and is in the same direction as the original field if the latter is decreasing. It will oppose the external field if it is coming toward it, and attract the field to not let it go if the field is leaving. This is called the Lenz’s law.
If we drop a ring magnet into a copper tube, the magnet will fall slower than in vacuum according to Lenz’s law. Part of the kinetic energy is converted to charge movements in the conductor. If Lenz’s law were reversed, then the magnet would come out the tube faster. This implies if we connect more tubes, the magnet will accelerate faster and faster until reaching the speed of light. A small push resulted in enormous energy; energy is added out of nowhere. This is not how our universe behaves, so Lenz’s law must be obeyed.
The induced emf, which is the work done when a unit charge goes once around the loop, is equal to the line integral of an electric field along the loop. Thus we can write the Faraday’s law as
$$
\oint\bm{E}\cdot d\bm{l} = - \frac{d\Phi_B}{dt} \quad\text{(Faraday’s law of induction)}
$$
This induced electric field is nonconservative, and it is called nonelctrostatic field.
We can also induce an emf in a conductor not by varying an external magnetic field, but by moving the conductor back and forth with speed $\bm{v}$ in a uniform magnetic field $\bm{B}$. The included emf is
$$
\mathcal{E} = \oint(\bm{v}\times\bm{B})\cdot d\bm{l}.
$$
If $\bm{v}$ is perpendicular to $\bm{B}$, then the emf boils down to $\mathcal{E}=vBL$, where $L$ is the length of the conductor.
Maxwell generalized Ampere’s law to include the case for capacitors. When a capacitor is charging or discharging, there’s no current flow between the two plates, so Ampere’s law would give zero, but there’s changing electric field and flux at the gap when charges accumulate or dissipate on the plates, and that change can create a magnetic field that we can detect experimentally.
When a capacitor is charging, we have $q(t)=Cv(t)$ where $v(t)$ is the potential difference at time $t$. With $v(t)=E(t)d$ and $C=\epsilon_0 A/d$ we have
$$
q(t) = Cv(t) = \frac{\epsilon_0 A}{d}(E(t)d) = \epsilon_0 E(t)A = \epsilon_0\Phi_E.
$$
So there would be a hypothetical current between the two plates, called displacement current, that is equal to
$$
i_D = \frac{dq(t)}{dt} = \epsilon_0\frac{d\Phi_E}{dt}
$$
The expanded Ampere’s law is then
$$
\oint\bm{B}\cdot d\bm{l} = \mu_0\left(I + \epsilon_0\frac{d\Phi_E}{dt}\right)_{\text{encl}} \quad\text{(Ampere’s law)}
$$
The Maxwell’s equations are a complete description of how electricity and magnetism work.
$$
\begin{dcases}
\oint\bm{E}\cdot d\bm{A} = \frac{Q}{\epsilon_0}\quad\text{(Gauss’s law)} \newline
\oint\bm{B}\cdot d\bm{A} = 0 \quad\text{(Gauss’s law)} \newline
\oint\bm{E}\cdot d\bm{l} = - \frac{d\Phi_B}{dt} \quad\text{(Faraday’s law) (how we generate electricity)} \newline
\oint\bm{B}\cdot d\bm{l} = \mu_0\left(I + \epsilon_0\frac{d\Phi_E}{dt}\right) \quad\text{(Ampere’s law) (how motors work)}
\end{dcases}
$$
This is the integral formulation. The differential formulation of Maxwell’s equations is
$$
\begin{dcases}
\nabla\cdot\bm{E} = \frac{\rho}{\epsilon_0} \quad\text{(Gauss’s law)}\newline\newline
\nabla\cdot\bm{B}=0 \quad\text{(Gauss’s law)}\newline\newline
\nabla\times\bm{E} = -\frac{\partial\bm{B}}{\partial t} \quad\text{(Faraday’s law) (how we generate electricity)}\newline\newline
\nabla\times\bm{B} = \mu_0\left(\bm{J}+\epsilon_0\frac{\partial\bm{E}}{\partial t}\right) \quad\text{(Ampere’s law) (how motors work)}
\end{dcases}
$$
where $\nabla\cdot$ is the divergence operator, $\nabla\times$ is the curl operator, $\rho=Q/V$ is the charge density, $\bm{J}$ is the current density. The equivalence can be established from the divergence theorem (Gauss’s theorem) and the curl theorem (Stokes’ theorem). We list the derivation below.
Definition of divergence and curl
The divergence operation converts a vector field $\bm{F}=(F_1,F_2,F_3):\mathbb{R}^3\to\mathbb{R}^3$ into a field of scalars $\mathbb{R}^3\to\mathbb{R}$, where each number represents the “outgoingness” at that point.
$$
\mathrm{div}\,\bm{F} = \nabla\cdot\bm{F} = \left(\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z}\right)\cdot(F_1,F_2,F_3) =\frac{\partial F_1}{\partial x} + \frac{\partial F_2}{\partial y} + \frac{\partial F_3}{\partial z}
$$
$\nabla\cdot\bm{F}>0$ at point $\bm{x}\in\mathbb{R}^3$ if that point acts like a source, and $\nabla\cdot\bm{F}<0$ if that point acts like a sink.
The curl of a field defines another field of vectors $\mathbb{R}^3\to\mathbb{R}^3$ where each vector describes how much the field vectors rotate at that point around the $x$, $y$ and $z$ axis respectively in space. It is similar to torque. The direction of a curl vector points at the rotation axis (right hand rule).
$$
\mathrm{curl}\,\bm{F}=\nabla\times\bm{F} = \left(\frac{\partial F_3}{\partial y}-\frac{\partial F_2}{\partial z}\right)\bm{\imath} + \left(\frac{\partial F_1}{\partial z}-\frac{\partial F_3}{\partial x}\right)\bm{\jmath} + \left(\frac{\partial F_2}{\partial x}-\frac{\partial F_1}{\partial y}\right)\bm{k}
$$
We mention that the dot product notation and the cross product notation are both abuse of notations.
The divergence theorem (Gauss’s theorem)
For a volume $V$ enclosed by surface $A$, the flux of $\bm{F}$ over $A$ is equal to the volume integral of divergence of $\bm{F}$ over $V$.
$$
\int_A\bm{F}\cdot d\bm{A} = \int_V\nabla\cdot\bm{F}dV
$$
Intuitive proof. We divide the volume into many small cubes, and let’s consider a small cube $dV=dx\cdot dy\cdot dz$ at point $(x,y,z)\in\mathbb{R}^3$. The field is approx. constant on each side of the cube, so we can calculate flux as field value times area. The net flux in the $x$-direction over the two sides with area $d\bm{A}=dy\cdot dz$ is
$$
\begin{aligned}
\text{flux in $x$ direction} &=(F_1(x+dx,y,z)-F_1(x,y,z))dy\cdot dz \newline
&= \frac{F_1(x+dx,y,z)-F_1(x,y,z)}{dx}dx\cdot dy\cdot dz \newline
&= \frac{\partial F_1}{\partial x} dV
\end{aligned}
$$
The minus sign is because that normal vector points in the opposite direction. We can calculate the flux over the $y$-direction and the $z$-direction in similar ways. The flux over the cube is then the sum of the flux over the three directions
$$
\text{flux over cube} = \left(\frac{\partial F_1}{\partial x}+\frac{\partial F_2}{\partial y}+\frac{\partial F_3}{\partial z}\right)dV = \nabla\cdot\bm{F}dV
$$
When we sum over the flux in all cubes, the adjacent surfaces cancel out, and we are only left with the outmost surface $\bm{A}$. This proves the equation.
The divergence theorem also explains why divergence is a measure of “outgoingness”. Over a small volume $V$, $\nabla\cdot\bm{F}$ is approx. constant. We take it out of the integral:
$$
\int_V\nabla\cdot\bm{F}dV \approx (\nabla\cdot\bm{F})V
$$
So $\nabla\cdot\bm{F}$ is roughly the flux divided by $V$. The relation becomes exact when $V\to0$:
$$
\nabla\cdot\bm{F} = \lim_{V\to0}\frac{1}{V}\int_S\bm{F}\cdot d\bm{A}
$$
Derivation of differential formulation of Gauss’s laws
By the divergence theorem,
$$
\oint\bm{E}\cdot d\bm{A} = \int_V\nabla\cdot\bm{E}\,dV
$$
On the other hand, we can write the right hand side of the Gauss’s law $Q/\epsilon_0$ as
$$
\int_V\frac{Q/V}{\epsilon_0}\,dV
$$
The two volume integrals are equal for arbitrary volume $V$. Thus the two integrands must be equal as well, and we get
$$
\nabla\cdot\bm{E} = \frac{\rho}{\epsilon_0}
$$
The differential formulation for Gauss’s law for magnetism $\nabla\cdot\bm{B}=0$ can be derived in same way.
The curl theorem (Stokes’ theorem)
THe curl theorem is similar in spirit to the divergence theorem. It says for surface $A$ with boundary $\partial A$, the line integral of $\bm{F}$ over the loop $\partial A$ is equal to the surface integral of $\nabla\times\bm{F}$ over $A$.
$$
\oint_{\partial A}\bm{F}\cdot d\bm{l} = \int_A(\nabla\times\bm{F})\cdot d\bm{A}
$$
When $A$ is small, we can take $\nabla\times\bm{F}$ out of the integral, and see how the curl is the area density of the circulation of the field in each of the three dimensions.
$$
(\nabla\times\bm{F})(p)\cdot\bm{\hat{n}} = \lim_{A\to0}\frac{1}{|A|}\oint_{\partial A}\bm{F}\cdot d\bm{l}.
$$
Stokes’ theorem in 2D is called Green’s theorem. For a 2d vector field $F=(F_1,F_2)$ and an area $A\subset\mathbb{R}^2$ with boundary $\partial A$, the Green’s theorem is
$$
\oint_{\partial A}\bm{F}\cdot d\bm{l} = \oint_{\partial A}F_1dx+F_2dy = \int_A\left(\frac{\partial F_2}{\partial x} - \frac{\partial F_1}{\partial y}\right)dA
$$
Intuitive proof. The intuitive proof of Green’s theorem is entirely similar to the one we used in proving the divergence theorem. However, this time I want to run it backwards: I want to first start with the formula at the right side, then decompose it step by step to see what we get.
Consider a small area $dA=dx\cdot dy$ at point $(x,y)\in\mathbb{R}^2$. The formula leads us to
$$
\left(\frac{\partial F_2}{\partial x} - \frac{\partial F_1}{\partial y}\right)dxdy = \left(\frac{\partial F_2}{\partial x}dx\right)dy - \left(\frac{\partial F_1}{\partial y}dy\right)dx \newline
$$
and according to the definition of the partial derivatives it is
$$
\bigg(F_2(x+dx, y) - F_2(x, y)\bigg)dy - \bigg(F_1(x,y+dy) - F_1(x,y)\bigg)dx
$$
We can start to see how this is the line integral of $F=(F_1,F_2)$ over the four sides of $dA$. Rearrange the equation we get
$$
F_1(x,y)dx + F_2(x+dx,y)dy - F_1(x,y+dy)dx - F_2(x,y)dy.
$$
So we are using the constant value of $F$ at $(x,y)$ for the first line segment $dx$ (note $dy=0$ so $F_2(x,y)dy$ just evaluated to $0$), the value of $F$ at $(x+dx,y)$ for the second line segment, the value of $F$ at $(x,y+dy)$ for the third line segment, and $F$ at $(x,y)$ for the fourth line segment. The sum is the line integral of $F$ over the boundary of $dA$.
Integrate over all small areas $dA$ and we get the line integral of $F$ over the boundary of $A$, because adjacent lines run opposite in directions and cancel each other out, and only the boundary is left.
The 2D case tells us how the field rotates in the $xy$-plane, namely around the $z$-axis in 3D. If we observe the formula we see that the curl in Green’s formula is exactly the third component in the definition of curl for 3D vector fields. Thus, if we add representation about how the field rotates around the $x$-axis and the $y$-axis by adding further two components to the curl, add integrate the curl over an area $A$ in 3D, then we get the line integral of the boundary of the area just as in the 2D case. This completes our intuitive proof of the Stokes’ theorem.
Derivation of differential formulation of Faraday’s law and Ampere’s law
By Stokes’ theorem, we have
$$
\oint\bm{E}\cdot d\bm{l} = \int_A(\nabla\times\bm{E})\cdot d\bm{A}
$$
On the other hand,
$$
-\frac{d\Phi_B}{dt} = -\frac{d}{dt}\int_A\bm{B}\cdot d\bm{A} = -\int_A\frac{\partial\bm{B}}{\partial t}\cdot d\bm{A}
$$
So we conclude that
$$
\nabla\times\bm{E} = -\frac{\partial\bm{B}}{\partial t}
$$
This is Faraday’s law in differential formulation. Similarly, for Ampere’s law,
$$
\begin{aligned}
\int_A(\nabla\times\bm{B})\cdot d\bm{A} &= \mu_0\left(I + \epsilon_0\frac{d\Phi_E}{dt}\right) \newline\newline
&= \mu_0\left(\int_A\bm{J}\cdot d\bm{A} + \epsilon_0\frac{d}{dt}\int_A\bm{E}\cdot d\bm{A}\right) \newline\newline
&= \int_A\mu_0\left(\bm{J} + \epsilon_0\frac{\partial\bm{E}}{\partial t}\right)\cdot d\bm{A}
\end{aligned}
$$
so we conclude that
$$
\nabla\times\bm{B} = \mu_0\left(\bm{J} + \epsilon_0\frac{\partial\bm{E}}{\partial t}\right).
$$
Applications: inductors, ac and transformers
inductors. A coil of $N$ turns of wire is called an inductor. When the current through the coil changes, by Lenz’s law the generated magnetic field will have a tendency to oppose the change. Thus, an inductor can be used in circuits to stabilize currents.
An inductor is sort of like a ship. It is hard to move it at the beginning, but once it is moved, it will have inertia and it will be difficult to slow the ship down.
By Faraday’s law, the induced emf in an inductor is proportional to time derivative of magnetic flux, which for a fixed area is proportional to time derivative of current. Thus we have the relation
$$
L = \frac{N\Phi_B}{I},\qquad \mathcal{E} = - L\frac{dI}{dt}
$$
where $L$ is called the inductance of the coil with unit henry ($\mathrm{H}$).
The energy $dU$ supplied to an inductor during time $dt$ is
$$
dU = Pdt = L\frac{dI}{dt}\cdot Idt = LIdI
$$
So the total energy $U$ supplied when the current increases from $0$ to $I$ is the integral of $dU$ over $0$ to $I$, which is $(1/2)LI^2$. When the current decreases from $I$ to $0$, the inductor acts as a source that supplies a total amount of $(1/2)LI^2$ to the circuit. It can be calculated that the magnetic energy density $u$ (energy per unit volume) is proportional to the square of the magnetic field magnitude.
$$
u = \frac{B^2}{2\mu_0}
$$
R-L circuit. In a R-L circuit, which consists of a DC emf, a resistor $R$ and an inductor $L$, the growth and decay of current are exponential, with differential equation
$$
\frac{dI}{dt} = \frac{\mathcal{E}}{L} - \frac{R}{L}I
$$
and solution
$$
I(t) = \frac{\mathcal{E}}{R}(1 - e^{-(R/L)t})
$$
The time constant $\tau=L/R$ measures the time it takes for the current to rise to $(1-1/e)\approx63$ percent of its final value.
L-C circuit. A circuit that consists of purely an inductor $L$ and a capacitor $C$ undergoes oscillations of current and charge. By Kirchhoff’s loop rule
$$
-L\frac{dI}{dt} - \frac{q}{C} = 0
$$
with $dI/dt = d^2q/dt^2$ we get the differential equation
$$
\frac{d^2q}{dt^2} + \frac{1}{LC}q = 0
$$
The solution is $q=Q\cos(\omega t + \phi)$ with the angular frequency of the oscillation being
$$
\omega = \sqrt{\frac{1}{LC}}
$$
and $I$ is the derivative of $q$.
L-R-C circuit. An L-R-C circuit exhibits damped oscillation. From Kirchhoff’s loop rule
$$
-IR - L\frac{dI}{dt} - \frac{q}{C} = 0
$$
we get the differential equation for the circuit
$$
\frac{d^2q}{dt^2} + \frac{R}{L}\frac{dq}{dt} + \frac{1}{LC}q = 0
$$
The solution to this ODE is
$$
q = Ae^{-(R/2L)t}\cos\left(\omega t + \phi\right)
$$
where
$$
\omega = \sqrt{\frac{1}{LC} - \frac{R^2}{4L^2}}
$$
is the angular frequency. When $R$ reaches a certain value, the circuit will no longer oscillate.
AC current. For an AC current $i(t) = I\cos\omega t$ with voltage $v(t) = V\cos(\omega t + \phi)$, we typically measure its current and voltage by their rav and rms values.
$$
I_{\text{rav}} = \frac{2}{\pi}I,\quad I_{\text{rms}}=\frac{I}{\sqrt{2}},\quad V_{\text{rms}} = \frac{V}{\sqrt{2}}
$$
For an AC current $i(t)=I\cos\omega t$, the voltage across a resistor $R$ is in phase with the current.
$$
v_R = iR = (IR)\cos\omega t = V_R\cos\omega t.
$$
However, the voltage across an inductor $L$ will lead the current by $\phi=90^\circ$.
$$
v_L = L\frac{di(t)}{dt} = L\frac{d}{dt}(I\cos\omega t) = -I\omega L\sin\omega t = I\omega L\cos(\omega t + 90^\circ)
$$
where the last equality follows from the identity $\cos(x+\pi/2)=-\sin x$. We define $X_L:=\omega L$ as the inductive reactance and we write the amplitude of the voltage across the inductor as $V_L=IX_L$. A high frequency ($\omega$) voltage gives only a small current, while a low frequency voltage gives rise to a larger current, so inductors act like low-pass filters.
The voltage across a capacitor in an AC circuit is given by
$$
i = \frac{dq}{dt}=I\cos\omega t\quad\Rightarrow v_C=\frac{q}{C}=\frac{I}{\omega C}\sin\omega t = \frac{I}{\omega C}\cos(\omega t - 90^\circ)
$$
which lags the current by $\phi=-90^\circ$. We define $X_C:=1/\omega C$ as capacitive reactance so that $V_C=IX_C$. We see that large $\omega$ gives large current while small $\omega$ gives small current, so capacitors act like high-pass filters.
In a L-R-C AC circuit, it can be shown on the phase diagram that the voltage across the source is a vector sum of $V_R$ and $V_L-V_C$, so its amplitude will be
$$
V = \sqrt{V_R^2+(V_L-V_C)^2} = I\sqrt{R^2 + (X_L - X_C)^2} =: IZ
$$
The quantity $Z(\omega)=\sqrt{R^2+[\omega L - (1/\omega C)]^2}$ is called the impedance of an AC circuit. The impedance will obtain its minimum $Z_{\min}=R$ (and $I$ at its maximum $I_{\max}$) at
$$
\omega_0 = \frac{1}{\sqrt{LC}}
$$
when the circuit will exhibit resonance. In this situation, $V=IR$, and the circuit behaves as if the inductor and the capacitor weren’t there at all. If we vary the inductance $L$ or the capacitance $C$, we can also vary the resonance frequency. This is how a radio is “tuned” to receive a particular station.
transformers. A transformer is two coils wrapped around two arms of an iron core for voltage conversion. One coil called the primary is connected to a high voltage AC source, while the other called the secondary only has load $R$. The iron core is used to contain the magnetic field almost completely within the core for optimal efficiency. From Faraday’s law of induction
$$
\mathcal{E} = -N_1\frac{d\Phi_B}{dt}\quad\text{and}\quad\mathcal{E_2}=-N_2\frac{d\Phi_B}{dt}
$$
so we have the relation
$$
\frac{V_2}{V_1}=\frac{\mathcal{E}_2}{\mathcal{E}_1}=\frac{N_2}{N_1}
$$
To reduce the effect of eddy currents generated on the surface of the core, thin sheets of laminae are used, which can produce a humming noise.
By Ampere’s law, a changing electric field can induce magnetic field. By Faraday’s law, a changing magnetic field can induce an electric field. Thus we are led to the prediction of existence of electromagnetic waves, first discovered by German physicist Heinrich Hertz in 1887. In vacuum or free space with no charge ($\rho=0$) and no current ($\bm{J}=0$), Maxwell’s equations reduce to
$$
\begin{aligned}
&\nabla\cdot\bm{E} = 0, \qquad \nabla\times\bm{E} = -\frac{\partial\bm{B}}{\partial t} \newline
&\nabla\cdot\bm{B} = 0, \qquad \nabla\times\bm{B} = \mu_0\epsilon_0\frac{\partial\bm{E}}{\partial t}
\end{aligned}
$$
Applying the curl of curl identity $\nabla\times(\nabla\times\bm{F}) =\nabla(\nabla\cdot \bm{F})-\nabla^2\bm{F}$ to the two curl equations, where $\nabla$ is the gradient operator and $\nabla^2$ is the vector Laplacian operator, we get
$$
\begin{aligned}
\frac{\partial^2\bm{E}}{\partial t^2} = \frac{1}{\mu_0\epsilon_0}\nabla^2\bm{E} \newline
\frac{\partial^2\bm{B}}{\partial t^2} = \frac{1}{\mu_0\epsilon_0}\nabla^2\bm{B} \newline
\end{aligned}
$$
This is the wave equation. Thus the electric field and magnetic field propagate as waves, with speed
$$
v = \frac{1}{\sqrt{\mu_0\epsilon_0}} = 3\times10^8\mathrm{m/s}
$$
which coincides with the speed of visible light that was first measured by for example French physicist Hippolyte Fizeau in 1849. This suggests that visible lights are electromagnetic waves.
visible light spectrum
speed of light is slow
The speed of light is $c=300,000$ km/s. By comparison, the earth’s diameter is $12,756$ km, so light travels at a distance of $23.5$ earths per second. This is not particularly fast considering how tiny the earth is in the vast universe.
According to the wave equation, the wave speed is constant, regardless of any motion of the source. This result from Maxwell’s equations led Einstein to postulate the constancy of speed of light in special relativity theory.
Alternatively, the wave equation can also be derived from the integral formulation of Maxwell’s equations by considering a plane wave that propagates in the $x$-direction, where $\bm{E}$ only has $y$ component $E_y$ and $\bm{B}$ only has $z$ component $B_z$. We consider a small distance $\Delta x$ and calculate the two line integrals in Faraday’s law and Ampere’s law over two small squares for $\bm{E}$ and $\bm{B}$ respectively, treating the value of $E$ or $B$ on each side of the areas as constant. We will get the same result.
$$
\begin{aligned}
\frac{\partial^2E_y(x,t)}{\partial t^2} = \frac{1}{\mu_0\epsilon_0}\frac{\partial^2E_y(x,t)}{\partial x^2} \newline
\frac{\partial^2B_z(x,t)}{\partial t^2} = \frac{1}{\mu_0\epsilon_0}\frac{\partial^2B_z(x,t)}{\partial x^2}
\end{aligned}
$$
One solution to the partial differential equations is the sinusoidal wave, given by
$$
E(x, t) = E_{\max}\cos(kx - \omega t)\newline
B(x, t) = B_{\max}\cos(kx - \omega t)
$$
where $E_{\max}$ and $B_{\max}$ are amplitudes, $\omega=2\pi f$ is the angular frequency, $k=2\pi/\lambda$ is the wave number. By considering a small rectangle with side $a$ and $cdt$, it can be shown using Faraday’s law that the amplitudes of the two fields must satisfy
$$
E_{\max}=cB_{\max}.
$$
We can already see from Maxwell’s equations why $\bm{E}$ and $\bm{B}$ will vary in phase. For example, the curl of $\bm{B}$, which in this case measures rotation of $B$ around the $y$-axis, will be greatest when $B$ changes directions from being positive to being negative or vice versa. At that point, the change of $E$ must also be steepest. The curl of $\bm{B}$ will be zero (no rotation) when $B$ is at its maximum; at that point $\partial\bm{E}/\partial{t}=0$ and so $E$ must be at its maximum as well.
In general, $\bm{E}$ and $\bm{B}$ can have both $x$, $y$ and $z$ components and can rotate around propagation direction as they travel through space.
Application: Starlink
Starlink employs 10,000+ satellites to provide internet connection over space, with a latency of around 20ms. To reduce latency, the satellites have to be deployed at a very near distance of 550km above earth surface, which means each satellite can only cover a small surface area and many of them are needed in sky. The disk has an array of 1280 antennas. They can form a narrow beam pointing toward a satellite by constructive interference, specifically phase array beam steering.
Starlink uses a modulation method called 64QAM to transmit data. Roughly speaking, in this method, all 6-bit symbols (e.g. 011101) are placed on a 2d grid, and we modulate wave amplitude and phase as a vector’s norm and angle $(A, \phi)$ to represent each point on the grid.
Starlink uses ~12GHz frequency electromagnetic waves to send and receive data, which is 1 wave per 83 picoseconds (1 ps = $10^{-12}\mathrm{s}$). Each symbol lasts 10 nanoseconds (1 ns = $10^{-9}\mathrm{s}$). This means that for each symbol 120 wavelengths are repeated before next symbol is sent. We can also calculate that around 90million ($10^8$) symbols per second can be sent, which is $6\times90\text{million}=540$ Mbits/s. This is the network speed that Starlink provides.
From the equation $c=\lambda f$, we can also calculate its wavelength as
$$
\begin{aligned}
3\cdot10^8 \text{m/s} &= \lambda\cdot12\cdot10^9\text{Hz}\newline
\lambda &= 3 / 120 \text{ m} \newline
&= 0.025\text{ m} \newline
&= 2.5\text{ cm}
\end{aligned}
$$
so we can see that 550km / 0.025m = 22 million waves are sent between the disk and the satellite in each second. Each wave emitted from the disk takes 550km / 300,000 (km/s) = 1.8 ms to reach the satellite.
Energy in electromagnetic waves. the energy density of an electromagnetic wave is the sum of energy densities for $\bm{E}$ and $\bm{B}$:
$$
u = \frac{1}{2}\epsilon_0E^2 + \frac{B^2}{2\mu_0}
$$
Substituting the relation $B=E/c=\sqrt{\epsilon_0\mu_0}E$ into the equation, we get
$$
u = \epsilon_0E^2.
$$
This shows that in vacuum, the energy density of $\bm{E}$ is equal to the energy density of $\bm{B}$. Consider a volume $dV=A\cdot(cdt)$. The energy in a volume $dV$ is the energy density times this volume $dU=udV = (\epsilon_0E^2)(Acdt)$. Energy flow per unit time per unit area is then
$$
S = \frac{1}{A}\frac{dU}{dt} = \epsilon_0cE^2 = \frac{\epsilon_0}{\sqrt{\epsilon_0\mu_0}}E^2=\frac{EB}{\mu_0}
$$
with unit of $1 \mathrm{W/m^2}$. The vector quantity
$$
\bm{S} = \frac{1}{\mu_0}\bm{E}\times\bm{B}
$$
is called the Poynting vector. Its direction is in the direction of propagation of the wave. The power (energy flow per unit time) over any closed surface is the integral of $\bm{S}$ over that surface
$$
P = \oint\bm{S}\cdot d\bm{A}.
$$
The average value of $\bm{S}$ at a point is called the intensity of the radiation at that point. For sinusoidal waves, substituting the function forms of $\bm{E}$ and $\bm{B}$ into $\bm{S}$, it can be worked out to be
$$
I = S_{\text{av}} = \frac{E_{\max}B_{\max}}{2\mu_0} = \frac{1}{2}\epsilon_0cE_{\max}^2.
$$
Electromagnetic waves also carry momentum. Momentum density has magnitude
$$
\frac{dp}{dV} = \frac{EB}{\mu_0c^2}=\frac{S}{c^2}
$$
Using $dV=Ac\,dt$ the momentum flow rate per unit area is
$$
\frac{1}{A}\frac{dp}{dt}=\frac{S}{c}=\frac{EB}{\mu_0c}
$$
This momentum is responsible for radiation pressure. Although typically small, the radiation force can add up over time, for example affecting orbit of spacecrafts and satellites, so radiation pressure must be taken into account when designing spacecrafts. Radiation pressure is also the bedrock of lase technologies.
In this section, we will omit the derivations to some of the optics formulas. They can be found in the textbook.
(Citation: Young, Freedman& al., 2011Young, H.,
Freedman, R. & Ford, A.
(2011).
University physics with modern physics 13th edition.
Addison-Wesley Reading, MA.)
In geometric optics, the ray model is used to describe lights.
Reflection and refraction. When light goes from one material to another, two things can happen: reflection and refraction. Reflection is the familiar phenomenon of light bouncing off a material, like the mirror. Reflection at a very smooth surface is called specular reflection, while scattered reflection from a rough surface is called diffuse reflection. Refraction is the phenomenon that lights continue to travel through the material, but with apparently different speed $v$ and angle. The index of refraction
$$
n=c/v
$$
is the ratio of the speed of light in vacuum to the speed $v$ in the material. A larger index of refraction means lights travel slower in that material. Frequency does not change when light passes through one material to another, but wavelength can. From $v=\lambda f$ we have the relation $\lambda=\lambda_0/n$ where $\lambda_0$ is the wavelength of the same light in vacuum.
The 3 laws of reflection and refraction are:
the incident, reflected, and refracted rays and the normal to the surface all lie in the same plane.
the angle of reflection $\theta_r$ is equal to the angle of incident $\theta_\alpha$ for all wavelengths and for any pair of materials.
$$
\theta_r = \theta_\alpha
$$
(Snell’s law) For monochromatic light and for a given pair of materials, $a$ and $b$,
$$
\frac{\sin\theta_a}{\sin\theta_b} = \frac{n_b}{n_a}
$$
This means when a light ray enters a material with a larger index of refraction $n_b$, the angle will be smaller ($\theta_b<\theta_a$) and the ray will bend toward the normal. When it enters a material with a smaller $n_b$ so it travels faster, the angle will be larger ($\theta_b>\theta_a$) and the ray bends away from the normal. This is why things submerged into water appear to be shallower, because when light travels from water to air, it appears to be faster and take shorter time to reach our eyes. Conversely, when viewing from water, things above the water surface will appear to be further than they really are.
If we let light travel from a slower material to a faster material, e.g. shooting lasers from underwater, by Snell’s law $\sin\theta_b=(n_a/n_b)\sin\theta_a$ there will be some angle $\theta_a<90^\circ$ (namely $\theta^*=\arcsin(n_b/n_a)$) such that $\theta_b=90^\circ$, called the critical angle. Beyond the critical angle, no light will be refracted through the second material, and we have a situation of total internal reflection. So light can be totally reflected without escaping from a material. Applications of this phenomenon include: (1) prisms in binoculars to guide light into eyepieces; (2) optical fibers and endoscopes for transmitting information with lights; (3) diamonds are cut so that there is total internal reflection on their back surfaces to maximize brilliance.
Dispersion is the dependence of wave speed and index of refraction on wavelength. Usually, index of refraction increases with increasing frequency (or decreasing wavelength). A familiar example is the rainbow.
Polarization is the extraction of the $\bm{E}$ field along a particular direction.
Plane surfaces. There are many terms in geometric optics.
⏺️ an image is what we see for an object through reflection or refraction of some material. If light rays don’t pass through the image point, it is called a virtual image. If light rays do pass through the image, it is called a real image.
⏺️ the object distance $s$ is the distance of the object from the mirror or lens. The image distance $s'$ is the distance of the image point from the mirror or lens. For plane mirror, $s=-s'$.
⏺️ sign rules. (1) $s$ is positive if the object is on the same side of the surface as the incoming light, otherwise it is negative. (2) $s'$ is positive if the image is on the same side as the outgoing light, otherwise it is negative. (3) the radius of curvature $R$ is positive if the center of curvature $C$ is on the same side as the outgoing light, otherwise it is negative.
⏺️ the lateral magnification $m$ of an extended object is
$$
m = \frac{y'}{y}
$$
where $y$ is the object height and $y'$ is the image height. For plane mirror $m$ is simply $1$. An image can be erect, inverted or reversed.
Reflection at spherical surfaces. Parallel light rays will concentrate at a single point when being reflected off a concave mirror. This point is called the focal point, and the distance of the focal point to vertex, denoted as $f$, is the focal length. $f$ is related to the radius of curvature $R$ by $f=R/2$. When the angles of the light rays with the optic axis are not too large, we can work out an approximate object-image relationship
$$
\frac{1}{s} + \frac{1}{s'} = \frac{2}{R} = \frac{1}{f}
$$
We can verify that when $s=\infty$, namely when light rays are parallel, the image will be formed at the focal point. Conversely, if we place a light source at the focal point, the reflected light rays will be parallel to each other.
The equation is only an approximation for spherical mirrors, When the angle is too large, a point object will not form a precise point image. This property is called spherical aberration. On the other hand, parabolic mirrors can be made so that the relation holds exactly true.
The lateral magnification for spherical mirrors is
$$
m = \frac{y'}{y} = -\frac{s’}{s}
$$
When an object is placed far away from a concave mirror, the image will be real, shrunk and inverted. When an objected is placed inside the focal length $s<f$, the image will be virtual, magnified and erect.
For convex mirrors, the same relations hold. On the other hand, parallel rays will diverge from a virtual focal point $F$ behind the mirror, and rays aimed at the virtual focal point will be reflected to become parallel rays.
Refraction at spherical surfaces. When light travels from material $n_a$ through a sphere of material $n_b$, the following relation holds
$$
\frac{n_a}{s} + \frac{n_b}{s'} = \frac{n_b - n_a}{R}.
$$
And the lateral magnification is
$$
m = \frac{y'}{y} = -\frac{n_as'}{n_bs}.
$$
A special case is a plane surface ($R=\infty$) and the two equations become
$$
\frac{n_a}{s} + \frac{n_b}{s'} = 0 \quad\text{and}\quad m=1
$$
So when light travels from water to air (i.e. $n_b < n_a$), we have $|s'|<|s|$ so an underwater object will appear to be shallower.
Thin lenses. There are two types of thin lenses: converging lens and diverging lens. A lens has two focal points $F_1$ and $F_2$ located on the left and right side of the lens.Any lens that is thicker at its center than at its edges is a converging lens with positive $f$; and any lens that is thicker at its edges than at its center is a diverging lens with negative $f$. The same object-image relation and lateral magnification equations hold for lenses:
$$
\frac{1}{s} + \frac{1}{s'} = \frac{1}{f}
$$
and
$$
m = -\frac{s'}{s}
$$
Using the principle that an image formed by one reflecting or refracting
surface can serve as the object for a second reflecting or refracting surface, we can derive the lensmaker’s equation that relates the radii of two surfaces of a lens with the focal length as
$$
\frac{1}{f} = (n-1)\left(\frac{1}{R_1} - \frac{1}{R_2}\right)
$$
where $n$ is the refractive index of the lens. This equation is only an approximation and more accurate but more complicated relation can also be deduced.
Application: Optical devices
Cameras, human eyes, magnifiers, microscopes and telescopes are all optical devices. Properties of mirrors and lenses discussed in this section can be applied to make such optical devices.
Converging lenses are used in cameras to map distant objects onto image sensors called CCD arrays. A lens of long focal length gives a small angle of view and a large image of a distant object. A lens of short focal length gives (“wide-angle lens”) gives a wide angle of view and a small image. The $f$-number of a lens is $f/D$, the focal length divided by aperture diameter. The intensity of light reaching the film is proportional to $D^2/f^2$, so small numbers like $f/2$ and $f/2.8$ means a large exposure and brighter image, while large numbers lke $f/16$ means a smaller exposure and dimmer image. Because converging lenses suffer from spherical aberration and chromatic aberration, high-end camera lenses use an array of lenses to correct such unwanted effects.
Human eyes consist of several parts, including the cornea, the crystalline lens, the aqueous humor, iris, pupil, and the vitreous humor. Converging lenses are used for correcting hyperopia by creating a virtual image of a nearby object at or beyond the eye’s near point. Diverging lenses are used for correcting myopia by creating a virtual image of a distant object that is inside the eye’s far point.
Microscopes and telescopes use two lenses to form an image. In each device a primary lens called the objective forms a real image, and a second lens called the eyepiece is used as a magnifier to make an enlarged, virtual image. In microscopes, the objective forms a real, inverted image $I$ inside the focal point $F_2$ of the eyepiece, and the eyepiece uses the image $I$ as an object to create an enlarged, virtual image $I’$ (still inverted).
In a refracting telescope, the objective forms a real, inverted image $I$ of the distant object at its second focal point $F_1'$, which is also the first focal point $F_2$ of the eyepiece. The eyepiece uses image $I$ to form a magnified, virtual image $I’$ at infinity (still inverted). By contrast, reflecting telescopes use concave mirrors to collect lights from distant objects.
A light source produces continuous waves of light. Light waves from two or more sources will interfere with each other. In this section we aim to derive precise formulas for location and intensity of interference.
We call two monochromatic sources to be coherent if they have the same frequency $f$ and a constant phase relationship $\phi$. Consider two coherent sources $S_1$ and $S_2$ and a point $P$. Let $r_1$ and $r_2$ denote the distance between $P$ and $S_1$ and $S_2$ respectively. The locations for constructive interference are places where crests coincide, namely
$$
r_2 - r_1 = m\lambda, \quad m = 0, \pm 1, \pm 2, \cdots
$$
Destructive interference happens where crests from one source are 1/2 wavelength apart from troughs of the other source:
$$
r_2 - r_1 = \left(m + \frac{1}{2}\right)\lambda, \quad m = 0, \pm 1, \pm 2, \cdots
$$
In Thomas Young’s double slit experiment, lights coming from the two horizontal slits are two sources $S_1$ and $S_2$ that are $d$ distance apart. For a point $P$ on the screen, denote its distance from horizontal level by $y$. Let $PS_1$ be $r_1$ and $PS_2$ be $r_2$, distance of the screen by $R$, and the angle between $r_2$ and horizontal line be $\theta$. Then $r_1-r_2 =d\sin\theta$, so constructive interference happens at angles $\theta$ where
$$
d\sin\theta = m\lambda, \quad m = 0, \pm 1, \pm 2, \cdots
$$
and destructive interference happens at angles $\theta$ where
$$
d\sin\theta = \left(m + \frac{1}{2}\right)\lambda, \quad m = 0, \pm 1, \pm 2, \cdots
$$
The distance of the screen $R$ is large compared to the height of the interference pattern, so $\theta$ is small and we can approximate $\sin\theta$ with $\tan\theta$, which equals $y/R$. So for an approximation, constructive interference happens at locations where
$$
y = R\frac{m\lambda}{d}
$$
We now calculate intensity in interference patterns. Suppose at location $P$ the two waves vary with time as
$$
\begin{cases}
E_1(t) = E\cos(\omega t + \phi)\newline
E_2(t) = E\cos\omega t
\end{cases}
$$
Then the amplitude of the two superimposed waves is
$$
E_P = 2E\left|\cos\frac{\phi}{2}\right|
$$
The intensity $I$ at point $P$ is proportional to $E_P^2$:
$$
I = S_{\text{av}} = \frac{E_P^2}{2\mu_0c} = \frac{1}{2}\epsilon_0cE_P^2 = 2\epsilon_0cE^2\cos^2\frac{\phi}{2}
$$
The maximum intensity
$$
I_0 = 2\epsilon_0cE^2
$$
is 4 times as great as the intensity $(1/2)\epsilon_0cE^2$ from each individual source. Substituting the expression for $I_0$ we get
$$
I = I_0\cos^2\frac{\phi}{2}
$$
We now express the phase difference $\phi$ in terms of $r_2-r_1$. When the path difference is one wavelength, the phase difference is one cycle ($\phi=2\pi$). When the path difference is $\lambda/2$, $\phi=\pi$, and so on. So phase is $2\pi$ times $r_2-r_1$ in terms of $\lambda$:
$$
\phi = \frac{2\pi}{\lambda}(r_2 - r_1) = k(r_2 - r_1)
$$
Plugging in $r_2-r_1=d\sin\theta$ into the intensity formula we’ll get
$$
I = I_0\cos^2\left(\frac{\pi d}{\lambda}\sin\theta\right)
$$
In the case of the double slit experiment, we approximate $\sin\theta$ with $y/R$, to get
$$
I(y) = I_0\cos^2\left(\frac{\pi d\cdot y}{\lambda R}\right)
$$
So we have now derived a formula for the interference pattern on the screen that tells us the intensity $I(y)$ at each location $y$. The function is non-negative, has maximum value $I_0$ at $n\cdot\lambda R/d$ and minimum value of $0$ at the middles of maximizers.
Applications of interference
One application of the interference phenomenon is anti-reflective coating. Thin films can be applied to surfaces of camera, telescope, eyeglass lenses as well as photovoltaic cells to reduce reflection and allow more lights to pass through. This often gives the lenses distinctive colors such as red or green, which indicate the wavelength of visible light least affected by the anti-reflective properties of the coating.
A nonreflective coating should have an index of refraction intermediate between those of air and glass ($n_{\text{air}}<n_{\text{film}}<n_{\text{glass}}$), and a thickness that is 1/4 of some particular wavelength. The two reflected waves from top and bottom of the film will come out half-cycle out of phase and cancel each other. Usually,the wavelength is chosen in the central yellow-green portion of the spectrum ($\lambda=550\text{nm}$), where the eye is most sensitive. Then there is somewhat more reflection at both longer (red) and shorter (blue) wavelengths, giving the reflected light a purple hue.
By contrast, there are situations where reflection is desired, for example reducing heat accumulation or increase visibility at night. In this situation reflective coating is useful.
The term diffraction really means the same thing as interference, though interference often refers to situation with two waves, while diffraction refers to interference of many waves. If the light source, obstacle and screen are close, it is referred to as Fresnel diffractionn. If the light source, obstacle and screen are far apart so that we can consider all lines from the source to the obstacle to be parallel, and can likewise consider all lines from the obstacle to a given point on the screen to be parallel, then it is referred to as Fraunhofer diffraction. In analyzing diffraction we typically assume Fraunhofer diffraction.
Single slit. A light beam will diffract when passing through a narrow slit with nonzero width $a$. By Huygen’s principle, each point in the slit acts like a wave source, and because the distance of a point $P$ on the screen to different points in the slit differ (slightly), waves arriving at $P$ will differ in phase and thus interfere with each other, producing alternating bright and dark fringes. We now derive the formulas for location and intensity of the pattern.
Let $\theta$ be the angle from the horizontal axis (origin $O$ at the middle of the slit) to point $P$. Consider the topmost light ray from the slit to $P$. The difference of this ray and $OP$ is $(a/2)\sin\theta$, and if the difference is $\pm\lambda/2$, then the two waves will cancel each other at $P$:
$$
\frac{a}{2}\sin\theta = \pm\frac{\lambda}{2}
$$
In general, the location for dark fringes are angles $\theta$ where
$$
\sin\theta = \frac{m\lambda}{a} \quad m = \pm 1, \pm 2, \ldots
$$
In the situation of Fraunhofer diffraction, $\theta$ is small, so we can approximate $\sin\theta$ by $\theta$:
$$
\theta = \frac{m\lambda}{a}, \quad m = \pm 1, \pm 2, \ldots
$$
Also using the approximation $\tan\theta=y/x\approx \theta$, the location of the dark fringes in terms of $y$ is
$$
y = x\frac{m\lambda}{a}, \quad m = \pm 1, \pm 2, \ldots
$$
Taylor series of $\sin(x)$ and $\tan(x)$
The Taylor series of a differentiable function $f$ at point $a$ is
$$
f(x) = f(a) + \frac{f'(a)}{1!}(x-a) + \frac{f''(a)}{2!}(x-a)^2 + \cdots
$$
The Taylor series for $\sin(x)$ at $x=0$ is
$$
\sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \cdots
$$
The Taylor series for $\tan(x)$ at $x=0$ is
$$
\tan(x) = x + \frac{1}{3}x^3 + \frac{2}{15}x^5 + \cdots
$$
So when $x$ is small we can approximate both $\sin(x)$ and $\tan(x)$ by $x$.
Denote the amplitude at the center of the screen by $E_0$. It is the superposition of waves emitting from all infinitesimal points in the slit that are all in phase. For another point $P$, waves will not always arrive in phase. We use a phasor diagram to calculate the amplitude of superimposed waves at $P$. Denote the total phase difference by $\beta$; this is the phase difference between the top wave and the bottom wave.
All the phasors form an arc with length $E_0$. According to the diagram, the norm of the vector sum $E_P$ is
$$
E_P = E_0\frac{\sin(\beta/2)}{\beta/2}
$$
L’Hôpital’s rule
The function $E_P$ is indeterminate at $\beta=0$, but through its graph it is easy to see that it converges to $E_0$ at $\beta=0$. We can also use L’Hôpital’s rule to evaluate the limit.
L’Hôpital’s rule. Let $f$ and $g$ be differentiable on $I\setminus\{c\}$. If $\lim_{x\to c}f(x)=g(x)=0$ or $\pm\infty$ and $g'(x)\neq0$ for all $x\in I\setminus\{c\}$, then
$$
\lim_{x\to c}\frac{f(x)}{g(x)} = \lim_{x\to c}\frac{f'(x)}{g'(x)}
$$
provided that the right hand side exists.
Using L’Hôpital’s rule we see that the quotient in $E_P$ indeed converges to $1$ at $\beta=0$.
The intensity is proportional to the square of the amplitude:
$$
I = I_0\left[\frac{\sin(\beta/2)}{\beta/2}\right]^2
$$
where $I_0$ is the intensity at the center. According to our discussion in the previous section about interference, the phase difference $\beta$ is
$$
\beta = \frac{2\pi}{\lambda}a\sin\theta
$$
so we can express intensity as a function of angle $\theta$ and wavelength $\lambda$ as
$$
I = I_0\left\{\frac{\sin[\pi a(\sin\theta)/\lambda]}{\pi a(\sin\theta)/\lambda}\right\}^2
$$
The dark fringes are places where $I=0$, or $\theta$ for which the numerator of $I$ is zero:
$$
\sin\theta = \frac{m\lambda}{a}, \quad m=\pm 1, \pm 2, \ldots
$$
which agrees with our previous result. To get the location for maxima fringes other than the central one, we differentiate $I$ and set its derivative to zero. Unfortunately it cannot be solved analytically, and numerical method must be used. What we see is that intensity decreases drastically along the pattern and even the first side maxima has less than $5$% of the intensity of the central maximum.
Using $\sin\theta\approx\theta$, the first minimum location is
$$
\theta_1 = \frac{\lambda}{a}
$$
This characterizes the width of the central maximum, which is inversely proportional to slit width. When $a$ is very large, $\theta_1$ is so small that we can practically consider all light to be concentrated at the geometrical focus. When $a$ is less than $\lambda$, the central maximum spreads over $180^\circ$, and the fringe pattern is not produced at all.
Below is a plot of $I$ for different width $a$. We see that when the slit width is equal to or narrower than the wavelength, the light spreads out. Otherwise there will be alternating pattern of dark and bright fringes. The wider the slit, the narrower and sharper is the central peak.
Multiple slits. Consider first a two-slit situation. In the previous section on interference, we derived the intensity function for two slits assuming the slit widths are negligible, which is a function of the form $\cos^2(x)$ with periodic alternation between $I_0$ and $0$. In that case, there’s no diffraction at each slit, light from each slit spreads uniformly as the left plot above shows, and they interference pattern is a series of equally spaced, equally intense maxima. However, if we assume finite slit width $a$, then the intensity will be modulated by the diffraction pattern at each slit:
$$
I = I_0\cos^2\frac{\phi}{2}\left[\frac{\sin(\beta/2)}{\beta/2}\right]^2
$$
where
$$
\phi = \frac{2\pi d}{\lambda}\sin\theta,\quad \beta = \frac{2\pi a}{\lambda}\sin\theta
$$
In general, the more slits there are, the larger and narrower the maxima are, and the more minima there will be in between the maxima. When there are $N$ slits, there are $(N-1)$ minima between each pair of principal maxima. The minima locations are where $\phi$ is an integral multiple of $2\pi/N$ (except when $\phi$ is an integral multiple of $2\pi$, which gives a principal maximum). There are secondary intensity maxima between the minima but they are small compared to the principal ones. The value of each principal maximum is $N^2I_0$, and the width of each maximum is proportional to $1/N$.
Diffraction grating. An array of a large number of parallel slits is called a diffraction grating. It can be a transparent square that has the size of a polarizing filter. When it is placed between a laser beam and a screen, its effect is to turn the single dot on the screen into several aligned dots separated apart. The maxima positions are $\theta$ such that
$$
d\sin\theta = m\lambda, \quad m=0, \pm 1, \pm 2, \ldots
$$
Because the maxima are so narrow, $\theta$ and hence $\lambda$ can be measured with high precision. Thus diffraction grating is used to measure wavelengths of light, a process called spectroscopy or spectrometry. For example, in astronomy, devices called grating spectrograph are used to determine chemical components of stars like our sun, by comparing its spectrum with that of different atoms and ions obtained in laboratory experiments. The chromatic resolving power of a grating spectrograph, denoted as $R$, is defined as $\lambda/\Delta\lambda$ where $\Delta\lambda$ is the minimum wavelength difference that can be distinguished.
Another application is spectrophotometry in biology. For example, DNA strongly absorbs UV light with a wavelength of exactly 260 nm. We can use diffraction grating to produce such a monochromatic light, and measures how much of that light is absorbed by a solution of biological molecules. This lets us determine the concentration of DNA in the solution.
Reflection grating refers to the situation where light diffraction (or interference) is caused by reflections from many small, equally spaced ridges and grooves instead of slits. Rainbow-colored reflection on a DVD is such an example.
X-ray diffraction. X-ray diffraction is by far the most important tool used to determine the structure of crystals and organic molecules. X-rays have wavelengths of ~0.1 nm, which is on the same order as diameters and spacings of atoms.
Resolution. The wave nature of light implies that every optical device has a limit of resolution. If two points are so close to each other, then lights coming from the two points will interfere with each other significantly and we may not be able to distinguish them. Shorter wavelengths have larger resolutions.
The entire theory of special relativity can be derived from two postulates:
Physics laws are the same in all inertial frames of reference.
There’s only a single speed of light $c$ in all inertial frames of reference.
No matter where the source is and how the source moves, when you measure the speed of light, you always get the same result of ~300,000 km/s. This postulate, although somewhat counterintuitive, comes from the observation that lights don’t need a medium to propagate and can travel in pure voidness. This observation strongly suggests that light or electromagnetic waves are very independent. You can’t add or subtract speed from $c$ because the Maxwell’s wave equations would then become invalid. Once an electromagnetic wave is generated, it will propagate through space with a constant speed $c$. You can’t nudge electric and magnetic fields. From the perspective of particle nature of light, if we view light as stream of photons, it means you can’t push massless particles like photons.
To maintain constancy of speed of light, time must slow down and length must contract for moving objects.
Suppose a spacecraft is moving toward the right at speed $u$. To an outside observer, suppose at point $O$ in space, the person inside the spacecraft switched on a light bulb. To the outside person, at the moment the first light wave is emitted, it will propagate through space at all directions at speed $c$, as if the motion of the spacecraft is absent. When time $t$ passed, the spacecraft traveled a distance of $ut$, and the wave traveled a distance of $ct$. He measures the height of the spacecraft as $\sqrt{c^2-u^2}t$. However, for the person inside, he is unaware of the motion of the spacecraft. He can’t distinguish point $O$ with his current location. All he knows is that the light inside the room always takes $t_0$ time in his reference frame to travel the vertical distance of the spacecraft. He measures the height of the spacecraft as $ct_0$.
With Newtonian mechanics, the velocity of light is the sum of its horizontal component and vertical component, namely $\vec{\bm{u}}+\vec{\bm{c}}$ with a norm of
$$
\|\vec{\bm{u}}+\vec{\bm{c}}\|^2 = \|\bm{u}\|^2 + \|\bm{c}\|^2 \quad\Rightarrow\quad \|\vec{\bm{u}}+\vec{\bm{c}}\|=\sqrt{u^2+c^2}
$$
so the length of the hypotenuse would be $\sqrt{u^2+c^2}t_0$. However, in special relativity speed of light is always $c$, so the length is equal to $ct$. From elementary trigonometry we can already see that
$$
ct>ct_0 \quad\Rightarrow t > t_0
$$
As mentioned earlier, for them to agree on spacecraft height, we have
$$
\begin{aligned}
&\sqrt{c^2 - u^2}\cdot t = ct_0 \newline\newline
&\Rightarrow t = \frac{c}{\sqrt{c^2 - u^2}}\cdot t_0 \newline\newline
&\Rightarrow t = \frac{1}{\sqrt{1-u^2/c^2}}\cdot t_0 = \gamma\cdot t_0
\end{aligned}
$$
where $\gamma =1/\sqrt{1-u^2/c^2} > 1$ is called the Lorentz factor.
Time appears normal inside the spaceship, but from an outside observer time slows down in the spaceship.
Suppose there’s a ruler in the spacecraft with length $x_0$, and let the external measurement be $x$. Inside the spacecraft, the time it takes for a light wave to go from left end to right end of the ruler is $t_0=x_0/c$. However, for an outside observer, because the spacecraft is moving at speed $u$, by the time the light reaches the right end, the light has traveled a total distance of $x+ut$. It is also equal to the invariant speed of light $c$ times the time recorded by the outsider
$$
ct = x + ut.
$$
Rearrange the equation and using the relation $t=\gamma t_0$, we get the expression of $x$ in terms of $x_0$:
$$
\begin{aligned}
t &= \frac{x}{c-u} = \frac{x\cdot c}{c^2-u^2} = \gamma t_0 = \gamma\frac{x_0}{c} \newline\newline
\Rightarrow x &= \gamma\cdot\frac{c^2-u^2}{c^2}x_0 = \gamma\cdot(1-u^2/c^2)x_0=(1/\gamma)x_0
\end{aligned}
$$
Because $1/\gamma < 1$, the length of the ruler would appear to be contracted for an outside observer.
Suppose a reference frame $S'$ is moving to the right at constant speed $u$ relative to $S$. We’d like to know what the is coordinate of point $P=(x,y,z)\in S$ in $S'$ at time $t$.
The length $x'$ in $S'$ would appear to be $x'/\gamma$ in $S$, so we can represent the $x$-coordinate of $P$ in $S$ as
$$
x = ut + x'/\gamma
$$
So the $x$-coordinate transformation is
$$
x' = \gamma(x - ut).
$$
There’s no transformation in the $y$ and $z$ direction. For time transformation, we use the fact that transformation from $S'$ to $S$ should be identical in form to that from $S$ to $S'$, just the sign of $u$ is reversed. Using the relation $x' = -ut' + x/\gamma$ and the above equation to eliminate $x'$, it can be verified that $t'$ in terms of $t$ is
$$
t' = \gamma(t - ux/c^2)
$$
So the Lorenz transformation is
$$
\begin{dcases}
x' = \gamma(x - ut)\newline
y' = y \newline
z' = z \newline
t' = \gamma(t - ux/c^2)
\end{dcases}
$$
which is a linear transformation of the 4-dimensional spacetime. Taking differential of the $x$ and $t$ transformation,
$$
\begin{aligned}
dx' &= \gamma(dx - udt)\newline
dt' &= \gamma(dt - udx/c^2)
\end{aligned}
$$
we get the velocity transformation
$$
v_x'=\frac{dx'}{dt'} = \frac{dx/dt-u}{1-(u/c^2)(dx/dt)} = \frac{v_x - u}{1-uv_x/c^2}
$$
$v_x=c\Rightarrow v_x'=c$, so the function has a fixed point at $c$. Anything moving with speed $c$ in $S$ also has velocity $c$ measured in $S'$, consistent with constancy of speed of light postulate.
The speed of light does not change. However, the frequency of light can change when a light source is moving toward or away from us. Suppose a light source with original frequency $f_0$ is moving toward us at speed $u$. During time $T$ (measured by us), the light waves have traveled a distance $cT$, but the source also caught up a distance $uT$ before emitting the next crest, so the wavelength appears to be shorter with $\lambda = (c - u)T$ to us. The frequency is
$$
f = \frac{c}{(c-u)T}.
$$
Substituting $T=\gamma T_0$ and $T_0 = 1/f_0$ into the equation, we get
$$
f = \sqrt{\frac{c+u}{c-u}}f_0
$$
so the frequency appears greater to us. The difference $f-f_0=\Delta f$ is called the Doppler frequency shift. When the source moves away from us, we substitute $-u$ for $u$ into the equation (so $u$ is always positive) and write
$$
f = \sqrt{\frac{c-u}{c+u}}f_0
$$
In Newtonian mechanics, speed is limitless. The only way for a particle with mass $m$ to possess infinite momentum is by letting $v\to\infty$. But in relativity, particle speed has a limit of $c$, and we still want a particle to have infinite momentum when $v\to c$. Otherwise, if we define the momentum of a particle with mass $M$ and speed $c$ as $Mc$, then by letting it collide with a smaller particle $m$ at rest and thus transfer all of the momentum to the smaller particle, by conservation of momentum the smaller particle would have a speed that is greater than $c$, contradicting the speed limit. Thus, momentum must go to infinity as $v\to c$ and we must modify the definition of momentum $\bm{p}=m\bm{v}$.
The modified definition of momentum is
$$
\bm{p} = \frac{m\bm{v}}{\sqrt{1-v^2/c^2}} = \gamma m\bm{v}.
$$
Derivation of the exact formula is actually somewhat complicated and is beyond the scope of this article. With the modified momentum formula, the force (in 1d case) becomes
$$
F = \frac{d}{dt}\frac{mv}{\sqrt{1-v^2/c^2}} = \frac{ma}{(1-v^2/c^2)^{3/2}} = \gamma^3 ma
$$
The kinetic energy of a particle moving at speed $v$ is equal to the work done by a force accelerating the particle from $v=0$ at some location $x_1$ to $v=v$ when at some location $x_2$.
$$
K = \int_{x_1}^{x_2}Fdx = \int_{x_1}^{x_2} \frac{ma}{(1-v^2/c^2)^{3/2}}dx
$$
Using $a\,dx = (dv/dt)(dx/dt)dt = v\,dv$, the integral becomes
$$
K = \int_0^v\frac{mu\,du}{(1-u^2/c^2)^{3/2}}
$$
We evaluate this integral by change or variable. Let $y=1-u^2/c^2$ so $u=c(1-y)^{1/2}$ and $du = -\frac{c}{2}(1-y)^{-1/2}dy$. Then $u\,du$ just becomes $(-c^2/2)dy$. The integral (omitting $m$) then becomes
$$
-\frac{c^2}{2}\int_1^{1-v^2/c^2}y^{-\frac{3}{2}}dy = \left[\frac{c^2}{\sqrt{y}}\right]_1^{1-v^2/c^2} = \frac{c^2}{\sqrt{1-v^2/c^2}} - c^2
$$
So we get the kinetic energy formula
$$
K = \frac{mc^2}{\sqrt{1-v^2/c^2}} - mc^2 = (\gamma-1)mc^2.
$$
The second term does not involve $v$. If we define the first term as the total energy $E$, the second term as the rest energy, then
$$
\boxed{E = K + mc^2 = \gamma mc^2}
$$
For a particle at rest ($K=0$), we have $E=mc^2$. If the total energy in an isolated system is conserved (called principle of conservation of mass and energy), then when particles in the system undergo interactions (e.g. decay or fission) so that their rest mass decreases by $\Delta m$ after the interaction, an amount of energy that is equal to $\Delta mc^2$ will be released. This is the working principle behind nuclear bombs.
With
$$
\left(\frac{E}{mc^2}\right)^2 = \frac{1}{1-v^2/c^2}\quad\text{and}\quad\left(\frac{p}{mc}\right)^2=\frac{v^2/c^2}{1-v^2/c^2}
$$
we can subtract the second equation from the first, and get the relation of $E$ with momentum $p$ as
$$
E^2 = (mc^2)^2 + (pc)^2
$$
when $m=0$ (photon) we get
$$
E = pc
$$
Lights are waves of photons. Experiments show that light can knock off electrons from materials, but the process depends on the frequency of light. If we use low frequency lights such as red lights, then there may be no photoelectric effect no matter how much intensity the lights have. The effect happens only with lights above a certain frequency. This phenomenon cannot be explained by treating light as waves and by associating its energy with only the magnitude of the electromagnetic field.
We can set up a circuit to measure the potential, called the stopping potential $V_0$, required to completely stop the current when we shine light onto some metal plate. Experiments show that $V_0$ increases linearly with $f$.
Einstein postulated that light are made of photons and postulated a simple linear relationship between energy $E$ of an individual photon and frequency $f$:
$$
E = hf
$$
where $h$ is Planck’s constant that is approximately $6.6\times10^{-34}\mathrm{J\cdot s}$. We then have the equation
$$
eV_0 = hf - \phi
$$
where $\phi$ is called the work function. From the equation $E=pc$, we can also derive the momentum of a photon as
$$
p = \frac{E}{c} = \frac{hf}{c} = \frac{h}{\lambda}
$$
X-ray production. x-rays was first produced in 1895 by German physicist Wilhelm Röntgen. The process is called bremsstrahlung (“braking radiation”). Electrons are released by heating the cathode with a very high temperature, then accelerated toward the anode by a potential difference $V_{\mathrm{AC}}$ of a few thousand volts. The abrupt stopping of electrons forces the electrons to release x-rays with wavelengths that range from 1 nm to 1 pm. We have the relation
$$
eV_{\mathrm{AC}} = hf_{\max} = \frac{hc}{\lambda_{\min}}
$$
We can use this relation to determine the wavelength of x-rays produced given the voltage $V_{\mathrm{AC}}$.
Materials with many electrons per atom tend to be better x-ray absorbers than materials with fewer electrons. Bones contain phosphorus and calcium that have 15 and 20 electrons per atom, while soft tissues are mostly made from hydrogen, carbon and oxygen that have 1, 6, and 8 electrons per atom. So x-rays can be used to image bones in human body. However, x-rays can cause damage because its energy can break molecular bonds and create free radicals.
Compton Scattering. The Compton scattering experiment further supported the particle model of light. In 1922, American physicist Compton conducted an experiment in which he aimed a beam of x-rays at a solid target and measured the wavelength of the radiation scattered from the target. He discovered that the scattered radiation has smaller frequency (longer wavelength) than the incident radiation and the change in wavelength depends on the scattering angle. He found the following relation:
$$
\lambda' - \lambda = \frac{h}{mc}(1-\cos\phi)
$$
where $\lambda'$ is the scattered wavelength, $\lambda$ is the incident wavelength, $m$ is the electron rest mass, and $\phi$ is the angle of scattering.
The relation can be derived using special relativity. Let the photon momentum be $\bm{p}$ with energy $pc$, the scattered photon momentum be $\bm{p'}$ with energy $p'c$, and electron momentum be $\bm{P}_e$ with initial rest energy $mc^2$ and final energy $E_e^2=(mc^2)^2 + (P_ec)^2$ after collision. By conservation of energy
$$
pc + mc^2 = p'c + E_e
$$
rearranging,
$$
(pc - p'c + mc^2)^2 = E_e^2 = (mc^2)^2 + (P_ec)^2
$$
By momentum conservation, $\bm{p} = \bm{p'} + \bm{P}_e$, and taking the square of $\bm{P}_e = \bm{p} - \bm{p'}$ we get
$$
P_e^2 = p^2 + p'^2 - 2pp'\cos\phi
$$
Then substitute this expression for $P_e^2$ into the above equation, we get
$$
\frac{mc}{p'} - \frac{mc}{p} = 1 - \cos\phi
$$
Finally, the result is obtained by substituting $p'=h/\lambda$ and $p=h/\lambda$ into the equation.
Pair production. Experiments find that when a gamma ray photon is fired at a target, the photon may disappear and generate an electron and a positron. The minimum energy required is the rest energy $2mc^2$ of the electron and positron, $E_{\min} = 2mc^2=1.022\mathrm{MeV}$. The photon’s wavelength has to be shorter than
$$
\lambda_{\max} = \frac{hc}{E_{\min}} = 1.213 \text{pm}
$$
Otherwise, there would be no pair production. This again shows light energy is proportional to frequency.
Uncertainty principle. Recall in single-slit diffraction, the narrower the slit, the more spread out the wave; the wider the slit, the more concentrated it is. The uncertainty principle is an account of this wave property of light for the particle model of light. It is as if the narrower slit width pins down the position of photons but makes the (direction and value of) the momentum of the photons coming out the slit more random so that they go out to all directions; a wider slit will increase location uncertainty but decrease momentum uncertainty for photons emitting from the slit so we see a concentrated dot at the screen. Formally, the uncertainty principle is
$$
\Delta x\Delta p_x \geq\hslash/2
$$
where $\Delta x$ is the standard deviation for location, $\Delta p_x$ is the standard deviation for momentum, and $\hslash=h/2\pi$ is Planck’s constant divided by $2\pi$. Similarly, there’s also an inequality for time and energy uncertainty.
$$
\Delta t\Delta E \geq\hslash/2
$$
Given the ubiquity of waves in universe, it is not surprising that electrons also move in waves. For an electron moving at non-relativistic speed $v$, the de Broglie wavelength is $\lambda = h/p = h/mv$ and its energy is $E=hf$. If we supply an energy of $eV_{ba}$ to an electron, and equate its kinetic energy $K=(1/2)mv^2=p^2/2m$ to this energy supply, we get the wavelength $\lambda$ of the electron in terms of the supplied voltage $V_{ba}$:
$$
eV_{ba}=\frac{p^2}{2m}\quad\Rightarrow\quad p=\sqrt{2meV_{ba}} \quad\Rightarrow\quad \lambda=\frac{h}{p} = \frac{h}{\sqrt{2meV_{ba}}}
$$
With this, we can predict the angle of maximum reflection $\theta$ when shooting electrons on crystal surfaces with the equation
$$
d\sin\theta=m\lambda,\quad m=1,2,3,\ldots
$$
where $d$ is the space between atoms in the crystal that can be measured by x-ray diffractions. The experiment results match with this theoretical prediction.
Because electron wavelengths are very small, it can be used to make microscopes that have resolution of 0.1 nm ~ 10 nm, which is far greater than optical microscopes’ 500 nm.
Line spectra. Heated gases glow, and if we pass the emitted light to diffraction grating to separate different frequencies, we will get discrete lines on the spectrum rather than a continuous one. Similarly, if we shine white light (continuous spectrum) onto some cooled gas then such frequencies are absorbed, forming what’s called an absorption line spectrum. In other words, emission line spectrum + absorption line spectrum = continuous spectrum. Line spectra are unique for different atoms, which can be used to determine the chemical composition of matters.
Rutherford’s experiment. It was known that atom size is of the order of $10^{-10}$ m and most of its mass is associated with the positive charge. Thomson envisioned a plum pudding model of atom where electrons are uniformly embedded in a sphere of positive charge. If that is the case, then if we shoot particles into a thin layer of metal foil, a particle will experience little overall electric force and its trajectory will undergo little change. Rutherford’s scattering experiment proved this to be false. When alpha particles are shot onto a metal foil, most of them pass through, but some are deflected back and with large angles. This suggests that the positive charge is concentrated on a tiny nucleus inside the atom, with vast space between the nucleus and the electrons.
What do electrons do? If electrons orbit around the nucleus, then by classical physics the electrons will emit electromagnetic waves and by giving away its energy, it will collapse toward the nucleus by electric force of attraction. So why don’t negatively charged electrons fall into the positively charged nucleus? A new model is needed.
Bohr postulated that energy levels of atoms are finite, rather than continuous. An atom goes from a lower energy level to a higher energy level by absorbing a photon with frequency $f$ and energy $E=hf$ that matches the difference, and it goes from a higher energy level to a lower energy level by emitting a photon with corresponding frequency $f$. The model mixes arguments from classical physics; it is unrealistic, it cannot explain line spectra beyond hydrogen atoms, and it doesn’t tell why electrons stay at their energy levels and how transition from one energy level to another happens, but it is still worthwhile to look at the physical reasoning of the model.
In the Bohr model, electrons orbit around the nucleus in circular motion, but only certain orbits exist, and we’re going to derive the allowed radius $r_n$. The angular momentum $L_n$ is quantized to be an integer multiple of $h/2\pi$. From classical physics, $L_n=m(v_n\times r_n)=mv_nr_n$ and when we equate the two we get the equation
$$
L_n = mv_nr_n = n\frac{h}{2\pi}
$$
One justification is to imagine electron orbits as oscillating standing waves. The circumference $2\pi r_n$ must be an integer multiple of wavelength $\lambda_n=h/mv_n$, so we have $2\pi r_n=nh/mv_n\Rightarrow mv_nr_n=n(h/2\pi)$. This is one equation for the two variables $r_n$ and $v_n$. Another equation comes from Newton’s law. The centripetal force has magnitude $mv_n^2/r_n$ and we equate this force with the Coulomb force to get the equation
$$
\frac{1}{4\pi\epsilon_0}\frac{e^2}{r_n^2} = \frac{mv_n^2}{r_n}
$$
We solve for $r_n$ and $v_n$ from the two equations and get
$$
\begin{dcases}
r_n &= \epsilon_0\frac{n^2h^2}{\pi m e^2}\newline\newline
v_n &= \frac{1}{\epsilon_0}\frac{e^2}{2nh}
\end{dcases}
$$
We denote $r_1$ by $a_0$. It is the smallest orbit radius and the rest is $r_n=n^2a_0$. We plug-in the constants for hydrogen atom and this gives us the radius $a_0$ as $5.29\times10^{-11}$ m, which is consistent with atomic diameter of $10^{-10}$ m ($0.1$ nm) estimated by other methods.
The energy level at $r_n$ is equal to the sum of kinetic energy and potential energy. Plug-in the formula for $r_n$ and $v_n$ into
$$
\begin{cases}
K_n &=(1/2)mv_n^2\newline
U_n &= (-1/4\pi\epsilon_0)(e^2/r_n)
\end{cases}
$$
we get
$$
E_n = K_n + U_n = -\frac{hcR}{n^2}, \quad\text{where}\quad R=\frac{me^4}{8\epsilon_0^2h^3c}
$$
This gives us the energy levels, which is inversely proportional to $n^2$. We equate the energy of emitted photon $E=hf=hc/\lambda$ to energy difference during transition from higher energy level $E_{n_2}$ to lower energy level $E_{n_1}$ to get the formula for emitted light wavelength:
$$
\begin{aligned}
\frac{hc}{\lambda}&=E_{n_2} - E_{n_1} = \left(-\frac{hcR}{n_2^2}\right) - \left(-\frac{hcR}{n_1^2}\right) = hcR\left(\frac{1}{n_1^2} - \frac{1}{n_2^2}\right)\newline
\frac{1}{\lambda} &= R\left(\frac{1}{n_1^2} - \frac{1}{n_2^2}\right)
\end{aligned}
$$
When we plugin $n_1=2$ and $n_2=3$ we get $\lambda=656.3$ nm, the red line that is called the $H_\alpha$ line. When we plugin $n_1=2$ and $n_2=4$ we get $\lambda=486.1$ nm, a cyan line that is called the $H_\beta$ line, and so on.
Balmer series. source: textbook
These theoretical predictions match experiment data within $0.1$ percent. Line spectrum emitted from $n>2$ to $n=2$ is called Balmer series (visible light and ultraviolet); $n>1$ to $n=1$ is called Lyman series (ultraviolet), $n>3$ to $n=3$ is called Paschen series (infrared), $n>4$ to $n=4$ is called Brackett series (infrared), and $n>5$ to $n=5$ is called Pfund series. There’s a large gap between energy level 1 and 2, but smaller and smaller gaps when $n$ gets larger. That’s why ultraviolet lights are emitted when the atom goes from higher levels to the ground level, but only infrared lights when transition between higher energy levels. The prediction can be made more accurate when we add the correction that the electrons and protons move around their common center of mass.
An ideal object that absorbs lights of all wavelengths and emits lights of all wavelengths is called a blackbody. Its emission line spectrum would be a continuous spectrum and its absorption line spectrum would be empty. The sun can be seen as a blackbody. An enclosed box with a small aperture can also be seen as a blackbody. We can heat up the object to temperature $T$, collect the emitted lights and use a prism and a lens to separate the lights into different wavelengths and measure the intensity from each frequency with photometers. When we do so, we get something like the following:
source: Britannica
Our goal is to derive a closed-form formula $I(\lambda)$ from physical principles that fit the data. English physicist Rayleigh’s model was
$$
I(\lambda) = \frac{2\pi ckT}{\lambda^4}
$$
The derivation is as follows. Consider a blackbody cube with side length $L$. Detected energies come from standing electromagnetic waves in the box. Wave energies are assumed to be continuous and follow the Boltzmann distribution in the box. The probability of finding wave energy $x$ is given by $e^{-x/k_BT}/S$ where $k_B$ is the Boltzmann constant and $S$ is the normalizing constant. The average energy per wave is the expectation of $E$ with respect to the distribution, which evaluates to $kT$. The model counts the number of waves for each wavelength, and multiply the average wave energy $kT$ to get the energy for each wavelength.
Now our goal is to count the number of standing waves for any wavelength $\lambda$ in the box in terms of the nodes of its $x$, $y$ and $z$ components.
standing waves
Consider the simple wave $y(x,t)=A\sin(kx-\omega t)$ propagating in the $x$-direction. The standing wave is formed by adding the wave in reverse:
$$
y(x,t) = A\sin(kx-\omega t) + A\sin(kx+\omega t) = 2A\sin(kx)\cos(\omega t).
$$
For a standing wave, its value at the boundary must be zero. We set $y(L,t)=0$ to get the wavelength in terms of the number of nodes:
$$
\sin(kL)=0\quad\Rightarrow\quad kL=n\pi \quad\Rightarrow\quad k=\frac{2\pi}{\lambda}=\frac{n\pi}{L} \quad\Rightarrow\quad \lambda = \frac{2L}{n}, n=1,2,3,\ldots
$$
This shows that the distance between two nodes is $\lambda/2$.
The figure above is an illustration of the situation in 2d. We can see that $\lambda/2 = (\lambda_x/2)\cos\alpha$, that is, $\lambda=\lambda_x\cos\alpha$. Similarly, $\lambda=\lambda_y\cos\beta$ and $\lambda=\lambda_z\cos\gamma$. So
$$
\begin{aligned}
\lambda_x &=\frac{2L}{n_x} \quad\Rightarrow\quad n_x = \frac{2L\cos\alpha}{\lambda} \newline
\lambda_y &=\frac{2L}{n_y} \quad\Rightarrow\quad n_y = \frac{2L\cos\beta}{\lambda} \newline
\lambda_z &=\frac{2L}{n_z} \quad\Rightarrow\quad n_z = \frac{2L\cos\gamma}{\lambda} \newline
\end{aligned}
$$
Taking the square and summing the three equations, and noting that $\cos^2\alpha+\cos^2\beta+\cos^2\gamma=1$, we get
$$
r:=\sqrt{n_x^2+n_y^2+n_z^2} = \frac{2L}{\lambda}
$$
The number of waves with wavelength $\lambda$ is equal to the count of all $(n_x,n_y,n_z)$ such that the above equation holds. This number is approximated by the volume of the sphere segment in the positive quadrant, which is
$$
\frac{1}{8}\cdot\frac{4}{3}\pi r^3
$$
Then we multiple this number by 2 to account for two polarization directions of lights. The count is now
$$
N = \frac{\pi}{3}r^3 = \frac{8\pi L^3}{3\lambda^3}
$$
Then take the derivative w.r.t. $-\lambda$ and divide out the volume $L^3$ to get the number of waves per wavelength per unit volume:
$$
-\frac{1}{V}\frac{dN}{d\lambda} = \frac{8\pi}{\lambda^4}
$$
Then multiply $k_BT$ to get the energy density formula (Rayleigh–Jeans law)
$$
I(\lambda) = \frac{8\pi k_BT}{\lambda^4}
$$
Note that some sources express the formula in terms of spectral radiance rather than energy density, which differs from our formula by a constant $c/4\pi$. Now the function is able to fit the data when $\lambda$ is large, but obviously, as $\lambda\to0$, $I(\lambda)\to\infty$, which doesn’t happen. This is called the ultraviolet catastrophe.
Derivation of average energy $k_BT$
Let $k$ denote the Boltzmann constant. The mathematical expectation for energy $x$ w.r.t. the Boltzmann distribution is
$$
\int_0^\infty xe^{-x/kT}dx / S
$$
where the normalizing constant is
$$
S = \int_0^\infty e^{-x/kT}dx.
$$
First we calculate the normalizing constant:
$$
\begin{aligned}
S &= \int_0^\infty e^{-x/kT}dx \newline
&= (-kT)\int_0^\infty de^{-x/kT} \newline
&= (-kT)\left[e^{-x/kT}\right]_0^\infty \newline
&= kT
\end{aligned}
$$
Then the average energy is found by integration by part:
$$
\begin{aligned}
\int_0^\infty xe^{-x/kT}dx &= (-kT)\int_0^\infty xde^{-x/kT} \newline
&= (-kT)\left\{\left[xe^{-x/kT}\right]_0^\infty - \int_0^\infty e^{-x/kT}dx \right\} \newline
&= (-kT)(-kT) \newline
&= (kT)^2
\end{aligned}
$$
So dividing the integral by $S=kT$ we get the result $kT$.
Planck’s remedy is to treat all wave energies as
$$
0,\varepsilon,2\varepsilon,3\varepsilon,4\varepsilon,\ldots \quad\text{where }\varepsilon=hf=\frac{hc}{\lambda}
$$
and replace the integral in average energy calculation by an infinite sum.
$$
\text{average energy} = \frac{\displaystyle\sum_{n=0}^\infty n\varepsilon e^{-n\varepsilon/kT}}{\displaystyle\sum_{n=0}^\infty e^{-n\varepsilon/kT}}
$$
Denote the normalizing constant by $S$. By the series formula $1+r+r^2+\cdots=1/(1-r)$, it is
$$
S = (1-e^{-\varepsilon/kT})^{-1}
$$
On the other hand, the numerator can be written as
$$
-kT\varepsilon\sum_{n=0}^\infty \left(-\frac{n}{kT}\right)e^{-n\varepsilon/kT} = (-kT\varepsilon)\frac{dS}{d\varepsilon}
$$
So the average energy is calculated as
$$
\begin{aligned}
\text{average energy} &= \frac{(-kT\varepsilon)\cdot dS/d\varepsilon}{S} \newline
&= -(1-e^{-\varepsilon/kT})^{-2}e^{-\varepsilon/kT}\varepsilon \bigg/ (1-e^{-\varepsilon/kT})^{-1} \newline
&= -(1-e^{-\varepsilon/kT})^{-1}e^{-\varepsilon/kT}\varepsilon \newline
&= \frac{\varepsilon}{e^{\varepsilon/kT}-1}
\end{aligned}
$$
Multiply the average energy by the number of waves, we get the Planck’s law
$$
\boxed{
I(\lambda) = \frac{8\pi hc}{\lambda^5}\frac{1}{e^{hc/\lambda k_BT}-1}
}
$$
Note again that some sources state the Planck’s law in terms of spectral radiance rather than energy density, which differs by a constant of $c/4\pi$ so the numerator will be $2hc^2$ instead of $8\pi hc$.
When the wavelength $\lambda$ is large, we can approximate the exponential term by $e^x\approx 1+x$, and in that case
$$
I(\lambda) \approx \frac{8\pi hc}{\lambda^5}\frac{\lambda k_BT}{hc} = \frac{8\pi k_BT}{\lambda^4}
$$
and we are back to the Rayleigh–Jeans law. As $\lambda\to0$, the exponential fraction goes to 0 faster than $1/\lambda^5$ to infinity, so $I(\lambda)\to0$ and the ultraviolet catastrophe is avoided. Planck’s law fit well with experiment data. Compared to Rayleigh’s model, Planck added a factor of $n$ to the energy as well as the distribution. This makes the decrease of probability for higher energies more rapid. The effect is adding an exponential dividing factor to the average of energy $\varepsilon$, which makes the average energy go to $0$ as $\lambda\to0$.
By taking the derivative of $I(\lambda)$ (in spectral radiance form) and setting it to 0, we get the Wien’s displacement law which states that the wavelength for maximum radiance shifts left (become small) as temperature $T$ increases:
$$
\lambda_{\max} = \frac{hc}{x\cdot k_BT}
$$
where $x$ is the solution to the equation $x=5(1-e^{-x})$ and is approximately $4.965$. On the other hand, by integrating $I(\lambda)$ over all $\lambda$ we get the total radiated energy per unit surface area per unit time, which is proportional to the fourth power of temperature. This is called the Stefan-Boltzmann law:
$$
I = \int_0^\infty I(\lambda)d\lambda = \sigma T^4
$$
where $\sigma=2\pi^5k_B^4/15c^2h^3 = 5.6704\times10^{-8}\mathrm{W/m^2\cdot K^4}$. This law can be used to determine the surface temperature of the sun and other stars.
For classical waves such as string waves, sound waves and electromagnetic waves, we know what the wave function value represents: either displacement, air pressure difference or EM field magnitudes, and then we go on to derive equations that those wave functions must satisfy from physical principles and try to solve the functions. Quantum mechanics goes the other direction: it is assumed that a particle (electrons, protons etc.) can be represented with some wave function $\Psi(x, t)$ that satisfies the wave equation, but physicists don’t know what exactly do the function values represent, and its meaning is still subject to debate.
In quantum mechanics, we assume a particle behaves as a wave with frequency $f$ and wavelength $\lambda$, and its energy and momentum are proportional to frequency
$$
\begin{aligned}
E &= hf = \frac{h}{2\pi}2\pi f = \hslash\omega \newline
p &= \frac{h}{\lambda} = \frac{h}{2\pi}\frac{2\pi}{\lambda} = \hslash k
\end{aligned}
$$
For a free particle (subject to no force) with mass $m$, its energy is equal to its kinetic energy:
$$
E = \frac{1}{2}mv^2 = \frac{m^2v^2}{2m} = \frac{(mv^2)}{2m} = \frac{p^2}{2m}
$$
so the $\omega$ and $k$ must satisfy the relation
$$
\boxed{\hslash\omega = \frac{\hslash^2k^2}{2m}}
$$
We’ll assume the wave function $\Psi(x,t)$ has the form
$$
\Psi(x,t) = A\cos(kx-\omega t) + B\sin(kx - \omega t)
$$
and subject the function to the wave equation. Differentiate the function twice w.r.t. $x$ we get $-k^2$ times the original function. Referring to the relation above, if we multiply the second derivative by $-\hslash^2/2m$ we will get
$$
-\frac{\hslash^2}{2m}\frac{\partial^2\Psi(x,t)}{\partial x^2} = \frac{\hslash^2k^2}{2m}\Psi(x,t)
$$
This suggests that the other side of the equation should be $\hslash\omega\Psi(x,t)$. Differentiating w.r.t. $t$ can bring out the factor $\omega$, so a tentative equation is
$$
-\frac{\hslash^2}{2m}\frac{\partial^2\Psi(x,t)}{\partial x^2} = C\hslash\frac{\partial\Psi(x,t)}{\partial t}
$$
Substituting the function form of $\Psi(x,t)$ into the equation, we end up with
$$
A\cos(kx-\omega t) + B\sin(kx-\omega t) = CA\sin(kx-\omega t) - CB\cos(kx-\omega t)
$$
Solving
$$
\begin{dcases}
A = -CB \newline
B = CA
\end{dcases}
$$
we end up with the imaginary number $C^2=-1\Rightarrow C=i$. The one-dimensional Schrödinger equation for a free particle is thus
$$
\boxed{-\frac{\hslash^2}{2m}\frac{\partial^2\Psi(x,t)}{\partial x^2} = i\hslash\frac{\partial\Psi(x,t)}{\partial t}}
$$
The solution to the Schrödinger equation is complex-valued. With $B=iA$ the wave function can now be written as
$$
\begin{aligned}
\Psi(x,t) &= A[\cos(kx-\omega t) + i\sin(kx-\omega t)]\newline
&= Ae^{i(kx-\omega t)} = Ae^{ikx}e^{-i\omega t}
\end{aligned}
$$
What does $\Psi(x,t)$ mean? German physicist Max Born first interpreted $|\Psi|^2$ as the probability density of finding the particle at location $x$ in space and in time $t$. This is analogous to the situation for EM waves where the intensity $I$, which is proportional to $E^2$, can also be interpreted as probability distribution of photons at that point. This interpretation requires $\Psi(x,t)$ to be normalized.
$$
\int_{-\infty}^{\infty}|\Psi(x,t)|^2dx = 1
$$
Note that for a complex number $x\in\mathbb{C}$, $|x|^2=x^*x$ where $x^*$ is the complex conjugate. For $\Psi(x,t)$ above,
$$
|\Psi(x,t)|^2=A^*Ae^0=|A|^2
$$
This means the particle is equally likely to be found anywhere on the $x$-axis. In other words we have no idea where the particle is or when will we find it at any particular point $x$. Further, it can’t be normalized because the integral is infinite. So this simple wave function isn’t realistic. However, if the wave function is localized, then it can be realistic.
Wave packet. We note that if $\Psi_1=A_1e^{i(k_1x-\omega_1t)}$ and $\Psi_2=A_2e^{i(k_2x-\omega_2t)}$ are two solutions to the Schrödinger equation, then so is their sum. The simple sinusoidal wave has only a single frequency. Its frequency is concentrated at a single point in the frequency domain, so the wave spreads out in the time domain. But if the wave is composed of more frequencies, then it will become concentrated in the time domain. Such a wave is called a wave packet, which is analogous to the inverse Fourier transform of the frequency distribution:
$$
\Psi(x, t) = \int_{-\infty}^{\infty} A(k)e^{i(kx-\omega t)}dk
$$
uncertainty principle
A narrow range of $k$ means a narrow range of $p=\hslash k$ and thus small $\Delta p$; the result is a relatively large $\Delta x$, and vice versa. This is the uncertainty principle $\Delta p\Delta x\geq\hslash/2$.
wave speed
Note that the wave speed is $v=\lambda f = \omega/k = \hslash k/2m = h/2m\lambda$. It depends on the wavelength $\lambda$. The shorter the wavelength, the faster the speed; the longer the wavelength, the slower the speed. This is in contrast with speed of light which is constant and doesn’t depend on the wavelength.
Schrödinger equation with potential energy. Particles like electrons and photons are subject to potential energies so it’s important to factor in potential energy into the Schrödinger equation. In our derivation of the Schrödinger equation for free particles, since the left hand of the equation is equal to kinetic energy times $\Psi(x,t)$, the right hand is the total energy $E=\hslash\omega$ times $\Psi(x,t)$, a reasonable guess is
$$
\boxed{-\frac{\hslash^2}{2m}\frac{\partial^2\Psi(x,t)}{\partial x^2} + U(x)\Psi(x,t) = i\hslash\frac{\partial\Psi(x,t)}{\partial t}}
$$
which is the statement $K+U=E$. The real reason we know this equation is correct is that it works: Predictions made with this equation agree with experimental results.
Stationary states. The simple sinusoidal wave function $\Psi(x,t)=Ae^{ikx}e^{-i\omega t}=Ae^{ikx}e^{-iEt/\hslash}$ has a definite energy $E=\hslash\omega$. We can write the wave function for a state of definite energy $E$ as
$$
\Psi(x,t) = \psi(x)e^{-iEt/\hslash}
$$
where $\psi(x)$ is time-independent. It is called a stationary state because the distribution of the particle in space
$$
|\Psi(x,t)|^2=\Psi^*(x,t)\Psi(x,t) = \psi^*(x)\psi(x) = |\psi(x)|^2
$$
does not vary with time. The Schrödinger equation for stationary states becomes
$$
\boxed{-\frac{\hslash^2}{2m}\frac{d^2\psi(x)}{dx^2} + U(x)\psi(x) = E\psi(x)}
$$
Given potential energy $U(x)$, the time-independent Schrödinger equation can be used to determine energy levels and wave function $\psi(x)$ for various systems. One situation is “particle in a box” where a particle is trapped in an infinitely deep well and moves along a straight line segment in $0\leq x \leq L$. The potential energy function $U(x)$ is zero for $0\leq x \leq L$ and is $+\infty$ beyond this range. We require $\psi$ and $\psi'$ to be continuous. For the Schrödinger equation to hold for $x\notin[0,L]$ when $U(x)=+\infty$, the boundary condition must be $\psi(0)=\psi(L)=0$. Inside the box, the Schrödinger equation becomes
$$
-\frac{\hslash^2}{2m}\frac{d^2\psi(x)}{dx^2} = E\psi(x)
$$
We expect the solution to be a standing wave, a superposition of a wave in one direction plus a wave in the opposite direction. We start with
$$
\psi(x) = A_1e^{ikx} + A_2e^{-ikx}
$$
Written in terms of sine and cosine functions it is $(A_1+A_2)\cos kx + i(A_1-A_2)\sin kx$, and the boundary condition $\psi(0)=A_1+A_2=0$ leads to $A_2=-A_1$, so the function becomes
$$
\psi(x) = 2iA_1\sin kx = C\sin kx
$$
where $C$ is to be determined later by normalization. The boundary condition $\psi(L)=0$ implies $kL=n\pi$, so the possible values of $k$ and $\lambda$ are
$$
k = \frac{n\pi}{L} \quad\text{and}\quad \lambda=\frac{2\pi}{k}=\frac{2L}{n}
$$
From this, we get the energy levels for a particle in a box:
$$
\begin{aligned}
p_n &= \frac{h}{\lambda_n} = \frac{nh}{2L} \newline
E_n &= \frac{p_n^2}{2m} = \frac{n^2h^2}{8mL^2} = \frac{n^2\pi^2\hslash^2}{2mL^2} \quad n=1,2,3,\ldots
\end{aligned}
$$
which is $n^2$ times the base level. Each energy levels corresponds to a wave function
$$
\psi_n(x) = C\sin\frac{n\pi x}{L}
$$
By the normalizing condition
$$
\int_{-\infty}^{\infty}|\psi(x)|^2dx = \int_0^LC^2\sin^2\frac{n\pi x}{L}dx=1
$$
$C$ is $\sqrt{2/L}$, so the normalized stationary-state wave functions for a particle in a box are
$$
\psi_n(x) = \sqrt{\frac{2}{L}}\sin\frac{n\pi x}{L}, \quad n=1,2,3,\ldots
$$
atom box
When applying to real situations, the box can correspond to an atom. For example, for an electron confined in $L=5\times10^{-10}$ m, about the diameter of an atom, its first two energy levels are
$$
\begin{aligned}
E_1 &= h^2/8mL^2 = 1.5\mathrm{eV}\newline
E_2 &= 4E_1 = 6.0\mathrm{eV}
\end{aligned}
$$
with a difference of $4.5\mathrm{eV}$. However, if $L$ is much smaller, e.g. $1.1\times10^{-14}$ m for a proton or neutron, the energy levels are a million times larger:
$$
\begin{aligned}
E_1 &= h^2/8mL^2 = 1.7\times10^6\mathrm{eV} = 1.7\mathrm{MeV}\newline
E_2 &= 4E_1 = 6.8\mathrm{MeV}
\end{aligned}
$$
with a difference of $5.1\mathrm{MeV}$. This suggests why nuclear reactions (which involve transition between energy levels in nuclei) release much more energy than chemical reactions (which involve transitions between energy levels of electrons in atoms)
Finally, we can multiply $\psi_n(x)$ by $e^{-iE_nt/h}$ to get the full expression for $\Psi_n(x, t)$:
$$
\Psi_n(x, t) = \sqrt{\frac{2}{L}}\sin\left(\frac{n\pi x}{L}\right)e^{-iE_nt/h},\quad n=1,2,3,\ldots
$$
Potential well. A (finite) potential well is situation with $U(x)=0$ for $x\in[0,L]$ and $U(x)=U_0$ outside this interval. This can be a simple model of an electron within a metallic sheet with thickness $L$. The electron can move freely inside the metal but has to climb a potential energy barrier with height $U_0$ to escape from the surface of the metal. When the total energy $E$ is less than $U_0$, it is called a bound state. Let’s solve the Schrödinger equation in bound state. When $U=0$, the equation becomes
$$
-\frac{\hslash^2}{2m}\psi''(x) = E\psi(x) \quad\text{or}\quad \psi''(x) = -\frac{2mE}{\hslash^2}\psi(x)
$$
The solution is
$$
\psi(x) = A\cos\left(\frac{\sqrt{2mE}}{\hslash}x\right) + B\sin\left(\frac{\sqrt{2mE}}{\hslash}x\right)
$$
as can be verified by taking the second derivative of $\psi(x)$. On the other hand, when $U=U_0$, the Schrödinger equation becomes
$$
-\frac{\hslash^2}{2m}\psi''(x) + U_0\psi(x) = E\psi(x) \quad\text{or}\quad \psi''(x) = -\frac{2m(U_0-E)}{\hslash^2}\psi(x)
$$
since $U_0-E>0$, the solution is exponential. Letting $\kappa=[2m(U_0-E)]^{1/2}/\hslash$, the solution is
$$
\psi(x) = Ce^{\kappa x} + De^{-\kappa x}
$$
Since $\psi(x)$ cannot approach infinity as $x\to\pm\infty$, we must have $D=0$ for $x<0$ and $C=0$ for $x>L$.
The bound state wave function in a finite well is sinusoidal inside the well and exponential outside it. Further, since $U-E$ is finite everywhere, $\psi''(x)$ must also be finite everywhere. This implies $\psi'(x)$ and $\psi(x)$ must be continuous at boundary points so the two parts must join smoothly at $x=0$ and $x=L$. It turns out that this is only possible for certain values of total energy $E$, so this condition determines the possible energy levels of the particle in the well. However, finding the levels is a fairly complicated math problem and numerical approximation must be used.
The exponential tail outside the well implies that there is some probability that the particle will be found outside the well, a phenomenon known as quantum tunneling.
Because the wave function doesn’t go to zero at $x=0$ and $x=L$, the wavelength in the well is longer than it would be with an infinite well. Thus each energy level is lower for a finite well than for an infinite well. How many levels there are depends on $U_0$ in comparison with the ground level energy for infinite well $E_1=\pi^2\hslash^2/2mL^2$. When $U_0$ is much smaller than $E_1$, there’s only one bound state. Finally, note that when $E>U_0$, called a free-particle state, the wave function are sinusoidal both inside and outside the well, any $E>U_0$ is possible, and the energy states form a continuum rather than a discrete set. The wavelength is shorter inside the well than outside, corresponding to greater kinetic energy inside the well than outside it.
Potential barrier. A potential barrier is the opposite of a potential well. The potential function is
$$
U(x)=\begin{cases}
U_0 \quad x\in[0, L]\newline
0 \quad x\notin[0, L]
\end{cases}
$$
In this case, the solution is exponential within the barrier, and sinusoidal outside the barrier. A particle located on the left has the possibility to overcome the barrier and appear on the right side of the barrier.
Harmonic oscillator. Now we solve the Schrödinger equation with the potential function as $U(x)=(1/2)k'x^2$:
$$
-\frac{\hslash^2}{2m}\psi''(x) + \frac{1}{2}k'x^2\psi(x) = E\psi(x)
$$
or
$$
\psi''(x) = \frac{2m}{\hslash^2}\left(\frac{1}{2}k'x^2-E\right)\psi(x)
$$
with the boundary condition that $\psi(x)\to0$ as $|x|\to\infty$. This is the situation where a particle moves under a spring force $F=-k'x$. ($k'$ is used to avoid confusion with wave number $k=2\pi/\lambda$) This can model oscillation of atoms in molecules and crystals. From classical physics, we could expect that the solution is harmonic oscillation with angular frequency $\omega=\sqrt{k'/m}$ and then energy levels are $\hslash\omega$ apart. As before, the boundary condition determines possible energy levels that the oscillator can take. It turns out that the energy levels are $(1/2)\hslash\omega$ apart with a minimum level of $E_0=(1/2)\hslash\omega>0$:
$$
E_n = \left(n+\frac{1}{2}\right)\hslash\sqrt{\frac{k'}{m}} = \left(n+\frac{1}{2}\right)\hslash\omega,\quad n=0,1,2,3,\ldots
$$
The solutions to the equation are Hermite functions; each such function is a product of a constant, a polynomial and an exponential $e^{-cx^2}$. The exponential ensures that $\psi(x)\to0$ as $|x|\to\infty$. When $n=0$ the solution is
$$
\psi(x) = Ce^{-\sqrt{mk'}x^2/2\hslash}
$$
where $C$ is chosen so that $\int_{-\infty}^{\infty}|\psi|^2dx=1$.
Hermite functions. source: wikipedia
In classical harmonic oscillation, the amplitude of the oscillation $A$ is given by $E=(1/2)k'A^2$, and the position can never go beyond the amplitude. In quantum mechanics, as we can see from the solution to the ODE, particles can penetrate into the region $|x|>A$.
quantum particles can go beyond classical amplitude $A$ in harmonic oscillation. source: textbook
As the state $n$ gets larger, the probability distribution $|\psi(x)|^2$ matches the Newtonian probability distribution.
probability distribution of position $x$ in harmonic oscillation, $|\psi(x)|^2$, along with corresponding probability distribution for Newtonian mechanics. source: textbook
3d Schrödinger equation. To realistically model atoms with Schrödinger equation, we need to solve the equation in 3d. For a particle in a 3d cubical box, the potential function is $U\equiv0$ for $(x,y,z)\in[0,L]^3$ and $+\infty$ otherwise. The time-independent equation is
$$
-\frac{\hslash^2}{2m}\left(\frac{\partial^2\psi}{\partial x^2}+\frac{\partial^2\psi}{\partial y^2}+\frac{\partial^2\psi}{\partial z^2}\right)=E\psi
$$
We solve this by separation of variables. We factor $\psi(x,y,z)$ as
$$
\psi(x,y,z)=X(x)Y(y)Z(z)
$$
and the equation becomes
$$
-\frac{\hslash^2}{2m}\left(\frac{X''(x)}{X(x)}+\frac{Y''(y)}{Y(y)}+\frac{Z''(z)}{Z(z)}\right) = E
$$
Since the right hand side does not depend on $x,y,z$, so should the left hand side. So the first term is a constant $E_X$, the second term is a constant $E_Y$, and the third term is a constant $E_Z$, with $E_X+E_Y+E_Z=E$. We are left with three independent ODEs:
$$
\begin{aligned}
-\frac{\hslash^2}{2m}X''(x) &= E_X\cdot X(x)\newline
-\frac{\hslash^2}{2m}Y''(y) &= E_Y\cdot Y(y)\newline
-\frac{\hslash^2}{2m}Z''(z) &= E_Z\cdot Z(z)
\end{aligned}
$$
and so the solutions are
$$
\begin{aligned}
X_{n_X}(x) &= C_X\sin\frac{n_X\pi x}{L},\quad n_X=1,2,3,\ldots \newline
Y_{n_Y}(y) &= C_Y\sin\frac{n_Y\pi y}{L},\quad n_Y=1,2,3,\ldots \newline
Z_{n_Z}(z) &= C_Z\sin\frac{n_Z\pi z}{L},\quad n_Z=1,2,3,\ldots
\end{aligned}
$$
with energy levels
$$
E_i = \frac{n_i^2\pi^2\hslash^2}{2mL^2},\quad i\in\{X,Y,Z\}, n_i=1,2,3,\ldots
$$
The wave function is then the product of the three,
$$
\psi(x,y,z)=C\sin\frac{n_X\pi x}{L}\sin\frac{n_Y\pi y}{L}\sin\frac{n_Z\pi z}{L}
$$
with $C=C_XC_YC_Z$ and energy levels
$$
E_{n_X,n_Y,n_Z} = \frac{(n_X^2+n_Y^2+n_Z^2)\pi^2\hslash^2}{2mL^2}
$$
Multiple set of quantum states $(n_X,n_Y,n_Z)$ can correspond to the same energy, whi is said to be degenerate.
Schrödinger equation for hydrogen atom. For hydrogen atom, which consists of one electron and one proton, we model the electron with the Schrödinger equation, but in spherical coordinate $(r,\theta,\phi)$, since the potential energy in this case only depends on the distance between the electron and proton:
$$
U(r) = -\frac{1}{4\pi\epsilon_0}\frac{e^2}{r}
$$
The Schrödinger equation is
$$
-\frac{\hslash^2}{2m_r}\nabla^2\psi - \frac{e^2}{4\pi\epsilon_0r}\psi = E\psi
$$
where $m_r$ is center of mass, $\nabla^2$ is the Laplace operator in spherical coordinate
$$
\nabla^2 = \frac{1}{r^2}\frac{\partial}{\partial r}\left(r^2\frac{\partial}{\partial r}\right) + \frac{1}{r^2\sin\theta}\frac{\partial}{\partial\theta}\left(\sin\theta\frac{\partial}{\partial\theta}\right) + \frac{1}{r^2\sin^2\theta}\frac{\partial^2}{\partial\phi^2}
$$
Again we use separation of variables to write the wave function $\psi$ as
$$
\psi(r,\theta,\phi) = R(r)\Theta(\theta)\Phi(\phi)
$$
and arrive at three differential equations for the three functions respectively that contain three numbers $(E_n, l, m_l)$ (equations omitted) The boundary conditions are $R(r)\to0$ as $r\to\infty$, and that $\Theta(\theta)$ and $\Phi(\phi)$ must be finite for all angle values.
The solution turns out to be follows: $R(r)$ is polynomial in $r$ multiplied by an exponential term $e^{-cr}$, $\Theta(\theta)$ is a polynomial containing various powers of $\sin\theta$ and $\cos\theta$, and $\Phi(\phi)$ is a constant times $e^{im_l\phi}$.
The boundary conditions determine possible energy levels, which turn out to be
$$
E_n = -\frac{1}{(4\pi\epsilon_0)^2}\frac{m_re^4}{2n^2\hslash^2} = -\frac{13.60\mathrm{eV}}{n^2}
$$
which is the same as those from the Bohr model. $n$ is called the principal quantum number for energy $E_n$. Further, the boundary condition that $\Theta(\theta)<\infty$ at $\theta=0,\pi$ determines a set of possible angular momentum magnitudes for each $E_n$ as
$$
L = \sqrt{l(l+1)}\hslash, \quad l=0,1,2,\ldots,n-1
$$
$l$ is called the orbital quantum number. The requirement that $\Phi(\phi)=\Phi(\phi+2\pi)$ determines the permitted values of $\bm{L}$ in a given direction, commonly chosen as the $z$-direction.
$$
L_z = m_l\hslash, \quad m_l=0,\pm 1, \pm 2, \ldots, \pm l
$$
$m_l$ is called the magnetic quantum number.
how states are labeled
Here’s the naming convention for various states. The energy levels $n$s are labeled as integers, and also as shells:
value of $n$
label
$n=1$
$K$ shell
$n=2$
$L$ shell
$n=3$
$M$ shell
$n=4$
$N$ shell
The $l$ states are named as follows:
value of $l$
label
$l=0$
$s$ states
$l=1$
$p$ states
$l=2$
$d$ states
$l=3$
$f$ states
$l=4$
$g$ states
$l=5$
$h$ states
So a $2p$ state means $n=2$ and $l=1$. Cases of $m_l$ (integers from $-l$ to $l$) are not labeled separately.
The Zeeman effect is the splitting of spectral lines when the atoms are placed in a magnetic field. This shows electrons have magnetic moments, and from the relation that magnetic moment is proportional to angular momentum, this implies electrons have (orbital) angular momentums. Further, those splitting lines are discrete, showing quantization of angular momentum.
magnetic moment and angular momentum
In classical model, an orbiting electron with orbiting radius $r$ and speed $v$ would take $T=2\pi r/v$ for one revolution, so the current (charge per time) is $I=ev/2\pi r$, and the magnetic moment is
$$
\mu = IA = \frac{ev}{2\pi r}\pi r^2 = \frac{evr}{2}.
$$
On the other hand, its angular momentum is $L=mvr$, so we have the relation
$$
\mu = \frac{e}{2m}L
$$
In the Bohr model, $L=n\hslash$. For $n=1$ (a ground state), the quantity is called a Bohr magneton and is denoted by $\mu_B$:
$$
\mu_B = \frac{e\hslash}{2m} = 5.788\times10^5\mathrm{eV/T}=9.724\times10^{-24}\mathrm{J/T}
$$
The ratio $\mu/L$ is called the gyromagnetic ratio.
It turns out that in the Schrödinger model $\mu/L$ has the same ratio as $e/2m$. If the external magnetic field $\bm{B}$ is directed along the $z$-axis, then we have
$$
\mu_z=-\frac{e}{2m}L_z = -\frac{e}{2m}\cdot (m_l\hslash), \quad m_l=0,\pm 1, \pm 2, \ldots, \pm l.
$$
The orbital magnetic interaction energy is then
$$
U=-\mu_zB = m_l\mu_BB.
$$
However, additional line splitting patterns are observed in the spectra. Another experiment that discovered additional angular momentum is the Stern–Gerlach experiment. In the Stern–Gerlach experiment, silver atoms, which have only 1 electron in their outermost shell and act like single electrons, were fired through a magnetic field. The screen detected a split of upper and lower parts. Because magnetic moment and angular momentum is related by $\mu=(e/2m)L$, this shows that electrons have an additional angular momentum (other than the orbital one) and this angular momentum is quantized. If there were only orbital angular momentum, the deflections
would split the beam into an odd number $(2l+1)$ of different components, rather than an even number of components.
This suggests that electrons behave in some way like planets, which have both rotation and revolution motions. The angular momentum that gives rise to the magnetic moment is called the spin angular momentum, denoted by $\bm{S}$. We never directly observed electron spins; it is only through the manifestation of its magnetic moments that we know such property exists.
When we measure it in a particular direction, we find
$$
S_z = \pm \frac{1}{2}\hslash
$$
We write $S_z=m_s\hslash$ where $m_s=\pm 1/2$ for electrons, is called the spin magnetic quantum number. The magnetic moment-angular momentum relation becomes
$$
\mu_z = - (2.00232)\frac{e}{2m}S_z
$$
The factor of around $2$ is explained in relativistic generalization of the Schrödinger equation and in QED theory, but is beyond the scope of this post.
hyperfine structure
The various line splittings resulting from magnetic interactions are collectively called fine structure. There are additional, much smaller splittings associated with the fact that the nucleus of the atom has a magnetic dipole moment that interacts with the orbital & spin magnetic dipole moments of the electrons. These effects are called hyperfine structure.
For example, the ground level of hydrogen is split into two states, separated by only $5.9\times10^{-6}\mathrm{eV}$. The photon that is emitted in the transition between these states has a wavelength of $21$ cm. Radio astronomers use this wavelength to map clouds of interstellar hydrogen gas that are too cold to emit visible light.
The four quantum numbers $(n, l, m_l, m_s)$ will completely describe the state of an electron in a hydrogen atom. In the case of many-electron atoms, the Pauli exclusion principle states that two electrons cannot occupy the same quantum state. So if two electrons reside in the same orbital, with the same $n$, $l$ and $m_l$, then their spin must be different, with one $m_s=1/2$ and the other $m_s=-1/2$.
The Pauli exclusion principle provided an explanation for observed chemical reactions of matters. Below is a table that enumerates all possible quantum states for electrons in the first 4 shells. As an atom possesses more and more electrons, those electrons must occupy higher energy states and thus they are more likely to be found further away from the nucleus.
$n$
$l$
$m_l$
spectroscopic notation
# of States
shell
1
0
0
1s
2
K
2
0
0
2s
2
L
2
1
-1, 0, 1
2p
6
3
0
0
3s
2
M
3
1
-1, 0, 1
3p
6
3
2
-2, -1, 0, 1, 2
3d
10
4
0
0
4s
2
4
1
-1, 0, 1
4p
6
N
4
2
-2, -1, 0, 1, 2
4d
10
4
3
-3, -2, -1, 0, 1, 2, 3
4f
14
As we can see from the table, the first shell ($K$ shell) can contain a maximum of 2 electrons, the second shell ($L$ shell) can contain a maximum of 2+6=8 electrons, the third shell ($M$ shell) can contain a maximum of 2+6+10=18 electrons, and the fourth shell ($N$ shell) can contain a maximum of 2+6+10+14=32 electrons. From this, we can list elements in a periodic table, and we can understand elements on the basis of electron configurations. Elements in each column of the periodic table have similar properties due to their shared outer-electron configuration.
X-ray spectra. The shell model of atoms can also explain x-ray emission spectra. When bombarding different substances with electrons accelerated with high voltages, x-rays are emitted and they produce characteristic sharp peaks in the intensity plot. Physicist Moseley found a linear relationship between atomic number $Z$ and the square root of emitted frequency, called the Moseley’s law:
$$
f = (2.48\times10^{15}\text{Hz})(Z-1)^2
$$
The explanation is as follows. First, though outer electrons of an atom are responsible for optical spectra, x-rays are emitted when electrons in the inner shells are knocked out and are filled by electrons falling from one of the outer shells, with energy equal to the energy decrease.
screening
For an atom with atomic number $Z$, we can replace $e$ with $Ze$ for the nuclear charge in the energy level formula, to get
$$
E_n = -\frac{Z^2}{n^2}(13.6\mathrm{eV})
$$
This calculation treats the electron at consideration as isolated as if it only experiences forces from the nucleus, which is unrealistic. A more realistic approximation is that, for an outer electron, the charge upon it is approximately the net charge of the nucleus plus the electrons in the inner shells, which we denote as $Z_{\text{eff}}$. This effect is called screening (shielding). We can write the energy level as
$$
E_n = -\frac{Z_{\text{eff}}^2}{n^2}(13.6\mathrm{ eV})
$$
A $K_\alpha$ x-ray photon is emitted when an electron in the $L$ shell ($n=2$) drops down to fill a hole in the $K$ shell ($n=1$). Combined with the remaining one electron in the $K$ shell, the net effective charge facing the electron is $Z_{\text{eff}}=Z-1$. The energies before and after the transition are respectively
$$
\begin{aligned}
E_1 &\approx -\frac{(Z-1)^2}{2^2}(13.6\text{ eV}) = -(Z-1)^2(3.4\text{ eV})\newline
E_2 &\approx -\frac{(Z-1)^2}{1^2}(13.6\text{ eV}) = -(Z-1)^2(13.6\text{ eV})
\end{aligned}
$$
The energy of the $K_\alpha$ x-ray photon is the difference $E_1-E_2=(Z-1)^2(10.2\text{ eV})$, and from here we can calculate its frequency as
$$
f = \frac{E}{h} = \frac{(Z-1)^2(10.2\text{ eV})}{4.136\times10^{-15}\mathrm{eV\cdot s}} = (2.47\times10^{15}\text{ Hz})(Z-1)^2
$$
The hole in the $K$ shell may be filled by an electron falling from the $M$ shell and $N$ shell, in addition to the $L$ shell, assuming they are occupied. If so, the spectrum shows a series of lines, called the $K$ series, and they are denoted as $L_\alpha$, $K_\beta$ and $K_\gamma$ lines.
The concept of energy levels can help understand formation of molecules, condensed matter structures, and how semiconductors work. Atoms can bond together due to several bonding mechanisms: ionic bond, covalent bond, van der Waals bond, and hydrogen bond. Molecules can vibrate and rotate, giving off molecular spectra. When we draw the energy diagram for a solid, which consists of a large number of identical atoms, we get an energy band diagram.
For insulators, there’s a large gap between (filled) valence band and (empty) conduction band. Conductors have filled valence band and partially filled conduction band. Semiconductors like silicon (Si) and germanium (Ge) have 4 electrons at the outer shell. The gap between (filled) valence band and (empty) conduction band is very small, so electrons are very easy to jump to the conduction band.
We can dope silicon or germanium with a small amount of impurities from the Group V elements (N, P, As, Sb, Bi), called donors, which have 5 electrons in the outer shell. Because of screening, the fifth electron is very loosely bound. Its energy level is very close to the conduction band of the material. The gap is only about $0.01\mathrm{eV}$, so it is very easy to jump to the conduction band and conduct electricity. This is called an $n$-type semiconductor.
On the other hand, doping with Group III elements (B, Al, Ga, In, Tl), called acceptors, which have 3 valence electrons, create holes in the material that act like positive charges. It is called a $p$-type semiconductor.
When we put a $n$-type semiconductor and a $p$-type semiconductor together, we get a $p-n$ junction. At the junction, electrons and holes combine, forming a depletion region. This leaves the $n$ side with a slightly positive charge, and the $p$ side with a slightly negative charge, which forms an electric field barrier directed against the $p$ side that prevents further movement of electrons.
When we connect the $n$-type end to cathode ($-$) and the $p$-type end to anode ($+$) of a battery with a voltage that can overcome the barrier, it will conduct electricity like a conductor. If we switch the ends, the current in the reverse direction is much smaller. Thus we say it has a forward bias.
When electrons in the $n$-type side fall into the holes in the valence band of the $p$-type side, which have lower energy levels, they will release energies as photons. If the energy released is large enough, it will emit visible light. This is how light-emitting diode (LED) works.
The reverse effect of materials absorbing photons and creating electron-hole pairs is called photovoltaic effect. That is how solar cells work. The $n$ layer is at the top, thin and heavily doped; the $p$ layer is at the bottom, thick and lightly doped. This gives a much thicker depletion region. Lights strike down to the depletion region and create electron-hole pairs. The electric field formed in the depletion region then pushes the holes to the $p$ layer and electrons to the $n$ layer, and a potential difference will build up. When connected to a circuit, electrons from the $n$ layer will flow to the load and then combine with holes in the $p$ layer.
The one way flow property is reminiscent of the ReLU activation function. This property can be used in power electronics, for AD-DC current conversion, and power and current amplification.
Semiconductors is ideal for making switches, the most common one being the MOSFET (metal-oxide-semiconductor field-effect transistor). In MOSFET, two $n$-type layers called source and drain are placed on the top left and right of a lightly doped $p$-type silicon. When applying a positive voltage to a gate on top, it will attract electrons in the $p$-type silicon, effectively opening up a tunnel for the source and drain to conduct electricity.
source: textbook
MOSFET can be used to make logic gates, e.g. NOT, AND, OR, XOR. An AND gate is two transistors connected in series, while an OR gate is two transistors connected in parallel. They are the building blocks for more complicated logic units like adder and memory.
MOSFET can be switched on and off very fast. It is widely used in inverters in electric cars to convert DC current supplied by the battery to AC current needed by the AC motor. It is also used in pulse width modulation (PWM) to control the brightness of OLED displays by repeatedly switching on and off with varying time intervals. This can cause significant eye strain and is one of the greatest unappreciated harm to human health in the modern society.
LCD vs OLED displays
Previously, LCD displays are widely used as phone displays. In a liquid crystal display, layers of crystal array, polarizers and color masks are placed on top of a constantly glowing white backlight. Applying a voltage to a crystal will twist the molecules inside and change the intensity of light passing through, giving a pixel its perceived color. The backlight is on all the time, most of the backlight is filtered, so it is not very efficient and can make the screen hot & consume a lot battery energies.
In OLED display, a single layer of millions of tiny red, green and blue LED droplets array is sandwiched between an anode and a cathode layer and is directly responsible for emitting lights. This method should be more efficient, resulting in lower power consumption and heat reduction. However, there’s a problem when it comes to adjusting brightness. While brightness of a LCD display can be reduced by simply dimming the backlight, if the voltage supply to a droplet in OLED is too low, lighting can become unstable, causing deterioration in color balance. For this reason, phone makers use pulse width modulation (PWM) to constantly switch the display on and off in front of your eyes to simulate brightness levels. This is extremely unnatural, and harmful. In particular, iPhones made by Apple use very low frequency PWM of only 480Hz. Currently, most people are unaware of the adverse health effect.
A proton or neutron is called a nucleon. The total number of nucleons (protons+neutrons) in an atom is called the nucleon number $A$. It is also called the mass number because it is approximately the mass of the nucleus measured in atomic mass units $\mathrm{u}$.
$$
1 \mathrm{u} = 1.66\times 10^{-27} \mathrm{kg}
$$
The radius $R$ of a nucleus follows the relation
$$
R = R_0A^{1/3}
$$
From this, the volume of a nucleus viewed as a sphere is proportional to $A$. Dividing by $A$, we see all nuclei have approximately the same density.
The number of protons in a nucleus is called the atomic number $Z$. The number of neutrons is called the neutron number $N$. We have the relation
$$
A = Z + N
$$
A neucleus with a specific $Z$ and $N$ is called a nuclide. Atoms with same $Z$ but different $N$ are isotopes.
How MRI works
Nuclear magnetic resonance imaging (MRI) is essentially a mapping of hydrogen atoms (water and fat) in the body. Like electrons, protons and neutrons are also spin-1/2 particles. By placing the body in a strong magnetic field $\bm{B}$, the spins of the hydrogen nuclei become aligned with a component parallel to $\bm{B}$. A brief radio signal causes the spins to flip orientations. Then as the protons realign with the $\bm{B}$ field, they emit radio waves that can be picked up by detectors.
The energy that must be added to separate the nucleons is called the binding energy $E_B$. We use $E_0$ to denote the total rest energy of the separated nucleons, so that the rest energy of the nucleus is $E_0-E_B$ and $E_0 = (E_0 - E_B) + E_B$. By mass-energy equivalence, the mass of the nucleus is smaller than the mass of the nucleons by an amount $m=E_B/c^2$. This term is called the mass defect.
The binding energy is defined as
$$
E_B = (ZM_{\mathrm{H}} + Nm_{\mathrm{n}} - \ce{^{A}_{Z}M})c^2
$$
where $M_\mathrm{H}$ is the mass of the hydrogen atom (1 proton + 1 electron), $\ce{^{A}_{Z}M}$ is the mass of the neutral atom containing the nucleus. Below are two examples.
Heavy water. The simplest nucleus is that of hydrogen, a single proton. Then there is the hydrogen isotope $\ce{^2_1 H}$, called deuterium, and its nucleus (1 proton + 1 neutron) is called the deuteron. The binding energy is calculated as
$$
E_B = (1.007825\mathrm{u} + 1.008665\mathrm{u} - 2.014102\mathrm{u})(931.5\text{ MeV/u}) = 2.224 \text{ MeV}
$$
The binding energy per nucleon is
$$
E_B/A = (2.224\text{ MeV}) / 2 = 1.112\text{ MeV}
$$
which is the lowest among all nuclides.
Nickel. By contrast, $\ce{^62_28 Ni}$ has the highest binding energy pre nucleon of all nuclides. With $Z=28$, $M_\mathrm{H}=1.007825\mathrm{u}$, $N=A-Z=62-28=34$, $m_\mathrm{n}=1.008665\mathrm{u}$, and $\ce{^{A}_{Z}M}=61.928349\mathrm{u}$, we have $\Delta M=0.585361\mathrm{u}$,
$$
E_B = (0.585361\mathrm{u})(931.5\text{ MeV/u}) = 545.3\text{ MeV}
$$
and
$$
E_B / A = 545.3\text{ MeV} / 62 = 8.795\text{ MeV per nucleon.}
$$
Note that the mass defect $\Delta M$ per nucleon is more than half the mass of a nucleon.
The nuclear force that overcomes the repulsive electric force and holds protons and neutrons together is still not well understood, but we know:
it does not depend on charge;
it has short range;
nearly constant density and nearly constant binding energy per nucleon suggest that a nucleon cannot interact with all other nucleons, but only those few in the vicinity;
nuclear force favors binding of pairs of protons or neutrons with opposite spins.
binding energy per nucleon as a function of mass number $A$. source: textbook
There are two methods to model the binding energy: the liquid-drop model and the shell model.
The liquid-drop model. This model treats nucleons as molecules of a liquid, held together by short-range interactions and surface-tension effects. The total binding enregy is modeled as a sum of five factors:
the first factor is simply the total number of nucleons $A$;
nucleons on the furface are less tightly bound than those in the interior, because they have no neighbors outside the surface. Thus a negative term proportional to the surface area $4\pi R^2$ is included, and from $R\sim A^{1/3}$ the term is $-A^{2/3}$.
each one of the $Z$ protons repels every other $(Z-1)$ protons. The total repulsive electric potential energy is proportional to $Z(Z-1)$, and inversely proportional to $R$. Thus a negative term $-Z(Z-1)/A^{1/3}$ is included.
to model balance between neutron and proton counts, a negative term $-(N-Z)^2/A = -(A-2Z)^2/A$ is included.
to model pairing of protons and neutrons, a term $\pm A^{-4/3}$ is added, so that it is positive if both $Z$ and $N$ are even, negative if both $Z$ and $N$ are odd, and zero otherwise.
The total binding energy is
$$
E_B = C_1 A - C_2A^{2/3} - C_3\frac{Z(Z-1)}{A^{1/3}} - C_4\frac{(A-2Z)^2}{A} \pm C_5A^{-4/3}
$$
where the coefficients are empirically fitted to observed data:
$$
C_1=15.75, C_2=17.8, C_3=0.71, C_4=23.69, C_5=39
$$
Given $E_B$ calculated from the model, one can estimate the mass of any neutral atom $\ce{^A_Z M}$.
The shell model. We could use some reasonable functions to represent the potential energies of the neutrons and protons coming from the nuclear force (as well as electric potential energies for protons), and solve the Schrödinger equation for a proton or neutron moving in such a potential. That is, we treat each nucleon as moving in a potential that represents the averaged-out effect of all other nucleons.
As for electrons, there are shells and subshells corresponding to stable arrangements. We find that when the number of neutrons or the number of protons is 2, 8, 20, 28, 50, 82, or 126, the resulting structure is unusually stable – that is, has an unusually high binding energy. These numbers are called magic numbers. There are also doubly magic nuclides for which both $Z$ and $N$ are magic:
$$
\ce{^4_2 He} \quad \ce{^16_8 O} \quad \ce{^40_20 Ca} \quad \ce{^48_20 Ca} \quad \ce{^208_82 Pb}
$$
The magic numbers correspond to filled-shell or -subshell configurations of nucleon energy levels with a relatively large jump in energy to the next allowed level.
Most of the 2500+ known nuclides are not stable, and they undergo nuclear decays. We can plot all nuclides on a $N$ vs $Z$ chart, called the Segrè chart. Those stable nuclides trend upward above the 45 degree line because as #protons get large, more neutrons are needed to contain the electric repulsion of the protons.
There are 3 types of nuclear radiation: alpha radiation, beta radiation and gamma radiation.
Alpha decay. Alpha decay is emission of alpha particles, which are $\ce{^4_2 He}$ nuclei with 2 protons and 2 neutrons. Alpha particles can travel only a few centimeters in air, and barely penetrate solids.
Beta decay. Beta decay is the emission of electrons or positrons from the nucleus. There are 3 types of beta decay: beta-minus, beta-plus and electron capture. A beta-minus decay is the transformation of a neutron into a proton, an electron, and an antineutrino.
$$
\ce{n -> p + \beta^- + \bar{\nu}_e}
$$
Beta-minus decay usually occurs for nuclides with too much neutrons (i.e. $N/Z$ too large). In $\ce{\beta^-}$ decay, $N$ decreases by 1 and $Z$ increases by 1, and $A$ doesn’t change.
A beta-plus decay is the transformation of a proton into a neutron, positron and an neutrino.
$$
\ce{p -> n + \beta^+ + \nu_e}
$$
It can occur when $N/Z$ is too small for stability.
An electron capture is the combination of a proton and an electron (usually in the innermost $K$ shell) into a neutron and a neutrino.
$$
\ce{p + \beta^+ -> n + \nu_e}
$$
Gamma decay. Like electrons, the nucleus also has different energy levels, including ground states and excited states. When a nucleus is placed in an excited states, it can decay to the ground state by emission of one or more photons called gamma rays.
Half-lives. Let $N(t)$ be the number of nuclei in a sample at time $t$. The number of decays in $dt$ is $-dN(t)$, and the decay rate $-dN(t)/dt$ is called the activity of the specimen. The larger $N(t)$ is, the more decays during any time interval. So there is a proportionality between the decay rate and the nuclei count, and the decay is exponential:
$$
-\frac{dN(t)}{dt} = \lambda N(t) \quad\Rightarrow\quad N(t) = N_0e^{-\lambda t}
$$
$\lambda$ is called the decay constant. The half-life $T_{1/2}$ is the time required for the count to go from $N_0$ to $N_0/2$. It is
$$
\frac{1}{2}=e^{-\lambda T_{1/2}} \quad\Rightarrow\quad T_{1/2}=\frac{\ln 2}{\lambda}.
$$
The (mean) lifetime is one over lambda:
$$
T_{\text{mean}} = \frac{1}{\lambda} = \frac{T_{1/2}}{\ln 2}
$$
Radioactivity can be used for dating. The carbon isotope $\ce{^14C}$ in plants beta decays to $\ce{^14N}$ with a half-life of 5730 years. $\ce{^40K}$ in some rocks decays to $\ce{^40Ar}$ with a half-life of $2.4\times 10^8$ years. If we know the half-life, then we know $\lambda$. If we know the activity then and count the activity now (how many decays per time per gram), then we can calculate $t$ and get the age of the specimen:
$$
e^{-\lambda t} = \frac{N(t)}{N_0} = \frac{-dN(t)/dt}{-dN(t)/dt\mid_t=0}
$$
$$
t = \frac{\ln(\text{activity now} / \text{activity at }t=0)}{-\lambda}
$$
Units of nuclear radiations. One unit of activity is curie (Ci), which is defined as $3.7\times 10^{10}$ decays per second. This is approximately equal to the activity of 1g of radium. The SI unit is becquerel (Bq), which is simply one decay per second.
$$
\begin{aligned}
1 \text{ Ci} &= 3.7\times 10^{10}\text{ decays/s}\newline
1 \text{ Bq} &= 1\text{ decay/s}
\end{aligned}
$$
The most abundant radioactive nuclide found on earth is the uranium isotope $\ce{^238 U}$. It can undergo a series of 14 decays, including 8 alpha emissions and 6 beta emissions, ending in a stable isotope of lead, $\ce{^206 Pb}$, while $\ce{^235 U}$ ends with $\ce{^207 Pb}$. In the decay chain, a notable decay is radium $\ce{^226 Ra}$ to the inert, colorless and odorless gas radon $\ce{^222 Rn}$. This decay has a half-life of 1600 years, while $\ce{^222 Rn}$ has a half-life of 3.82 days. Radon can accumulate in houses and basements, due to the presence of radium in soils and building materials. It is a serious health hazard. If inhaled, it emits damaging $\alpha$ particles in your lungs while continuing its decay down the $\ce{^238 U}$ decay series. The average activity per volume inside American homes is estimated to be $1.5\text{ pCi/L}$, which is several thousands decays per second in an average-sized room (note $1\mathrm{L}=0.001\mathrm{m^3}$)
Nuclear radiations are ionizing radiations because as they pass through matter, they lose energy, breaking molecular bonds and creating ions. Radiation dosimetry is the description of the effect of radiation on living tissue. The absorbed dose (辐射吸收剂量) of radiation is defined as the energy delivered to the tissue per unit mass. There are twp units for absorbed dose, gray (中文:戈瑞) and rad:
$$\begin{aligned}
1\text{ Gy} &= 1\text{ J/kg} \newline
1\text{ rad} &= 0.01\text{ J/kg} = 0.01\text{ Gy}
\end{aligned}
$$
Absorbed dose may not be an adequate measure because equal energies of different kind of radiation can cause different effect. Relative Biological Effectiveness (RBE) is defined as the ratio of the dose of a reference radiation (200 keV X-rays) to the dose of the test radiation required to produce the same biological effect. It obviously depends on types of radiation and biological endpoints being studied (e.g., cell death, DNA damage, or tumor induction).
Thus, RBE can be further decomposed as a product of radiation type factor called the quality factor $Q$ and a weighting factor $W$ for different organs. They are defined and published by the ICRP (International Commission on Radiological Protection) and ICRU (International Commission on Radiation Units and Measurements).
(quality factor) X-rays, gamma rays and electrons all have a factor of $1$. Neutrons have $5\sim20$ depending on energy, and alpha particles receive a factor of $20$.
(weighting factor) Here are some examples:
bone/lung/stomach/breast: 0.12
gonads: 0.08
bladder/liver: 0.04
skin: 0.01
human body: 1
plant: 2 ~ 0.02
fish: 0.75 ~ 0.03
bird: 0.6 ~ 0.15
The (biologically) equivalent dose is energy per kilogram corrected by RBE factors. The SI unit is sievert (Sv) (中文:希沃特/西弗,简称希). An older unit is rem (中文:雷姆):
$$
\begin{aligned}
\mathrm{Sv = RBE \times Gy}\newline
\mathrm{rem = RBE \times rad}
\end{aligned}
$$
Note we have the following conversions:
Geiger counters measure nuclear radiation with the Geiger-Muller tube. The tube is filled with a noble gas, typically He or Ar. The inner wall is coated with a conducting material and serves as the cathode, while the anode is a wire in the center of the tube. A high voltage of around 400V is applied. When ionizing radiation strikes the tube, it will knock out electrons of the gas molecules. Electrons will be collected at the anode, while the positive ions will be attracted toward the wall, thus a pulse can be detected and registered as a count.
A problem we need to address is the dead time. As the positive ions grab electrons at the cathode and become neutral, they end up in a higher energy state. They return to the ground state by emitting photons, causing further ionization. If nothing were done to counteract this, ionization would be prolonged and could even escalate. The prolonged avalanche would increase the dead time when new events cannot be detected, and could become continuous and damage the tube. Some form of quenching of the ionization is therefore needed to reduce the dead time. A common way of doing this is to add a compound (usually Br, Cl) which can absorb the energy and charge from the ionized gas. They decay to a ground state without emitting photons, by forming compounds.
Geiger counters can display in either count or dose rate. To convert counts to dose rates, producers need to calibrate a calibration factor (G) in counts per microsievert (cps/μSv) or counts per minute per microsievert per hour (cpm per μSv/h). The reference isotope is usually Cs-137 (Cesium-137). The conversion is then cpm = G · μSv/h.
If particles $A$ and $B$ interact to produce particles $C$ and $D$, the reaction energy is defined as
$$
Q = (M_A+M_B-M_C-M_D)c^2
$$
When $Q>0$, the reaction is called exoergic or exothermal. When $Q<0$, the reaction is called endoergic or endothermal. We calculate this quantity to estimate energy released or the minimal energy required for reactions to occur.
Neutron activation analysis is a sensitive technique of determining elemental composition of a sample by bombarding the sample with neutrons. Stable nuclides absorb the neutrons, become unstable, undergo $\beta^-$ and $\gamma$ radiations, which can help identify the original stable nuclide. Quantities of elements that are far too small for conventional chemical analysis can be detected in this way.
Nuclear fission is a decay process in which an unstable nucleus splits into two fragments of comparable mass. The following is a typical process:
$$
\ce{^235_92 U + ^1_0n -> ^236_92 U^* -> ^144_56 Ba + ^89_36 Kr + 3 ^1_0n}
$$
$$
\ce{^235_92 U + ^1_0n -> ^236_92 U^* -> ^140_54 Xe + ^94_38 Sr + 2 ^1_0n}
$$
The total kinetic energy of the fission fragments is enormous, about 200 MeV, compared to typical alpha and beta energies of a few MeV. This can be explained by the relatively loose bound for nucleus with large $A$. Referring to the binding energy plot as a function of $A$, the average binding energy per nucleon is about $7.6$ MeV at $A=240$ and $8.5$ MeV at $A=120$. The binding energy is expected to increase about $8.5-7.6=0.9$ MeV per nucleon, or a total of $235\times0.9\text{ MeV}\approx 200\text{ Mev}$. The increase in binding energy corresponds to a decrease in rest energy, as it is converted to the kinetic energy of the fission fragments.
Fission fragments always have too many neutrons to be stable. They usually respond to this surplus of neutrons by undergoing a series of $\beta^-$ decays, until a stable value of $N/Z$ is reached. A typical example is
$$
\ce{^140_54 Xe ->[\beta-] ^140_55 Cs ->[\beta-] ^140_56 Ba ->[\beta-] ^140_57 La ->[\beta-] ^140_58 Ce}
$$
This releases 15 MeV of additional kinetic energy. This poses a serious problem with respect to control and safety of nuclear reactors (discussed below), because heat generation from this process cannot be stopped even after full insertion of control rods. In the event of total loss of cooling water, this power is more than enough to cause a catastrophic meltdown of the reactor core and possible penetration of the containment vessel.
Nuclear reactors. Fission of uranium nucleus can trigger a chain reaction. Nuclear reactors in nuclear power plants use controlled nuclear chain reaction to generate energy, by boiling water. Californium-252 is used as a startup neutron source because it undergoes spontaneous fission and can splitting out many neutrons at steady rate. If the neutrons move too fast, $\ce{^235 U}$ is not going to catch them, so water is used to slow the neutrons down. It is called the moderator. Additionally, control rods made of boron or cadmium are placed among the fuel rods to absorb neutrons and change or stop the reaction rate.
A typical nuclear power plant has and overall efficiency is $1/3$, with an electric-generating capacity of $1000\text{ MW}$ ($10^9\text{W}$).
Nuclear fusion. In a nuclear fusion reaction, two or small light nuclei fuse together to form a larger nucleus.
$$
\begin{aligned}
\ce{^1_1H + ^1_1H &-> ^2_1H + \beta^+ \nu_e}\newline
\ce{^2_1H + ^1_1H &-> ^3_2He + \gamma}\newline
\ce{^2_3He + ^3_2He &-> ^4_2He + ^1_1H + ^1_1H}
\end{aligned}
$$
For fusion to occur, two nuclei must come together to the range of $2\times10^{-15}\text{m}$. They must overcome the electrical repulsion of their positive charges, corresponding to a potential energy of about $0.7\text{ MeV}=1.2\times10^{-13}\text{J}$. Atoms only have this energy at extremely high temperatures. It can be calculated to be
$$
E = \frac{3}{2}kT \quad\Rightarrow\quad T=\frac{2E}{3k}=\frac{2(0.6\times10^{-13}\text{J})}{3(1.38\times10^{-23}\text{J/K})} = 3\times10^9\text{ K}.
$$
It is estimated that each gram of the sun contains about $4.5\times10^{23}$ protons. If all of these photons were fused into helium, it would take the sun $75\times10^9$ years to exhaust its supply of photons. But since temperature is only high enough deep in the interior, the sun can sustain fusion for a total of only about $10\times10^9$ years, with the current age being $4.54\times10^9$ years, so it has about halfway to go.