What RAW Photographers Need to Understand About ‘Gamma’
It’s possible that a RAW photographer can get by just fine without knowing a thing about gamma, since all gamma-related functionality is taken care of under-the-hood — by the RAW converter, the video card and the display. But it’s a little like exercising without understanding the underlying physiological mechanisms at play. One would realize some or even most of the benefits, but knowing more would help improve technique and maximize potential.
This is not a detailed technical overview of gamma. The main goal is to explain how gamma helps extract the maximum dynamic range out of RAW data. To keep things simple, the discussion is restricted to still photography that is viewed on a modern LCD display.
Cameras capture light in their sensors, measure that light and convert it to a digital electrical signal. This conversion is done with the help of a function called Opto-Electrical-Transfer-Function (OETF). It sounds heavy but really it’s a simple mapping of the particular tone of the captured light to a digital value. When the signal is ready to display, the reverse process happens. This time the digital signal is converted to light via — you guessed it — an EOTF.
Camera sensors capture light linearly, such that the doubling of the input light causes a doubling of the digital signal. Modern displays are also linear in that a doubling of the digital signal produces a doubling of the output light. In an ideal world then, the OETF & EOTF would be straight lines with a slope of 1. A doubling of incident light on the sensor would end up resulting in a doubling of the display output.
The kind of cameras used by RAW photographer produce digital signals that are 12 to 14-bit wide. What the means is that the camera is capable of capturing 4096 to 16384 distinct tones of light.
1 bit can represent (2x1) 2 binary levels (0, 1)
2 bits can represent (2x2) 4 binary levels (00, 01, 10, 11)
3 bits can represent (2x2x2) 8 binary levels (000, 001, 010, 011, .., 111)
…
N bits can produce 2 (power) N binary levels
For simplification, let’s assume we’re working with 12-bit RAW / 4096 tones. On most modern displays, this number of tones is more than adequate for maximizing the available dynamic range. In other words, if we have a gradient going from the darkest to the brightest luminance the display can produce, then having 4096 tones will produce a result that looks smooth to the eye, with no visible steps or banding.
The problem is that most modern displays can only process 8-bit data and can therefore only reproduce 256 tones. That means that we have to map the available range of tones (4096) to 256 tones. The simplest way to do this would be to divide the RAW tones into 256 equally sized bins. This would produce 8 raw tones per display tone. In this case, the first eight raw tones (0000_0000_0000, 0000_0000_0001, 0000_0000_0010 .. 0000_0000_0111) would map to the display tone 0000_0000. The next eight tones would map to 0000_0001. And so on ..
The problem with this mapping is that it would not produce a smooth gradient and the reason for that has to do with the way the human eye processes light. The human eye does not respond to light in a linear way. Instead, it amplifies light in the darker tones and is therefore able to resolve more detail in this range as compared to bright light. What that means is that an increase in incident light that is far below double it’s original value, is perceived by the eye as if it was doubled.
When we do the simple mapping of 4096 to 256 tones described above, we’re simply not collecting enough samples from the range in which the eye has maximum sensitivity. Therefore, when we display the results of that mapping, the eye sees a gradient that is far from smooth because crucial details are missing.
This is where gamma comes in. Instead of a linear OETF, we take the RAW data and we apply a non-linear OETF (the gamma curve) that amplifies the tones much the same way the eye would. The result is the details in the image that originally had tones within the eyes’ sensitivity range are now outside that range and occupy some of brighter tonal ranges. Now, when we do our binning, the same as before, details that had been missed with a linear OETF, are picked up.
When the resultant image is presented to the display, inverse gamma (EOTF) is applied, which undoes the brightening that was done after gamma application. The result is that the image returns to its original luminance level but with far more detail in the darks/shadows region.
The key takeaway is that when we’re shooting RAW, we’re capturing far more data than the display can process and gamma is the mechanism by which we make the most judicious use of the subset of that data — by playing to the human eye’s biases — that will used for display. It follows that the final 8-bit version of an image processed/manipulated in RAW will present greater detail than a similarly processed 8-bit image (JPEG) because of the flexibility offered by the excess underlying information. Without gamma, we would not be able to pick-and-choose and therefore lose this flexibility.
The usefulness of shooting at a greater bit-depth than what a typical display can handle is not obvious until we understand gamma. I hope the information presented here can help clear up some of that confusion and motivate more photographers to shoot in RAW so as to maximize the quality and dynamic range of their final images.