Overcoming spatio-angular trade-off in light field acquisition using compressive sensing

ABSTRACT In contrast to conventional cameras which capture a 2D projection of a 3D scene by integrating the angular domain, light field cameras preserve the angular information of individual light rays by capturing a 4D light field of a scene. On the one hand, light field photography enables powerful post-capture capabilities such as refocusing, virtual aperture, depth sensing and perspective shift. On the other hand, it has several drawbacks, namely, high-dimensionality of the captured light fields and a fundamental trade-off between spatial and angular resolution in the camera design. In this paper, we propose a compressive sensing approach to light field acquisition from a sub-Nyquist number of samples. Using an off-the-shelf measurement setup consisting of a digital projector and a Lytro Illum light field camera, we demonstrate the efficiency of the compressive sensing approach by improving the spatial resolution of the acquired light field. This paper presents a proof of concept with a simplified 3D scene as the scene of interest. Results obtained by the proposed method show significant improvement in the spatial resolution of the light field as well as preserved post-capture capabilities.


Introduction
Since the invention of the digital camera in mid-1970s, lots of effort has been given into improving the digital photography. Most of the research in this field focused on increasing the spatial resolution and reducing the physical size of the imaging devices. Conventional cameras capture two-dimensional projections of the threedimensional world with high spatial resolution. Even though the spatial information is well preserved, the information contained in other dimensions of the light field is lost in the acquisition process.
Plenoptic function L(x, y, z, θ, φ, λ, t) is a 7D function which has been introduced in [1]. It represents the radiance of the light rays emitted by a scene and received by an observer at every point (x, y, z) in the 3D space, along any direction (θ , φ), at any time instance (t) and for all wave lengths (λ). Plenoptic function contains detailed information about the light field within the observed scene.
The idea of light field photography was introduced in the pioneer work of Lippman [2]. The main goal of light field photography is to record not only the scalar intensity of the observed scene, but also the angular information of the light arriving at each pixel of the imaging sensor. Angular information enables novel post-capture capabilities such as refocusing, depth sensing, perspective shifts and synthetic apertures. Unfortunately, that comes with an inherent trade-off in the spatio-angular resolution of the light field cameras, resulting in the fact that the spatial resolution of the light field images does not meet the standards of the modern digital photography.
In recent years, light field photography became a popular research area and multiple methods for light field acquisition and processing have been proposed. Some of the methods focus exclusively on the light field acquisition, while the others work on improving different aspects of the captured light fields using advanced post-processing. Light field acquisition using temporal multiplexing is a simple idea where a static scene is observed sequentially from multiple viewpoints using a conventional camera. In [3], the authors used a camera gantry in order to acquire multiple two-dimensional images. A 4D light field is then reconstructed from multiple 2D projections of the observed scene. In [4], a prototype of a programmable aperture light field camera was proposed. Programmable aperture enables multiplexed light field acquisition using a conventional camera. Compressive light field camera based on a programmable aperture was proposed in [5], where the authors introduced novel optical designs along with improved reconstruction algorithms in order to obtain light field from a single camera exposure. Spatial multiplexing using camera arrays is another way to acquire light fields. In [6], the authors built a camera array which consists of 10 × 10 cameras. This setup enabled light field capture with high spatial resolution, demonstrating the ability to see through partial occlusions. However, this setup needs careful calibration and is too bulky for practical use. Ng [7,8] prototyped a portable light field camera by inserting a micro-lens array (MLA) between the sensor and the main lens. Each micro-lens enables capturing information on both the intensity and the direction of a light ray. Later, this prototype resulted in the first consumer grade light field camera produced by Lytro, followed by an improved camera model named Lytro Illum. A major drawback of the MLA-based light field acquisition is the low spatial resolution of the captured images, which is due to the fact that a single imaging sensor is shared to capture both the spatial and the angular information.
There are two main approaches for improving the spatial and angular resolution of the light field. Hybrid light field imaging systems like the ones proposed in [9,10] consist of a conventional high-resolution camera and a light field camera. In such imaging systems, resolution of the individual light field subaperture is enhanced using the information from the conventional camera image. Recently, deep learning methods for light field super-resolution were proposed [11][12][13]. These methods learn underlying statistical distribution of a training dataset consisting of different light field examples. After the training, such methods are able to 'hallucinate' higher resolution details in the angular and spatial domain of the light fields. Contrary to the previously mentioned hybrid methods, deep learning methods for light field super-resolution are not based on physical measurements.
In this paper, we present a compressive sensing (CS) approach to the light field acquisition. We design and implement an off-the-shelf setup consisting of a digital projector and a Lytro Illum light field camera. The proposed setup enables improvement of the spatial resolution of the captured light field and overcomes the aforementioned spatio-angular trade-off. In contrast with previously mentioned hybrid imaging systems, our measurement system acquires high-resolution images of the observed scene from a sub-Nyquist number of samples. The results presented in this paper show significant improvement in the spatial resolution of the sampled light field as well as preserved post-capture capabilities.
The remainder of this paper is organized as follows: Section 2 describes the light field parameterization and acquisition, Section 3 offers a short introduction into compressive sensing framework, while Section 4 describes our measurement setup along with the acquisition and reconstruction process. Experimental results are presented in Section 5, while conclusions are drawn in Section 6.

Light field parameterization and acquisition
Plenoptic 7D function can be simplified to a 5D function by observing the light field at a single time instance (i.e. omitting t) and by adding a colour filter array on the photo-sensor (i.e. omitting λ). Moreover, by assuming propagation of light ray in a space without occlusions, the plenoptic function can be further simplified to a 4D light field representation LF(u, v, s, t) that was introduced in [3]. (s, t) denote the spatial, while (u, v) are the angular dimensions of the light field. In this paper, we use the two-plane parameterization of the light field.
Visualization of the two-plane light field parameterization is shown in Figure 1. Lytro Illum camera contains a hexagonal micro-lens array that is placed in front of the imaging sensor. Each micro-lens is positioned at its corresponding coordinate in the (s, t) plane. Number of lenslets defines the spatial resolution of the captured light field. Imaging sensor is placed at the focal distance of the MLA. Pixels of the imaging sensor lie in the (u, v) plane. Number of pixels behind each of the micro-lens defines the angular resolution of the light field. In Figure 2(a), an inset from the raw Lytro Illum camera image is shown, and the hexagonal arrangement of micro-lenses is visible in the magnified detail. Figure 2(b) shows the tiled light field format. Each of the sub-images corresponds to a single sub-aperture of the Lytro camera. Sub-aperture images are constructed by taking a single pixel with specific (u, v) under each micro-lens. Essentially, such an extraction corresponds to holding (u, v) fixed and considering all (s, t) coordinates in the 4D light field representation. Figure 1. Visualization of the two-plane light field parameterization. (s, t) coordinate system corresponds to the spatial information of the light field. Each circle corresponds to a single micro-lens and the number of micro-lenses defines the spatial resolution of the light field. (u, v) coordinate system corresponds to the angular information of the light field. A single lenslet is shown with its corresponding pixels of the imaging sensor. Number of sensor pixels behind the micro-lens defines the angular resolution of the light field.  Light field imaging enables a wide variety of post-capture capabilities such as digital refocusing, viewpoint change, virtual aperture and depth sensing [7,8,14]. In order to easily model the postcapture manipulation of the light field, synthetic light field L (u , v , s , t ) was introduced in [7] inspired by the synthetic sensor plane equation E(s , t ) given by: where D is separation between the sensor and aperture, A is the aperture function, and θ is the angle of incidence that ray (u , v , s , t ) makes with the film plane [7]. All the post-capture manipulations of the light field are essentially methods that numerically approximate this integral. For example, refocusing is conceptually just a summation of shifted versions of the images that form through pinholes over the entire (u, v) aperture.
The main drawback of most light field imaging devices is the aforementioned spatio-angular trade-off. Specifically, Lytro Illum camera contains a 40MP imaging sensor. Spatial resolution of the light field captured using the Lytro Illum camera is limited by the number of the micro-lenses placed above the imaging sensor and equals around 0.3MP after decoding it into the 4D light field format. Notice that this resolution is rather small compared to the spatial resolution of the modern imaging sensors.
In this paper, our goal is to improve the spatial resolution of the light fields while retaining the original angular resolution by using a simple off-the-shelf CS measurement setup and by applying compressive sensing principles. In the following section, we provide a short overview of the CS framework.

Compressive sensing
Compressive sensing is a signal processing framework which consists of a linear measurement and a nonlinear reconstruction process which is based on sparse optimization [15][16][17]. Research on various applications of the CS framework [18][19][20][21][22] has been very active in the recent years in different scientific areas.
The CS measurement process can be modelled using inner products between the original signal x and a collection of measurement vectors φ i as in In the CS measurement process, only a subset of measurements defined by the Shannon-Nyquist theorem is acquired. Since M N, the quotient r = M/N is called the subsampling ratio.
Measurement results y i can be arranged into an M × 1 vector y and the measurement vectors φ T i can be written as rows of an M × N matrix . CS measurement process can then be written in the matrix form as: where s = x is the sparse representation of the signal x in some transformation domain , and = −1 is the sensing matrix. System in Equation (2) is underdetermined with a higher number of unknowns than the number of equations. Consequently, it has an infinite number of possible solutions. Basic assumption in the CS is that the observed signals have a sparse representation (i.e. only a small number of nonzero elements) in certain transformation domain. This assumption holds for most natural signals. However, solving Equation (2) to find the sparsest solution (i.e. the one with the smallest l 0 norm) is NP-hard. Thus, l 1 -norm solution is commonly used as a convex relaxation of the l 0 -norm since it can be found using convex optimization algorithms [23].
CS reconstruction process can be modelled in its unconstrained form as: Figure 3. Flowchart of the proposed method. First, detailed calibration of the camera-projector system is performed, followed by the CS measurement process. Sparse optimization is applied to the CS measurements resulting in high-resolution sub-aperture reconstructions. Individual sub-aperture reconstructions are transformed from the projector coordinate system to the camera coordinate system using estimated perspective transformations. Finally, different post-capture effects can be applied to the reconstructed light field including refocusing, virtual aperture and viewpoint change.
where the left term corresponds to the Euclidean loss, while the right term corresponds to the l 1regularization. In order for the CS reconstruction to be feasible, measurement matrix has to be incoherent with the transformation matrix , meaning that the rows of cannot sparsely represent the columns of , and vice versa. Any random measurement matrix, like a random Gaussian measurement matrix or a random binary matrix, is incoherent with any transformation matrix with high probability [16].

Experiments
In this section, we describe the CS measurement setup and a reconstruction algorithm that enables high resolution light field reconstruction from the CS measurements. Flowchart in Figure 3 shows the steps of the proposed CS light field reconstruction framework.

Measurement setup
The measurement setup proposed in this paper is a generalization of the compressive imaging setup proposed in [24]. It consists of a Lytro Illum light field camera and InFocus IN3118HD DLP projector, along with mechanical integration elements and a computer which is used for the CS reconstruction. All of the used components are unmodified off-the-shelf components. Measurement is performed in a darkroom to eliminate the influence of the ambient illumination. Camera and projector are remotely triggered in order to preserve the geometry of the measurement setup. For simplicity, in our experiments we use a flat board with printed details as the scene of interest. We ensure that the scene has varying depth by carefully positioning the measurement setup. Camera is placed at an arbitrary angle (in our experiments at around 45 • ) with respect to the scene, while the projector is kept perpendicular as depicted in Figure 4.

Camera-projector calibration
Calibration of the camera-projector system is a crucial step that precedes the CS measurement process. During the CS measurement process, projector is used to modulate the observed scene, while the camera captures the reflected intensities. Any mismatch between the computer generated measurement patterns and the patterns projected by the projector directly influences the reconstruction quality and is modelled as the multiplicative noise. The calibration process for the proposed measurement setup closely follows the procedure described in our previous work [24].
Therefore, camera focus, ISO and exposure values are manually set and kept fixed during the measurement process. Camera exposure time is set to a multiple of projector colour-wheel rotation period in order to overcome the aliasing effects of the colourwheel. Frequency of the colour-wheel rotation for the IN3118HD projector is 120 Hz. Thus, we set the camera exposure time to 1/60 s. Projector manufacturers often reduce the required bandwidth in the projector design by using chroma subsampling. In chroma subsampling, the resolution of the colour channels is decimated, while the resolution of the luminance channel is preserved. Undesired intensity variations can occur when projecting patterns with resolution higher than the resolution of the projector's colour channels. Thus, we project measurement patterns that are 2× downsampled to match the 4:2:2 chroma subsampling ratio of the IN3118HD. Since in our experiments only binary measurement patterns are used, gamma correction of the projector is not necessary.
CS assumes linearity of the measurement process. Consequently, we need to ensure that the camera sensor is working in linear mode. Lytro Illum camera stores raw sensor data as a 10-bit image in Lytro Raw Format (LFR) and these images are used in our experiments. Additionally, the camera-projector system introduces noise into the CS measurements. The overall noise in the measurement system consists of the camera noise (i.e. readout, dark current and photon noise [25]) and projector noise (i.e. stray projector light and background illumination). We perform overall noise estimation as proposed in [24] by projecting a sequence of eight 'black' projector images (i.e. zeros in R, G and B channels) and capturing them using the Lytro camera. Mean 'black' image I b is calculated by averaging the captured images. Subtracting the mean 'black' image from each measurement image I c cancels out the deterministic error pattern of the whole measurement system, resulting in the background corrected camera imageÎ c :Î

Light field decoding
In order to decode the raw Lytro 2D lenslet image into the 4D light field representation, we closely follow the decoding procedure described in [26] and the Matlab implementation from [27]. Lytro camera has an unknown placement of the micro-lens array, with slight translational and rotational offsets in relation to the imaging sensor. Lytro cameras contain a database of white images which were taken through a diffuser. Due to the vignetting effect of each microlens (see Figure 2(c)), the micro-lens centres appear as the brightest spots in each micro-lens image. After detecting the micro-lens centroids, grid parameters are estimated by traversing the micro-lens centres using Delaunay triangulation. Rotation of the micro-lens array is estimated by observing the angle of each individual row and column and by calculating the mean angle of the whole lenslet array. After the rotation is estimated, mean horizontal and vertical spacing between the micro-lens centroids is calculated. Raw sensor images are then resampled so that the micro-lens centres are aligned with the pixel centres. Additionally, re-sampling occurs during conversion of the hexagonal grid into an orthogonal grid.
The decoding procedure results in a light field with standard 4D parameterization. Light field obtained using the decoding procedure from [27] has angular resolution of 15 × 15 and spatial resolution of 626 × 434 pixels after the previously mentioned re-sampling steps.

Measurement process
In the experiments, we use Hadamard measurement matrix [28] shown in Figure 5, since it can be efficiently implemented in our measurement system using binary measurement patterns. As previously mentioned, this simplifies the calibration procedure by eliminating the need for the projector gamma correction. For the transformation matrix , we use discrete cosine transformation (DCT), as our scenes are approximately sparse in the DCT domain.
Number of CS measurements M and dimensionality of the observed signal N define the size of the measurement matrix and the size of the sparse optimization problem that needs to be solved. Row vectors of the measurement matrix define measurement patterns φ i to be projected onto the scene (see Figure 5). Pay attention that obtaining the CS measurements and performing sparse optimization for high-dimensional signals is time consuming and extremely computationally complex. To overcome this problem, we proposed phased CS measurement approach in [24]. The basic idea is to divide the original high-dimensional problem into a number of smaller sub-problems which can be measured and reconstructed independently. This approach significantly reduces the computational complexity and makes the CS reconstruction of high-dimensional signals feasible.
In our experiments, the projector image is partitioned into non-overlapping 8 × 8 pixel blocks. In order to establish per block correspondences in the camera-projector system, we use multiple sine patterns which are shifted in phase (i.e. multiple phase shift, MPS approach [29,30]). MPS patterns are projected onto the scene independently from the CS measurement patterns and captured by the Lytro camera. Raw Lytro images of the MPS patterns are first decoded into sub-aperture images and the correspondences are found for each sub-aperture image independently. Images corresponding to the projected MPS patterns, along with the images captured by the camera (for a single sub-aperture) are shown in Figure 6 (a ,b), respectively. After estimating and unwrapping the phase of the captured sine pattern, we find the exact correspondences between the projector and camera pixels by matching the corresponding phase values.
Next, we project Hadamard measurement patterns onto the scene and capture CS measurements using the Lytro camera as shown in Figure 6(c,d). In order to achieve the separability of the measurements, we project only a subset of blocks in a single measurement phase. In our case, the measurements are acquired by separating the blocks into 4 measurement phases.
A set of M measurement patterns (where M N) is projected onto the scene sequentially. Camera and projector lens introduce geometric spatial distortions, and the camera sensor captures the distorted projection of the scene. Captured projector blocks fall on several pixels of the camera sensor, since the camera and projector grids are arbitrary aligned. In order to isolate an individual measurement block, we use the previously obtained camera-projector correspondences. Intensities recorded by the camera sensor define the measurement results y. Measurement results are obtained by summing the intensities of camera pixels that correspond to the individual projected 8 × 8 block. In Figure 7, calculation of the measurement vector for a single block in a single measurement phase is depicted. Previously described procedure is performed for each block of the projected measurement pattern.

CS light field reconstruction
In our approach, CS reconstruction procedure is performed on the measurement results for each of the Lytro sub-apertures. For the reconstruction, we use sparse optimization algorithm from SPArse Modeling Software (SPAMS), an optimization toolbox for solving various sparse optimization problems [31]. Our sparse estimation problem is formulated following Equation (3). A single sub-aperture CS reconstruction results in a high-resolution image of the observed scene from the projector viewpoint.
In order to acquire the high resolution reconstruction from the camera viewpoint, we use homography transformation.
In the following, we give a short overview of the homography estimation based on [32]. Homography is estimated between each of the Lytro camera sub-apertures and the projector using the previously obtained correspondences. Under homography, we can write the transformation of the corresponding points in 3D from the camera to projector coordinate system as: In the image planes, using homogeneous coordinates, we can write: Equation (6) After rewriting Equation (7) in inhomogeneous coordinates with x 2 = x 2 /z 2 and y 2 = y 2 /z 2 and by setting z 1 = 1, we get the following system: Given a set of corresponding points between the camera and the projector, we can form the following linear system of equations: where Equation (10) can be solved using homogeneous linear least squares method in order to obtain the estimate of the homography matrix. In our experiments, we use MATLAB function fitgeotrans to calculate the homography matrix from the estimated phase values in camera images, and the known projected phase values. After the homography matrix is obtained, we are able to transform the high-resolution image reconstruction from the projector viewpoint to the camera viewpoint. Same procedure is repeated for each of the Lytro camera sub-apertures. This results in a high-resolution reconstruction of the light field. Finally, individually reconstructed sub-aperture images are then reordered into the 4D light field format. The presented approach thus preserves all post-capture capabilities of the Lytro camera which will be demonstrated in the next section.

Improved spatial resolution of light field
In this section, we discuss the results obtained using the proposed framework. First, we show the improvement in the spatial resolution of the captured light field. We select the central sub-aperture from the Lytro camera for visualization purposes, but similar observations hold for other sub-apertures and we instruct the interested reader to check the supplementary materials for additional visualizations. In Figure 8(a), a single measurement from the Lytro camera sensor is shown. Pay attention that this visualization corresponds to the per block measurement results. Figures 8(b-d) show the CS reconstructions using different number of measurements, i.e. M = 24, 32, 64. As noted before, M corresponds to the total number of measurements obtained using the proposed setup. The reconstruction obtained using 64 measurements corresponds to the Nyquist sampling rate.

Viewpoint change
After we have successfully increased the spatial resolution of each of the Lytro sub-apertures, we focus on the post-capture manipulations of the reconstructed light field. One possible post-capture manipulation in light field imaging is change of the viewpoint. Sub-aperture images are formed by extracting the same pixel under each micro-lens.
By traversing the sub-aperture images and selecting a single sub-aperture at a time, we get the scene reconstruction from different viewpoints. In Figure 9, a visualization of the central sub-aperture along with the tiled light field format is shown. Each tile corresponds to a single high-resolution sub-aperture image.
In Figure 10, a composite of two sub-aperture images is shown. Pay attention that only an enlarged part of the reconstructed scene is shown. Since significant

Post-capture refocusing and virtual aperture
Refocusing is another potential post-capture manipulation of the light field. There are several algorithms for refocusing light fields [8,33], and in this paper we use the shift-and-sum algorithm from [34,35]. Refocusing using shift-and-sum algorithm is done by numerically approximating the integral from Equation (1), and conceptually is a summation of dilated and shifted versions of the sub-aperture images over the entire (u, v) aperture.
Virtual change of the camera aperture is performed in a similar manner. By summing different number of sub-apertures without shifting them, we can obtain an arbitrary aperture. By summing all of the sub-apertures, we get the original physical aperture of the Lytro camera. In Figure 11, a composite of two images is shown. In the first image, the focus plane is located at the board plane resulting in a sharp reconstruction of the scene. and (8,13). Since boundary sub-apertures have significant vignetting, we do not visualize the leftmost and rightmost subaperture, but the near-boundary sub-apertures. Visualization is a composite of two sub-apertures where the dashed line denotes the boundary between the views. Notice the slight horizontal offset visible in the enlarged detail. Figure 11. Visualization of the post-capture light field refocusing. A composite of two images is shown, where one image is focused on the scene, while the other image is focused further away from the scene. Dashed line denotes the boundary between the two images. Better visualization can be found in the supplementary material in animated GIF format.
In the second image, the focus is set further away from the scene resulting in blurry reconstruction. Dashed line denotes the boundary between the images. In the magnified detail, the difference between the two images is clearly visible. Better visualization is available in the supplementary material as a live GIF image.

Conclusion
In this paper, we proposed a novel compressive sensing framework for high-resolution light field reconstruction. The original 40MP sensor resolution of the Lytro Illum light field camera is split between 15 2 = 225 angles, thus resulting in rather small spatial resolution after decoding the raw Lytro sensor image into the 4D light field format.
In order to overcome the spatio-angular trade-off of the light field acquisition, we designed an off-the-shelf measurement setup consisting of a light field camera and a digital projector. After calibration, a set of random patterns was projected on a static scene. Compressive sensing approach was used for a successful reconstruction of 225 Lytro sub-apertures in the projector resolution. The potential resolution of the virtual sensor in the proposed measurement system is up to 445MP under assumption of a full HD projector (1920 × 1080). Postcapture capabilities of the light field, including viewpoint change, refocusing and virtual aperture, are preserved using the proposed method.
Since the correspondences between the projector and the light field camera views were modelled as homography, the scene is limited to a plane of an arbitrary angle in 3D space. Our results in high-resolution light field reconstruction have practical significance, but are still limited to a rather simple planar scene. Hence, this paper presents a proof-of-concept of an original approach that will be generalized to an arbitrary scene in our future research.