When I was first introduced to this project by Professor Alexei A. Efros, I was immediately captured by Sergei Mikhailovich Prokudin-Gorskii’s innovative mindset. Prokudin-Gorskii was a pioneer in the field of color photography, foreseeing its future as early as 1907. After gaining special permission from the Tzar, Prokudin-Gorskii embarked on an ambitious journey. In his expeditions across the Russian Empire, he captured thousands of images, ranging from people to architecture to nature. His technique involved photographing scenes with three separate exposures on glass plates, each filtered through red, green, and blue filters. However, in Prokudin-Gorskii’s time period, there was no technology available to produce colored images from his black and white images. Prokudin-Gorskii’s glass plate negatives were later digitized by the Library of Congress, and thus the objective of this project became clear: to put Prokudin-Gorskii’s hard work into fruition and display a unique glimpse of the Russian Empire’s final years.
The objective of my project is to take the digitized glass plate images and use image processing techniques to recreate color images with minimal visual artifacts. To achieve this, I implemented and tested several methods for aligning the three color channels. Initially, I employed an exhaustive search method using L2 norm and normalized cross correlation on low resolution images. Based on the visual results, I determined that NCC offered better performance for alignment. I then implemented a Gaussian image pyramid approach to handle higher resolution images more efficiently. While this method successfully improved the alignment, it significantly increased computation time. To address this, I optimized my algorithm by vectorizing large array operations and parallelizing processes, which substantially reduced the processing time.
In order to measure the quality of alignment between two images, I utilized Normalized Cross Correlation (NCC) as the score metric. I picked NCC as it is a robust method for comparing the similarity between two images by measuring the correlation between pixel intensities, normalized to account for differences in lighting or contrast. This method computes the similarity on a scale from -1 to 1, where a score closer to 1 indicates better alignment.
The exhaustive align method is a single scale search that aligns two images by testing all possible shifts within a specified range. For each shift, the target image is displaced, and the similarity to the reference image is measured using the NCC score. Here are the results on lower resolution images, with displacement range of [-16, 16].
G displacement: (2, -3)
R displacement: (2, 3)
G displacement: (3, 3)
R displacement: (3, 6)
G displacement: (2, 5)
R displacement: (3, 12)
Furthermore, I implemented a pyramid align function that is a multi-scale approach which leverages image pyramids to align two images more efficiently. This method progressively aligns images at different resolutions, starting from a low resolution version and refining the alignment at higher resolutions. First, I construct Gaussian pyramids for both the reference (base) and target (shift) images by iteratively applying a Gaussian filter and downscaling the images at each level. Starting with the lowest resolution images, I use the previously defined exhaustive align method to compute the optimal shift. The alignment is then refined at each subsequent level by adjusting the displacement according to the image's resolution. By progressively aligning the images from coarse to fine detail, this approach reduces the search space at higher resolutions, resulting in faster and more efficient alignment for large or high resolution images. Here are three results for high resolution images, ran with displacement range of [-48, 48] and 5 levels for the pyramid.
G displacement: (5, 42)
R displacement: (32, 87)
G displacement: (16, 59)
R displacement: (13, 124)
G displacement: (4, 25)
R displacement: (-4, 58)
1. Inaccuracy in Exhaustive Search: Initially, the exhaustive search method struggled with alignment accuracy, even on low-resolution images. The issue was caused by the presence of borders in the images, which disrupted the similarity calculations. I resolved this by manually cropping the images to remove the borders, which greatly improved the alignment results.
2. High Runtime: When I first ran the naive version of the pyramid align function, the runtime exceeded 10 minutes per image. To address this, I optimized the performance in two key ways: first, I vectorized the large array calculations to take advantage of faster, element-wise operations. Then, I implemented multi-core parallelism using Joblib, which distributed alignment tasks across multiple CPU cores. Additionally, I used Numba to compile Python functions into machine code, speeding up the individual computations. These optimizations reduced the runtime to approximately 1 minute per image.
3. Accuracy Issues: After optimizing the runtime, the image alignment was still not as accurate as expected. The issue stemmed from the lack of a Gaussian blur in the image pyramid. Once I applied the Gaussian blur to the images during the pyramid construction, the alignment improved significantly. However, it was still not entirely accurate for three images: emir, self portrait, and melons. To combat this, I just increased the displacement range to [-64, 64] but only at 3 levels for the guassian pyramid.
The following are the remaining images processed from the colorized Prokudin-Gorskii photo collection.
G displacement: (24, 49)
R displacement: (55, 102)
G displacement: (17, 41)
R displacement: (23, 89)
G displacement: (26, 51)
R displacement: (36, 108)
G displacement: (10, 82)
R displacement: (13, 178)
G displacement: (29, 78)
R displacement: (37, 176)
G displacement: (14, 53)
R displacement: (11, 112)
G displacement: (-11, 33)
R displacement: (-27, 140)
G displacement: (9, 50)
R displacement: (11, 112)
1. Automatic Contrasting: I implemented automatic contrasting by using histogram equalization, as learned in lecture. I started by taking a properly aligned image and converting the image from RGB to YUV color space. This method works since color information is stored in the U and V channels and only the Y/luminance channel gets affected. This algorithm works best for any images that are too dark or washed out. Here is the algorithm ran on the melons image, hover over it to see the original colorized image.
2. Automatic Cropping: I implemented automatic cropping by leveraging edge detection and contour analysis. I started by loading the image and converting it to grayscale to simplify the process of detecting edges. I then applied the Canny edge detection algorithm to highlight the boundaries of the objects in the image. Once the edge map was generated, I identified the largest contour. To ensure a reasonable crop without losing important content, I defined a maximum allowable crop (7% on each side). Using the bounding rectangle of the largest contour, I applied these constraints to compute the final cropped image. This method effectively trims unnecessary borders while preserving the central object, making it ideal for images with extraneous background space. Below is the cropping process for the train and three generations images, from before, to edge map, to cropped final result.