**Understanding Pinhole Cameras and Depth Problem**
For simplicity sake, we'll model our camera as a pinhole camera. The optical center of our camera is somewhere behind this camera plane. Any object in the world projects its light down here, intersecting our image plane and then goes into the camera origin. We have an optical center of our camera, and any light rays coming from an object here are going to travel down this ray, intersect our image plane, and then go into the optical center of the camera. This will happen for any points in our scene that this camera can see.
We want to say we've got a point on this image plane where did it come from? The crucial problem is that it could have come from here or it could have come from here or here or here or anywhere along this ray and we don't know, and that's what we're trying to find out. That's the depth problem.
We also have an optical center for this camera, which is somewhere behind the image plane. Rays will be coming out of and intersecting through these points. If we knew that this point in this image was a certain distance away, we just project the rays, find where they intersect, and use simple triangulation and maths to work out how far way that position is. However, we don't know what point that is because it's going to change - it might not be visible in this image. This is one problem, as finding the exact same point in a different image when it might have rotated and changed slightly is a lot of work in two dimensions.
You've got to do that for every single pixel in this image, trying to find maybe one that tries to match in here. That's a lot of work, so we don't tend to do that. We use something called epipolar geometry to try and make this a little bit easier. Epipolar lines are like big triangles coming out from the optical center of our cameras, with the points where they intersect the image plane.
This is part of what makes pinhole cameras so simple - we're not trying to find a single point in this image but rather a line that passes through all these points. This line represents all the possible projections of a ray into this image, and by knowing where our cameras are, we've simplified our problem. We know where the cameras are, so we can say we're trying to find this position X1 in this image by knowing that it's going to be somewhere along this epipolar line.
We know it's going to be in this line, so we've got a limited set of pixels to look through. All we need to do is go through each of these pixels and say which one of them looks most like this point, and then find it. Once we've found the pixel that matches, we can use triangulation to work out how far away the object is by projecting rays from different parts of the image plane towards our optical centers.
**Epipolar Geometry**
One edge of our triangle is between the optical centers of the cameras, one is through this point and out into the world, and the other is some value we don't know. This makes it a lot easier to find where these things are, as we have a simple way to determine which pixel in the image corresponds to a certain epipolar line.
If we're writing a stereo reconstruction algorithm, for every point in this image, we will try and find the point along its particular epipolar line that best matches it. This process is called finding the correspondence problem, and it's really the core of what we're solving here - finding the occluded pixels and determining their depth.
Finding a point in this image based on another one from this image is not as straightforward as simply looking for a pixel that matches. There are many factors to consider, such as the rotation and translation of the objects, which can make it difficult to find reliable correspondences. However, by using epipolar geometry, we can simplify the problem and make it more tractable.
**Stereo Reconstruction**
Stereo reconstruction is the process of using two or more cameras to create a 3D image of a scene. This involves finding the correspondence between pixels in different images and then using that information to determine the depth of the objects in the scene.
When we're trying to find the depth of an object, we need to be able to determine which pixel in one image corresponds to a certain point in another image. We can use epipolar geometry to help us with this, as it allows us to simplify the problem by finding lines that pass through all the points on our image plane.
By looking at the pixels along these epipolar lines, we can find the ones that correspond to each other between different images. Once we've found these correspondences, we can use triangulation to work out how far away the objects are in the scene. This is a simplified version of what happens in stereo reconstruction - finding the exact correspondence and depth of every pixel can be much more complex.
**Challenges and Limitations**
Stereo reconstruction has many challenges and limitations, particularly when it comes to finding reliable correspondences between different images. The rotation and translation of objects can make it difficult to find matches, and small movements between frames can result in large discrepancies in the correspondence.
Additionally, stereo reconstruction requires a lot of processing power and computational resources to handle the large amounts of data involved. However, with the advent of more powerful computers and advanced algorithms, we're able to tackle these challenges and create more accurate 3D images from stereo pairs.
Despite the challenges, stereo reconstruction remains an important area of research in computer vision and robotics, with applications in fields such as augmented reality, virtual reality, and robotics.