3d point reconstruction from 2d Images
I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:
For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)
We find the corresponding points in the two images using methods like SIFT or SURF etc.
After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)
Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters
Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.
After going through theory section, as my first experiment I tried to run the code available here,
Which worked as expected. With a few modification in the example.py
code I tried to run this example on all the consecutive image pairs and merge the 3-d point clouds for 3d reconstruction of object (dino
) as below:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2
from camera import Camera
import structure
import processor
import features
def dino():
# Dino
img1 = cv2.imread('imgs/dinos/viff.003.ppm')
img2 = cv2.imread('imgs/dinos/viff.001.ppm')
pts1, pts2 = features.find_correspondence_points(img1, img2)
points1 = processor.cart2hom(pts1)
points2 = processor.cart2hom(pts2)
fig, ax = plt.subplots(1, 2)
ax[0].autoscale_view('tight')
ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
ax[0].plot(points1[0], points1[1], 'r.')
ax[1].autoscale_view('tight')
ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
ax[1].plot(points2[0], points2[1], 'r.')
fig.show()
height, width, ch = img1.shape
intrinsic = np.array([ # for dino
[2360, 0, width / 2],
[0, 2360, height / 2],
[0, 0, 1]])
return points1, points2, intrinsic
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
for item in range(len-1):
print(files[item], files[(item+1)%len])
#dino() function takes 2 images as input
#and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
#print(('Length', len(points1))
# Calculate essential matrix with 2d points.
# Result will be up to a scale
# First, normalize points
points1n = np.dot(np.linalg.inv(intrinsic), points1)
points2n = np.dot(np.linalg.inv(intrinsic), points2)
E = structure.compute_essential_normalized(points1n, points2n)
print('Computed essential matrix:', (-E / E[0][1]))
# Given we are at camera 1, calculate the parameters for camera 2
# Using the essential matrix returns 4 possible camera paramters
P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
P2s = structure.compute_P_from_essential(E)
ind = -1
for i, P2 in enumerate(P2s):
# Find the correct camera parameters
d1 = structure.reconstruct_one_point(
points1n[:, 0], points2n[:, 0], P1, P2)
# Convert P2 from camera view to world view
P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
d2 = np.dot(P2_homogenous[:3, :4], d1)
if d1[2] > 0 and d2[2] > 0:
ind = i
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
#tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)
if not points3d.size:
points3d = tripoints3d
else:
points3d = np.concatenate((points3d, tripoints3d), 1)
fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()
But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.
python image-processing computer-vision 3d-reconstruction
|
show 4 more comments
I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:
For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)
We find the corresponding points in the two images using methods like SIFT or SURF etc.
After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)
Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters
Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.
After going through theory section, as my first experiment I tried to run the code available here,
Which worked as expected. With a few modification in the example.py
code I tried to run this example on all the consecutive image pairs and merge the 3-d point clouds for 3d reconstruction of object (dino
) as below:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2
from camera import Camera
import structure
import processor
import features
def dino():
# Dino
img1 = cv2.imread('imgs/dinos/viff.003.ppm')
img2 = cv2.imread('imgs/dinos/viff.001.ppm')
pts1, pts2 = features.find_correspondence_points(img1, img2)
points1 = processor.cart2hom(pts1)
points2 = processor.cart2hom(pts2)
fig, ax = plt.subplots(1, 2)
ax[0].autoscale_view('tight')
ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
ax[0].plot(points1[0], points1[1], 'r.')
ax[1].autoscale_view('tight')
ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
ax[1].plot(points2[0], points2[1], 'r.')
fig.show()
height, width, ch = img1.shape
intrinsic = np.array([ # for dino
[2360, 0, width / 2],
[0, 2360, height / 2],
[0, 0, 1]])
return points1, points2, intrinsic
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
for item in range(len-1):
print(files[item], files[(item+1)%len])
#dino() function takes 2 images as input
#and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
#print(('Length', len(points1))
# Calculate essential matrix with 2d points.
# Result will be up to a scale
# First, normalize points
points1n = np.dot(np.linalg.inv(intrinsic), points1)
points2n = np.dot(np.linalg.inv(intrinsic), points2)
E = structure.compute_essential_normalized(points1n, points2n)
print('Computed essential matrix:', (-E / E[0][1]))
# Given we are at camera 1, calculate the parameters for camera 2
# Using the essential matrix returns 4 possible camera paramters
P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
P2s = structure.compute_P_from_essential(E)
ind = -1
for i, P2 in enumerate(P2s):
# Find the correct camera parameters
d1 = structure.reconstruct_one_point(
points1n[:, 0], points2n[:, 0], P1, P2)
# Convert P2 from camera view to world view
P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
d2 = np.dot(P2_homogenous[:3, :4], d1)
if d1[2] > 0 and d2[2] > 0:
ind = i
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
#tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)
if not points3d.size:
points3d = tripoints3d
else:
points3d = np.concatenate((points3d, tripoints3d), 1)
fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()
But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.
python image-processing computer-vision 3d-reconstruction
3
If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
– BConic
Nov 15 '18 at 20:05
Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
– flamelite
Nov 16 '18 at 4:05
Your code doesn't include the the definition of thedino
function, and neither does the code you link to. Can you please add it in?
– tel
Nov 16 '18 at 8:16
1
@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
– tel
Nov 21 '18 at 20:12
1
Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
– Alexander Reynolds
Nov 21 '18 at 20:15
|
show 4 more comments
I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:
For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)
We find the corresponding points in the two images using methods like SIFT or SURF etc.
After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)
Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters
Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.
After going through theory section, as my first experiment I tried to run the code available here,
Which worked as expected. With a few modification in the example.py
code I tried to run this example on all the consecutive image pairs and merge the 3-d point clouds for 3d reconstruction of object (dino
) as below:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2
from camera import Camera
import structure
import processor
import features
def dino():
# Dino
img1 = cv2.imread('imgs/dinos/viff.003.ppm')
img2 = cv2.imread('imgs/dinos/viff.001.ppm')
pts1, pts2 = features.find_correspondence_points(img1, img2)
points1 = processor.cart2hom(pts1)
points2 = processor.cart2hom(pts2)
fig, ax = plt.subplots(1, 2)
ax[0].autoscale_view('tight')
ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
ax[0].plot(points1[0], points1[1], 'r.')
ax[1].autoscale_view('tight')
ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
ax[1].plot(points2[0], points2[1], 'r.')
fig.show()
height, width, ch = img1.shape
intrinsic = np.array([ # for dino
[2360, 0, width / 2],
[0, 2360, height / 2],
[0, 0, 1]])
return points1, points2, intrinsic
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
for item in range(len-1):
print(files[item], files[(item+1)%len])
#dino() function takes 2 images as input
#and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
#print(('Length', len(points1))
# Calculate essential matrix with 2d points.
# Result will be up to a scale
# First, normalize points
points1n = np.dot(np.linalg.inv(intrinsic), points1)
points2n = np.dot(np.linalg.inv(intrinsic), points2)
E = structure.compute_essential_normalized(points1n, points2n)
print('Computed essential matrix:', (-E / E[0][1]))
# Given we are at camera 1, calculate the parameters for camera 2
# Using the essential matrix returns 4 possible camera paramters
P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
P2s = structure.compute_P_from_essential(E)
ind = -1
for i, P2 in enumerate(P2s):
# Find the correct camera parameters
d1 = structure.reconstruct_one_point(
points1n[:, 0], points2n[:, 0], P1, P2)
# Convert P2 from camera view to world view
P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
d2 = np.dot(P2_homogenous[:3, :4], d1)
if d1[2] > 0 and d2[2] > 0:
ind = i
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
#tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)
if not points3d.size:
points3d = tripoints3d
else:
points3d = np.concatenate((points3d, tripoints3d), 1)
fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()
But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.
python image-processing computer-vision 3d-reconstruction
I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:
For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)
We find the corresponding points in the two images using methods like SIFT or SURF etc.
After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)
Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters
Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.
After going through theory section, as my first experiment I tried to run the code available here,
Which worked as expected. With a few modification in the example.py
code I tried to run this example on all the consecutive image pairs and merge the 3-d point clouds for 3d reconstruction of object (dino
) as below:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2
from camera import Camera
import structure
import processor
import features
def dino():
# Dino
img1 = cv2.imread('imgs/dinos/viff.003.ppm')
img2 = cv2.imread('imgs/dinos/viff.001.ppm')
pts1, pts2 = features.find_correspondence_points(img1, img2)
points1 = processor.cart2hom(pts1)
points2 = processor.cart2hom(pts2)
fig, ax = plt.subplots(1, 2)
ax[0].autoscale_view('tight')
ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
ax[0].plot(points1[0], points1[1], 'r.')
ax[1].autoscale_view('tight')
ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
ax[1].plot(points2[0], points2[1], 'r.')
fig.show()
height, width, ch = img1.shape
intrinsic = np.array([ # for dino
[2360, 0, width / 2],
[0, 2360, height / 2],
[0, 0, 1]])
return points1, points2, intrinsic
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
for item in range(len-1):
print(files[item], files[(item+1)%len])
#dino() function takes 2 images as input
#and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
#print(('Length', len(points1))
# Calculate essential matrix with 2d points.
# Result will be up to a scale
# First, normalize points
points1n = np.dot(np.linalg.inv(intrinsic), points1)
points2n = np.dot(np.linalg.inv(intrinsic), points2)
E = structure.compute_essential_normalized(points1n, points2n)
print('Computed essential matrix:', (-E / E[0][1]))
# Given we are at camera 1, calculate the parameters for camera 2
# Using the essential matrix returns 4 possible camera paramters
P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
P2s = structure.compute_P_from_essential(E)
ind = -1
for i, P2 in enumerate(P2s):
# Find the correct camera parameters
d1 = structure.reconstruct_one_point(
points1n[:, 0], points2n[:, 0], P1, P2)
# Convert P2 from camera view to world view
P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
d2 = np.dot(P2_homogenous[:3, :4], d1)
if d1[2] > 0 and d2[2] > 0:
ind = i
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
#tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)
if not points3d.size:
points3d = tripoints3d
else:
points3d = np.concatenate((points3d, tripoints3d), 1)
fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()
But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.
python image-processing computer-vision 3d-reconstruction
python image-processing computer-vision 3d-reconstruction
edited Nov 21 '18 at 19:42


Alexander Reynolds
9,42311739
9,42311739
asked Oct 26 '18 at 13:38
flameliteflamelite
9031623
9031623
3
If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
– BConic
Nov 15 '18 at 20:05
Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
– flamelite
Nov 16 '18 at 4:05
Your code doesn't include the the definition of thedino
function, and neither does the code you link to. Can you please add it in?
– tel
Nov 16 '18 at 8:16
1
@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
– tel
Nov 21 '18 at 20:12
1
Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
– Alexander Reynolds
Nov 21 '18 at 20:15
|
show 4 more comments
3
If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
– BConic
Nov 15 '18 at 20:05
Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
– flamelite
Nov 16 '18 at 4:05
Your code doesn't include the the definition of thedino
function, and neither does the code you link to. Can you please add it in?
– tel
Nov 16 '18 at 8:16
1
@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
– tel
Nov 21 '18 at 20:12
1
Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
– Alexander Reynolds
Nov 21 '18 at 20:15
3
3
If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
– BConic
Nov 15 '18 at 20:05
If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
– BConic
Nov 15 '18 at 20:05
Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
– flamelite
Nov 16 '18 at 4:05
Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
– flamelite
Nov 16 '18 at 4:05
Your code doesn't include the the definition of the
dino
function, and neither does the code you link to. Can you please add it in?– tel
Nov 16 '18 at 8:16
Your code doesn't include the the definition of the
dino
function, and neither does the code you link to. Can you please add it in?– tel
Nov 16 '18 at 8:16
1
1
@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
– tel
Nov 21 '18 at 20:12
@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
– tel
Nov 21 '18 at 20:12
1
1
Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
– Alexander Reynolds
Nov 21 '18 at 20:15
Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
– Alexander Reynolds
Nov 21 '18 at 20:15
|
show 4 more comments
3 Answers
3
active
oldest
votes
Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.
Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).
The steps of OpenSfM at a high level:
Read image exif for any prior information you can use (e.g. focal
length)Extract feature points (e.g. SIFT)
Match feature points
Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
a track instead of match(1,2), match(2,3), etc...)Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
images to the reconstruction, triangulate new points, and refine the
poses/point positions using a process called Bundle Adjustment.
Hopefully that helps.
add a comment |
The general idea is as follows.
In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.
What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.
Here is how to do this.
First, before the loop, initialize an accumulation matrix absolute_P1
:
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])
for item in range(len-1):
# ...
Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:
# ...
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])
abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!
if not points3d.size:
points3d = abs_tripoints3d
else:
points3d = np.concatenate((points3d, abs_tripoints3d), 1)
# ...
This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
– tel
Nov 21 '18 at 3:45
Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
– BConic
Nov 21 '18 at 6:22
add a comment |
TL;DR
You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.
The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.
code
...in progress
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53010027%2f3d-point-reconstruction-from-2d-images%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.
Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).
The steps of OpenSfM at a high level:
Read image exif for any prior information you can use (e.g. focal
length)Extract feature points (e.g. SIFT)
Match feature points
Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
a track instead of match(1,2), match(2,3), etc...)Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
images to the reconstruction, triangulate new points, and refine the
poses/point positions using a process called Bundle Adjustment.
Hopefully that helps.
add a comment |
Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.
Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).
The steps of OpenSfM at a high level:
Read image exif for any prior information you can use (e.g. focal
length)Extract feature points (e.g. SIFT)
Match feature points
Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
a track instead of match(1,2), match(2,3), etc...)Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
images to the reconstruction, triangulate new points, and refine the
poses/point positions using a process called Bundle Adjustment.
Hopefully that helps.
add a comment |
Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.
Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).
The steps of OpenSfM at a high level:
Read image exif for any prior information you can use (e.g. focal
length)Extract feature points (e.g. SIFT)
Match feature points
Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
a track instead of match(1,2), match(2,3), etc...)Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
images to the reconstruction, triangulate new points, and refine the
poses/point positions using a process called Bundle Adjustment.
Hopefully that helps.
Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.
Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).
The steps of OpenSfM at a high level:
Read image exif for any prior information you can use (e.g. focal
length)Extract feature points (e.g. SIFT)
Match feature points
Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
a track instead of match(1,2), match(2,3), etc...)Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
images to the reconstruction, triangulate new points, and refine the
poses/point positions using a process called Bundle Adjustment.
Hopefully that helps.
answered Nov 21 '18 at 19:35
Jomnipotent17Jomnipotent17
127419
127419
add a comment |
add a comment |
The general idea is as follows.
In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.
What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.
Here is how to do this.
First, before the loop, initialize an accumulation matrix absolute_P1
:
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])
for item in range(len-1):
# ...
Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:
# ...
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])
abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!
if not points3d.size:
points3d = abs_tripoints3d
else:
points3d = np.concatenate((points3d, abs_tripoints3d), 1)
# ...
This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
– tel
Nov 21 '18 at 3:45
Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
– BConic
Nov 21 '18 at 6:22
add a comment |
The general idea is as follows.
In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.
What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.
Here is how to do this.
First, before the loop, initialize an accumulation matrix absolute_P1
:
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])
for item in range(len-1):
# ...
Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:
# ...
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])
abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!
if not points3d.size:
points3d = abs_tripoints3d
else:
points3d = np.concatenate((points3d, abs_tripoints3d), 1)
# ...
This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
– tel
Nov 21 '18 at 3:45
Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
– BConic
Nov 21 '18 at 6:22
add a comment |
The general idea is as follows.
In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.
What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.
Here is how to do this.
First, before the loop, initialize an accumulation matrix absolute_P1
:
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])
for item in range(len-1):
# ...
Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:
# ...
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])
abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!
if not points3d.size:
points3d = abs_tripoints3d
else:
points3d = np.concatenate((points3d, abs_tripoints3d), 1)
# ...
The general idea is as follows.
In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.
What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.
Here is how to do this.
First, before the loop, initialize an accumulation matrix absolute_P1
:
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])
for item in range(len-1):
# ...
Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:
# ...
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])
abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!
if not points3d.size:
points3d = abs_tripoints3d
else:
points3d = np.concatenate((points3d, abs_tripoints3d), 1)
# ...
edited Nov 21 '18 at 6:19
answered Nov 20 '18 at 20:49


BConicBConic
6,41021538
6,41021538
This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
– tel
Nov 21 '18 at 3:45
Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
– BConic
Nov 21 '18 at 6:22
add a comment |
This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
– tel
Nov 21 '18 at 3:45
Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
– BConic
Nov 21 '18 at 6:22
This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
– tel
Nov 21 '18 at 3:45
This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
– tel
Nov 21 '18 at 3:45
Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
– BConic
Nov 21 '18 at 6:22
Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
– BConic
Nov 21 '18 at 6:22
add a comment |
TL;DR
You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.
The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.
code
...in progress
add a comment |
TL;DR
You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.
The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.
code
...in progress
add a comment |
TL;DR
You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.
The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.
code
...in progress
TL;DR
You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.
The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.
code
...in progress
answered Nov 21 '18 at 17:14


teltel
7,31121431
7,31121431
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53010027%2f3d-point-reconstruction-from-2d-images%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
– BConic
Nov 15 '18 at 20:05
Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
– flamelite
Nov 16 '18 at 4:05
Your code doesn't include the the definition of the
dino
function, and neither does the code you link to. Can you please add it in?– tel
Nov 16 '18 at 8:16
1
@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
– tel
Nov 21 '18 at 20:12
1
Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
– Alexander Reynolds
Nov 21 '18 at 20:15