I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:

For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)

We find the corresponding points in the two images using methods like SIFT or SURF etc.

After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)

Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters

Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.

After going through theory section, as my first experiment I tried to run the code available here,
Which worked as expected. With a few modification in the example.py code I tried to run this example on all the consecutive image pairs and merge the 3-d point clouds for 3d reconstruction of object (dino) as below:

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

import numpy as np

import cv2



from camera import Camera

import structure

import processor

import features



def dino():

    # Dino

    img1 = cv2.imread('imgs/dinos/viff.003.ppm')

    img2 = cv2.imread('imgs/dinos/viff.001.ppm')

    pts1, pts2 = features.find_correspondence_points(img1, img2)

    points1 = processor.cart2hom(pts1)

    points2 = processor.cart2hom(pts2)



    fig, ax = plt.subplots(1, 2)

    ax[0].autoscale_view('tight')

    ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))

    ax[0].plot(points1[0], points1[1], 'r.')

    ax[1].autoscale_view('tight')

    ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))

    ax[1].plot(points2[0], points2[1], 'r.')

    fig.show()



    height, width, ch = img1.shape

    intrinsic = np.array([  # for dino

        [2360, 0, width / 2],

        [0, 2360, height / 2],

        [0, 0, 1]])



    return points1, points2, intrinsic





points3d = np.empty((0,0))

files = glob.glob("imgs/dinos/*.ppm")

len = len(files)



for item in range(len-1):

    print(files[item], files[(item+1)%len])

    #dino() function takes 2 images as input

    #and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.

    points1, points2, intrinsic = dino(files[item], files[(item+1)%len])

    #print(('Length', len(points1))

    # Calculate essential matrix with 2d points.

    # Result will be up to a scale

    # First, normalize points

    points1n = np.dot(np.linalg.inv(intrinsic), points1)

    points2n = np.dot(np.linalg.inv(intrinsic), points2)

    E = structure.compute_essential_normalized(points1n, points2n)

    print('Computed essential matrix:', (-E / E[0][1]))



    # Given we are at camera 1, calculate the parameters for camera 2

    # Using the essential matrix returns 4 possible camera paramters

    P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])

    P2s = structure.compute_P_from_essential(E)



    ind = -1

    for i, P2 in enumerate(P2s):

        # Find the correct camera parameters

        d1 = structure.reconstruct_one_point(

            points1n[:, 0], points2n[:, 0], P1, P2)



        # Convert P2 from camera view to world view

        P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))

        d2 = np.dot(P2_homogenous[:3, :4], d1)



        if d1[2] > 0 and d2[2] > 0:

            ind = i



    P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]

    #tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)

    tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)



    if not points3d.size:

        points3d = tripoints3d

    else:

        points3d = np.concatenate((points3d, tripoints3d), 1)





fig = plt.figure()

fig.suptitle('3D reconstructed', fontsize=16)

ax = fig.gca(projection='3d')

ax.plot(points3d[0], points3d[1], points3d[2], 'b.')

ax.set_xlabel('x axis')

ax.set_ylabel('y axis')

ax.set_zlabel('z axis')

ax.view_init(elev=135, azim=90)

plt.show()

But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.

edited Nov 21 '18 at 19:42

Alexander Reynolds

9,42311739

asked Oct 26 '18 at 13:38

flamelite

9031623

3

If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.

– BConic
Nov 15 '18 at 20:05

Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?

– flamelite
Nov 16 '18 at 4:05

Your code doesn't include the the definition of the dino function, and neither does the code you link to. Can you please add it in?

– tel
Nov 16 '18 at 8:16

1

@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?

– tel
Nov 21 '18 at 20:12

1

Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.

– Alexander Reynolds
Nov 21 '18 at 20:15

|
show 4 more comments

I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:

For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)

We find the corresponding points in the two images using methods like SIFT or SURF etc.

After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)

Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters

Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

import numpy as np

import cv2



from camera import Camera

import structure

import processor

import features



def dino():

    # Dino

    img1 = cv2.imread('imgs/dinos/viff.003.ppm')

    img2 = cv2.imread('imgs/dinos/viff.001.ppm')

    pts1, pts2 = features.find_correspondence_points(img1, img2)

    points1 = processor.cart2hom(pts1)

    points2 = processor.cart2hom(pts2)



    fig, ax = plt.subplots(1, 2)

    ax[0].autoscale_view('tight')

    ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))

    ax[0].plot(points1[0], points1[1], 'r.')

    ax[1].autoscale_view('tight')

    ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))

    ax[1].plot(points2[0], points2[1], 'r.')

    fig.show()



    height, width, ch = img1.shape

    intrinsic = np.array([  # for dino

        [2360, 0, width / 2],

        [0, 2360, height / 2],

        [0, 0, 1]])



    return points1, points2, intrinsic





points3d = np.empty((0,0))

files = glob.glob("imgs/dinos/*.ppm")

len = len(files)



for item in range(len-1):

    print(files[item], files[(item+1)%len])

    #dino() function takes 2 images as input

    #and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.

    points1, points2, intrinsic = dino(files[item], files[(item+1)%len])

    #print(('Length', len(points1))

    # Calculate essential matrix with 2d points.

    # Result will be up to a scale

    # First, normalize points

    points1n = np.dot(np.linalg.inv(intrinsic), points1)

    points2n = np.dot(np.linalg.inv(intrinsic), points2)

    E = structure.compute_essential_normalized(points1n, points2n)

    print('Computed essential matrix:', (-E / E[0][1]))



    # Given we are at camera 1, calculate the parameters for camera 2

    # Using the essential matrix returns 4 possible camera paramters

    P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])

    P2s = structure.compute_P_from_essential(E)



    ind = -1

    for i, P2 in enumerate(P2s):

        # Find the correct camera parameters

        d1 = structure.reconstruct_one_point(

            points1n[:, 0], points2n[:, 0], P1, P2)



        # Convert P2 from camera view to world view

        P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))

        d2 = np.dot(P2_homogenous[:3, :4], d1)



        if d1[2] > 0 and d2[2] > 0:

            ind = i



    P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]

    #tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)

    tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)



    if not points3d.size:

        points3d = tripoints3d

    else:

        points3d = np.concatenate((points3d, tripoints3d), 1)





fig = plt.figure()

fig.suptitle('3D reconstructed', fontsize=16)

ax = fig.gca(projection='3d')

ax.plot(points3d[0], points3d[1], points3d[2], 'b.')

ax.set_xlabel('x axis')

ax.set_ylabel('y axis')

ax.set_zlabel('z axis')

ax.view_init(elev=135, azim=90)

plt.show()

But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.

edited Nov 21 '18 at 19:42

Alexander Reynolds

9,42311739

asked Oct 26 '18 at 13:38

flamelite

9031623

3

If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.

– BConic
Nov 15 '18 at 20:05

Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?

– flamelite
Nov 16 '18 at 4:05

Your code doesn't include the the definition of the dino function, and neither does the code you link to. Can you please add it in?

– tel
Nov 16 '18 at 8:16

1

@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?

– tel
Nov 21 '18 at 20:12

1

Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.

– Alexander Reynolds
Nov 21 '18 at 20:15

|
show 4 more comments

I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:

For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)

We find the corresponding points in the two images using methods like SIFT or SURF etc.

After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)

Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters

Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

import numpy as np

import cv2



from camera import Camera

import structure

import processor

import features



def dino():

    # Dino

    img1 = cv2.imread('imgs/dinos/viff.003.ppm')

    img2 = cv2.imread('imgs/dinos/viff.001.ppm')

    pts1, pts2 = features.find_correspondence_points(img1, img2)

    points1 = processor.cart2hom(pts1)

    points2 = processor.cart2hom(pts2)



    fig, ax = plt.subplots(1, 2)

    ax[0].autoscale_view('tight')

    ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))

    ax[0].plot(points1[0], points1[1], 'r.')

    ax[1].autoscale_view('tight')

    ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))

    ax[1].plot(points2[0], points2[1], 'r.')

    fig.show()



    height, width, ch = img1.shape

    intrinsic = np.array([  # for dino

        [2360, 0, width / 2],

        [0, 2360, height / 2],

        [0, 0, 1]])



    return points1, points2, intrinsic





points3d = np.empty((0,0))

files = glob.glob("imgs/dinos/*.ppm")

len = len(files)



for item in range(len-1):

    print(files[item], files[(item+1)%len])

    #dino() function takes 2 images as input

    #and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.

    points1, points2, intrinsic = dino(files[item], files[(item+1)%len])

    #print(('Length', len(points1))

    # Calculate essential matrix with 2d points.

    # Result will be up to a scale

    # First, normalize points

    points1n = np.dot(np.linalg.inv(intrinsic), points1)

    points2n = np.dot(np.linalg.inv(intrinsic), points2)

    E = structure.compute_essential_normalized(points1n, points2n)

    print('Computed essential matrix:', (-E / E[0][1]))



    # Given we are at camera 1, calculate the parameters for camera 2

    # Using the essential matrix returns 4 possible camera paramters

    P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])

    P2s = structure.compute_P_from_essential(E)



    ind = -1

    for i, P2 in enumerate(P2s):

        # Find the correct camera parameters

        d1 = structure.reconstruct_one_point(

            points1n[:, 0], points2n[:, 0], P1, P2)



        # Convert P2 from camera view to world view

        P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))

        d2 = np.dot(P2_homogenous[:3, :4], d1)



        if d1[2] > 0 and d2[2] > 0:

            ind = i



    P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]

    #tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)

    tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)



    if not points3d.size:

        points3d = tripoints3d

    else:

        points3d = np.concatenate((points3d, tripoints3d), 1)





fig = plt.figure()

fig.suptitle('3D reconstructed', fontsize=16)

ax = fig.gca(projection='3d')

ax.plot(points3d[0], points3d[1], points3d[2], 'b.')

ax.set_xlabel('x axis')

ax.set_ylabel('y axis')

ax.set_zlabel('z axis')

ax.view_init(elev=135, azim=90)

plt.show()

But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.

edited Nov 21 '18 at 19:42

Alexander Reynolds

9,42311739

asked Oct 26 '18 at 13:38

flamelite

9031623

I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:

For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)

We find the corresponding points in the two images using methods like SIFT or SURF etc.

After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)

Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters

Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

import numpy as np

import cv2



from camera import Camera

import structure

import processor

import features



def dino():

    # Dino

    img1 = cv2.imread('imgs/dinos/viff.003.ppm')

    img2 = cv2.imread('imgs/dinos/viff.001.ppm')

    pts1, pts2 = features.find_correspondence_points(img1, img2)

    points1 = processor.cart2hom(pts1)

    points2 = processor.cart2hom(pts2)



    fig, ax = plt.subplots(1, 2)

    ax[0].autoscale_view('tight')

    ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))

    ax[0].plot(points1[0], points1[1], 'r.')

    ax[1].autoscale_view('tight')

    ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))

    ax[1].plot(points2[0], points2[1], 'r.')

    fig.show()



    height, width, ch = img1.shape

    intrinsic = np.array([  # for dino

        [2360, 0, width / 2],

        [0, 2360, height / 2],

        [0, 0, 1]])



    return points1, points2, intrinsic





points3d = np.empty((0,0))

files = glob.glob("imgs/dinos/*.ppm")

len = len(files)



for item in range(len-1):

    print(files[item], files[(item+1)%len])

    #dino() function takes 2 images as input

    #and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.

    points1, points2, intrinsic = dino(files[item], files[(item+1)%len])

    #print(('Length', len(points1))

    # Calculate essential matrix with 2d points.

    # Result will be up to a scale

    # First, normalize points

    points1n = np.dot(np.linalg.inv(intrinsic), points1)

    points2n = np.dot(np.linalg.inv(intrinsic), points2)

    E = structure.compute_essential_normalized(points1n, points2n)

    print('Computed essential matrix:', (-E / E[0][1]))



    # Given we are at camera 1, calculate the parameters for camera 2

    # Using the essential matrix returns 4 possible camera paramters

    P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])

    P2s = structure.compute_P_from_essential(E)



    ind = -1

    for i, P2 in enumerate(P2s):

        # Find the correct camera parameters

        d1 = structure.reconstruct_one_point(

            points1n[:, 0], points2n[:, 0], P1, P2)



        # Convert P2 from camera view to world view

        P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))

        d2 = np.dot(P2_homogenous[:3, :4], d1)



        if d1[2] > 0 and d2[2] > 0:

            ind = i



    P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]

    #tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)

    tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)



    if not points3d.size:

        points3d = tripoints3d

    else:

        points3d = np.concatenate((points3d, tripoints3d), 1)





fig = plt.figure()

fig.suptitle('3D reconstructed', fontsize=16)

ax = fig.gca(projection='3d')

ax.plot(points3d[0], points3d[1], points3d[2], 'b.')

ax.set_xlabel('x axis')

ax.set_ylabel('y axis')

ax.set_zlabel('z axis')

ax.view_init(elev=135, azim=90)

plt.show()

But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.

python image-processing computer-vision 3d-reconstruction

edited Nov 21 '18 at 19:42

Alexander Reynolds

9,42311739

asked Oct 26 '18 at 13:38

flamelite

9031623

edited Nov 21 '18 at 19:42

Alexander Reynolds

9,42311739

asked Oct 26 '18 at 13:38

flamelite

9031623

edited Nov 21 '18 at 19:42

Alexander Reynolds

9,42311739

edited Nov 21 '18 at 19:42

Alexander Reynolds

9,42311739

edited Nov 21 '18 at 19:42

Alexander Reynolds

9,42311739

asked Oct 26 '18 at 13:38

flamelite

9031623

asked Oct 26 '18 at 13:38

flamelite

9031623

asked Oct 26 '18 at 13:38

flamelite

9031623

3

If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.

– BConic
Nov 15 '18 at 20:05

Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?

– flamelite
Nov 16 '18 at 4:05

Your code doesn't include the the definition of the dino function, and neither does the code you link to. Can you please add it in?

– tel
Nov 16 '18 at 8:16

1

@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?

– tel
Nov 21 '18 at 20:12

1

Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.

– Alexander Reynolds
Nov 21 '18 at 20:15

|
show 4 more comments

3

If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.

– BConic
Nov 15 '18 at 20:05

Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?

– flamelite
Nov 16 '18 at 4:05

Your code doesn't include the the definition of the dino function, and neither does the code you link to. Can you please add it in?

– tel
Nov 16 '18 at 8:16

1

@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?

– tel
Nov 21 '18 at 20:12

1

Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.

– Alexander Reynolds
Nov 21 '18 at 20:15

If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.

– BConic
Nov 15 '18 at 20:05

Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?

– flamelite
Nov 16 '18 at 4:05

Your code doesn't include the the definition of the dino function, and neither does the code you link to. Can you please add it in?

– tel
Nov 16 '18 at 8:16

@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?

– tel
Nov 21 '18 at 20:12

Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.

– Alexander Reynolds
Nov 21 '18 at 20:15

|
show 4 more comments

3 Answers
3

active

oldest

votes

Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.

Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).

The steps of OpenSfM at a high level:

Read image exif for any prior information you can use (e.g. focal
length)

Extract feature points (e.g. SIFT)

Match feature points

Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
a track instead of match(1,2), match(2,3), etc...)

Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
images to the reconstruction, triangulate new points, and refine the
poses/point positions using a process called Bundle Adjustment.

Hopefully that helps.

answered Nov 21 '18 at 19:35

Jomnipotent17

127419

add a comment |

-1

The general idea is as follows.

In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.

What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.

Here is how to do this.

First, before the loop, initialize an accumulation matrix absolute_P1:

points3d = np.empty((0,0))

files = glob.glob("imgs/dinos/*.ppm")

len = len(files)

absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])



for item in range(len-1):

    # ...

Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:

# ...

P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))

tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])



abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))

absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!



if not points3d.size:

    points3d = abs_tripoints3d

else:

    points3d = np.concatenate((points3d, abs_tripoints3d), 1)



# ...

edited Nov 21 '18 at 6:19

answered Nov 20 '18 at 20:49

BConic

6,41021538

This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.

– tel
Nov 21 '18 at 3:45

Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.

– BConic
Nov 21 '18 at 6:22

add a comment |

-1

TL;DR

You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.

The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.

code

...in progress

answered Nov 21 '18 at 17:14

tel

7,31121431

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53010027%2f3d-point-reconstruction-from-2d-images%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

The steps of OpenSfM at a high level:

Read image exif for any prior information you can use (e.g. focal
length)

Extract feature points (e.g. SIFT)

Match feature points

Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
a track instead of match(1,2), match(2,3), etc...)

Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
images to the reconstruction, triangulate new points, and refine the
poses/point positions using a process called Bundle Adjustment.

Hopefully that helps.

answered Nov 21 '18 at 19:35

Jomnipotent17

127419

add a comment |

The steps of OpenSfM at a high level:

Read image exif for any prior information you can use (e.g. focal
length)

Extract feature points (e.g. SIFT)

Match feature points

Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
a track instead of match(1,2), match(2,3), etc...)

Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
images to the reconstruction, triangulate new points, and refine the
poses/point positions using a process called Bundle Adjustment.

Hopefully that helps.

answered Nov 21 '18 at 19:35

Jomnipotent17

127419

add a comment |

The steps of OpenSfM at a high level:

Read image exif for any prior information you can use (e.g. focal
length)

Extract feature points (e.g. SIFT)

Match feature points

Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
a track instead of match(1,2), match(2,3), etc...)

Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
images to the reconstruction, triangulate new points, and refine the
poses/point positions using a process called Bundle Adjustment.

Hopefully that helps.

answered Nov 21 '18 at 19:35

Jomnipotent17

127419

The steps of OpenSfM at a high level:

Read image exif for any prior information you can use (e.g. focal
length)

Extract feature points (e.g. SIFT)

Match feature points

Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
a track instead of match(1,2), match(2,3), etc...)

Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
images to the reconstruction, triangulate new points, and refine the
poses/point positions using a process called Bundle Adjustment.

Hopefully that helps.

answered Nov 21 '18 at 19:35

Jomnipotent17

127419

answered Nov 21 '18 at 19:35

Jomnipotent17

127419

answered Nov 21 '18 at 19:35

Jomnipotent17

127419

answered Nov 21 '18 at 19:35

Jomnipotent17

127419

add a comment |

-1

The general idea is as follows.

Here is how to do this.

First, before the loop, initialize an accumulation matrix absolute_P1:

points3d = np.empty((0,0))

files = glob.glob("imgs/dinos/*.ppm")

len = len(files)

absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])



for item in range(len-1):

    # ...

Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:

# ...

P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))

tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])



abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))

absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!



if not points3d.size:

    points3d = abs_tripoints3d

else:

    points3d = np.concatenate((points3d, abs_tripoints3d), 1)



# ...

edited Nov 21 '18 at 6:19

answered Nov 20 '18 at 20:49

BConic

6,41021538

This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.

– tel
Nov 21 '18 at 3:45

Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.

– BConic
Nov 21 '18 at 6:22

add a comment |

-1

The general idea is as follows.

Here is how to do this.

First, before the loop, initialize an accumulation matrix absolute_P1:

points3d = np.empty((0,0))

files = glob.glob("imgs/dinos/*.ppm")

len = len(files)

absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])



for item in range(len-1):

    # ...

Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:

# ...

P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))

tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])



abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))

absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!



if not points3d.size:

    points3d = abs_tripoints3d

else:

    points3d = np.concatenate((points3d, abs_tripoints3d), 1)



# ...

edited Nov 21 '18 at 6:19

answered Nov 20 '18 at 20:49

BConic

6,41021538

This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.

– tel
Nov 21 '18 at 3:45

Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.

– BConic
Nov 21 '18 at 6:22

add a comment |

-1

The general idea is as follows.

Here is how to do this.

First, before the loop, initialize an accumulation matrix absolute_P1:

points3d = np.empty((0,0))

files = glob.glob("imgs/dinos/*.ppm")

len = len(files)

absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])



for item in range(len-1):

    # ...

Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:

# ...

P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))

tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])



abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))

absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!



if not points3d.size:

    points3d = abs_tripoints3d

else:

    points3d = np.concatenate((points3d, abs_tripoints3d), 1)



# ...

edited Nov 21 '18 at 6:19

answered Nov 20 '18 at 20:49

BConic

6,41021538

The general idea is as follows.

Here is how to do this.

First, before the loop, initialize an accumulation matrix absolute_P1:

points3d = np.empty((0,0))

files = glob.glob("imgs/dinos/*.ppm")

len = len(files)

absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])



for item in range(len-1):

    # ...

Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:

# ...

P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))

tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])



abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))

absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!



if not points3d.size:

    points3d = abs_tripoints3d

else:

    points3d = np.concatenate((points3d, abs_tripoints3d), 1)



# ...

edited Nov 21 '18 at 6:19

answered Nov 20 '18 at 20:49

BConic

6,41021538

edited Nov 21 '18 at 6:19

answered Nov 20 '18 at 20:49

BConic

6,41021538

answered Nov 20 '18 at 20:49

BConic

6,41021538

answered Nov 20 '18 at 20:49

BConic

6,41021538

This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.

– tel
Nov 21 '18 at 3:45

Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.

– BConic
Nov 21 '18 at 6:22

add a comment |

This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.

– tel
Nov 21 '18 at 3:45

Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.

– BConic
Nov 21 '18 at 6:22

This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.

– tel
Nov 21 '18 at 3:45

Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.

– BConic
Nov 21 '18 at 6:22

add a comment |

-1

TL;DR

code

...in progress

answered Nov 21 '18 at 17:14

tel

7,31121431

add a comment |

-1

TL;DR

code

...in progress

answered Nov 21 '18 at 17:14

tel

7,31121431

add a comment |

-1

TL;DR

code

...in progress

answered Nov 21 '18 at 17:14

tel

7,31121431

TL;DR

code

...in progress

answered Nov 21 '18 at 17:14

tel

7,31121431

answered Nov 21 '18 at 17:14

tel

7,31121431

answered Nov 21 '18 at 17:14

tel

7,31121431

answered Nov 21 '18 at 17:14

tel

7,31121431

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu

3d point reconstruction from 2d Images

3 Answers
3

TL;DR

code

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

TL;DR

code

TL;DR

code

TL;DR

code

TL;DR

code

Post as a guest

Popular posts from this blog

MongoDB - Not Authorized To Execute Command

How to fix TextFormField cause rebuild widget in Flutter

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

Category

Random preview

3d point reconstruction from 2d Images

3 Answers 3

TL;DR

code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

TL;DR

code

TL;DR

code

TL;DR

code

TL;DR

code

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

MongoDB - Not Authorized To Execute Command

How to fix TextFormField cause rebuild widget in Flutter

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

3 Answers
3

3 Answers
3

3 Answers
3