I want to get all objects except text object as an image from PDF using iTextSharp

-1

I am developing a program to convert PDF to PPTX for specific reasons using iTextSharp.
What I've done so far is to get all text objects and image objects and locations.
But I'm feeling difficult to get Table objects without texts.
Actually it would be better if I can get them as images.
My plan is to merge all objects except text objects as a background image and put text objects at proper locations.
I tried to find similar questions here but no luck so far.
If anyone knows how to do this particular job, please answer.
Thanks.

asked Jan 2 at 9:25

Piao David

158

There is nothing like a table object in a pdf (unless it's properly tagged, and even then it's merely a logical table object, not a graphical one), there only are chunks of text (or whatever table content you see) and probably some graphical objects like lines or colored rectangles. Thus, it is unclear what you want.

– mkl
Jan 2 at 11:12

mkl, thanks for your reply. Hope I can get help from you again on this question. I agree that there should be no table objects but it's interesting that when I get all images I can't see ones for tables. I used IRenderListener. Looking forward to your answer.

– Piao David
Jan 2 at 12:52

1

Implement IExtRenderListener which extends IRenderListener but has additional callbacks for vector graphics related instructions. Most likely these additional callbacks will be invoked for the lines or colored rectangles structuring your table.

– mkl
Jan 2 at 17:45

Thanks a lot, mkl. I tried IExtRenderListener but no idea how to use Path. Basically what I need to do is draw all objects on PPTX. I'm afraid Path includes all texts and images too. On the other hand, I'm thinking to remove all text objects from the PDF and get a temporary PDF. Then I can get the whole page (text objects removed) as an image and use it as a background. Do you have any ideas how to implement this way? Removing text objects and make a new PDF without texts. Thanks in advance.

– Piao David
Jan 3 at 3:39

@mkl, I'm still struggling. Looking forward to your answer

– Piao David
Jan 4 at 3:02

add a comment |

-1

asked Jan 2 at 9:25

Piao David

158

There is nothing like a table object in a pdf (unless it's properly tagged, and even then it's merely a logical table object, not a graphical one), there only are chunks of text (or whatever table content you see) and probably some graphical objects like lines or colored rectangles. Thus, it is unclear what you want.

– mkl
Jan 2 at 11:12

mkl, thanks for your reply. Hope I can get help from you again on this question. I agree that there should be no table objects but it's interesting that when I get all images I can't see ones for tables. I used IRenderListener. Looking forward to your answer.

– Piao David
Jan 2 at 12:52

1

Implement IExtRenderListener which extends IRenderListener but has additional callbacks for vector graphics related instructions. Most likely these additional callbacks will be invoked for the lines or colored rectangles structuring your table.

– mkl
Jan 2 at 17:45

Thanks a lot, mkl. I tried IExtRenderListener but no idea how to use Path. Basically what I need to do is draw all objects on PPTX. I'm afraid Path includes all texts and images too. On the other hand, I'm thinking to remove all text objects from the PDF and get a temporary PDF. Then I can get the whole page (text objects removed) as an image and use it as a background. Do you have any ideas how to implement this way? Removing text objects and make a new PDF without texts. Thanks in advance.

– Piao David
Jan 3 at 3:39

@mkl, I'm still struggling. Looking forward to your answer

– Piao David
Jan 4 at 3:02

add a comment |

-1

asked Jan 2 at 9:25

Piao David

158

c# pdf itext

asked Jan 2 at 9:25

Piao David

158

asked Jan 2 at 9:25

Piao David

158

asked Jan 2 at 9:25

Piao David

158

asked Jan 2 at 9:25

Piao David

158

asked Jan 2 at 9:25

Piao David

158

There is nothing like a table object in a pdf (unless it's properly tagged, and even then it's merely a logical table object, not a graphical one), there only are chunks of text (or whatever table content you see) and probably some graphical objects like lines or colored rectangles. Thus, it is unclear what you want.

– mkl
Jan 2 at 11:12

mkl, thanks for your reply. Hope I can get help from you again on this question. I agree that there should be no table objects but it's interesting that when I get all images I can't see ones for tables. I used IRenderListener. Looking forward to your answer.

– Piao David
Jan 2 at 12:52

1

Implement IExtRenderListener which extends IRenderListener but has additional callbacks for vector graphics related instructions. Most likely these additional callbacks will be invoked for the lines or colored rectangles structuring your table.

– mkl
Jan 2 at 17:45

Thanks a lot, mkl. I tried IExtRenderListener but no idea how to use Path. Basically what I need to do is draw all objects on PPTX. I'm afraid Path includes all texts and images too. On the other hand, I'm thinking to remove all text objects from the PDF and get a temporary PDF. Then I can get the whole page (text objects removed) as an image and use it as a background. Do you have any ideas how to implement this way? Removing text objects and make a new PDF without texts. Thanks in advance.

– Piao David
Jan 3 at 3:39

@mkl, I'm still struggling. Looking forward to your answer

– Piao David
Jan 4 at 3:02

add a comment |

There is nothing like a table object in a pdf (unless it's properly tagged, and even then it's merely a logical table object, not a graphical one), there only are chunks of text (or whatever table content you see) and probably some graphical objects like lines or colored rectangles. Thus, it is unclear what you want.

– mkl
Jan 2 at 11:12

mkl, thanks for your reply. Hope I can get help from you again on this question. I agree that there should be no table objects but it's interesting that when I get all images I can't see ones for tables. I used IRenderListener. Looking forward to your answer.

– Piao David
Jan 2 at 12:52

1

Implement IExtRenderListener which extends IRenderListener but has additional callbacks for vector graphics related instructions. Most likely these additional callbacks will be invoked for the lines or colored rectangles structuring your table.

– mkl
Jan 2 at 17:45

Thanks a lot, mkl. I tried IExtRenderListener but no idea how to use Path. Basically what I need to do is draw all objects on PPTX. I'm afraid Path includes all texts and images too. On the other hand, I'm thinking to remove all text objects from the PDF and get a temporary PDF. Then I can get the whole page (text objects removed) as an image and use it as a background. Do you have any ideas how to implement this way? Removing text objects and make a new PDF without texts. Thanks in advance.

– Piao David
Jan 3 at 3:39

@mkl, I'm still struggling. Looking forward to your answer

– Piao David
Jan 4 at 3:02

There is nothing like a table object in a pdf (unless it's properly tagged, and even then it's merely a logical table object, not a graphical one), there only are chunks of text (or whatever table content you see) and probably some graphical objects like lines or colored rectangles. Thus, it is unclear what you want.

– mkl
Jan 2 at 11:12

mkl, thanks for your reply. Hope I can get help from you again on this question. I agree that there should be no table objects but it's interesting that when I get all images I can't see ones for tables. I used IRenderListener. Looking forward to your answer.

– Piao David
Jan 2 at 12:52

Implement IExtRenderListener which extends IRenderListener but has additional callbacks for vector graphics related instructions. Most likely these additional callbacks will be invoked for the lines or colored rectangles structuring your table.

– mkl
Jan 2 at 17:45

Thanks a lot, mkl. I tried IExtRenderListener but no idea how to use Path. Basically what I need to do is draw all objects on PPTX. I'm afraid Path includes all texts and images too. On the other hand, I'm thinking to remove all text objects from the PDF and get a temporary PDF. Then I can get the whole page (text objects removed) as an image and use it as a background. Do you have any ideas how to implement this way? Removing text objects and make a new PDF without texts. Thanks in advance.

– Piao David
Jan 3 at 3:39

@mkl, I'm still struggling. Looking forward to your answer

– Piao David
Jan 4 at 3:02

add a comment |

2 Answers
2

active

oldest

votes

You say

What I've done so far is to get all text objects and image objects and locations.

but you don't go into detail how you do so. I assume you use a matching IRenderListener implementation.

But IRenderListener, as you found out yourself,

only extracts images and texts.

The main missing objects are paths and their usages.

To extract them, too, you should implement IExtRenderListener which extends IRenderListener but also retrieves information about paths. To understand the callback methods, please first be aware how path related instructions work in PDFs:

First there are instructions for building the actual path; these instructions essentially
- move to some position,
- add a line to some position from the previous position,
- add a Bézier curve to some position from the previous position using some control points, or
- add an upright rectangle at some position using some width and height information.

Then there is an optional instruction to intersect the current clip path with the generated path.

Finally, there is a drawing instruction for any combination of filling the inside of the path and stroking along the path, i.e. for doing both, either one, or neither one.

This corresponds to the callbacks you retrieve in your IExtRenderListener implementation:

/**

 * Called when the current path is being modified. E.g. new segment is being added,

 * new subpath is being started etc.

 *

 * @param renderInfo Contains information about the path segment being added to the current path.

 */

void ModifyPath(PathConstructionRenderInfo renderInfo);

is called once or more often to build the actual path, PathConstructionRenderInfo containing the actual instruction type in its Operation property (compare to the PathConstructionRenderInfo constant members MOVETO, LINETO, etc. to determine the operation type) and the required coordinates / dimensions in its SegmentData property. The Ctm property additionally returns the affine transformation that currently is set to be applied to all drawing operations.

Then

/**

 * Called when the current path should be set as a new clipping path.

 *

 * @param rule Either {@link PathPaintingRenderInfo#EVEN_ODD_RULE} or {@link PathPaintingRenderInfo#NONZERO_WINDING_RULE}

 */

void ClipPath(int rule);

is called if the current clip path shall be intersected with the constructed path.

Finally

/**

 * Called when the current path should be rendered.

 *

 * @param renderInfo Contains information about the current path which should be rendered.

 * @return The path which can be used as a new clipping path.

 */

Path RenderPath(PathPaintingRenderInfo renderInfo);

is called, PathPaintingRenderInfo containing the drawing operation in its Operation property (any combination of the PathPaintingRenderInfo constants STROKE and FILL), the rule for determining what "inside the path" means in its Rule property (NONZERO_WINDING_RULE or EVEN_ODD_RULE), and some other drawing details in the Ctm, LineWidth, LineCapStyle, LineJoinStyle, MiterLimit, and LineDashPattern properties.

answered Jan 7 at 12:04

mkl

55.1k1170149

Thanks a lot, @mkl! I think this will be an answer to my another question. Please check and share this link as an answer. I would really appreciate your any comments on my thought in that question. Thanks again. LINK:stackoverflow.com/questions/54059341/…

– Piao David
Jan 7 at 12:20

add a comment |

try to implement IRenderListener

  internal class ImageExtractor : IRenderListener

{

    private int _currentPage = 1;

    private int _imageCount = 0;

    private readonly string _outputFilePrefix;

    private readonly string _outputFolder;

    private readonly bool _overwriteExistingFiles;



    private ImageExtractor(string outputFilePrefix, string outputFolder, bool overwriteExistingFiles)

    {

        _outputFilePrefix = outputFilePrefix;

        _outputFolder = outputFolder;

        _overwriteExistingFiles = overwriteExistingFiles;

    }



    /// <summary>

    /// Extract all images from a PDF file

    /// </summary>

    /// <param name="pdfPath">Full path and file name of PDF file</param>

    /// <param name="outputFilePrefix">Basic name of exported files. If null then uses same name as PDF file.</param>

    /// <param name="outputFolder">Where to save images. If null or empty then uses same folder as PDF file.</param>

    /// <param name="overwriteExistingFiles">True to overwrite existing image files, false to skip past them</param>

    /// <returns>Count of number of images extracted.</returns>

    public static int ExtractImagesFromFile(string pdfPath, string outputFilePrefix, string outputFolder, bool overwriteExistingFiles)

    {

        // Handle setting of any default values

        outputFilePrefix = outputFilePrefix ?? System.IO.Path.GetFileNameWithoutExtension(pdfPath);

        outputFolder = String.IsNullOrEmpty(outputFolder) ? System.IO.Path.GetDirectoryName(pdfPath) : outputFolder;



        var instance = new ImageExtractor(outputFilePrefix, outputFolder, overwriteExistingFiles);



        using (var pdfReader = new PdfReader(pdfPath))

        {

            if (pdfReader.IsEncrypted())

                throw new ApplicationException(pdfPath + " is encrypted.");



            var pdfParser = new PdfReaderContentParser(pdfReader);



            while (instance._currentPage <= pdfReader.NumberOfPages)

            {

                pdfParser.ProcessContent(instance._currentPage, instance);



                instance._currentPage++;

            }

        }



        return instance._imageCount;

    }



    #region Implementation of IRenderListener



    public void BeginTextBlock() { }

    public void EndTextBlock() { }

    public void RenderText(TextRenderInfo renderInfo) { }



    public void RenderImage(ImageRenderInfo renderInfo)

    {

        if (_imageCount == 0)

        {

            var imageObject = renderInfo.GetImage();



            var imageFileName = _outputFilePrefix + _imageCount; //to get multiple file (you should add .jpg or .png ...)

            var imagePath = System.IO.Path.Combine(_outputFolder, imageFileName);







            if (_overwriteExistingFiles || !File.Exists(imagePath))

            {

                var imageRawBytes = imageObject.GetImageAsBytes();

                //create a new file ()

                File.WriteAllBytes(imagePath, imageRawBytes);



            }

        }

        _imageCount++;

    }



    #endregion // Implementation of IRenderListener



}

answered Jan 2 at 10:08

Amine

383

Yes, I already tried IRenderListener. This method only extracts images and texts. It does not return anything about tables.There's no Table related function.

– Piao David
Jan 2 at 10:26

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54003886%2fi-want-to-get-all-objects-except-text-object-as-an-image-from-pdf-using-itextsha%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You say

What I've done so far is to get all text objects and image objects and locations.

but you don't go into detail how you do so. I assume you use a matching IRenderListener implementation.

But IRenderListener, as you found out yourself,

only extracts images and texts.

The main missing objects are paths and their usages.

First there are instructions for building the actual path; these instructions essentially
- move to some position,
- add a line to some position from the previous position,
- add a Bézier curve to some position from the previous position using some control points, or
- add an upright rectangle at some position using some width and height information.

Then there is an optional instruction to intersect the current clip path with the generated path.

Finally, there is a drawing instruction for any combination of filling the inside of the path and stroking along the path, i.e. for doing both, either one, or neither one.

This corresponds to the callbacks you retrieve in your IExtRenderListener implementation:

/**

 * Called when the current path is being modified. E.g. new segment is being added,

 * new subpath is being started etc.

 *

 * @param renderInfo Contains information about the path segment being added to the current path.

 */

void ModifyPath(PathConstructionRenderInfo renderInfo);

Then

/**

 * Called when the current path should be set as a new clipping path.

 *

 * @param rule Either {@link PathPaintingRenderInfo#EVEN_ODD_RULE} or {@link PathPaintingRenderInfo#NONZERO_WINDING_RULE}

 */

void ClipPath(int rule);

is called if the current clip path shall be intersected with the constructed path.

Finally

/**

 * Called when the current path should be rendered.

 *

 * @param renderInfo Contains information about the current path which should be rendered.

 * @return The path which can be used as a new clipping path.

 */

Path RenderPath(PathPaintingRenderInfo renderInfo);

answered Jan 7 at 12:04

mkl

55.1k1170149

Thanks a lot, @mkl! I think this will be an answer to my another question. Please check and share this link as an answer. I would really appreciate your any comments on my thought in that question. Thanks again. LINK:stackoverflow.com/questions/54059341/…

– Piao David
Jan 7 at 12:20

add a comment |

You say

What I've done so far is to get all text objects and image objects and locations.

but you don't go into detail how you do so. I assume you use a matching IRenderListener implementation.

But IRenderListener, as you found out yourself,

only extracts images and texts.

The main missing objects are paths and their usages.

First there are instructions for building the actual path; these instructions essentially
- move to some position,
- add a line to some position from the previous position,
- add a Bézier curve to some position from the previous position using some control points, or
- add an upright rectangle at some position using some width and height information.

Then there is an optional instruction to intersect the current clip path with the generated path.

Finally, there is a drawing instruction for any combination of filling the inside of the path and stroking along the path, i.e. for doing both, either one, or neither one.

This corresponds to the callbacks you retrieve in your IExtRenderListener implementation:

/**

 * Called when the current path is being modified. E.g. new segment is being added,

 * new subpath is being started etc.

 *

 * @param renderInfo Contains information about the path segment being added to the current path.

 */

void ModifyPath(PathConstructionRenderInfo renderInfo);

Then

/**

 * Called when the current path should be set as a new clipping path.

 *

 * @param rule Either {@link PathPaintingRenderInfo#EVEN_ODD_RULE} or {@link PathPaintingRenderInfo#NONZERO_WINDING_RULE}

 */

void ClipPath(int rule);

is called if the current clip path shall be intersected with the constructed path.

Finally

/**

 * Called when the current path should be rendered.

 *

 * @param renderInfo Contains information about the current path which should be rendered.

 * @return The path which can be used as a new clipping path.

 */

Path RenderPath(PathPaintingRenderInfo renderInfo);

answered Jan 7 at 12:04

mkl

55.1k1170149

Thanks a lot, @mkl! I think this will be an answer to my another question. Please check and share this link as an answer. I would really appreciate your any comments on my thought in that question. Thanks again. LINK:stackoverflow.com/questions/54059341/…

– Piao David
Jan 7 at 12:20

add a comment |

You say

What I've done so far is to get all text objects and image objects and locations.

but you don't go into detail how you do so. I assume you use a matching IRenderListener implementation.

But IRenderListener, as you found out yourself,

only extracts images and texts.

The main missing objects are paths and their usages.

First there are instructions for building the actual path; these instructions essentially
- move to some position,
- add a line to some position from the previous position,
- add a Bézier curve to some position from the previous position using some control points, or
- add an upright rectangle at some position using some width and height information.

Then there is an optional instruction to intersect the current clip path with the generated path.

Finally, there is a drawing instruction for any combination of filling the inside of the path and stroking along the path, i.e. for doing both, either one, or neither one.

This corresponds to the callbacks you retrieve in your IExtRenderListener implementation:

/**

 * Called when the current path is being modified. E.g. new segment is being added,

 * new subpath is being started etc.

 *

 * @param renderInfo Contains information about the path segment being added to the current path.

 */

void ModifyPath(PathConstructionRenderInfo renderInfo);

Then

/**

 * Called when the current path should be set as a new clipping path.

 *

 * @param rule Either {@link PathPaintingRenderInfo#EVEN_ODD_RULE} or {@link PathPaintingRenderInfo#NONZERO_WINDING_RULE}

 */

void ClipPath(int rule);

is called if the current clip path shall be intersected with the constructed path.

Finally

/**

 * Called when the current path should be rendered.

 *

 * @param renderInfo Contains information about the current path which should be rendered.

 * @return The path which can be used as a new clipping path.

 */

Path RenderPath(PathPaintingRenderInfo renderInfo);

answered Jan 7 at 12:04

mkl

55.1k1170149

You say

What I've done so far is to get all text objects and image objects and locations.

but you don't go into detail how you do so. I assume you use a matching IRenderListener implementation.

But IRenderListener, as you found out yourself,

only extracts images and texts.

The main missing objects are paths and their usages.

First there are instructions for building the actual path; these instructions essentially
- move to some position,
- add a line to some position from the previous position,
- add a Bézier curve to some position from the previous position using some control points, or
- add an upright rectangle at some position using some width and height information.

Then there is an optional instruction to intersect the current clip path with the generated path.

Finally, there is a drawing instruction for any combination of filling the inside of the path and stroking along the path, i.e. for doing both, either one, or neither one.

This corresponds to the callbacks you retrieve in your IExtRenderListener implementation:

/**

 * Called when the current path is being modified. E.g. new segment is being added,

 * new subpath is being started etc.

 *

 * @param renderInfo Contains information about the path segment being added to the current path.

 */

void ModifyPath(PathConstructionRenderInfo renderInfo);

Then

/**

 * Called when the current path should be set as a new clipping path.

 *

 * @param rule Either {@link PathPaintingRenderInfo#EVEN_ODD_RULE} or {@link PathPaintingRenderInfo#NONZERO_WINDING_RULE}

 */

void ClipPath(int rule);

is called if the current clip path shall be intersected with the constructed path.

Finally

/**

 * Called when the current path should be rendered.

 *

 * @param renderInfo Contains information about the current path which should be rendered.

 * @return The path which can be used as a new clipping path.

 */

Path RenderPath(PathPaintingRenderInfo renderInfo);

answered Jan 7 at 12:04

mkl

55.1k1170149

answered Jan 7 at 12:04

mkl

55.1k1170149

answered Jan 7 at 12:04

mkl

55.1k1170149

answered Jan 7 at 12:04

mkl

55.1k1170149

Thanks a lot, @mkl! I think this will be an answer to my another question. Please check and share this link as an answer. I would really appreciate your any comments on my thought in that question. Thanks again. LINK:stackoverflow.com/questions/54059341/…

– Piao David
Jan 7 at 12:20

add a comment |

Thanks a lot, @mkl! I think this will be an answer to my another question. Please check and share this link as an answer. I would really appreciate your any comments on my thought in that question. Thanks again. LINK:stackoverflow.com/questions/54059341/…

– Piao David
Jan 7 at 12:20

Thanks a lot, @mkl! I think this will be an answer to my another question. Please check and share this link as an answer. I would really appreciate your any comments on my thought in that question. Thanks again. LINK:stackoverflow.com/questions/54059341/…

– Piao David
Jan 7 at 12:20

add a comment |

try to implement IRenderListener

  internal class ImageExtractor : IRenderListener

{

    private int _currentPage = 1;

    private int _imageCount = 0;

    private readonly string _outputFilePrefix;

    private readonly string _outputFolder;

    private readonly bool _overwriteExistingFiles;



    private ImageExtractor(string outputFilePrefix, string outputFolder, bool overwriteExistingFiles)

    {

        _outputFilePrefix = outputFilePrefix;

        _outputFolder = outputFolder;

        _overwriteExistingFiles = overwriteExistingFiles;

    }



    /// <summary>

    /// Extract all images from a PDF file

    /// </summary>

    /// <param name="pdfPath">Full path and file name of PDF file</param>

    /// <param name="outputFilePrefix">Basic name of exported files. If null then uses same name as PDF file.</param>

    /// <param name="outputFolder">Where to save images. If null or empty then uses same folder as PDF file.</param>

    /// <param name="overwriteExistingFiles">True to overwrite existing image files, false to skip past them</param>

    /// <returns>Count of number of images extracted.</returns>

    public static int ExtractImagesFromFile(string pdfPath, string outputFilePrefix, string outputFolder, bool overwriteExistingFiles)

    {

        // Handle setting of any default values

        outputFilePrefix = outputFilePrefix ?? System.IO.Path.GetFileNameWithoutExtension(pdfPath);

        outputFolder = String.IsNullOrEmpty(outputFolder) ? System.IO.Path.GetDirectoryName(pdfPath) : outputFolder;



        var instance = new ImageExtractor(outputFilePrefix, outputFolder, overwriteExistingFiles);



        using (var pdfReader = new PdfReader(pdfPath))

        {

            if (pdfReader.IsEncrypted())

                throw new ApplicationException(pdfPath + " is encrypted.");



            var pdfParser = new PdfReaderContentParser(pdfReader);



            while (instance._currentPage <= pdfReader.NumberOfPages)

            {

                pdfParser.ProcessContent(instance._currentPage, instance);



                instance._currentPage++;

            }

        }



        return instance._imageCount;

    }



    #region Implementation of IRenderListener



    public void BeginTextBlock() { }

    public void EndTextBlock() { }

    public void RenderText(TextRenderInfo renderInfo) { }



    public void RenderImage(ImageRenderInfo renderInfo)

    {

        if (_imageCount == 0)

        {

            var imageObject = renderInfo.GetImage();



            var imageFileName = _outputFilePrefix + _imageCount; //to get multiple file (you should add .jpg or .png ...)

            var imagePath = System.IO.Path.Combine(_outputFolder, imageFileName);







            if (_overwriteExistingFiles || !File.Exists(imagePath))

            {

                var imageRawBytes = imageObject.GetImageAsBytes();

                //create a new file ()

                File.WriteAllBytes(imagePath, imageRawBytes);



            }

        }

        _imageCount++;

    }



    #endregion // Implementation of IRenderListener



}

answered Jan 2 at 10:08

Amine

383

Yes, I already tried IRenderListener. This method only extracts images and texts. It does not return anything about tables.There's no Table related function.

– Piao David
Jan 2 at 10:26

add a comment |

try to implement IRenderListener

  internal class ImageExtractor : IRenderListener

{

    private int _currentPage = 1;

    private int _imageCount = 0;

    private readonly string _outputFilePrefix;

    private readonly string _outputFolder;

    private readonly bool _overwriteExistingFiles;



    private ImageExtractor(string outputFilePrefix, string outputFolder, bool overwriteExistingFiles)

    {

        _outputFilePrefix = outputFilePrefix;

        _outputFolder = outputFolder;

        _overwriteExistingFiles = overwriteExistingFiles;

    }



    /// <summary>

    /// Extract all images from a PDF file

    /// </summary>

    /// <param name="pdfPath">Full path and file name of PDF file</param>

    /// <param name="outputFilePrefix">Basic name of exported files. If null then uses same name as PDF file.</param>

    /// <param name="outputFolder">Where to save images. If null or empty then uses same folder as PDF file.</param>

    /// <param name="overwriteExistingFiles">True to overwrite existing image files, false to skip past them</param>

    /// <returns>Count of number of images extracted.</returns>

    public static int ExtractImagesFromFile(string pdfPath, string outputFilePrefix, string outputFolder, bool overwriteExistingFiles)

    {

        // Handle setting of any default values

        outputFilePrefix = outputFilePrefix ?? System.IO.Path.GetFileNameWithoutExtension(pdfPath);

        outputFolder = String.IsNullOrEmpty(outputFolder) ? System.IO.Path.GetDirectoryName(pdfPath) : outputFolder;



        var instance = new ImageExtractor(outputFilePrefix, outputFolder, overwriteExistingFiles);



        using (var pdfReader = new PdfReader(pdfPath))

        {

            if (pdfReader.IsEncrypted())

                throw new ApplicationException(pdfPath + " is encrypted.");



            var pdfParser = new PdfReaderContentParser(pdfReader);



            while (instance._currentPage <= pdfReader.NumberOfPages)

            {

                pdfParser.ProcessContent(instance._currentPage, instance);



                instance._currentPage++;

            }

        }



        return instance._imageCount;

    }



    #region Implementation of IRenderListener



    public void BeginTextBlock() { }

    public void EndTextBlock() { }

    public void RenderText(TextRenderInfo renderInfo) { }



    public void RenderImage(ImageRenderInfo renderInfo)

    {

        if (_imageCount == 0)

        {

            var imageObject = renderInfo.GetImage();



            var imageFileName = _outputFilePrefix + _imageCount; //to get multiple file (you should add .jpg or .png ...)

            var imagePath = System.IO.Path.Combine(_outputFolder, imageFileName);







            if (_overwriteExistingFiles || !File.Exists(imagePath))

            {

                var imageRawBytes = imageObject.GetImageAsBytes();

                //create a new file ()

                File.WriteAllBytes(imagePath, imageRawBytes);



            }

        }

        _imageCount++;

    }



    #endregion // Implementation of IRenderListener



}

answered Jan 2 at 10:08

Amine

383

Yes, I already tried IRenderListener. This method only extracts images and texts. It does not return anything about tables.There's no Table related function.

– Piao David
Jan 2 at 10:26

add a comment |

try to implement IRenderListener

  internal class ImageExtractor : IRenderListener

{

    private int _currentPage = 1;

    private int _imageCount = 0;

    private readonly string _outputFilePrefix;

    private readonly string _outputFolder;

    private readonly bool _overwriteExistingFiles;



    private ImageExtractor(string outputFilePrefix, string outputFolder, bool overwriteExistingFiles)

    {

        _outputFilePrefix = outputFilePrefix;

        _outputFolder = outputFolder;

        _overwriteExistingFiles = overwriteExistingFiles;

    }



    /// <summary>

    /// Extract all images from a PDF file

    /// </summary>

    /// <param name="pdfPath">Full path and file name of PDF file</param>

    /// <param name="outputFilePrefix">Basic name of exported files. If null then uses same name as PDF file.</param>

    /// <param name="outputFolder">Where to save images. If null or empty then uses same folder as PDF file.</param>

    /// <param name="overwriteExistingFiles">True to overwrite existing image files, false to skip past them</param>

    /// <returns>Count of number of images extracted.</returns>

    public static int ExtractImagesFromFile(string pdfPath, string outputFilePrefix, string outputFolder, bool overwriteExistingFiles)

    {

        // Handle setting of any default values

        outputFilePrefix = outputFilePrefix ?? System.IO.Path.GetFileNameWithoutExtension(pdfPath);

        outputFolder = String.IsNullOrEmpty(outputFolder) ? System.IO.Path.GetDirectoryName(pdfPath) : outputFolder;



        var instance = new ImageExtractor(outputFilePrefix, outputFolder, overwriteExistingFiles);



        using (var pdfReader = new PdfReader(pdfPath))

        {

            if (pdfReader.IsEncrypted())

                throw new ApplicationException(pdfPath + " is encrypted.");



            var pdfParser = new PdfReaderContentParser(pdfReader);



            while (instance._currentPage <= pdfReader.NumberOfPages)

            {

                pdfParser.ProcessContent(instance._currentPage, instance);



                instance._currentPage++;

            }

        }



        return instance._imageCount;

    }



    #region Implementation of IRenderListener



    public void BeginTextBlock() { }

    public void EndTextBlock() { }

    public void RenderText(TextRenderInfo renderInfo) { }



    public void RenderImage(ImageRenderInfo renderInfo)

    {

        if (_imageCount == 0)

        {

            var imageObject = renderInfo.GetImage();



            var imageFileName = _outputFilePrefix + _imageCount; //to get multiple file (you should add .jpg or .png ...)

            var imagePath = System.IO.Path.Combine(_outputFolder, imageFileName);







            if (_overwriteExistingFiles || !File.Exists(imagePath))

            {

                var imageRawBytes = imageObject.GetImageAsBytes();

                //create a new file ()

                File.WriteAllBytes(imagePath, imageRawBytes);



            }

        }

        _imageCount++;

    }



    #endregion // Implementation of IRenderListener



}

answered Jan 2 at 10:08

Amine

383

try to implement IRenderListener

  internal class ImageExtractor : IRenderListener

{

    private int _currentPage = 1;

    private int _imageCount = 0;

    private readonly string _outputFilePrefix;

    private readonly string _outputFolder;

    private readonly bool _overwriteExistingFiles;



    private ImageExtractor(string outputFilePrefix, string outputFolder, bool overwriteExistingFiles)

    {

        _outputFilePrefix = outputFilePrefix;

        _outputFolder = outputFolder;

        _overwriteExistingFiles = overwriteExistingFiles;

    }



    /// <summary>

    /// Extract all images from a PDF file

    /// </summary>

    /// <param name="pdfPath">Full path and file name of PDF file</param>

    /// <param name="outputFilePrefix">Basic name of exported files. If null then uses same name as PDF file.</param>

    /// <param name="outputFolder">Where to save images. If null or empty then uses same folder as PDF file.</param>

    /// <param name="overwriteExistingFiles">True to overwrite existing image files, false to skip past them</param>

    /// <returns>Count of number of images extracted.</returns>

    public static int ExtractImagesFromFile(string pdfPath, string outputFilePrefix, string outputFolder, bool overwriteExistingFiles)

    {

        // Handle setting of any default values

        outputFilePrefix = outputFilePrefix ?? System.IO.Path.GetFileNameWithoutExtension(pdfPath);

        outputFolder = String.IsNullOrEmpty(outputFolder) ? System.IO.Path.GetDirectoryName(pdfPath) : outputFolder;



        var instance = new ImageExtractor(outputFilePrefix, outputFolder, overwriteExistingFiles);



        using (var pdfReader = new PdfReader(pdfPath))

        {

            if (pdfReader.IsEncrypted())

                throw new ApplicationException(pdfPath + " is encrypted.");



            var pdfParser = new PdfReaderContentParser(pdfReader);



            while (instance._currentPage <= pdfReader.NumberOfPages)

            {

                pdfParser.ProcessContent(instance._currentPage, instance);



                instance._currentPage++;

            }

        }



        return instance._imageCount;

    }



    #region Implementation of IRenderListener



    public void BeginTextBlock() { }

    public void EndTextBlock() { }

    public void RenderText(TextRenderInfo renderInfo) { }



    public void RenderImage(ImageRenderInfo renderInfo)

    {

        if (_imageCount == 0)

        {

            var imageObject = renderInfo.GetImage();



            var imageFileName = _outputFilePrefix + _imageCount; //to get multiple file (you should add .jpg or .png ...)

            var imagePath = System.IO.Path.Combine(_outputFolder, imageFileName);







            if (_overwriteExistingFiles || !File.Exists(imagePath))

            {

                var imageRawBytes = imageObject.GetImageAsBytes();

                //create a new file ()

                File.WriteAllBytes(imagePath, imageRawBytes);



            }

        }

        _imageCount++;

    }



    #endregion // Implementation of IRenderListener



}

answered Jan 2 at 10:08

Amine

383

answered Jan 2 at 10:08

Amine

383

answered Jan 2 at 10:08

Amine

383

answered Jan 2 at 10:08

Amine

383

Yes, I already tried IRenderListener. This method only extracts images and texts. It does not return anything about tables.There's no Table related function.

– Piao David
Jan 2 at 10:26

add a comment |

Yes, I already tried IRenderListener. This method only extracts images and texts. It does not return anything about tables.There's no Table related function.

– Piao David
Jan 2 at 10:26

Yes, I already tried IRenderListener. This method only extracts images and texts. It does not return anything about tables.There's no Table related function.

– Piao David
Jan 2 at 10:26

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu