getAcroForm() method returning null values, but with PDFTextStripper I am able to read complete text

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I have a PDF document I want to read fields of that document but PDAcroForm object is null from docCatalog.getAcroForm();. with PDFTextStripper I am able to get the complete pdf as text, but I want to read fields.

The document is here.

edited Jan 4 at 18:12

halfer

14.8k759117

asked Jan 3 at 9:21

Vijendra Singh

Please add some core code logic you used, and language tag.

– psyco
Jan 3 at 9:46

1

Please share the PDF. Maybe the fields were "flattened".

– Tilman Hausherr
Jan 3 at 11:02

Actually I can't see any option to upload pdf file here. code i am using is as below : PDDocument pdDoc = null; try { pdDoc = PDDocument.load((new FileInputStream(new File("Application for Individual Life Insurance.pdf")))); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } PDDocumentCatalog docCatalog = pdDoc.getDocumentCatalog(); PDAcroForm acroForm = docCatalog.getAcroForm(); List fields = acroForm.getFields();

– Vijendra Singh
Jan 3 at 12:23

"Actually I can't see any option to upload pdf file here." - usually one uses a public file sharing service (Google drive, dropbox,...) and posts the url here.

– mkl
Jan 3 at 17:21

1

@halfer If get getAcroForm() then there are no fields. But the user believes that there are fields, so she/he saw something. Further analysis requires some knowledge of the PDF specification that goes further than the PDFBox API.

– Tilman Hausherr
Jan 3 at 19:31

|
show 5 more comments

The document is here.

edited Jan 4 at 18:12

halfer

14.8k759117

asked Jan 3 at 9:21

Vijendra Singh

Please add some core code logic you used, and language tag.

– psyco
Jan 3 at 9:46

1

Please share the PDF. Maybe the fields were "flattened".

– Tilman Hausherr
Jan 3 at 11:02

Actually I can't see any option to upload pdf file here. code i am using is as below : PDDocument pdDoc = null; try { pdDoc = PDDocument.load((new FileInputStream(new File("Application for Individual Life Insurance.pdf")))); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } PDDocumentCatalog docCatalog = pdDoc.getDocumentCatalog(); PDAcroForm acroForm = docCatalog.getAcroForm(); List fields = acroForm.getFields();

– Vijendra Singh
Jan 3 at 12:23

"Actually I can't see any option to upload pdf file here." - usually one uses a public file sharing service (Google drive, dropbox,...) and posts the url here.

– mkl
Jan 3 at 17:21

1

@halfer If get getAcroForm() then there are no fields. But the user believes that there are fields, so she/he saw something. Further analysis requires some knowledge of the PDF specification that goes further than the PDFBox API.

– Tilman Hausherr
Jan 3 at 19:31

|
show 5 more comments

The document is here.

edited Jan 4 at 18:12

halfer

14.8k759117

asked Jan 3 at 9:21

Vijendra Singh

The document is here.

pdfbox

edited Jan 4 at 18:12

halfer

14.8k759117

asked Jan 3 at 9:21

Vijendra Singh

edited Jan 4 at 18:12

halfer

14.8k759117

asked Jan 3 at 9:21

Vijendra Singh

edited Jan 4 at 18:12

halfer

14.8k759117

edited Jan 4 at 18:12

halfer

14.8k759117

edited Jan 4 at 18:12

halfer

14.8k759117

asked Jan 3 at 9:21

Vijendra Singh

asked Jan 3 at 9:21

Vijendra Singh

asked Jan 3 at 9:21

Vijendra Singh

Please add some core code logic you used, and language tag.

– psyco
Jan 3 at 9:46

1

Please share the PDF. Maybe the fields were "flattened".

– Tilman Hausherr
Jan 3 at 11:02

Actually I can't see any option to upload pdf file here. code i am using is as below : PDDocument pdDoc = null; try { pdDoc = PDDocument.load((new FileInputStream(new File("Application for Individual Life Insurance.pdf")))); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } PDDocumentCatalog docCatalog = pdDoc.getDocumentCatalog(); PDAcroForm acroForm = docCatalog.getAcroForm(); List fields = acroForm.getFields();

– Vijendra Singh
Jan 3 at 12:23

"Actually I can't see any option to upload pdf file here." - usually one uses a public file sharing service (Google drive, dropbox,...) and posts the url here.

– mkl
Jan 3 at 17:21

1

@halfer If get getAcroForm() then there are no fields. But the user believes that there are fields, so she/he saw something. Further analysis requires some knowledge of the PDF specification that goes further than the PDFBox API.

– Tilman Hausherr
Jan 3 at 19:31

|
show 5 more comments

Please add some core code logic you used, and language tag.

– psyco
Jan 3 at 9:46

1

Please share the PDF. Maybe the fields were "flattened".

– Tilman Hausherr
Jan 3 at 11:02

Actually I can't see any option to upload pdf file here. code i am using is as below : PDDocument pdDoc = null; try { pdDoc = PDDocument.load((new FileInputStream(new File("Application for Individual Life Insurance.pdf")))); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } PDDocumentCatalog docCatalog = pdDoc.getDocumentCatalog(); PDAcroForm acroForm = docCatalog.getAcroForm(); List fields = acroForm.getFields();

– Vijendra Singh
Jan 3 at 12:23

"Actually I can't see any option to upload pdf file here." - usually one uses a public file sharing service (Google drive, dropbox,...) and posts the url here.

– mkl
Jan 3 at 17:21

1

@halfer If get getAcroForm() then there are no fields. But the user believes that there are fields, so she/he saw something. Further analysis requires some knowledge of the PDF specification that goes further than the PDFBox API.

– Tilman Hausherr
Jan 3 at 19:31

Please add some core code logic you used, and language tag.

– psyco
Jan 3 at 9:46

Please share the PDF. Maybe the fields were "flattened".

– Tilman Hausherr
Jan 3 at 11:02

Actually I can't see any option to upload pdf file here. code i am using is as below : PDDocument pdDoc = null; try { pdDoc = PDDocument.load((new FileInputStream(new File("Application for Individual Life Insurance.pdf")))); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } PDDocumentCatalog docCatalog = pdDoc.getDocumentCatalog(); PDAcroForm acroForm = docCatalog.getAcroForm(); List fields = acroForm.getFields();

– Vijendra Singh
Jan 3 at 12:23

"Actually I can't see any option to upload pdf file here." - usually one uses a public file sharing service (Google drive, dropbox,...) and posts the url here.

– mkl
Jan 3 at 17:21

@halfer If get getAcroForm() then there are no fields. But the user believes that there are fields, so she/he saw something. Further analysis requires some knowledge of the PDF specification that goes further than the PDFBox API.

– Tilman Hausherr
Jan 3 at 19:31

|
show 5 more comments

1 Answer
1

active

oldest

votes

The PDF you shared does not contain any AcroForm form fields.

If you inspect the file using a PDF browser (like iText RUPS or PDFBox PDFDebugger), you'll see that the Catalog only contains a Pages and a Type entry:

Catalog screen shot

In particular, there is no AcroForm entry which bundles the data of an AcroForm form. Thus, docCatalog.getAcroForm(); cannot return any existing field structure.

Looking at the last Contents stream of e.g. page 1, one sees

Q

q

Q

q

1 0 0 1 329.78 655.45 cm

/Xi5 Do

Q

q

Q

q

1 0 0 1 324.17 624.51 cm

/Xi8 Do

Q

q

Q

q

1 0 0 1 265.95 702.31 cm

/Xi10 Do

Q

q

Q

q

1 0 0 1 554.46 655.6 cm

/Xi17 Do

Q

...

This is typical for a PDF which used to contain an AcroForm form definition which then was flattened into the page contents, for each former form field an XObject (which before defined the appearance of the form field widget annotation) is now referenced directly from the page content stream.

Thus, the only way to extract contents is via text extraction.

The obvious problem with text extraction is that it may be difficult to differentiate between former field contents and static form text like labels. Depending on the number of PDFs you have to extract data from it might be worth extending the PDFTextStripper to add some marker for text extracted from some XObject contents (in contrast to immediate page contents). Such markers would allow you to differentiate quite well.

answered Jan 7 at 16:00

mkl

55.7k1270150

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54019367%2fgetacroform-method-returning-null-values-but-with-pdftextstripper-i-am-able-t%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The PDF you shared does not contain any AcroForm form fields.

If you inspect the file using a PDF browser (like iText RUPS or PDFBox PDFDebugger), you'll see that the Catalog only contains a Pages and a Type entry:

Catalog screen shot

In particular, there is no AcroForm entry which bundles the data of an AcroForm form. Thus, docCatalog.getAcroForm(); cannot return any existing field structure.

Looking at the last Contents stream of e.g. page 1, one sees

Q

q

Q

q

1 0 0 1 329.78 655.45 cm

/Xi5 Do

Q

q

Q

q

1 0 0 1 324.17 624.51 cm

/Xi8 Do

Q

q

Q

q

1 0 0 1 265.95 702.31 cm

/Xi10 Do

Q

q

Q

q

1 0 0 1 554.46 655.6 cm

/Xi17 Do

Q

...

Thus, the only way to extract contents is via text extraction.

answered Jan 7 at 16:00

mkl

55.7k1270150

add a comment |

The PDF you shared does not contain any AcroForm form fields.

If you inspect the file using a PDF browser (like iText RUPS or PDFBox PDFDebugger), you'll see that the Catalog only contains a Pages and a Type entry:

Catalog screen shot

In particular, there is no AcroForm entry which bundles the data of an AcroForm form. Thus, docCatalog.getAcroForm(); cannot return any existing field structure.

Looking at the last Contents stream of e.g. page 1, one sees

Q

q

Q

q

1 0 0 1 329.78 655.45 cm

/Xi5 Do

Q

q

Q

q

1 0 0 1 324.17 624.51 cm

/Xi8 Do

Q

q

Q

q

1 0 0 1 265.95 702.31 cm

/Xi10 Do

Q

q

Q

q

1 0 0 1 554.46 655.6 cm

/Xi17 Do

Q

...

Thus, the only way to extract contents is via text extraction.

answered Jan 7 at 16:00

mkl

55.7k1270150

add a comment |

The PDF you shared does not contain any AcroForm form fields.

If you inspect the file using a PDF browser (like iText RUPS or PDFBox PDFDebugger), you'll see that the Catalog only contains a Pages and a Type entry:

Catalog screen shot

In particular, there is no AcroForm entry which bundles the data of an AcroForm form. Thus, docCatalog.getAcroForm(); cannot return any existing field structure.

Looking at the last Contents stream of e.g. page 1, one sees

Q

q

Q

q

1 0 0 1 329.78 655.45 cm

/Xi5 Do

Q

q

Q

q

1 0 0 1 324.17 624.51 cm

/Xi8 Do

Q

q

Q

q

1 0 0 1 265.95 702.31 cm

/Xi10 Do

Q

q

Q

q

1 0 0 1 554.46 655.6 cm

/Xi17 Do

Q

...

Thus, the only way to extract contents is via text extraction.

answered Jan 7 at 16:00

mkl

55.7k1270150

The PDF you shared does not contain any AcroForm form fields.

If you inspect the file using a PDF browser (like iText RUPS or PDFBox PDFDebugger), you'll see that the Catalog only contains a Pages and a Type entry:

Catalog screen shot

In particular, there is no AcroForm entry which bundles the data of an AcroForm form. Thus, docCatalog.getAcroForm(); cannot return any existing field structure.

Looking at the last Contents stream of e.g. page 1, one sees

Q

q

Q

q

1 0 0 1 329.78 655.45 cm

/Xi5 Do

Q

q

Q

q

1 0 0 1 324.17 624.51 cm

/Xi8 Do

Q

q

Q

q

1 0 0 1 265.95 702.31 cm

/Xi10 Do

Q

q

Q

q

1 0 0 1 554.46 655.6 cm

/Xi17 Do

Q

...

Thus, the only way to extract contents is via text extraction.

answered Jan 7 at 16:00

mkl

55.7k1270150

answered Jan 7 at 16:00

mkl

55.7k1270150

answered Jan 7 at 16:00

mkl

55.7k1270150

answered Jan 7 at 16:00

mkl

55.7k1270150

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu