How to find pdf is edited or not using python?











up vote
-3
down vote

favorite












I am designing a solution for a fintech company where we need to identify if the uploaded documents like bank statements is original or not ,programmatically. We are using python packages like tika , tabula etc. to extract the text and meta data . My question is there any information in meta data that could help to identify if the PDF document is changed after downloading from original source?



Any reference material or paper would be greatly helpful.



thanks










share|improve this question




















  • 2




    I would guess the only way to check would be if the original pdf had a signature. If it doesn't, then you could only know if you had a copy of the original and could compare hashes. Python can generate and compare hashes of things, but unless the pdf has a signature that you can verify with public key, or certificate authority, or you have the original (to compare) then this wouldn't be too useful.
    – Dom
    Nov 19 at 12:15















up vote
-3
down vote

favorite












I am designing a solution for a fintech company where we need to identify if the uploaded documents like bank statements is original or not ,programmatically. We are using python packages like tika , tabula etc. to extract the text and meta data . My question is there any information in meta data that could help to identify if the PDF document is changed after downloading from original source?



Any reference material or paper would be greatly helpful.



thanks










share|improve this question




















  • 2




    I would guess the only way to check would be if the original pdf had a signature. If it doesn't, then you could only know if you had a copy of the original and could compare hashes. Python can generate and compare hashes of things, but unless the pdf has a signature that you can verify with public key, or certificate authority, or you have the original (to compare) then this wouldn't be too useful.
    – Dom
    Nov 19 at 12:15













up vote
-3
down vote

favorite









up vote
-3
down vote

favorite











I am designing a solution for a fintech company where we need to identify if the uploaded documents like bank statements is original or not ,programmatically. We are using python packages like tika , tabula etc. to extract the text and meta data . My question is there any information in meta data that could help to identify if the PDF document is changed after downloading from original source?



Any reference material or paper would be greatly helpful.



thanks










share|improve this question















I am designing a solution for a fintech company where we need to identify if the uploaded documents like bank statements is original or not ,programmatically. We are using python packages like tika , tabula etc. to extract the text and meta data . My question is there any information in meta data that could help to identify if the PDF document is changed after downloading from original source?



Any reference material or paper would be greatly helpful.



thanks







python pdf apache-tika data-extraction






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago

























asked Nov 19 at 11:41









Ravi Siswaliya

95




95








  • 2




    I would guess the only way to check would be if the original pdf had a signature. If it doesn't, then you could only know if you had a copy of the original and could compare hashes. Python can generate and compare hashes of things, but unless the pdf has a signature that you can verify with public key, or certificate authority, or you have the original (to compare) then this wouldn't be too useful.
    – Dom
    Nov 19 at 12:15














  • 2




    I would guess the only way to check would be if the original pdf had a signature. If it doesn't, then you could only know if you had a copy of the original and could compare hashes. Python can generate and compare hashes of things, but unless the pdf has a signature that you can verify with public key, or certificate authority, or you have the original (to compare) then this wouldn't be too useful.
    – Dom
    Nov 19 at 12:15








2




2




I would guess the only way to check would be if the original pdf had a signature. If it doesn't, then you could only know if you had a copy of the original and could compare hashes. Python can generate and compare hashes of things, but unless the pdf has a signature that you can verify with public key, or certificate authority, or you have the original (to compare) then this wouldn't be too useful.
– Dom
Nov 19 at 12:15




I would guess the only way to check would be if the original pdf had a signature. If it doesn't, then you could only know if you had a copy of the original and could compare hashes. Python can generate and compare hashes of things, but unless the pdf has a signature that you can verify with public key, or certificate authority, or you have the original (to compare) then this wouldn't be too useful.
– Dom
Nov 19 at 12:15

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53373878%2fhow-to-find-pdf-is-edited-or-not-using-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53373878%2fhow-to-find-pdf-is-edited-or-not-using-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

Npm cannot find a required file even through it is in the searched directory