How to find pdf is edited or not using python?
up vote
-3
down vote
favorite
I am designing a solution for a fintech company where we need to identify if the uploaded documents like bank statements is original or not ,programmatically. We are using python packages like tika , tabula etc. to extract the text and meta data . My question is there any information in meta data that could help to identify if the PDF document is changed after downloading from original source?
Any reference material or paper would be greatly helpful.
thanks
python pdf apache-tika data-extraction
add a comment |
up vote
-3
down vote
favorite
I am designing a solution for a fintech company where we need to identify if the uploaded documents like bank statements is original or not ,programmatically. We are using python packages like tika , tabula etc. to extract the text and meta data . My question is there any information in meta data that could help to identify if the PDF document is changed after downloading from original source?
Any reference material or paper would be greatly helpful.
thanks
python pdf apache-tika data-extraction
2
I would guess the only way to check would be if the original pdf had a signature. If it doesn't, then you could only know if you had a copy of the original and could compare hashes. Python can generate and compare hashes of things, but unless the pdf has a signature that you can verify with public key, or certificate authority, or you have the original (to compare) then this wouldn't be too useful.
– Dom
Nov 19 at 12:15
add a comment |
up vote
-3
down vote
favorite
up vote
-3
down vote
favorite
I am designing a solution for a fintech company where we need to identify if the uploaded documents like bank statements is original or not ,programmatically. We are using python packages like tika , tabula etc. to extract the text and meta data . My question is there any information in meta data that could help to identify if the PDF document is changed after downloading from original source?
Any reference material or paper would be greatly helpful.
thanks
python pdf apache-tika data-extraction
I am designing a solution for a fintech company where we need to identify if the uploaded documents like bank statements is original or not ,programmatically. We are using python packages like tika , tabula etc. to extract the text and meta data . My question is there any information in meta data that could help to identify if the PDF document is changed after downloading from original source?
Any reference material or paper would be greatly helpful.
thanks
python pdf apache-tika data-extraction
python pdf apache-tika data-extraction
edited 2 days ago
asked Nov 19 at 11:41


Ravi Siswaliya
95
95
2
I would guess the only way to check would be if the original pdf had a signature. If it doesn't, then you could only know if you had a copy of the original and could compare hashes. Python can generate and compare hashes of things, but unless the pdf has a signature that you can verify with public key, or certificate authority, or you have the original (to compare) then this wouldn't be too useful.
– Dom
Nov 19 at 12:15
add a comment |
2
I would guess the only way to check would be if the original pdf had a signature. If it doesn't, then you could only know if you had a copy of the original and could compare hashes. Python can generate and compare hashes of things, but unless the pdf has a signature that you can verify with public key, or certificate authority, or you have the original (to compare) then this wouldn't be too useful.
– Dom
Nov 19 at 12:15
2
2
I would guess the only way to check would be if the original pdf had a signature. If it doesn't, then you could only know if you had a copy of the original and could compare hashes. Python can generate and compare hashes of things, but unless the pdf has a signature that you can verify with public key, or certificate authority, or you have the original (to compare) then this wouldn't be too useful.
– Dom
Nov 19 at 12:15
I would guess the only way to check would be if the original pdf had a signature. If it doesn't, then you could only know if you had a copy of the original and could compare hashes. Python can generate and compare hashes of things, but unless the pdf has a signature that you can verify with public key, or certificate authority, or you have the original (to compare) then this wouldn't be too useful.
– Dom
Nov 19 at 12:15
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53373878%2fhow-to-find-pdf-is-edited-or-not-using-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
I would guess the only way to check would be if the original pdf had a signature. If it doesn't, then you could only know if you had a copy of the original and could compare hashes. Python can generate and compare hashes of things, but unless the pdf has a signature that you can verify with public key, or certificate authority, or you have the original (to compare) then this wouldn't be too useful.
– Dom
Nov 19 at 12:15