Python3 PyPDF2 - how to treat file handlers as BytesIO objects?
Have a nice, tested bit of python PyPDF2 code a .py designed to operate on 'real' OS files. Having debugged it all, I am now trying to incorporate it into a plPython function, replacing files with io.BytesIO() - or whatever mechanism would be the best candidate for seamless drop-in...
The file read/writes will now be to PostgreSQL bytea cols. Documents 'in' have been written with PG copy functions - byte counts match disk sizes; so far so good.
Original code expected files:
# infile = "myInputPdf.pdf"
# outfile = "myOutputPdf.pdf"
# inputStream = open(infile, "rb") # designed to open OS-based file
# --- Instead: 'document_in' loaded from PG bytea col:
inputStream = io.BytesIO(document_in)
# ---
pdf_reader = PdfFileReader(inputStream, strict=False)
# (lots of code in here, seems? to be working)
outputStream = io.BytesIO() # trying it the python3 way!
pdf_writer.write(outputStream)
(I've assumed the objects should be treated as byte objects)
Finally:
plan3 = plpy.prepare("UPDATE documents SET document_out=$2 WHERE name=$1", ["varchar"]["varchar"])
ERROR: TypeError: list indices must be integers, not str
(PostgreSQL 11.1, if it matters)
Have done similar things in the past using mkstemp techniques; trying now to grow up into the bytes world!
python python-3.x postgresql plpython
add a comment |
Have a nice, tested bit of python PyPDF2 code a .py designed to operate on 'real' OS files. Having debugged it all, I am now trying to incorporate it into a plPython function, replacing files with io.BytesIO() - or whatever mechanism would be the best candidate for seamless drop-in...
The file read/writes will now be to PostgreSQL bytea cols. Documents 'in' have been written with PG copy functions - byte counts match disk sizes; so far so good.
Original code expected files:
# infile = "myInputPdf.pdf"
# outfile = "myOutputPdf.pdf"
# inputStream = open(infile, "rb") # designed to open OS-based file
# --- Instead: 'document_in' loaded from PG bytea col:
inputStream = io.BytesIO(document_in)
# ---
pdf_reader = PdfFileReader(inputStream, strict=False)
# (lots of code in here, seems? to be working)
outputStream = io.BytesIO() # trying it the python3 way!
pdf_writer.write(outputStream)
(I've assumed the objects should be treated as byte objects)
Finally:
plan3 = plpy.prepare("UPDATE documents SET document_out=$2 WHERE name=$1", ["varchar"]["varchar"])
ERROR: TypeError: list indices must be integers, not str
(PostgreSQL 11.1, if it matters)
Have done similar things in the past using mkstemp techniques; trying now to grow up into the bytes world!
python python-3.x postgresql plpython
add a comment |
Have a nice, tested bit of python PyPDF2 code a .py designed to operate on 'real' OS files. Having debugged it all, I am now trying to incorporate it into a plPython function, replacing files with io.BytesIO() - or whatever mechanism would be the best candidate for seamless drop-in...
The file read/writes will now be to PostgreSQL bytea cols. Documents 'in' have been written with PG copy functions - byte counts match disk sizes; so far so good.
Original code expected files:
# infile = "myInputPdf.pdf"
# outfile = "myOutputPdf.pdf"
# inputStream = open(infile, "rb") # designed to open OS-based file
# --- Instead: 'document_in' loaded from PG bytea col:
inputStream = io.BytesIO(document_in)
# ---
pdf_reader = PdfFileReader(inputStream, strict=False)
# (lots of code in here, seems? to be working)
outputStream = io.BytesIO() # trying it the python3 way!
pdf_writer.write(outputStream)
(I've assumed the objects should be treated as byte objects)
Finally:
plan3 = plpy.prepare("UPDATE documents SET document_out=$2 WHERE name=$1", ["varchar"]["varchar"])
ERROR: TypeError: list indices must be integers, not str
(PostgreSQL 11.1, if it matters)
Have done similar things in the past using mkstemp techniques; trying now to grow up into the bytes world!
python python-3.x postgresql plpython
Have a nice, tested bit of python PyPDF2 code a .py designed to operate on 'real' OS files. Having debugged it all, I am now trying to incorporate it into a plPython function, replacing files with io.BytesIO() - or whatever mechanism would be the best candidate for seamless drop-in...
The file read/writes will now be to PostgreSQL bytea cols. Documents 'in' have been written with PG copy functions - byte counts match disk sizes; so far so good.
Original code expected files:
# infile = "myInputPdf.pdf"
# outfile = "myOutputPdf.pdf"
# inputStream = open(infile, "rb") # designed to open OS-based file
# --- Instead: 'document_in' loaded from PG bytea col:
inputStream = io.BytesIO(document_in)
# ---
pdf_reader = PdfFileReader(inputStream, strict=False)
# (lots of code in here, seems? to be working)
outputStream = io.BytesIO() # trying it the python3 way!
pdf_writer.write(outputStream)
(I've assumed the objects should be treated as byte objects)
Finally:
plan3 = plpy.prepare("UPDATE documents SET document_out=$2 WHERE name=$1", ["varchar"]["varchar"])
ERROR: TypeError: list indices must be integers, not str
(PostgreSQL 11.1, if it matters)
Have done similar things in the past using mkstemp techniques; trying now to grow up into the bytes world!
python python-3.x postgresql plpython
python python-3.x postgresql plpython
edited Jan 7 at 17:26


klin
60.7k65787
60.7k65787
asked Jan 2 at 21:57
DrLouDrLou
374319
374319
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The second argument in plpy.prepare()
is a list. The column type is bytea
, not varchar
. And you should use bytes
(not a file object) to update the column:
plan3 = plpy.prepare("UPDATE documents SET document_out=$2 WHERE name=$1", ["varchar", "bytea"])
outputStream.seek(0)
bytes_out = outputStream.read()
plpy.execute(plan3, ['some name', bytes_out])
Really perfect, klin - tks for your response. Duh...! of course; gotta reset the stream's pointer.
– DrLou
Jan 3 at 13:33
add a comment |
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54013708%2fpython3-pypdf2-how-to-treat-file-handlers-as-bytesio-objects%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The second argument in plpy.prepare()
is a list. The column type is bytea
, not varchar
. And you should use bytes
(not a file object) to update the column:
plan3 = plpy.prepare("UPDATE documents SET document_out=$2 WHERE name=$1", ["varchar", "bytea"])
outputStream.seek(0)
bytes_out = outputStream.read()
plpy.execute(plan3, ['some name', bytes_out])
Really perfect, klin - tks for your response. Duh...! of course; gotta reset the stream's pointer.
– DrLou
Jan 3 at 13:33
add a comment |
The second argument in plpy.prepare()
is a list. The column type is bytea
, not varchar
. And you should use bytes
(not a file object) to update the column:
plan3 = plpy.prepare("UPDATE documents SET document_out=$2 WHERE name=$1", ["varchar", "bytea"])
outputStream.seek(0)
bytes_out = outputStream.read()
plpy.execute(plan3, ['some name', bytes_out])
Really perfect, klin - tks for your response. Duh...! of course; gotta reset the stream's pointer.
– DrLou
Jan 3 at 13:33
add a comment |
The second argument in plpy.prepare()
is a list. The column type is bytea
, not varchar
. And you should use bytes
(not a file object) to update the column:
plan3 = plpy.prepare("UPDATE documents SET document_out=$2 WHERE name=$1", ["varchar", "bytea"])
outputStream.seek(0)
bytes_out = outputStream.read()
plpy.execute(plan3, ['some name', bytes_out])
The second argument in plpy.prepare()
is a list. The column type is bytea
, not varchar
. And you should use bytes
(not a file object) to update the column:
plan3 = plpy.prepare("UPDATE documents SET document_out=$2 WHERE name=$1", ["varchar", "bytea"])
outputStream.seek(0)
bytes_out = outputStream.read()
plpy.execute(plan3, ['some name', bytes_out])
answered Jan 2 at 23:20


klinklin
60.7k65787
60.7k65787
Really perfect, klin - tks for your response. Duh...! of course; gotta reset the stream's pointer.
– DrLou
Jan 3 at 13:33
add a comment |
Really perfect, klin - tks for your response. Duh...! of course; gotta reset the stream's pointer.
– DrLou
Jan 3 at 13:33
Really perfect, klin - tks for your response. Duh...! of course; gotta reset the stream's pointer.
– DrLou
Jan 3 at 13:33
Really perfect, klin - tks for your response. Duh...! of course; gotta reset the stream's pointer.
– DrLou
Jan 3 at 13:33
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54013708%2fpython3-pypdf2-how-to-treat-file-handlers-as-bytesio-objects%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown