Strange characters in string gotten from API can't decode

I'm creating a program that grabs data from an API and stores it in my own database. The problem is some of the stings have some sort of character code where quotation marks should be. Upon further inspection, it appears to be hex code for the quotation mark, but it's fantastically double escaped, confusing me along with all my decoders. I believe the string comes in as ascii and I don't have any other issues with the other characters.

I know I can simply replace the specific character code with the actual character, but I need to catch stuff like this in the future. If it is hex, I need to comb strings for hex codes and replace them procedurally.

I've tried

clean_val = unicodedata.normalize('NFKD', val).encode('latin1').decode('utf8')

I've gotten myself quite confused about the whole thing

response = session.get(url)

    if response.status_code == requests.codes.ok:

        print(response.content)



b'{"Description":"American Assets Trust, Inc. (the \u0093company\u0094) is a full service, vertically ..."}'

I think the string is stored on their database like " to satisfy some SQL escaping protocol. When I get it, the escaping slash gets mixed in with the character code, thereby messing up the encoding.

edited Mar 2 at 18:10

snakecharmerb

12.1k42552

asked Jan 2 at 23:29

Jonathan Duncan

133

Some variation of Ascii2Uni might help. Perhaps there is a Python binding or port.

– Tom Blodget
Jan 3 at 23:48

add a comment |

I've tried

clean_val = unicodedata.normalize('NFKD', val).encode('latin1').decode('utf8')

I've gotten myself quite confused about the whole thing

response = session.get(url)

    if response.status_code == requests.codes.ok:

        print(response.content)



b'{"Description":"American Assets Trust, Inc. (the \u0093company\u0094) is a full service, vertically ..."}'

I think the string is stored on their database like " to satisfy some SQL escaping protocol. When I get it, the escaping slash gets mixed in with the character code, thereby messing up the encoding.

edited Mar 2 at 18:10

snakecharmerb

12.1k42552

asked Jan 2 at 23:29

Jonathan Duncan

133

Some variation of Ascii2Uni might help. Perhaps there is a Python binding or port.

– Tom Blodget
Jan 3 at 23:48

add a comment |

I've tried

clean_val = unicodedata.normalize('NFKD', val).encode('latin1').decode('utf8')

I've gotten myself quite confused about the whole thing

response = session.get(url)

    if response.status_code == requests.codes.ok:

        print(response.content)



b'{"Description":"American Assets Trust, Inc. (the \u0093company\u0094) is a full service, vertically ..."}'

I think the string is stored on their database like " to satisfy some SQL escaping protocol. When I get it, the escaping slash gets mixed in with the character code, thereby messing up the encoding.

edited Mar 2 at 18:10

snakecharmerb

12.1k42552

asked Jan 2 at 23:29

Jonathan Duncan

133

I've tried

clean_val = unicodedata.normalize('NFKD', val).encode('latin1').decode('utf8')

I've gotten myself quite confused about the whole thing

response = session.get(url)

    if response.status_code == requests.codes.ok:

        print(response.content)



b'{"Description":"American Assets Trust, Inc. (the \u0093company\u0094) is a full service, vertically ..."}'

I think the string is stored on their database like " to satisfy some SQL escaping protocol. When I get it, the escaping slash gets mixed in with the character code, thereby messing up the encoding.

python python-3.x character-encoding

edited Mar 2 at 18:10

snakecharmerb

12.1k42552

asked Jan 2 at 23:29

Jonathan Duncan

133

edited Mar 2 at 18:10

snakecharmerb

12.1k42552

asked Jan 2 at 23:29

Jonathan Duncan

133

edited Mar 2 at 18:10

snakecharmerb

12.1k42552

edited Mar 2 at 18:10

snakecharmerb

12.1k42552

edited Mar 2 at 18:10

snakecharmerb

12.1k42552

asked Jan 2 at 23:29

Jonathan Duncan

133

asked Jan 2 at 23:29

Jonathan Duncan

133

asked Jan 2 at 23:29

Jonathan Duncan

133

Some variation of Ascii2Uni might help. Perhaps there is a Python binding or port.

– Tom Blodget
Jan 3 at 23:48

add a comment |

Some variation of Ascii2Uni might help. Perhaps there is a Python binding or port.

– Tom Blodget
Jan 3 at 23:48

Some variation of Ascii2Uni might help. Perhaps there is a Python binding or port.

– Tom Blodget
Jan 3 at 23:48

add a comment |

1 Answer
1

active

oldest

votes

It looks like these characters are from text encoded as cp1252. It's possible to decode them

>>> bs = b'{"Description":"American Assets Trust, Inc. (the \u0093company\u0094) is a full service, vertically ..."}'

>>> d = json.loads(bs)

>>> s = d['Description']

>>> decoded = s.encode('latin-1').decode('cp1252')

>>> decoded

'American Assets Trust, Inc. (the “company”) is a full service, vertically ...'

but you will have to replace them manually, using str.replace or str.translate

>>> table = str.maketrans('“”', '""')

>>> decoded = s.encode('latin-1').decode('cp1252')

>>> decoded.translate(table)

'American Assets Trust, Inc. (the "company") is a full service, vertically ...'

answered Mar 2 at 18:08

snakecharmerb

12.1k42552

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014532%2fstrange-characters-in-string-gotten-from-api-cant-decode%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

It looks like these characters are from text encoded as cp1252. It's possible to decode them

>>> bs = b'{"Description":"American Assets Trust, Inc. (the \u0093company\u0094) is a full service, vertically ..."}'

>>> d = json.loads(bs)

>>> s = d['Description']

>>> decoded = s.encode('latin-1').decode('cp1252')

>>> decoded

'American Assets Trust, Inc. (the “company”) is a full service, vertically ...'

but you will have to replace them manually, using str.replace or str.translate

>>> table = str.maketrans('“”', '""')

>>> decoded = s.encode('latin-1').decode('cp1252')

>>> decoded.translate(table)

'American Assets Trust, Inc. (the "company") is a full service, vertically ...'

answered Mar 2 at 18:08

snakecharmerb

12.1k42552

add a comment |

It looks like these characters are from text encoded as cp1252. It's possible to decode them

>>> bs = b'{"Description":"American Assets Trust, Inc. (the \u0093company\u0094) is a full service, vertically ..."}'

>>> d = json.loads(bs)

>>> s = d['Description']

>>> decoded = s.encode('latin-1').decode('cp1252')

>>> decoded

'American Assets Trust, Inc. (the “company”) is a full service, vertically ...'

but you will have to replace them manually, using str.replace or str.translate

>>> table = str.maketrans('“”', '""')

>>> decoded = s.encode('latin-1').decode('cp1252')

>>> decoded.translate(table)

'American Assets Trust, Inc. (the "company") is a full service, vertically ...'

answered Mar 2 at 18:08

snakecharmerb

12.1k42552

add a comment |

It looks like these characters are from text encoded as cp1252. It's possible to decode them

>>> bs = b'{"Description":"American Assets Trust, Inc. (the \u0093company\u0094) is a full service, vertically ..."}'

>>> d = json.loads(bs)

>>> s = d['Description']

>>> decoded = s.encode('latin-1').decode('cp1252')

>>> decoded

'American Assets Trust, Inc. (the “company”) is a full service, vertically ...'

but you will have to replace them manually, using str.replace or str.translate

>>> table = str.maketrans('“”', '""')

>>> decoded = s.encode('latin-1').decode('cp1252')

>>> decoded.translate(table)

'American Assets Trust, Inc. (the "company") is a full service, vertically ...'

answered Mar 2 at 18:08

snakecharmerb

12.1k42552

It looks like these characters are from text encoded as cp1252. It's possible to decode them

>>> bs = b'{"Description":"American Assets Trust, Inc. (the \u0093company\u0094) is a full service, vertically ..."}'

>>> d = json.loads(bs)

>>> s = d['Description']

>>> decoded = s.encode('latin-1').decode('cp1252')

>>> decoded

'American Assets Trust, Inc. (the “company”) is a full service, vertically ...'

but you will have to replace them manually, using str.replace or str.translate

>>> table = str.maketrans('“”', '""')

>>> decoded = s.encode('latin-1').decode('cp1252')

>>> decoded.translate(table)

'American Assets Trust, Inc. (the "company") is a full service, vertically ...'

answered Mar 2 at 18:08

snakecharmerb

12.1k42552

answered Mar 2 at 18:08

snakecharmerb

12.1k42552

answered Mar 2 at 18:08

snakecharmerb

12.1k42552

answered Mar 2 at 18:08

snakecharmerb

12.1k42552

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu