Get text after string
I'm looking for help to create a regular expression that can get a certain text after a given string using Python.
I'm trying to extract a JSON from a page and it's like this:
var config = {aslkdjsakljdkalsj{asdasdas}askldjaskljd};
I need a regex that can get from the first { to the } => without the semicolon
I've tried using
config = .*?(?=};)
but the output is
config = {sadasdasdas{a}asdasdasd
It gets the config =
part and doesn't get the last }
.
How can I fix it?
python regex
|
show 5 more comments
I'm looking for help to create a regular expression that can get a certain text after a given string using Python.
I'm trying to extract a JSON from a page and it's like this:
var config = {aslkdjsakljdkalsj{asdasdas}askldjaskljd};
I need a regex that can get from the first { to the } => without the semicolon
I've tried using
config = .*?(?=};)
but the output is
config = {sadasdasdas{a}asdasdasd
It gets the config =
part and doesn't get the last }
.
How can I fix it?
python regex
What language are you implementing this in?
– CertainPerformance
Jan 3 at 1:43
Also,{aslkdjsakljdkalsj{asdasdas}askldjaskljd};
is not JSON
– CertainPerformance
Jan 3 at 1:44
I'm not implementing yet. I'm trying to find the right pattern using link.
– Tauan Matos
Jan 3 at 1:48
And yes, in this example it's not a JSON.
– Tauan Matos
Jan 3 at 1:48
Which language you're using really matters - in some languages, this is impossible, in others, it's doable.
– CertainPerformance
Jan 3 at 1:50
|
show 5 more comments
I'm looking for help to create a regular expression that can get a certain text after a given string using Python.
I'm trying to extract a JSON from a page and it's like this:
var config = {aslkdjsakljdkalsj{asdasdas}askldjaskljd};
I need a regex that can get from the first { to the } => without the semicolon
I've tried using
config = .*?(?=};)
but the output is
config = {sadasdasdas{a}asdasdasd
It gets the config =
part and doesn't get the last }
.
How can I fix it?
python regex
I'm looking for help to create a regular expression that can get a certain text after a given string using Python.
I'm trying to extract a JSON from a page and it's like this:
var config = {aslkdjsakljdkalsj{asdasdas}askldjaskljd};
I need a regex that can get from the first { to the } => without the semicolon
I've tried using
config = .*?(?=};)
but the output is
config = {sadasdasdas{a}asdasdasd
It gets the config =
part and doesn't get the last }
.
How can I fix it?
python regex
python regex
edited Jan 4 at 0:46
CertainPerformance
97.5k165887
97.5k165887
asked Jan 3 at 1:42


Tauan MatosTauan Matos
133
133
What language are you implementing this in?
– CertainPerformance
Jan 3 at 1:43
Also,{aslkdjsakljdkalsj{asdasdas}askldjaskljd};
is not JSON
– CertainPerformance
Jan 3 at 1:44
I'm not implementing yet. I'm trying to find the right pattern using link.
– Tauan Matos
Jan 3 at 1:48
And yes, in this example it's not a JSON.
– Tauan Matos
Jan 3 at 1:48
Which language you're using really matters - in some languages, this is impossible, in others, it's doable.
– CertainPerformance
Jan 3 at 1:50
|
show 5 more comments
What language are you implementing this in?
– CertainPerformance
Jan 3 at 1:43
Also,{aslkdjsakljdkalsj{asdasdas}askldjaskljd};
is not JSON
– CertainPerformance
Jan 3 at 1:44
I'm not implementing yet. I'm trying to find the right pattern using link.
– Tauan Matos
Jan 3 at 1:48
And yes, in this example it's not a JSON.
– Tauan Matos
Jan 3 at 1:48
Which language you're using really matters - in some languages, this is impossible, in others, it's doable.
– CertainPerformance
Jan 3 at 1:50
What language are you implementing this in?
– CertainPerformance
Jan 3 at 1:43
What language are you implementing this in?
– CertainPerformance
Jan 3 at 1:43
Also,
{aslkdjsakljdkalsj{asdasdas}askldjaskljd};
is not JSON– CertainPerformance
Jan 3 at 1:44
Also,
{aslkdjsakljdkalsj{asdasdas}askldjaskljd};
is not JSON– CertainPerformance
Jan 3 at 1:44
I'm not implementing yet. I'm trying to find the right pattern using link.
– Tauan Matos
Jan 3 at 1:48
I'm not implementing yet. I'm trying to find the right pattern using link.
– Tauan Matos
Jan 3 at 1:48
And yes, in this example it's not a JSON.
– Tauan Matos
Jan 3 at 1:48
And yes, in this example it's not a JSON.
– Tauan Matos
Jan 3 at 1:48
Which language you're using really matters - in some languages, this is impossible, in others, it's doable.
– CertainPerformance
Jan 3 at 1:50
Which language you're using really matters - in some languages, this is impossible, in others, it's doable.
– CertainPerformance
Jan 3 at 1:50
|
show 5 more comments
1 Answer
1
active
oldest
votes
If your line of JS there is guaranteed to contain no newline characters before the terminating ;
, then the problem is simple enough - match var config =
, followed by non-newline characters captured in a group, and then matcha semicolon and the end of the line. If the JSON is delimited with '
s, then, for example, use the pattern
var config = '(.+)';$
and extract the first group.
input = '''
var config = '{ "foo": "b\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
var someOtherVar = 'bar';
'''
match = re.search("(?m)var config = '(.+)';$", input);
If the JSON isn't guaranteed to be on its own line, then it's a lot more complicated. Parsing nested structures like JSON is difficult - the only way the general problem is solvable with regular expressions is if the structure is known beforehand (which often isn't the case, and can require a lot of repetitive code in the pattern), or if the RE engine being used supports recursive matches. Without that, there's no way to to express the need for a balanced number of {
s with }
s in the pattern.
Luckily, if you're working with Python, even though Python's native REs don't support recursion, there'a a regex module available that does. You'll also need to make sure that the {
and }
s that may come inside of strings in the JSON don't affect the current nesting level. For a raw string, you'd need a pattern like
var config = String.raw`K({(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*})(?=`;)
The outside of the capture group is
var config = String.raw`K({ ... })(?=`;)
matching the line you want and the string delimiters, with a capturing group of
{(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*}
which means - {
, followed by any number of: either
"(?:\|\"|[^"])*"
- match a string inside the JSON (either a key or a value), from its starting delimiter to its ending delimiter, ignoring escaped"
s, or
[^{}]
- Match anything that isn't a{
or}
- other characters can be ignored, since we just want to get the nesting level right, or
(?1)
- Recurse the whole first capture group (the one that matches the{ ... }
)
This will ensure that the {
}
brackets are balanced by the end of the pattern.
But - the above is an example where String.raw
was used, where literal backslashes in the Javascript code indicate literal backslashes in the string. With '
delimiters, on the other hand, literal backslashes need to be double-escaped in the JS, so the above input would look like
var config = '{ "foo": "b\\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
requiring double-escaping the backslashes in the pattern as well:
var config = 'K({(?:"(?:\\|\\"|[^"])*"|[^{}]|(?1))*})(?=';)
https://regex101.com/r/8rSrGf/1
It's pretty complicated. I'd recommend going with the first approach or a variation on it instead, if at all possible.
Thanks for the editing and answer. Really helped me.
– Tauan Matos
Jan 6 at 3:37
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54015349%2fget-text-after-string%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
If your line of JS there is guaranteed to contain no newline characters before the terminating ;
, then the problem is simple enough - match var config =
, followed by non-newline characters captured in a group, and then matcha semicolon and the end of the line. If the JSON is delimited with '
s, then, for example, use the pattern
var config = '(.+)';$
and extract the first group.
input = '''
var config = '{ "foo": "b\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
var someOtherVar = 'bar';
'''
match = re.search("(?m)var config = '(.+)';$", input);
If the JSON isn't guaranteed to be on its own line, then it's a lot more complicated. Parsing nested structures like JSON is difficult - the only way the general problem is solvable with regular expressions is if the structure is known beforehand (which often isn't the case, and can require a lot of repetitive code in the pattern), or if the RE engine being used supports recursive matches. Without that, there's no way to to express the need for a balanced number of {
s with }
s in the pattern.
Luckily, if you're working with Python, even though Python's native REs don't support recursion, there'a a regex module available that does. You'll also need to make sure that the {
and }
s that may come inside of strings in the JSON don't affect the current nesting level. For a raw string, you'd need a pattern like
var config = String.raw`K({(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*})(?=`;)
The outside of the capture group is
var config = String.raw`K({ ... })(?=`;)
matching the line you want and the string delimiters, with a capturing group of
{(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*}
which means - {
, followed by any number of: either
"(?:\|\"|[^"])*"
- match a string inside the JSON (either a key or a value), from its starting delimiter to its ending delimiter, ignoring escaped"
s, or
[^{}]
- Match anything that isn't a{
or}
- other characters can be ignored, since we just want to get the nesting level right, or
(?1)
- Recurse the whole first capture group (the one that matches the{ ... }
)
This will ensure that the {
}
brackets are balanced by the end of the pattern.
But - the above is an example where String.raw
was used, where literal backslashes in the Javascript code indicate literal backslashes in the string. With '
delimiters, on the other hand, literal backslashes need to be double-escaped in the JS, so the above input would look like
var config = '{ "foo": "b\\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
requiring double-escaping the backslashes in the pattern as well:
var config = 'K({(?:"(?:\\|\\"|[^"])*"|[^{}]|(?1))*})(?=';)
https://regex101.com/r/8rSrGf/1
It's pretty complicated. I'd recommend going with the first approach or a variation on it instead, if at all possible.
Thanks for the editing and answer. Really helped me.
– Tauan Matos
Jan 6 at 3:37
add a comment |
If your line of JS there is guaranteed to contain no newline characters before the terminating ;
, then the problem is simple enough - match var config =
, followed by non-newline characters captured in a group, and then matcha semicolon and the end of the line. If the JSON is delimited with '
s, then, for example, use the pattern
var config = '(.+)';$
and extract the first group.
input = '''
var config = '{ "foo": "b\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
var someOtherVar = 'bar';
'''
match = re.search("(?m)var config = '(.+)';$", input);
If the JSON isn't guaranteed to be on its own line, then it's a lot more complicated. Parsing nested structures like JSON is difficult - the only way the general problem is solvable with regular expressions is if the structure is known beforehand (which often isn't the case, and can require a lot of repetitive code in the pattern), or if the RE engine being used supports recursive matches. Without that, there's no way to to express the need for a balanced number of {
s with }
s in the pattern.
Luckily, if you're working with Python, even though Python's native REs don't support recursion, there'a a regex module available that does. You'll also need to make sure that the {
and }
s that may come inside of strings in the JSON don't affect the current nesting level. For a raw string, you'd need a pattern like
var config = String.raw`K({(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*})(?=`;)
The outside of the capture group is
var config = String.raw`K({ ... })(?=`;)
matching the line you want and the string delimiters, with a capturing group of
{(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*}
which means - {
, followed by any number of: either
"(?:\|\"|[^"])*"
- match a string inside the JSON (either a key or a value), from its starting delimiter to its ending delimiter, ignoring escaped"
s, or
[^{}]
- Match anything that isn't a{
or}
- other characters can be ignored, since we just want to get the nesting level right, or
(?1)
- Recurse the whole first capture group (the one that matches the{ ... }
)
This will ensure that the {
}
brackets are balanced by the end of the pattern.
But - the above is an example where String.raw
was used, where literal backslashes in the Javascript code indicate literal backslashes in the string. With '
delimiters, on the other hand, literal backslashes need to be double-escaped in the JS, so the above input would look like
var config = '{ "foo": "b\\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
requiring double-escaping the backslashes in the pattern as well:
var config = 'K({(?:"(?:\\|\\"|[^"])*"|[^{}]|(?1))*})(?=';)
https://regex101.com/r/8rSrGf/1
It's pretty complicated. I'd recommend going with the first approach or a variation on it instead, if at all possible.
Thanks for the editing and answer. Really helped me.
– Tauan Matos
Jan 6 at 3:37
add a comment |
If your line of JS there is guaranteed to contain no newline characters before the terminating ;
, then the problem is simple enough - match var config =
, followed by non-newline characters captured in a group, and then matcha semicolon and the end of the line. If the JSON is delimited with '
s, then, for example, use the pattern
var config = '(.+)';$
and extract the first group.
input = '''
var config = '{ "foo": "b\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
var someOtherVar = 'bar';
'''
match = re.search("(?m)var config = '(.+)';$", input);
If the JSON isn't guaranteed to be on its own line, then it's a lot more complicated. Parsing nested structures like JSON is difficult - the only way the general problem is solvable with regular expressions is if the structure is known beforehand (which often isn't the case, and can require a lot of repetitive code in the pattern), or if the RE engine being used supports recursive matches. Without that, there's no way to to express the need for a balanced number of {
s with }
s in the pattern.
Luckily, if you're working with Python, even though Python's native REs don't support recursion, there'a a regex module available that does. You'll also need to make sure that the {
and }
s that may come inside of strings in the JSON don't affect the current nesting level. For a raw string, you'd need a pattern like
var config = String.raw`K({(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*})(?=`;)
The outside of the capture group is
var config = String.raw`K({ ... })(?=`;)
matching the line you want and the string delimiters, with a capturing group of
{(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*}
which means - {
, followed by any number of: either
"(?:\|\"|[^"])*"
- match a string inside the JSON (either a key or a value), from its starting delimiter to its ending delimiter, ignoring escaped"
s, or
[^{}]
- Match anything that isn't a{
or}
- other characters can be ignored, since we just want to get the nesting level right, or
(?1)
- Recurse the whole first capture group (the one that matches the{ ... }
)
This will ensure that the {
}
brackets are balanced by the end of the pattern.
But - the above is an example where String.raw
was used, where literal backslashes in the Javascript code indicate literal backslashes in the string. With '
delimiters, on the other hand, literal backslashes need to be double-escaped in the JS, so the above input would look like
var config = '{ "foo": "b\\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
requiring double-escaping the backslashes in the pattern as well:
var config = 'K({(?:"(?:\\|\\"|[^"])*"|[^{}]|(?1))*})(?=';)
https://regex101.com/r/8rSrGf/1
It's pretty complicated. I'd recommend going with the first approach or a variation on it instead, if at all possible.
If your line of JS there is guaranteed to contain no newline characters before the terminating ;
, then the problem is simple enough - match var config =
, followed by non-newline characters captured in a group, and then matcha semicolon and the end of the line. If the JSON is delimited with '
s, then, for example, use the pattern
var config = '(.+)';$
and extract the first group.
input = '''
var config = '{ "foo": "b\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
var someOtherVar = 'bar';
'''
match = re.search("(?m)var config = '(.+)';$", input);
If the JSON isn't guaranteed to be on its own line, then it's a lot more complicated. Parsing nested structures like JSON is difficult - the only way the general problem is solvable with regular expressions is if the structure is known beforehand (which often isn't the case, and can require a lot of repetitive code in the pattern), or if the RE engine being used supports recursive matches. Without that, there's no way to to express the need for a balanced number of {
s with }
s in the pattern.
Luckily, if you're working with Python, even though Python's native REs don't support recursion, there'a a regex module available that does. You'll also need to make sure that the {
and }
s that may come inside of strings in the JSON don't affect the current nesting level. For a raw string, you'd need a pattern like
var config = String.raw`K({(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*})(?=`;)
The outside of the capture group is
var config = String.raw`K({ ... })(?=`;)
matching the line you want and the string delimiters, with a capturing group of
{(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*}
which means - {
, followed by any number of: either
"(?:\|\"|[^"])*"
- match a string inside the JSON (either a key or a value), from its starting delimiter to its ending delimiter, ignoring escaped"
s, or
[^{}]
- Match anything that isn't a{
or}
- other characters can be ignored, since we just want to get the nesting level right, or
(?1)
- Recurse the whole first capture group (the one that matches the{ ... }
)
This will ensure that the {
}
brackets are balanced by the end of the pattern.
But - the above is an example where String.raw
was used, where literal backslashes in the Javascript code indicate literal backslashes in the string. With '
delimiters, on the other hand, literal backslashes need to be double-escaped in the JS, so the above input would look like
var config = '{ "foo": "b\\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
requiring double-escaping the backslashes in the pattern as well:
var config = 'K({(?:"(?:\\|\\"|[^"])*"|[^{}]|(?1))*})(?=';)
https://regex101.com/r/8rSrGf/1
It's pretty complicated. I'd recommend going with the first approach or a variation on it instead, if at all possible.
answered Jan 4 at 0:44
CertainPerformanceCertainPerformance
97.5k165887
97.5k165887
Thanks for the editing and answer. Really helped me.
– Tauan Matos
Jan 6 at 3:37
add a comment |
Thanks for the editing and answer. Really helped me.
– Tauan Matos
Jan 6 at 3:37
Thanks for the editing and answer. Really helped me.
– Tauan Matos
Jan 6 at 3:37
Thanks for the editing and answer. Really helped me.
– Tauan Matos
Jan 6 at 3:37
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54015349%2fget-text-after-string%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What language are you implementing this in?
– CertainPerformance
Jan 3 at 1:43
Also,
{aslkdjsakljdkalsj{asdasdas}askldjaskljd};
is not JSON– CertainPerformance
Jan 3 at 1:44
I'm not implementing yet. I'm trying to find the right pattern using link.
– Tauan Matos
Jan 3 at 1:48
And yes, in this example it's not a JSON.
– Tauan Matos
Jan 3 at 1:48
Which language you're using really matters - in some languages, this is impossible, in others, it's doable.
– CertainPerformance
Jan 3 at 1:50