Get text after string












2















I'm looking for help to create a regular expression that can get a certain text after a given string using Python.



I'm trying to extract a JSON from a page and it's like this:



    var config = {aslkdjsakljdkalsj{asdasdas}askldjaskljd};


I need a regex that can get from the first { to the } => without the semicolon



I've tried using



    config = .*?(?=};)


but the output is



    config = {sadasdasdas{a}asdasdasd


It gets the config = part and doesn't get the last }.



How can I fix it?










share|improve this question

























  • What language are you implementing this in?

    – CertainPerformance
    Jan 3 at 1:43











  • Also, {aslkdjsakljdkalsj{asdasdas}askldjaskljd}; is not JSON

    – CertainPerformance
    Jan 3 at 1:44













  • I'm not implementing yet. I'm trying to find the right pattern using link.

    – Tauan Matos
    Jan 3 at 1:48











  • And yes, in this example it's not a JSON.

    – Tauan Matos
    Jan 3 at 1:48











  • Which language you're using really matters - in some languages, this is impossible, in others, it's doable.

    – CertainPerformance
    Jan 3 at 1:50
















2















I'm looking for help to create a regular expression that can get a certain text after a given string using Python.



I'm trying to extract a JSON from a page and it's like this:



    var config = {aslkdjsakljdkalsj{asdasdas}askldjaskljd};


I need a regex that can get from the first { to the } => without the semicolon



I've tried using



    config = .*?(?=};)


but the output is



    config = {sadasdasdas{a}asdasdasd


It gets the config = part and doesn't get the last }.



How can I fix it?










share|improve this question

























  • What language are you implementing this in?

    – CertainPerformance
    Jan 3 at 1:43











  • Also, {aslkdjsakljdkalsj{asdasdas}askldjaskljd}; is not JSON

    – CertainPerformance
    Jan 3 at 1:44













  • I'm not implementing yet. I'm trying to find the right pattern using link.

    – Tauan Matos
    Jan 3 at 1:48











  • And yes, in this example it's not a JSON.

    – Tauan Matos
    Jan 3 at 1:48











  • Which language you're using really matters - in some languages, this is impossible, in others, it's doable.

    – CertainPerformance
    Jan 3 at 1:50














2












2








2


1






I'm looking for help to create a regular expression that can get a certain text after a given string using Python.



I'm trying to extract a JSON from a page and it's like this:



    var config = {aslkdjsakljdkalsj{asdasdas}askldjaskljd};


I need a regex that can get from the first { to the } => without the semicolon



I've tried using



    config = .*?(?=};)


but the output is



    config = {sadasdasdas{a}asdasdasd


It gets the config = part and doesn't get the last }.



How can I fix it?










share|improve this question
















I'm looking for help to create a regular expression that can get a certain text after a given string using Python.



I'm trying to extract a JSON from a page and it's like this:



    var config = {aslkdjsakljdkalsj{asdasdas}askldjaskljd};


I need a regex that can get from the first { to the } => without the semicolon



I've tried using



    config = .*?(?=};)


but the output is



    config = {sadasdasdas{a}asdasdasd


It gets the config = part and doesn't get the last }.



How can I fix it?







python regex






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 4 at 0:46









CertainPerformance

97.5k165887




97.5k165887










asked Jan 3 at 1:42









Tauan MatosTauan Matos

133




133













  • What language are you implementing this in?

    – CertainPerformance
    Jan 3 at 1:43











  • Also, {aslkdjsakljdkalsj{asdasdas}askldjaskljd}; is not JSON

    – CertainPerformance
    Jan 3 at 1:44













  • I'm not implementing yet. I'm trying to find the right pattern using link.

    – Tauan Matos
    Jan 3 at 1:48











  • And yes, in this example it's not a JSON.

    – Tauan Matos
    Jan 3 at 1:48











  • Which language you're using really matters - in some languages, this is impossible, in others, it's doable.

    – CertainPerformance
    Jan 3 at 1:50



















  • What language are you implementing this in?

    – CertainPerformance
    Jan 3 at 1:43











  • Also, {aslkdjsakljdkalsj{asdasdas}askldjaskljd}; is not JSON

    – CertainPerformance
    Jan 3 at 1:44













  • I'm not implementing yet. I'm trying to find the right pattern using link.

    – Tauan Matos
    Jan 3 at 1:48











  • And yes, in this example it's not a JSON.

    – Tauan Matos
    Jan 3 at 1:48











  • Which language you're using really matters - in some languages, this is impossible, in others, it's doable.

    – CertainPerformance
    Jan 3 at 1:50

















What language are you implementing this in?

– CertainPerformance
Jan 3 at 1:43





What language are you implementing this in?

– CertainPerformance
Jan 3 at 1:43













Also, {aslkdjsakljdkalsj{asdasdas}askldjaskljd}; is not JSON

– CertainPerformance
Jan 3 at 1:44







Also, {aslkdjsakljdkalsj{asdasdas}askldjaskljd}; is not JSON

– CertainPerformance
Jan 3 at 1:44















I'm not implementing yet. I'm trying to find the right pattern using link.

– Tauan Matos
Jan 3 at 1:48





I'm not implementing yet. I'm trying to find the right pattern using link.

– Tauan Matos
Jan 3 at 1:48













And yes, in this example it's not a JSON.

– Tauan Matos
Jan 3 at 1:48





And yes, in this example it's not a JSON.

– Tauan Matos
Jan 3 at 1:48













Which language you're using really matters - in some languages, this is impossible, in others, it's doable.

– CertainPerformance
Jan 3 at 1:50





Which language you're using really matters - in some languages, this is impossible, in others, it's doable.

– CertainPerformance
Jan 3 at 1:50












1 Answer
1






active

oldest

votes


















1














If your line of JS there is guaranteed to contain no newline characters before the terminating ;, then the problem is simple enough - match var config =, followed by non-newline characters captured in a group, and then matcha semicolon and the end of the line. If the JSON is delimited with 's, then, for example, use the pattern



var config = '(.+)';$


and extract the first group.



input = '''
var config = '{ "foo": "b\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
var someOtherVar = 'bar';
'''
match = re.search("(?m)var config = '(.+)';$", input);


If the JSON isn't guaranteed to be on its own line, then it's a lot more complicated. Parsing nested structures like JSON is difficult - the only way the general problem is solvable with regular expressions is if the structure is known beforehand (which often isn't the case, and can require a lot of repetitive code in the pattern), or if the RE engine being used supports recursive matches. Without that, there's no way to to express the need for a balanced number of {s with }s in the pattern.



Luckily, if you're working with Python, even though Python's native REs don't support recursion, there'a a regex module available that does. You'll also need to make sure that the { and }s that may come inside of strings in the JSON don't affect the current nesting level. For a raw string, you'd need a pattern like



var config = String.raw`K({(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*})(?=`;)


The outside of the capture group is



var config = String.raw`K({ ... })(?=`;)


matching the line you want and the string delimiters, with a capturing group of



{(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*}


which means - {, followed by any number of: either





  • "(?:\|\"|[^"])*" - match a string inside the JSON (either a key or a value), from its starting delimiter to its ending delimiter, ignoring escaped "s, or


  • [^{}] - Match anything that isn't a { or } - other characters can be ignored, since we just want to get the nesting level right, or


  • (?1) - Recurse the whole first capture group (the one that matches the { ... })


This will ensure that the { } brackets are balanced by the end of the pattern.





But - the above is an example where String.raw was used, where literal backslashes in the Javascript code indicate literal backslashes in the string. With ' delimiters, on the other hand, literal backslashes need to be double-escaped in the JS, so the above input would look like



var config = '{ "foo": "b\\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';


requiring double-escaping the backslashes in the pattern as well:



var config = 'K({(?:"(?:\\|\\"|[^"])*"|[^{}]|(?1))*})(?=';)


https://regex101.com/r/8rSrGf/1



It's pretty complicated. I'd recommend going with the first approach or a variation on it instead, if at all possible.






share|improve this answer
























  • Thanks for the editing and answer. Really helped me.

    – Tauan Matos
    Jan 6 at 3:37












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54015349%2fget-text-after-string%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














If your line of JS there is guaranteed to contain no newline characters before the terminating ;, then the problem is simple enough - match var config =, followed by non-newline characters captured in a group, and then matcha semicolon and the end of the line. If the JSON is delimited with 's, then, for example, use the pattern



var config = '(.+)';$


and extract the first group.



input = '''
var config = '{ "foo": "b\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
var someOtherVar = 'bar';
'''
match = re.search("(?m)var config = '(.+)';$", input);


If the JSON isn't guaranteed to be on its own line, then it's a lot more complicated. Parsing nested structures like JSON is difficult - the only way the general problem is solvable with regular expressions is if the structure is known beforehand (which often isn't the case, and can require a lot of repetitive code in the pattern), or if the RE engine being used supports recursive matches. Without that, there's no way to to express the need for a balanced number of {s with }s in the pattern.



Luckily, if you're working with Python, even though Python's native REs don't support recursion, there'a a regex module available that does. You'll also need to make sure that the { and }s that may come inside of strings in the JSON don't affect the current nesting level. For a raw string, you'd need a pattern like



var config = String.raw`K({(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*})(?=`;)


The outside of the capture group is



var config = String.raw`K({ ... })(?=`;)


matching the line you want and the string delimiters, with a capturing group of



{(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*}


which means - {, followed by any number of: either





  • "(?:\|\"|[^"])*" - match a string inside the JSON (either a key or a value), from its starting delimiter to its ending delimiter, ignoring escaped "s, or


  • [^{}] - Match anything that isn't a { or } - other characters can be ignored, since we just want to get the nesting level right, or


  • (?1) - Recurse the whole first capture group (the one that matches the { ... })


This will ensure that the { } brackets are balanced by the end of the pattern.





But - the above is an example where String.raw was used, where literal backslashes in the Javascript code indicate literal backslashes in the string. With ' delimiters, on the other hand, literal backslashes need to be double-escaped in the JS, so the above input would look like



var config = '{ "foo": "b\\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';


requiring double-escaping the backslashes in the pattern as well:



var config = 'K({(?:"(?:\\|\\"|[^"])*"|[^{}]|(?1))*})(?=';)


https://regex101.com/r/8rSrGf/1



It's pretty complicated. I'd recommend going with the first approach or a variation on it instead, if at all possible.






share|improve this answer
























  • Thanks for the editing and answer. Really helped me.

    – Tauan Matos
    Jan 6 at 3:37
















1














If your line of JS there is guaranteed to contain no newline characters before the terminating ;, then the problem is simple enough - match var config =, followed by non-newline characters captured in a group, and then matcha semicolon and the end of the line. If the JSON is delimited with 's, then, for example, use the pattern



var config = '(.+)';$


and extract the first group.



input = '''
var config = '{ "foo": "b\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
var someOtherVar = 'bar';
'''
match = re.search("(?m)var config = '(.+)';$", input);


If the JSON isn't guaranteed to be on its own line, then it's a lot more complicated. Parsing nested structures like JSON is difficult - the only way the general problem is solvable with regular expressions is if the structure is known beforehand (which often isn't the case, and can require a lot of repetitive code in the pattern), or if the RE engine being used supports recursive matches. Without that, there's no way to to express the need for a balanced number of {s with }s in the pattern.



Luckily, if you're working with Python, even though Python's native REs don't support recursion, there'a a regex module available that does. You'll also need to make sure that the { and }s that may come inside of strings in the JSON don't affect the current nesting level. For a raw string, you'd need a pattern like



var config = String.raw`K({(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*})(?=`;)


The outside of the capture group is



var config = String.raw`K({ ... })(?=`;)


matching the line you want and the string delimiters, with a capturing group of



{(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*}


which means - {, followed by any number of: either





  • "(?:\|\"|[^"])*" - match a string inside the JSON (either a key or a value), from its starting delimiter to its ending delimiter, ignoring escaped "s, or


  • [^{}] - Match anything that isn't a { or } - other characters can be ignored, since we just want to get the nesting level right, or


  • (?1) - Recurse the whole first capture group (the one that matches the { ... })


This will ensure that the { } brackets are balanced by the end of the pattern.





But - the above is an example where String.raw was used, where literal backslashes in the Javascript code indicate literal backslashes in the string. With ' delimiters, on the other hand, literal backslashes need to be double-escaped in the JS, so the above input would look like



var config = '{ "foo": "b\\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';


requiring double-escaping the backslashes in the pattern as well:



var config = 'K({(?:"(?:\\|\\"|[^"])*"|[^{}]|(?1))*})(?=';)


https://regex101.com/r/8rSrGf/1



It's pretty complicated. I'd recommend going with the first approach or a variation on it instead, if at all possible.






share|improve this answer
























  • Thanks for the editing and answer. Really helped me.

    – Tauan Matos
    Jan 6 at 3:37














1












1








1







If your line of JS there is guaranteed to contain no newline characters before the terminating ;, then the problem is simple enough - match var config =, followed by non-newline characters captured in a group, and then matcha semicolon and the end of the line. If the JSON is delimited with 's, then, for example, use the pattern



var config = '(.+)';$


and extract the first group.



input = '''
var config = '{ "foo": "b\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
var someOtherVar = 'bar';
'''
match = re.search("(?m)var config = '(.+)';$", input);


If the JSON isn't guaranteed to be on its own line, then it's a lot more complicated. Parsing nested structures like JSON is difficult - the only way the general problem is solvable with regular expressions is if the structure is known beforehand (which often isn't the case, and can require a lot of repetitive code in the pattern), or if the RE engine being used supports recursive matches. Without that, there's no way to to express the need for a balanced number of {s with }s in the pattern.



Luckily, if you're working with Python, even though Python's native REs don't support recursion, there'a a regex module available that does. You'll also need to make sure that the { and }s that may come inside of strings in the JSON don't affect the current nesting level. For a raw string, you'd need a pattern like



var config = String.raw`K({(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*})(?=`;)


The outside of the capture group is



var config = String.raw`K({ ... })(?=`;)


matching the line you want and the string delimiters, with a capturing group of



{(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*}


which means - {, followed by any number of: either





  • "(?:\|\"|[^"])*" - match a string inside the JSON (either a key or a value), from its starting delimiter to its ending delimiter, ignoring escaped "s, or


  • [^{}] - Match anything that isn't a { or } - other characters can be ignored, since we just want to get the nesting level right, or


  • (?1) - Recurse the whole first capture group (the one that matches the { ... })


This will ensure that the { } brackets are balanced by the end of the pattern.





But - the above is an example where String.raw was used, where literal backslashes in the Javascript code indicate literal backslashes in the string. With ' delimiters, on the other hand, literal backslashes need to be double-escaped in the JS, so the above input would look like



var config = '{ "foo": "b\\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';


requiring double-escaping the backslashes in the pattern as well:



var config = 'K({(?:"(?:\\|\\"|[^"])*"|[^{}]|(?1))*})(?=';)


https://regex101.com/r/8rSrGf/1



It's pretty complicated. I'd recommend going with the first approach or a variation on it instead, if at all possible.






share|improve this answer













If your line of JS there is guaranteed to contain no newline characters before the terminating ;, then the problem is simple enough - match var config =, followed by non-newline characters captured in a group, and then matcha semicolon and the end of the line. If the JSON is delimited with 's, then, for example, use the pattern



var config = '(.+)';$


and extract the first group.



input = '''
var config = '{ "foo": "b\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
var someOtherVar = 'bar';
'''
match = re.search("(?m)var config = '(.+)';$", input);


If the JSON isn't guaranteed to be on its own line, then it's a lot more complicated. Parsing nested structures like JSON is difficult - the only way the general problem is solvable with regular expressions is if the structure is known beforehand (which often isn't the case, and can require a lot of repetitive code in the pattern), or if the RE engine being used supports recursive matches. Without that, there's no way to to express the need for a balanced number of {s with }s in the pattern.



Luckily, if you're working with Python, even though Python's native REs don't support recursion, there'a a regex module available that does. You'll also need to make sure that the { and }s that may come inside of strings in the JSON don't affect the current nesting level. For a raw string, you'd need a pattern like



var config = String.raw`K({(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*})(?=`;)


The outside of the capture group is



var config = String.raw`K({ ... })(?=`;)


matching the line you want and the string delimiters, with a capturing group of



{(?:"(?:\|\"|[^"])*"|[^{}]|(?1))*}


which means - {, followed by any number of: either





  • "(?:\|\"|[^"])*" - match a string inside the JSON (either a key or a value), from its starting delimiter to its ending delimiter, ignoring escaped "s, or


  • [^{}] - Match anything that isn't a { or } - other characters can be ignored, since we just want to get the nesting level right, or


  • (?1) - Recurse the whole first capture group (the one that matches the { ... })


This will ensure that the { } brackets are balanced by the end of the pattern.





But - the above is an example where String.raw was used, where literal backslashes in the Javascript code indicate literal backslashes in the string. With ' delimiters, on the other hand, literal backslashes need to be double-escaped in the JS, so the above input would look like



var config = '{ "foo": "b\\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';


requiring double-escaping the backslashes in the pattern as well:



var config = 'K({(?:"(?:\\|\\"|[^"])*"|[^{}]|(?1))*})(?=';)


https://regex101.com/r/8rSrGf/1



It's pretty complicated. I'd recommend going with the first approach or a variation on it instead, if at all possible.







share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 4 at 0:44









CertainPerformanceCertainPerformance

97.5k165887




97.5k165887













  • Thanks for the editing and answer. Really helped me.

    – Tauan Matos
    Jan 6 at 3:37



















  • Thanks for the editing and answer. Really helped me.

    – Tauan Matos
    Jan 6 at 3:37

















Thanks for the editing and answer. Really helped me.

– Tauan Matos
Jan 6 at 3:37





Thanks for the editing and answer. Really helped me.

– Tauan Matos
Jan 6 at 3:37




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54015349%2fget-text-after-string%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

How to fix TextFormField cause rebuild widget in Flutter