Remove all text not wrapped in XML braces
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I want to remove all invalid text from an XML document. I consider any text not wrapped in <> XML brackets to be invalid, and want to strip these prior to translation.
From this post Regular expression to remove text outside the tags in a string - it explains how to match XML brackets together. However on my example it doesn't clean up the text outside of the XML as can be seen in this example. https://regex101.com/r/6iUyia/1
I dont think this specific example has been asked on S/O before from my initial research.
Currently in my code, I have this XML as a string, before I compose an XDocument from it later on. So I potentially have string, Regex and XDocument methods available to assist in removing this, there could additionally be more than one bit of invalid XML present in these documents. Additionally, I do not wish to use XSLT to remove these values.
One of the very rudimentary idea's I tried and failed to compose, was to iterate over the string as a char array, and attempting to remove it if it was outside of '>' and '<' but decided there must be a better way to achieve this (hence the question)
This is an example of the input, with invalid text being displayed between nested-A and nested-B
<ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<A>
<nested-A>valid text</nested-A>
Remove text not inside valid xml braces
<nested-B>more valid text here</nested-B>
</A>
</ASchema>
I expect the output to be in a format like the below.
<ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<A>
<nested-A>valid text</nested-A>
<nested-B>more valid text here</nested-B>
</A>
</ASchema>
c# xml xdoc
add a comment |
I want to remove all invalid text from an XML document. I consider any text not wrapped in <> XML brackets to be invalid, and want to strip these prior to translation.
From this post Regular expression to remove text outside the tags in a string - it explains how to match XML brackets together. However on my example it doesn't clean up the text outside of the XML as can be seen in this example. https://regex101.com/r/6iUyia/1
I dont think this specific example has been asked on S/O before from my initial research.
Currently in my code, I have this XML as a string, before I compose an XDocument from it later on. So I potentially have string, Regex and XDocument methods available to assist in removing this, there could additionally be more than one bit of invalid XML present in these documents. Additionally, I do not wish to use XSLT to remove these values.
One of the very rudimentary idea's I tried and failed to compose, was to iterate over the string as a char array, and attempting to remove it if it was outside of '>' and '<' but decided there must be a better way to achieve this (hence the question)
This is an example of the input, with invalid text being displayed between nested-A and nested-B
<ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<A>
<nested-A>valid text</nested-A>
Remove text not inside valid xml braces
<nested-B>more valid text here</nested-B>
</A>
</ASchema>
I expect the output to be in a format like the below.
<ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<A>
<nested-A>valid text</nested-A>
<nested-B>more valid text here</nested-B>
</A>
</ASchema>
c# xml xdoc
add a comment |
I want to remove all invalid text from an XML document. I consider any text not wrapped in <> XML brackets to be invalid, and want to strip these prior to translation.
From this post Regular expression to remove text outside the tags in a string - it explains how to match XML brackets together. However on my example it doesn't clean up the text outside of the XML as can be seen in this example. https://regex101.com/r/6iUyia/1
I dont think this specific example has been asked on S/O before from my initial research.
Currently in my code, I have this XML as a string, before I compose an XDocument from it later on. So I potentially have string, Regex and XDocument methods available to assist in removing this, there could additionally be more than one bit of invalid XML present in these documents. Additionally, I do not wish to use XSLT to remove these values.
One of the very rudimentary idea's I tried and failed to compose, was to iterate over the string as a char array, and attempting to remove it if it was outside of '>' and '<' but decided there must be a better way to achieve this (hence the question)
This is an example of the input, with invalid text being displayed between nested-A and nested-B
<ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<A>
<nested-A>valid text</nested-A>
Remove text not inside valid xml braces
<nested-B>more valid text here</nested-B>
</A>
</ASchema>
I expect the output to be in a format like the below.
<ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<A>
<nested-A>valid text</nested-A>
<nested-B>more valid text here</nested-B>
</A>
</ASchema>
c# xml xdoc
I want to remove all invalid text from an XML document. I consider any text not wrapped in <> XML brackets to be invalid, and want to strip these prior to translation.
From this post Regular expression to remove text outside the tags in a string - it explains how to match XML brackets together. However on my example it doesn't clean up the text outside of the XML as can be seen in this example. https://regex101.com/r/6iUyia/1
I dont think this specific example has been asked on S/O before from my initial research.
Currently in my code, I have this XML as a string, before I compose an XDocument from it later on. So I potentially have string, Regex and XDocument methods available to assist in removing this, there could additionally be more than one bit of invalid XML present in these documents. Additionally, I do not wish to use XSLT to remove these values.
One of the very rudimentary idea's I tried and failed to compose, was to iterate over the string as a char array, and attempting to remove it if it was outside of '>' and '<' but decided there must be a better way to achieve this (hence the question)
This is an example of the input, with invalid text being displayed between nested-A and nested-B
<ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<A>
<nested-A>valid text</nested-A>
Remove text not inside valid xml braces
<nested-B>more valid text here</nested-B>
</A>
</ASchema>
I expect the output to be in a format like the below.
<ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<A>
<nested-A>valid text</nested-A>
<nested-B>more valid text here</nested-B>
</A>
</ASchema>
c# xml xdoc
c# xml xdoc
asked Jan 3 at 4:35
EightSquaredEightSquared
15112
15112
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You could do the following . Please note I have done very limited testing, kindly let me know if it fails in some scenarios .
XmlDocument doc = new XmlDocument();
doc.LoadXml(str);
var json = JsonConvert.SerializeXmlNode(doc);
string result = JToken.Parse(json).RemoveFields().ToString(Newtonsoft.Json.Formatting.None);
var xml = (XmlDocument)JsonConvert.DeserializeXmlNode(result);
Where RemoveFields are defined as
public static class Extensions
{
public static JToken RemoveFields(this JToken token)
{
JContainer container = token as JContainer;
if (container == null) return token;
List<JToken> removeList = new List<JToken>();
foreach (JToken el in container.Children())
{
JProperty p = el as JProperty;
if (p != null && p.Name.StartsWith("#"))
{
removeList.Add(el);
}
el.RemoveFields();
}
foreach (JToken el in removeList)
el.Remove();
return token;
}
}
Output
<ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<A>
<nested-A>valid text</nested-A>
<nested-B>more valid text here</nested-B>
</A>
</ASchema>
Please note am using Json.net in above code
Hello, I'm not using Java as the tag suggests, so unable to test your code. Additionally your output does not match what I'm looking for, as you wrap an additional <A> element around my schema. :(
– EightSquared
Jan 3 at 6:23
@EightSquared I have fixed the "<A>" issue, it was a mistake from my side. Btw, why do you feel this is Java ? This is C# code :) If you felt so due to JToken, it is from Json.Net package.
– Anu Viswan
Jan 3 at 6:28
1
Hey Anu. On further investigation into your code, it works as expected. I need to adapt my code to reflect this extenstion. You're right about Json.net package too - I assumed it was Java!
– EightSquared
Jan 3 at 9:02
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54016402%2fremove-all-text-not-wrapped-in-xml-braces%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You could do the following . Please note I have done very limited testing, kindly let me know if it fails in some scenarios .
XmlDocument doc = new XmlDocument();
doc.LoadXml(str);
var json = JsonConvert.SerializeXmlNode(doc);
string result = JToken.Parse(json).RemoveFields().ToString(Newtonsoft.Json.Formatting.None);
var xml = (XmlDocument)JsonConvert.DeserializeXmlNode(result);
Where RemoveFields are defined as
public static class Extensions
{
public static JToken RemoveFields(this JToken token)
{
JContainer container = token as JContainer;
if (container == null) return token;
List<JToken> removeList = new List<JToken>();
foreach (JToken el in container.Children())
{
JProperty p = el as JProperty;
if (p != null && p.Name.StartsWith("#"))
{
removeList.Add(el);
}
el.RemoveFields();
}
foreach (JToken el in removeList)
el.Remove();
return token;
}
}
Output
<ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<A>
<nested-A>valid text</nested-A>
<nested-B>more valid text here</nested-B>
</A>
</ASchema>
Please note am using Json.net in above code
Hello, I'm not using Java as the tag suggests, so unable to test your code. Additionally your output does not match what I'm looking for, as you wrap an additional <A> element around my schema. :(
– EightSquared
Jan 3 at 6:23
@EightSquared I have fixed the "<A>" issue, it was a mistake from my side. Btw, why do you feel this is Java ? This is C# code :) If you felt so due to JToken, it is from Json.Net package.
– Anu Viswan
Jan 3 at 6:28
1
Hey Anu. On further investigation into your code, it works as expected. I need to adapt my code to reflect this extenstion. You're right about Json.net package too - I assumed it was Java!
– EightSquared
Jan 3 at 9:02
add a comment |
You could do the following . Please note I have done very limited testing, kindly let me know if it fails in some scenarios .
XmlDocument doc = new XmlDocument();
doc.LoadXml(str);
var json = JsonConvert.SerializeXmlNode(doc);
string result = JToken.Parse(json).RemoveFields().ToString(Newtonsoft.Json.Formatting.None);
var xml = (XmlDocument)JsonConvert.DeserializeXmlNode(result);
Where RemoveFields are defined as
public static class Extensions
{
public static JToken RemoveFields(this JToken token)
{
JContainer container = token as JContainer;
if (container == null) return token;
List<JToken> removeList = new List<JToken>();
foreach (JToken el in container.Children())
{
JProperty p = el as JProperty;
if (p != null && p.Name.StartsWith("#"))
{
removeList.Add(el);
}
el.RemoveFields();
}
foreach (JToken el in removeList)
el.Remove();
return token;
}
}
Output
<ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<A>
<nested-A>valid text</nested-A>
<nested-B>more valid text here</nested-B>
</A>
</ASchema>
Please note am using Json.net in above code
Hello, I'm not using Java as the tag suggests, so unable to test your code. Additionally your output does not match what I'm looking for, as you wrap an additional <A> element around my schema. :(
– EightSquared
Jan 3 at 6:23
@EightSquared I have fixed the "<A>" issue, it was a mistake from my side. Btw, why do you feel this is Java ? This is C# code :) If you felt so due to JToken, it is from Json.Net package.
– Anu Viswan
Jan 3 at 6:28
1
Hey Anu. On further investigation into your code, it works as expected. I need to adapt my code to reflect this extenstion. You're right about Json.net package too - I assumed it was Java!
– EightSquared
Jan 3 at 9:02
add a comment |
You could do the following . Please note I have done very limited testing, kindly let me know if it fails in some scenarios .
XmlDocument doc = new XmlDocument();
doc.LoadXml(str);
var json = JsonConvert.SerializeXmlNode(doc);
string result = JToken.Parse(json).RemoveFields().ToString(Newtonsoft.Json.Formatting.None);
var xml = (XmlDocument)JsonConvert.DeserializeXmlNode(result);
Where RemoveFields are defined as
public static class Extensions
{
public static JToken RemoveFields(this JToken token)
{
JContainer container = token as JContainer;
if (container == null) return token;
List<JToken> removeList = new List<JToken>();
foreach (JToken el in container.Children())
{
JProperty p = el as JProperty;
if (p != null && p.Name.StartsWith("#"))
{
removeList.Add(el);
}
el.RemoveFields();
}
foreach (JToken el in removeList)
el.Remove();
return token;
}
}
Output
<ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<A>
<nested-A>valid text</nested-A>
<nested-B>more valid text here</nested-B>
</A>
</ASchema>
Please note am using Json.net in above code
You could do the following . Please note I have done very limited testing, kindly let me know if it fails in some scenarios .
XmlDocument doc = new XmlDocument();
doc.LoadXml(str);
var json = JsonConvert.SerializeXmlNode(doc);
string result = JToken.Parse(json).RemoveFields().ToString(Newtonsoft.Json.Formatting.None);
var xml = (XmlDocument)JsonConvert.DeserializeXmlNode(result);
Where RemoveFields are defined as
public static class Extensions
{
public static JToken RemoveFields(this JToken token)
{
JContainer container = token as JContainer;
if (container == null) return token;
List<JToken> removeList = new List<JToken>();
foreach (JToken el in container.Children())
{
JProperty p = el as JProperty;
if (p != null && p.Name.StartsWith("#"))
{
removeList.Add(el);
}
el.RemoveFields();
}
foreach (JToken el in removeList)
el.Remove();
return token;
}
}
Output
<ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<A>
<nested-A>valid text</nested-A>
<nested-B>more valid text here</nested-B>
</A>
</ASchema>
Please note am using Json.net in above code
edited Jan 3 at 6:27
answered Jan 3 at 5:37


Anu ViswanAnu Viswan
6,0092526
6,0092526
Hello, I'm not using Java as the tag suggests, so unable to test your code. Additionally your output does not match what I'm looking for, as you wrap an additional <A> element around my schema. :(
– EightSquared
Jan 3 at 6:23
@EightSquared I have fixed the "<A>" issue, it was a mistake from my side. Btw, why do you feel this is Java ? This is C# code :) If you felt so due to JToken, it is from Json.Net package.
– Anu Viswan
Jan 3 at 6:28
1
Hey Anu. On further investigation into your code, it works as expected. I need to adapt my code to reflect this extenstion. You're right about Json.net package too - I assumed it was Java!
– EightSquared
Jan 3 at 9:02
add a comment |
Hello, I'm not using Java as the tag suggests, so unable to test your code. Additionally your output does not match what I'm looking for, as you wrap an additional <A> element around my schema. :(
– EightSquared
Jan 3 at 6:23
@EightSquared I have fixed the "<A>" issue, it was a mistake from my side. Btw, why do you feel this is Java ? This is C# code :) If you felt so due to JToken, it is from Json.Net package.
– Anu Viswan
Jan 3 at 6:28
1
Hey Anu. On further investigation into your code, it works as expected. I need to adapt my code to reflect this extenstion. You're right about Json.net package too - I assumed it was Java!
– EightSquared
Jan 3 at 9:02
Hello, I'm not using Java as the tag suggests, so unable to test your code. Additionally your output does not match what I'm looking for, as you wrap an additional <A> element around my schema. :(
– EightSquared
Jan 3 at 6:23
Hello, I'm not using Java as the tag suggests, so unable to test your code. Additionally your output does not match what I'm looking for, as you wrap an additional <A> element around my schema. :(
– EightSquared
Jan 3 at 6:23
@EightSquared I have fixed the "<A>" issue, it was a mistake from my side. Btw, why do you feel this is Java ? This is C# code :) If you felt so due to JToken, it is from Json.Net package.
– Anu Viswan
Jan 3 at 6:28
@EightSquared I have fixed the "<A>" issue, it was a mistake from my side. Btw, why do you feel this is Java ? This is C# code :) If you felt so due to JToken, it is from Json.Net package.
– Anu Viswan
Jan 3 at 6:28
1
1
Hey Anu. On further investigation into your code, it works as expected. I need to adapt my code to reflect this extenstion. You're right about Json.net package too - I assumed it was Java!
– EightSquared
Jan 3 at 9:02
Hey Anu. On further investigation into your code, it works as expected. I need to adapt my code to reflect this extenstion. You're right about Json.net package too - I assumed it was Java!
– EightSquared
Jan 3 at 9:02
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54016402%2fremove-all-text-not-wrapped-in-xml-braces%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown