Remove all text not wrapped in XML braces





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















I want to remove all invalid text from an XML document. I consider any text not wrapped in <> XML brackets to be invalid, and want to strip these prior to translation.



From this post Regular expression to remove text outside the tags in a string - it explains how to match XML brackets together. However on my example it doesn't clean up the text outside of the XML as can be seen in this example. https://regex101.com/r/6iUyia/1



I dont think this specific example has been asked on S/O before from my initial research.



Currently in my code, I have this XML as a string, before I compose an XDocument from it later on. So I potentially have string, Regex and XDocument methods available to assist in removing this, there could additionally be more than one bit of invalid XML present in these documents. Additionally, I do not wish to use XSLT to remove these values.



One of the very rudimentary idea's I tried and failed to compose, was to iterate over the string as a char array, and attempting to remove it if it was outside of '>' and '<' but decided there must be a better way to achieve this (hence the question)



This is an example of the input, with invalid text being displayed between nested-A and nested-B



 <ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<A>
<nested-A>valid text</nested-A>
Remove text not inside valid xml braces
<nested-B>more valid text here</nested-B>
</A>
</ASchema>


I expect the output to be in a format like the below.



 <ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<A>
<nested-A>valid text</nested-A>
<nested-B>more valid text here</nested-B>
</A>
</ASchema>









share|improve this question





























    0















    I want to remove all invalid text from an XML document. I consider any text not wrapped in <> XML brackets to be invalid, and want to strip these prior to translation.



    From this post Regular expression to remove text outside the tags in a string - it explains how to match XML brackets together. However on my example it doesn't clean up the text outside of the XML as can be seen in this example. https://regex101.com/r/6iUyia/1



    I dont think this specific example has been asked on S/O before from my initial research.



    Currently in my code, I have this XML as a string, before I compose an XDocument from it later on. So I potentially have string, Regex and XDocument methods available to assist in removing this, there could additionally be more than one bit of invalid XML present in these documents. Additionally, I do not wish to use XSLT to remove these values.



    One of the very rudimentary idea's I tried and failed to compose, was to iterate over the string as a char array, and attempting to remove it if it was outside of '>' and '<' but decided there must be a better way to achieve this (hence the question)



    This is an example of the input, with invalid text being displayed between nested-A and nested-B



     <ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
    <A>
    <nested-A>valid text</nested-A>
    Remove text not inside valid xml braces
    <nested-B>more valid text here</nested-B>
    </A>
    </ASchema>


    I expect the output to be in a format like the below.



     <ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
    <A>
    <nested-A>valid text</nested-A>
    <nested-B>more valid text here</nested-B>
    </A>
    </ASchema>









    share|improve this question

























      0












      0








      0








      I want to remove all invalid text from an XML document. I consider any text not wrapped in <> XML brackets to be invalid, and want to strip these prior to translation.



      From this post Regular expression to remove text outside the tags in a string - it explains how to match XML brackets together. However on my example it doesn't clean up the text outside of the XML as can be seen in this example. https://regex101.com/r/6iUyia/1



      I dont think this specific example has been asked on S/O before from my initial research.



      Currently in my code, I have this XML as a string, before I compose an XDocument from it later on. So I potentially have string, Regex and XDocument methods available to assist in removing this, there could additionally be more than one bit of invalid XML present in these documents. Additionally, I do not wish to use XSLT to remove these values.



      One of the very rudimentary idea's I tried and failed to compose, was to iterate over the string as a char array, and attempting to remove it if it was outside of '>' and '<' but decided there must be a better way to achieve this (hence the question)



      This is an example of the input, with invalid text being displayed between nested-A and nested-B



       <ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
      <A>
      <nested-A>valid text</nested-A>
      Remove text not inside valid xml braces
      <nested-B>more valid text here</nested-B>
      </A>
      </ASchema>


      I expect the output to be in a format like the below.



       <ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
      <A>
      <nested-A>valid text</nested-A>
      <nested-B>more valid text here</nested-B>
      </A>
      </ASchema>









      share|improve this question














      I want to remove all invalid text from an XML document. I consider any text not wrapped in <> XML brackets to be invalid, and want to strip these prior to translation.



      From this post Regular expression to remove text outside the tags in a string - it explains how to match XML brackets together. However on my example it doesn't clean up the text outside of the XML as can be seen in this example. https://regex101.com/r/6iUyia/1



      I dont think this specific example has been asked on S/O before from my initial research.



      Currently in my code, I have this XML as a string, before I compose an XDocument from it later on. So I potentially have string, Regex and XDocument methods available to assist in removing this, there could additionally be more than one bit of invalid XML present in these documents. Additionally, I do not wish to use XSLT to remove these values.



      One of the very rudimentary idea's I tried and failed to compose, was to iterate over the string as a char array, and attempting to remove it if it was outside of '>' and '<' but decided there must be a better way to achieve this (hence the question)



      This is an example of the input, with invalid text being displayed between nested-A and nested-B



       <ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
      <A>
      <nested-A>valid text</nested-A>
      Remove text not inside valid xml braces
      <nested-B>more valid text here</nested-B>
      </A>
      </ASchema>


      I expect the output to be in a format like the below.



       <ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
      <A>
      <nested-A>valid text</nested-A>
      <nested-B>more valid text here</nested-B>
      </A>
      </ASchema>






      c# xml xdoc






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jan 3 at 4:35









      EightSquaredEightSquared

      15112




      15112
























          1 Answer
          1






          active

          oldest

          votes


















          1














          You could do the following . Please note I have done very limited testing, kindly let me know if it fails in some scenarios .



          XmlDocument doc = new XmlDocument();
          doc.LoadXml(str);
          var json = JsonConvert.SerializeXmlNode(doc);

          string result = JToken.Parse(json).RemoveFields().ToString(Newtonsoft.Json.Formatting.None);
          var xml = (XmlDocument)JsonConvert.DeserializeXmlNode(result);


          Where RemoveFields are defined as



          public static class Extensions
          {
          public static JToken RemoveFields(this JToken token)
          {
          JContainer container = token as JContainer;
          if (container == null) return token;

          List<JToken> removeList = new List<JToken>();
          foreach (JToken el in container.Children())
          {
          JProperty p = el as JProperty;
          if (p != null && p.Name.StartsWith("#"))
          {
          removeList.Add(el);
          }
          el.RemoveFields();
          }

          foreach (JToken el in removeList)
          el.Remove();

          return token;
          }
          }


          Output



          <ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
          <A>
          <nested-A>valid text</nested-A>
          <nested-B>more valid text here</nested-B>
          </A>
          </ASchema>


          Please note am using Json.net in above code






          share|improve this answer


























          • Hello, I'm not using Java as the tag suggests, so unable to test your code. Additionally your output does not match what I'm looking for, as you wrap an additional <A> element around my schema. :(

            – EightSquared
            Jan 3 at 6:23













          • @EightSquared I have fixed the "<A>" issue, it was a mistake from my side. Btw, why do you feel this is Java ? This is C# code :) If you felt so due to JToken, it is from Json.Net package.

            – Anu Viswan
            Jan 3 at 6:28








          • 1





            Hey Anu. On further investigation into your code, it works as expected. I need to adapt my code to reflect this extenstion. You're right about Json.net package too - I assumed it was Java!

            – EightSquared
            Jan 3 at 9:02












          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54016402%2fremove-all-text-not-wrapped-in-xml-braces%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          You could do the following . Please note I have done very limited testing, kindly let me know if it fails in some scenarios .



          XmlDocument doc = new XmlDocument();
          doc.LoadXml(str);
          var json = JsonConvert.SerializeXmlNode(doc);

          string result = JToken.Parse(json).RemoveFields().ToString(Newtonsoft.Json.Formatting.None);
          var xml = (XmlDocument)JsonConvert.DeserializeXmlNode(result);


          Where RemoveFields are defined as



          public static class Extensions
          {
          public static JToken RemoveFields(this JToken token)
          {
          JContainer container = token as JContainer;
          if (container == null) return token;

          List<JToken> removeList = new List<JToken>();
          foreach (JToken el in container.Children())
          {
          JProperty p = el as JProperty;
          if (p != null && p.Name.StartsWith("#"))
          {
          removeList.Add(el);
          }
          el.RemoveFields();
          }

          foreach (JToken el in removeList)
          el.Remove();

          return token;
          }
          }


          Output



          <ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
          <A>
          <nested-A>valid text</nested-A>
          <nested-B>more valid text here</nested-B>
          </A>
          </ASchema>


          Please note am using Json.net in above code






          share|improve this answer


























          • Hello, I'm not using Java as the tag suggests, so unable to test your code. Additionally your output does not match what I'm looking for, as you wrap an additional <A> element around my schema. :(

            – EightSquared
            Jan 3 at 6:23













          • @EightSquared I have fixed the "<A>" issue, it was a mistake from my side. Btw, why do you feel this is Java ? This is C# code :) If you felt so due to JToken, it is from Json.Net package.

            – Anu Viswan
            Jan 3 at 6:28








          • 1





            Hey Anu. On further investigation into your code, it works as expected. I need to adapt my code to reflect this extenstion. You're right about Json.net package too - I assumed it was Java!

            – EightSquared
            Jan 3 at 9:02
















          1














          You could do the following . Please note I have done very limited testing, kindly let me know if it fails in some scenarios .



          XmlDocument doc = new XmlDocument();
          doc.LoadXml(str);
          var json = JsonConvert.SerializeXmlNode(doc);

          string result = JToken.Parse(json).RemoveFields().ToString(Newtonsoft.Json.Formatting.None);
          var xml = (XmlDocument)JsonConvert.DeserializeXmlNode(result);


          Where RemoveFields are defined as



          public static class Extensions
          {
          public static JToken RemoveFields(this JToken token)
          {
          JContainer container = token as JContainer;
          if (container == null) return token;

          List<JToken> removeList = new List<JToken>();
          foreach (JToken el in container.Children())
          {
          JProperty p = el as JProperty;
          if (p != null && p.Name.StartsWith("#"))
          {
          removeList.Add(el);
          }
          el.RemoveFields();
          }

          foreach (JToken el in removeList)
          el.Remove();

          return token;
          }
          }


          Output



          <ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
          <A>
          <nested-A>valid text</nested-A>
          <nested-B>more valid text here</nested-B>
          </A>
          </ASchema>


          Please note am using Json.net in above code






          share|improve this answer


























          • Hello, I'm not using Java as the tag suggests, so unable to test your code. Additionally your output does not match what I'm looking for, as you wrap an additional <A> element around my schema. :(

            – EightSquared
            Jan 3 at 6:23













          • @EightSquared I have fixed the "<A>" issue, it was a mistake from my side. Btw, why do you feel this is Java ? This is C# code :) If you felt so due to JToken, it is from Json.Net package.

            – Anu Viswan
            Jan 3 at 6:28








          • 1





            Hey Anu. On further investigation into your code, it works as expected. I need to adapt my code to reflect this extenstion. You're right about Json.net package too - I assumed it was Java!

            – EightSquared
            Jan 3 at 9:02














          1












          1








          1







          You could do the following . Please note I have done very limited testing, kindly let me know if it fails in some scenarios .



          XmlDocument doc = new XmlDocument();
          doc.LoadXml(str);
          var json = JsonConvert.SerializeXmlNode(doc);

          string result = JToken.Parse(json).RemoveFields().ToString(Newtonsoft.Json.Formatting.None);
          var xml = (XmlDocument)JsonConvert.DeserializeXmlNode(result);


          Where RemoveFields are defined as



          public static class Extensions
          {
          public static JToken RemoveFields(this JToken token)
          {
          JContainer container = token as JContainer;
          if (container == null) return token;

          List<JToken> removeList = new List<JToken>();
          foreach (JToken el in container.Children())
          {
          JProperty p = el as JProperty;
          if (p != null && p.Name.StartsWith("#"))
          {
          removeList.Add(el);
          }
          el.RemoveFields();
          }

          foreach (JToken el in removeList)
          el.Remove();

          return token;
          }
          }


          Output



          <ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
          <A>
          <nested-A>valid text</nested-A>
          <nested-B>more valid text here</nested-B>
          </A>
          </ASchema>


          Please note am using Json.net in above code






          share|improve this answer















          You could do the following . Please note I have done very limited testing, kindly let me know if it fails in some scenarios .



          XmlDocument doc = new XmlDocument();
          doc.LoadXml(str);
          var json = JsonConvert.SerializeXmlNode(doc);

          string result = JToken.Parse(json).RemoveFields().ToString(Newtonsoft.Json.Formatting.None);
          var xml = (XmlDocument)JsonConvert.DeserializeXmlNode(result);


          Where RemoveFields are defined as



          public static class Extensions
          {
          public static JToken RemoveFields(this JToken token)
          {
          JContainer container = token as JContainer;
          if (container == null) return token;

          List<JToken> removeList = new List<JToken>();
          foreach (JToken el in container.Children())
          {
          JProperty p = el as JProperty;
          if (p != null && p.Name.StartsWith("#"))
          {
          removeList.Add(el);
          }
          el.RemoveFields();
          }

          foreach (JToken el in removeList)
          el.Remove();

          return token;
          }
          }


          Output



          <ASchema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes" xmlns:fn="http://www.w3.org/2005/xpath-functions">
          <A>
          <nested-A>valid text</nested-A>
          <nested-B>more valid text here</nested-B>
          </A>
          </ASchema>


          Please note am using Json.net in above code







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 3 at 6:27

























          answered Jan 3 at 5:37









          Anu ViswanAnu Viswan

          6,0092526




          6,0092526













          • Hello, I'm not using Java as the tag suggests, so unable to test your code. Additionally your output does not match what I'm looking for, as you wrap an additional <A> element around my schema. :(

            – EightSquared
            Jan 3 at 6:23













          • @EightSquared I have fixed the "<A>" issue, it was a mistake from my side. Btw, why do you feel this is Java ? This is C# code :) If you felt so due to JToken, it is from Json.Net package.

            – Anu Viswan
            Jan 3 at 6:28








          • 1





            Hey Anu. On further investigation into your code, it works as expected. I need to adapt my code to reflect this extenstion. You're right about Json.net package too - I assumed it was Java!

            – EightSquared
            Jan 3 at 9:02



















          • Hello, I'm not using Java as the tag suggests, so unable to test your code. Additionally your output does not match what I'm looking for, as you wrap an additional <A> element around my schema. :(

            – EightSquared
            Jan 3 at 6:23













          • @EightSquared I have fixed the "<A>" issue, it was a mistake from my side. Btw, why do you feel this is Java ? This is C# code :) If you felt so due to JToken, it is from Json.Net package.

            – Anu Viswan
            Jan 3 at 6:28








          • 1





            Hey Anu. On further investigation into your code, it works as expected. I need to adapt my code to reflect this extenstion. You're right about Json.net package too - I assumed it was Java!

            – EightSquared
            Jan 3 at 9:02

















          Hello, I'm not using Java as the tag suggests, so unable to test your code. Additionally your output does not match what I'm looking for, as you wrap an additional <A> element around my schema. :(

          – EightSquared
          Jan 3 at 6:23







          Hello, I'm not using Java as the tag suggests, so unable to test your code. Additionally your output does not match what I'm looking for, as you wrap an additional <A> element around my schema. :(

          – EightSquared
          Jan 3 at 6:23















          @EightSquared I have fixed the "<A>" issue, it was a mistake from my side. Btw, why do you feel this is Java ? This is C# code :) If you felt so due to JToken, it is from Json.Net package.

          – Anu Viswan
          Jan 3 at 6:28







          @EightSquared I have fixed the "<A>" issue, it was a mistake from my side. Btw, why do you feel this is Java ? This is C# code :) If you felt so due to JToken, it is from Json.Net package.

          – Anu Viswan
          Jan 3 at 6:28






          1




          1





          Hey Anu. On further investigation into your code, it works as expected. I need to adapt my code to reflect this extenstion. You're right about Json.net package too - I assumed it was Java!

          – EightSquared
          Jan 3 at 9:02





          Hey Anu. On further investigation into your code, it works as expected. I need to adapt my code to reflect this extenstion. You're right about Json.net package too - I assumed it was Java!

          – EightSquared
          Jan 3 at 9:02




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54016402%2fremove-all-text-not-wrapped-in-xml-braces%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          MongoDB - Not Authorized To Execute Command

          Npm cannot find a required file even through it is in the searched directory

          in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith