Check if XML Element has children or not, in ElementTree

I retrieve an XML documents this way:

import xml.etree.ElementTree as ET



root = ET.parse(urllib2.urlopen(url))

for child in root.findall("item"):

  a1 = child[0].text # ok

  a2 = child[1].text # ok

  a3 = child[2].text # ok

  a4 = child[3].text # BOOM

  # ...

The XML looks like this:

<item>

  <a1>value1</a1>

  <a2>value2</a2>

  <a3>value3</a3>

  <a4>

    <a11>value222</a11>

    <a22>value22</a22>

  </a4>

</item>

How do I check if a4 (in this particular case, but it might've been any other element) has children?

edited Jan 1 at 6:38

smci

15.2k676108

asked Sep 20 '14 at 16:01

アレックス

9,2182499186

add a comment |

I retrieve an XML documents this way:

import xml.etree.ElementTree as ET



root = ET.parse(urllib2.urlopen(url))

for child in root.findall("item"):

  a1 = child[0].text # ok

  a2 = child[1].text # ok

  a3 = child[2].text # ok

  a4 = child[3].text # BOOM

  # ...

The XML looks like this:

<item>

  <a1>value1</a1>

  <a2>value2</a2>

  <a3>value3</a3>

  <a4>

    <a11>value222</a11>

    <a22>value22</a22>

  </a4>

</item>

How do I check if a4 (in this particular case, but it might've been any other element) has children?

edited Jan 1 at 6:38

smci

15.2k676108

asked Sep 20 '14 at 16:01

アレックス

9,2182499186

add a comment |

I retrieve an XML documents this way:

import xml.etree.ElementTree as ET



root = ET.parse(urllib2.urlopen(url))

for child in root.findall("item"):

  a1 = child[0].text # ok

  a2 = child[1].text # ok

  a3 = child[2].text # ok

  a4 = child[3].text # BOOM

  # ...

The XML looks like this:

<item>

  <a1>value1</a1>

  <a2>value2</a2>

  <a3>value3</a3>

  <a4>

    <a11>value222</a11>

    <a22>value22</a22>

  </a4>

</item>

How do I check if a4 (in this particular case, but it might've been any other element) has children?

edited Jan 1 at 6:38

smci

15.2k676108

asked Sep 20 '14 at 16:01

アレックス

9,2182499186

I retrieve an XML documents this way:

import xml.etree.ElementTree as ET



root = ET.parse(urllib2.urlopen(url))

for child in root.findall("item"):

  a1 = child[0].text # ok

  a2 = child[1].text # ok

  a3 = child[2].text # ok

  a4 = child[3].text # BOOM

  # ...

The XML looks like this:

<item>

  <a1>value1</a1>

  <a2>value2</a2>

  <a3>value3</a3>

  <a4>

    <a11>value222</a11>

    <a22>value22</a22>

  </a4>

</item>

How do I check if a4 (in this particular case, but it might've been any other element) has children?

python xml elementtree children

edited Jan 1 at 6:38

smci

15.2k676108

asked Sep 20 '14 at 16:01

アレックス

9,2182499186

edited Jan 1 at 6:38

smci

15.2k676108

asked Sep 20 '14 at 16:01

アレックス

9,2182499186

edited Jan 1 at 6:38

smci

15.2k676108

edited Jan 1 at 6:38

smci

15.2k676108

edited Jan 1 at 6:38

smci

15.2k676108

asked Sep 20 '14 at 16:01

アレックス

9,2182499186

asked Sep 20 '14 at 16:01

アレックス

9,2182499186

asked Sep 20 '14 at 16:01

アレックス

9,2182499186

add a comment |

5 Answers
5

active

oldest

votes

You could try the list function on the element:

>>> xml = """<item>

  <a1>value1</a1>

  <a2>value2</a2>

  <a3>value3</a3>

  <a4>

    <a11>value222</a11>

    <a22>value22</a22>

  </a4>

</item>"""

>>> root = ET.fromstring(xml)

>>> list(root[0])



>>> list(root[3])

[<Element 'a11' at 0x2321e10>, <Element 'a22' at 0x2321e48>]

>>> len(list(root[3]))

2

>>> print "has children" if len(list(root[3])) else "no child"

has children

>>> print "has children" if len(list(root[2])) else "no child"

no child

>>> # Or simpler, without a call to list within len, it also works:

>>> print "has children" if len(root[3]) else "no child"

has children

I modified your sample because the findall function call on the item root did not work (as findall will search for direct descendants, and not the current element). If you want to access text of the subchildren afterward in your working program, you could do:

for child in root.findall("item"):

  # if there are children, get their text content as well.

  if len(child): 

    for subchild in child:

      subchild.text

  # else just get the current child text.

  else:

    child.text

This would be a good fit for a recursive though.

edited Sep 20 '14 at 17:50

answered Sep 20 '14 at 16:14

jlr

741515

doesn't work. Could you use my example with iteration?

– アレックス
Sep 20 '14 at 16:28

1

it does not work, because your iteration loop yields no elements, since there are no elements named 'item'

– marscher
Sep 20 '14 at 16:36

yes, it yields them in my real application.

– アレックス
Sep 20 '14 at 16:43

how do I get "<a11>" and "<a22>" elements?

– アレックス
Sep 20 '14 at 16:44

It works, check this pythonfiddle: pythonfiddle.com/check-if-element-has-children-or-not Else tell me exactly what did not work. Your sample did not work though, hence why I modified it. Let me modify my answer to tell you how to access the subchildren.

– jlr
Sep 20 '14 at 17:34

|
show 1 more comment

The simplest way I have been able to find is to use the bool value of the element directly. This means you can use a4 in a conditional statement as-is:

a4 = Element('a4')

if a4:

    print('Has kids')

else:

    print('No kids yet')



a4.append(Element('x'))

if a4:

    print('Has kids now')

else:

    print('Still no kids')

Running this code will print

No kids yet

Has kids now

The boolean value of an element does not say anything about text, tail or attributes. It only indicates the presence or absence of children, which is what the original question was asking.

edited Jul 22 '16 at 18:20

answered Jul 22 '16 at 18:13

Mad Physicist

38k1674108

add a comment |

The element class has the get children method. So you should use something like this, to check if there are children and store result in a dictionary by key=tag name:

result = {}

for child in root.findall("item"):

   is child.getchildren() == :

      result[child.tag] = child.text

answered Sep 20 '14 at 16:14

marscher

3101212

getchildren is deprecated though since version 2.7. From the documentation: Use list(elem) or iteration.

– jlr
Sep 20 '14 at 16:15

you're right. It should not be used anymore

– marscher
Sep 20 '14 at 16:16

add a comment |

I would personally recommend that you use an xml parser that fully supports xpath expressions. The subset supported by xml.etree is insufficient for tasks like this.

For example, in lxml I can do:

"give me all children of the children of the <item> node":

doc.xpath('//item/*/child::*') #equivalent to '//item/*/*', if you're being terse

Out[18]: [<Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>]

or,

"give me all of <item>'s children that have no children themselves":

doc.xpath('/item/*[count(child::*) = 0]')

Out[20]: 

[<Element a1 at 0x7f60ec1c1588>,

 <Element a2 at 0x7f60ec1c15c8>,

 <Element a3 at 0x7f60ec1c1608>]

or,

"give me ALL of the elements that don't have any children":

doc.xpath('//*[count(child::*) = 0]')

Out[29]: 

[<Element a1 at 0x7f60ec1c1588>,

 <Element a2 at 0x7f60ec1c15c8>,

 <Element a3 at 0x7f60ec1c1608>,

 <Element a11 at 0x7f60ec1c1348>,

 <Element a22 at 0x7f60ec1c1888>]



# and if I only care about the text from those nodes...

doc.xpath('//*[count(child::*) = 0]/text()')

Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22']

edited Dec 17 '17 at 13:14

Mad Physicist

38k1674108

answered Sep 20 '14 at 16:17

roippi

20.1k33253

Suggesting lxml assumes there is a problem with performance and xpath features are lacking. It's definitely better than ElementTree but I wouldn't go this way if there is no problem with the latter, especially considering that lxml requires installation and it's not always a nice walk in the park.

– jlr
Sep 20 '14 at 17:47

1

Performance is a thing, yes, but full xpath support means that you do all the work of selecting nodes in one compact place. xpath queries take me a few seconds to write; writing python code to walk the tree and select the nodes I want takes longer and is far likelier to generate bugs. There are lots of benefits other than performance.

– roippi
Sep 20 '14 at 17:56

add a comment |

You can use the iter method

import xml.etree.ElementTree as ET



etree = ET.parse('file.xml')

root = etree.getroot()

a = 

for child in root.iter():

    if child.text:

        if len(child.text.split()) > 0:

            a.append(child.text)

print(a)

answered May 21 '18 at 11:17

David Córdoba Ruiz

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f25950635%2fcheck-if-xml-element-has-children-or-not-in-elementtree%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

You could try the list function on the element:

>>> xml = """<item>

  <a1>value1</a1>

  <a2>value2</a2>

  <a3>value3</a3>

  <a4>

    <a11>value222</a11>

    <a22>value22</a22>

  </a4>

</item>"""

>>> root = ET.fromstring(xml)

>>> list(root[0])



>>> list(root[3])

[<Element 'a11' at 0x2321e10>, <Element 'a22' at 0x2321e48>]

>>> len(list(root[3]))

2

>>> print "has children" if len(list(root[3])) else "no child"

has children

>>> print "has children" if len(list(root[2])) else "no child"

no child

>>> # Or simpler, without a call to list within len, it also works:

>>> print "has children" if len(root[3]) else "no child"

has children

for child in root.findall("item"):

  # if there are children, get their text content as well.

  if len(child): 

    for subchild in child:

      subchild.text

  # else just get the current child text.

  else:

    child.text

This would be a good fit for a recursive though.

edited Sep 20 '14 at 17:50

answered Sep 20 '14 at 16:14

jlr

741515

doesn't work. Could you use my example with iteration?

– アレックス
Sep 20 '14 at 16:28

1

it does not work, because your iteration loop yields no elements, since there are no elements named 'item'

– marscher
Sep 20 '14 at 16:36

yes, it yields them in my real application.

– アレックス
Sep 20 '14 at 16:43

how do I get "<a11>" and "<a22>" elements?

– アレックス
Sep 20 '14 at 16:44

It works, check this pythonfiddle: pythonfiddle.com/check-if-element-has-children-or-not Else tell me exactly what did not work. Your sample did not work though, hence why I modified it. Let me modify my answer to tell you how to access the subchildren.

– jlr
Sep 20 '14 at 17:34

|
show 1 more comment

You could try the list function on the element:

>>> xml = """<item>

  <a1>value1</a1>

  <a2>value2</a2>

  <a3>value3</a3>

  <a4>

    <a11>value222</a11>

    <a22>value22</a22>

  </a4>

</item>"""

>>> root = ET.fromstring(xml)

>>> list(root[0])



>>> list(root[3])

[<Element 'a11' at 0x2321e10>, <Element 'a22' at 0x2321e48>]

>>> len(list(root[3]))

2

>>> print "has children" if len(list(root[3])) else "no child"

has children

>>> print "has children" if len(list(root[2])) else "no child"

no child

>>> # Or simpler, without a call to list within len, it also works:

>>> print "has children" if len(root[3]) else "no child"

has children

for child in root.findall("item"):

  # if there are children, get their text content as well.

  if len(child): 

    for subchild in child:

      subchild.text

  # else just get the current child text.

  else:

    child.text

This would be a good fit for a recursive though.

edited Sep 20 '14 at 17:50

answered Sep 20 '14 at 16:14

jlr

741515

doesn't work. Could you use my example with iteration?

– アレックス
Sep 20 '14 at 16:28

1

it does not work, because your iteration loop yields no elements, since there are no elements named 'item'

– marscher
Sep 20 '14 at 16:36

yes, it yields them in my real application.

– アレックス
Sep 20 '14 at 16:43

how do I get "<a11>" and "<a22>" elements?

– アレックス
Sep 20 '14 at 16:44

It works, check this pythonfiddle: pythonfiddle.com/check-if-element-has-children-or-not Else tell me exactly what did not work. Your sample did not work though, hence why I modified it. Let me modify my answer to tell you how to access the subchildren.

– jlr
Sep 20 '14 at 17:34

|
show 1 more comment

You could try the list function on the element:

>>> xml = """<item>

  <a1>value1</a1>

  <a2>value2</a2>

  <a3>value3</a3>

  <a4>

    <a11>value222</a11>

    <a22>value22</a22>

  </a4>

</item>"""

>>> root = ET.fromstring(xml)

>>> list(root[0])



>>> list(root[3])

[<Element 'a11' at 0x2321e10>, <Element 'a22' at 0x2321e48>]

>>> len(list(root[3]))

2

>>> print "has children" if len(list(root[3])) else "no child"

has children

>>> print "has children" if len(list(root[2])) else "no child"

no child

>>> # Or simpler, without a call to list within len, it also works:

>>> print "has children" if len(root[3]) else "no child"

has children

for child in root.findall("item"):

  # if there are children, get their text content as well.

  if len(child): 

    for subchild in child:

      subchild.text

  # else just get the current child text.

  else:

    child.text

This would be a good fit for a recursive though.

edited Sep 20 '14 at 17:50

answered Sep 20 '14 at 16:14

jlr

741515

You could try the list function on the element:

>>> xml = """<item>

  <a1>value1</a1>

  <a2>value2</a2>

  <a3>value3</a3>

  <a4>

    <a11>value222</a11>

    <a22>value22</a22>

  </a4>

</item>"""

>>> root = ET.fromstring(xml)

>>> list(root[0])



>>> list(root[3])

[<Element 'a11' at 0x2321e10>, <Element 'a22' at 0x2321e48>]

>>> len(list(root[3]))

2

>>> print "has children" if len(list(root[3])) else "no child"

has children

>>> print "has children" if len(list(root[2])) else "no child"

no child

>>> # Or simpler, without a call to list within len, it also works:

>>> print "has children" if len(root[3]) else "no child"

has children

for child in root.findall("item"):

  # if there are children, get their text content as well.

  if len(child): 

    for subchild in child:

      subchild.text

  # else just get the current child text.

  else:

    child.text

This would be a good fit for a recursive though.

edited Sep 20 '14 at 17:50

answered Sep 20 '14 at 16:14

jlr

741515

edited Sep 20 '14 at 17:50

answered Sep 20 '14 at 16:14

jlr

741515

answered Sep 20 '14 at 16:14

jlr

741515

answered Sep 20 '14 at 16:14

jlr

741515

doesn't work. Could you use my example with iteration?

– アレックス
Sep 20 '14 at 16:28

1

it does not work, because your iteration loop yields no elements, since there are no elements named 'item'

– marscher
Sep 20 '14 at 16:36

yes, it yields them in my real application.

– アレックス
Sep 20 '14 at 16:43

how do I get "<a11>" and "<a22>" elements?

– アレックス
Sep 20 '14 at 16:44

It works, check this pythonfiddle: pythonfiddle.com/check-if-element-has-children-or-not Else tell me exactly what did not work. Your sample did not work though, hence why I modified it. Let me modify my answer to tell you how to access the subchildren.

– jlr
Sep 20 '14 at 17:34

|
show 1 more comment

doesn't work. Could you use my example with iteration?

– アレックス
Sep 20 '14 at 16:28

1

it does not work, because your iteration loop yields no elements, since there are no elements named 'item'

– marscher
Sep 20 '14 at 16:36

yes, it yields them in my real application.

– アレックス
Sep 20 '14 at 16:43

how do I get "<a11>" and "<a22>" elements?

– アレックス
Sep 20 '14 at 16:44

It works, check this pythonfiddle: pythonfiddle.com/check-if-element-has-children-or-not Else tell me exactly what did not work. Your sample did not work though, hence why I modified it. Let me modify my answer to tell you how to access the subchildren.

– jlr
Sep 20 '14 at 17:34

doesn't work. Could you use my example with iteration?

– アレックス
Sep 20 '14 at 16:28

it does not work, because your iteration loop yields no elements, since there are no elements named 'item'

– marscher
Sep 20 '14 at 16:36

yes, it yields them in my real application.

– アレックス
Sep 20 '14 at 16:43

how do I get "<a11>" and "<a22>" elements?

– アレックス
Sep 20 '14 at 16:44

It works, check this pythonfiddle: pythonfiddle.com/check-if-element-has-children-or-not Else tell me exactly what did not work. Your sample did not work though, hence why I modified it. Let me modify my answer to tell you how to access the subchildren.

– jlr
Sep 20 '14 at 17:34

|
show 1 more comment

The simplest way I have been able to find is to use the bool value of the element directly. This means you can use a4 in a conditional statement as-is:

a4 = Element('a4')

if a4:

    print('Has kids')

else:

    print('No kids yet')



a4.append(Element('x'))

if a4:

    print('Has kids now')

else:

    print('Still no kids')

Running this code will print

No kids yet

Has kids now

The boolean value of an element does not say anything about text, tail or attributes. It only indicates the presence or absence of children, which is what the original question was asking.

edited Jul 22 '16 at 18:20

answered Jul 22 '16 at 18:13

Mad Physicist

38k1674108

add a comment |

The simplest way I have been able to find is to use the bool value of the element directly. This means you can use a4 in a conditional statement as-is:

a4 = Element('a4')

if a4:

    print('Has kids')

else:

    print('No kids yet')



a4.append(Element('x'))

if a4:

    print('Has kids now')

else:

    print('Still no kids')

Running this code will print

No kids yet

Has kids now

The boolean value of an element does not say anything about text, tail or attributes. It only indicates the presence or absence of children, which is what the original question was asking.

edited Jul 22 '16 at 18:20

answered Jul 22 '16 at 18:13

Mad Physicist

38k1674108

add a comment |

The simplest way I have been able to find is to use the bool value of the element directly. This means you can use a4 in a conditional statement as-is:

a4 = Element('a4')

if a4:

    print('Has kids')

else:

    print('No kids yet')



a4.append(Element('x'))

if a4:

    print('Has kids now')

else:

    print('Still no kids')

Running this code will print

No kids yet

Has kids now

The boolean value of an element does not say anything about text, tail or attributes. It only indicates the presence or absence of children, which is what the original question was asking.

edited Jul 22 '16 at 18:20

answered Jul 22 '16 at 18:13

Mad Physicist

38k1674108

The simplest way I have been able to find is to use the bool value of the element directly. This means you can use a4 in a conditional statement as-is:

a4 = Element('a4')

if a4:

    print('Has kids')

else:

    print('No kids yet')



a4.append(Element('x'))

if a4:

    print('Has kids now')

else:

    print('Still no kids')

Running this code will print

No kids yet

Has kids now

The boolean value of an element does not say anything about text, tail or attributes. It only indicates the presence or absence of children, which is what the original question was asking.

edited Jul 22 '16 at 18:20

answered Jul 22 '16 at 18:13

Mad Physicist

38k1674108

edited Jul 22 '16 at 18:20

answered Jul 22 '16 at 18:13

Mad Physicist

38k1674108

answered Jul 22 '16 at 18:13

Mad Physicist

38k1674108

answered Jul 22 '16 at 18:13

Mad Physicist

38k1674108

add a comment |

The element class has the get children method. So you should use something like this, to check if there are children and store result in a dictionary by key=tag name:

result = {}

for child in root.findall("item"):

   is child.getchildren() == :

      result[child.tag] = child.text

answered Sep 20 '14 at 16:14

marscher

3101212

getchildren is deprecated though since version 2.7. From the documentation: Use list(elem) or iteration.

– jlr
Sep 20 '14 at 16:15

you're right. It should not be used anymore

– marscher
Sep 20 '14 at 16:16

add a comment |

The element class has the get children method. So you should use something like this, to check if there are children and store result in a dictionary by key=tag name:

result = {}

for child in root.findall("item"):

   is child.getchildren() == :

      result[child.tag] = child.text

answered Sep 20 '14 at 16:14

marscher

3101212

getchildren is deprecated though since version 2.7. From the documentation: Use list(elem) or iteration.

– jlr
Sep 20 '14 at 16:15

you're right. It should not be used anymore

– marscher
Sep 20 '14 at 16:16

add a comment |

The element class has the get children method. So you should use something like this, to check if there are children and store result in a dictionary by key=tag name:

result = {}

for child in root.findall("item"):

   is child.getchildren() == :

      result[child.tag] = child.text

answered Sep 20 '14 at 16:14

marscher

3101212

The element class has the get children method. So you should use something like this, to check if there are children and store result in a dictionary by key=tag name:

result = {}

for child in root.findall("item"):

   is child.getchildren() == :

      result[child.tag] = child.text

answered Sep 20 '14 at 16:14

marscher

3101212

answered Sep 20 '14 at 16:14

marscher

3101212

answered Sep 20 '14 at 16:14

marscher

3101212

answered Sep 20 '14 at 16:14

marscher

3101212

getchildren is deprecated though since version 2.7. From the documentation: Use list(elem) or iteration.

– jlr
Sep 20 '14 at 16:15

you're right. It should not be used anymore

– marscher
Sep 20 '14 at 16:16

add a comment |

getchildren is deprecated though since version 2.7. From the documentation: Use list(elem) or iteration.

– jlr
Sep 20 '14 at 16:15

you're right. It should not be used anymore

– marscher
Sep 20 '14 at 16:16

getchildren is deprecated though since version 2.7. From the documentation: Use list(elem) or iteration.

– jlr
Sep 20 '14 at 16:15

you're right. It should not be used anymore

– marscher
Sep 20 '14 at 16:16

add a comment |

I would personally recommend that you use an xml parser that fully supports xpath expressions. The subset supported by xml.etree is insufficient for tasks like this.

For example, in lxml I can do:

"give me all children of the children of the <item> node":

doc.xpath('//item/*/child::*') #equivalent to '//item/*/*', if you're being terse

Out[18]: [<Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>]

or,

"give me all of <item>'s children that have no children themselves":

doc.xpath('/item/*[count(child::*) = 0]')

Out[20]: 

[<Element a1 at 0x7f60ec1c1588>,

 <Element a2 at 0x7f60ec1c15c8>,

 <Element a3 at 0x7f60ec1c1608>]

or,

"give me ALL of the elements that don't have any children":

doc.xpath('//*[count(child::*) = 0]')

Out[29]: 

[<Element a1 at 0x7f60ec1c1588>,

 <Element a2 at 0x7f60ec1c15c8>,

 <Element a3 at 0x7f60ec1c1608>,

 <Element a11 at 0x7f60ec1c1348>,

 <Element a22 at 0x7f60ec1c1888>]



# and if I only care about the text from those nodes...

doc.xpath('//*[count(child::*) = 0]/text()')

Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22']

edited Dec 17 '17 at 13:14

Mad Physicist

38k1674108

answered Sep 20 '14 at 16:17

roippi

20.1k33253

Suggesting lxml assumes there is a problem with performance and xpath features are lacking. It's definitely better than ElementTree but I wouldn't go this way if there is no problem with the latter, especially considering that lxml requires installation and it's not always a nice walk in the park.

– jlr
Sep 20 '14 at 17:47

1

Performance is a thing, yes, but full xpath support means that you do all the work of selecting nodes in one compact place. xpath queries take me a few seconds to write; writing python code to walk the tree and select the nodes I want takes longer and is far likelier to generate bugs. There are lots of benefits other than performance.

– roippi
Sep 20 '14 at 17:56

add a comment |

I would personally recommend that you use an xml parser that fully supports xpath expressions. The subset supported by xml.etree is insufficient for tasks like this.

For example, in lxml I can do:

"give me all children of the children of the <item> node":

doc.xpath('//item/*/child::*') #equivalent to '//item/*/*', if you're being terse

Out[18]: [<Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>]

or,

"give me all of <item>'s children that have no children themselves":

doc.xpath('/item/*[count(child::*) = 0]')

Out[20]: 

[<Element a1 at 0x7f60ec1c1588>,

 <Element a2 at 0x7f60ec1c15c8>,

 <Element a3 at 0x7f60ec1c1608>]

or,

"give me ALL of the elements that don't have any children":

doc.xpath('//*[count(child::*) = 0]')

Out[29]: 

[<Element a1 at 0x7f60ec1c1588>,

 <Element a2 at 0x7f60ec1c15c8>,

 <Element a3 at 0x7f60ec1c1608>,

 <Element a11 at 0x7f60ec1c1348>,

 <Element a22 at 0x7f60ec1c1888>]



# and if I only care about the text from those nodes...

doc.xpath('//*[count(child::*) = 0]/text()')

Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22']

edited Dec 17 '17 at 13:14

Mad Physicist

38k1674108

answered Sep 20 '14 at 16:17

roippi

20.1k33253

Suggesting lxml assumes there is a problem with performance and xpath features are lacking. It's definitely better than ElementTree but I wouldn't go this way if there is no problem with the latter, especially considering that lxml requires installation and it's not always a nice walk in the park.

– jlr
Sep 20 '14 at 17:47

1

Performance is a thing, yes, but full xpath support means that you do all the work of selecting nodes in one compact place. xpath queries take me a few seconds to write; writing python code to walk the tree and select the nodes I want takes longer and is far likelier to generate bugs. There are lots of benefits other than performance.

– roippi
Sep 20 '14 at 17:56

add a comment |

I would personally recommend that you use an xml parser that fully supports xpath expressions. The subset supported by xml.etree is insufficient for tasks like this.

For example, in lxml I can do:

"give me all children of the children of the <item> node":

doc.xpath('//item/*/child::*') #equivalent to '//item/*/*', if you're being terse

Out[18]: [<Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>]

or,

"give me all of <item>'s children that have no children themselves":

doc.xpath('/item/*[count(child::*) = 0]')

Out[20]: 

[<Element a1 at 0x7f60ec1c1588>,

 <Element a2 at 0x7f60ec1c15c8>,

 <Element a3 at 0x7f60ec1c1608>]

or,

"give me ALL of the elements that don't have any children":

doc.xpath('//*[count(child::*) = 0]')

Out[29]: 

[<Element a1 at 0x7f60ec1c1588>,

 <Element a2 at 0x7f60ec1c15c8>,

 <Element a3 at 0x7f60ec1c1608>,

 <Element a11 at 0x7f60ec1c1348>,

 <Element a22 at 0x7f60ec1c1888>]



# and if I only care about the text from those nodes...

doc.xpath('//*[count(child::*) = 0]/text()')

Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22']

edited Dec 17 '17 at 13:14

Mad Physicist

38k1674108

answered Sep 20 '14 at 16:17

roippi

20.1k33253

I would personally recommend that you use an xml parser that fully supports xpath expressions. The subset supported by xml.etree is insufficient for tasks like this.

For example, in lxml I can do:

"give me all children of the children of the <item> node":

doc.xpath('//item/*/child::*') #equivalent to '//item/*/*', if you're being terse

Out[18]: [<Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>]

or,

"give me all of <item>'s children that have no children themselves":

doc.xpath('/item/*[count(child::*) = 0]')

Out[20]: 

[<Element a1 at 0x7f60ec1c1588>,

 <Element a2 at 0x7f60ec1c15c8>,

 <Element a3 at 0x7f60ec1c1608>]

or,

"give me ALL of the elements that don't have any children":

doc.xpath('//*[count(child::*) = 0]')

Out[29]: 

[<Element a1 at 0x7f60ec1c1588>,

 <Element a2 at 0x7f60ec1c15c8>,

 <Element a3 at 0x7f60ec1c1608>,

 <Element a11 at 0x7f60ec1c1348>,

 <Element a22 at 0x7f60ec1c1888>]



# and if I only care about the text from those nodes...

doc.xpath('//*[count(child::*) = 0]/text()')

Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22']

edited Dec 17 '17 at 13:14

Mad Physicist

38k1674108

answered Sep 20 '14 at 16:17

roippi

20.1k33253

edited Dec 17 '17 at 13:14

Mad Physicist

38k1674108

edited Dec 17 '17 at 13:14

Mad Physicist

38k1674108

edited Dec 17 '17 at 13:14

Mad Physicist

38k1674108

answered Sep 20 '14 at 16:17

roippi

20.1k33253

answered Sep 20 '14 at 16:17

roippi

20.1k33253

answered Sep 20 '14 at 16:17

roippi

20.1k33253

Suggesting lxml assumes there is a problem with performance and xpath features are lacking. It's definitely better than ElementTree but I wouldn't go this way if there is no problem with the latter, especially considering that lxml requires installation and it's not always a nice walk in the park.

– jlr
Sep 20 '14 at 17:47

1

Performance is a thing, yes, but full xpath support means that you do all the work of selecting nodes in one compact place. xpath queries take me a few seconds to write; writing python code to walk the tree and select the nodes I want takes longer and is far likelier to generate bugs. There are lots of benefits other than performance.

– roippi
Sep 20 '14 at 17:56

add a comment |

Suggesting lxml assumes there is a problem with performance and xpath features are lacking. It's definitely better than ElementTree but I wouldn't go this way if there is no problem with the latter, especially considering that lxml requires installation and it's not always a nice walk in the park.

– jlr
Sep 20 '14 at 17:47

1

Performance is a thing, yes, but full xpath support means that you do all the work of selecting nodes in one compact place. xpath queries take me a few seconds to write; writing python code to walk the tree and select the nodes I want takes longer and is far likelier to generate bugs. There are lots of benefits other than performance.

– roippi
Sep 20 '14 at 17:56

Suggesting lxml assumes there is a problem with performance and xpath features are lacking. It's definitely better than ElementTree but I wouldn't go this way if there is no problem with the latter, especially considering that lxml requires installation and it's not always a nice walk in the park.

– jlr
Sep 20 '14 at 17:47

Performance is a thing, yes, but full xpath support means that you do all the work of selecting nodes in one compact place. xpath queries take me a few seconds to write; writing python code to walk the tree and select the nodes I want takes longer and is far likelier to generate bugs. There are lots of benefits other than performance.

– roippi
Sep 20 '14 at 17:56

add a comment |

You can use the iter method

import xml.etree.ElementTree as ET



etree = ET.parse('file.xml')

root = etree.getroot()

a = 

for child in root.iter():

    if child.text:

        if len(child.text.split()) > 0:

            a.append(child.text)

print(a)

answered May 21 '18 at 11:17

David Córdoba Ruiz

add a comment |

You can use the iter method

import xml.etree.ElementTree as ET



etree = ET.parse('file.xml')

root = etree.getroot()

a = 

for child in root.iter():

    if child.text:

        if len(child.text.split()) > 0:

            a.append(child.text)

print(a)

answered May 21 '18 at 11:17

David Córdoba Ruiz

add a comment |

You can use the iter method

import xml.etree.ElementTree as ET



etree = ET.parse('file.xml')

root = etree.getroot()

a = 

for child in root.iter():

    if child.text:

        if len(child.text.split()) > 0:

            a.append(child.text)

print(a)

answered May 21 '18 at 11:17

David Córdoba Ruiz

You can use the iter method

import xml.etree.ElementTree as ET



etree = ET.parse('file.xml')

root = etree.getroot()

a = 

for child in root.iter():

    if child.text:

        if len(child.text.split()) > 0:

            a.append(child.text)

print(a)

answered May 21 '18 at 11:17

David Córdoba Ruiz

answered May 21 '18 at 11:17

David Córdoba Ruiz

answered May 21 '18 at 11:17

David Córdoba Ruiz

answered May 21 '18 at 11:17

David Córdoba Ruiz

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu