Check if XML Element has children or not, in ElementTree
I retrieve an XML documents this way:
import xml.etree.ElementTree as ET
root = ET.parse(urllib2.urlopen(url))
for child in root.findall("item"):
a1 = child[0].text # ok
a2 = child[1].text # ok
a3 = child[2].text # ok
a4 = child[3].text # BOOM
# ...
The XML looks like this:
<item>
<a1>value1</a1>
<a2>value2</a2>
<a3>value3</a3>
<a4>
<a11>value222</a11>
<a22>value22</a22>
</a4>
</item>
How do I check if a4
(in this particular case, but it might've been any other element) has children?
python xml elementtree children
add a comment |
I retrieve an XML documents this way:
import xml.etree.ElementTree as ET
root = ET.parse(urllib2.urlopen(url))
for child in root.findall("item"):
a1 = child[0].text # ok
a2 = child[1].text # ok
a3 = child[2].text # ok
a4 = child[3].text # BOOM
# ...
The XML looks like this:
<item>
<a1>value1</a1>
<a2>value2</a2>
<a3>value3</a3>
<a4>
<a11>value222</a11>
<a22>value22</a22>
</a4>
</item>
How do I check if a4
(in this particular case, but it might've been any other element) has children?
python xml elementtree children
add a comment |
I retrieve an XML documents this way:
import xml.etree.ElementTree as ET
root = ET.parse(urllib2.urlopen(url))
for child in root.findall("item"):
a1 = child[0].text # ok
a2 = child[1].text # ok
a3 = child[2].text # ok
a4 = child[3].text # BOOM
# ...
The XML looks like this:
<item>
<a1>value1</a1>
<a2>value2</a2>
<a3>value3</a3>
<a4>
<a11>value222</a11>
<a22>value22</a22>
</a4>
</item>
How do I check if a4
(in this particular case, but it might've been any other element) has children?
python xml elementtree children
I retrieve an XML documents this way:
import xml.etree.ElementTree as ET
root = ET.parse(urllib2.urlopen(url))
for child in root.findall("item"):
a1 = child[0].text # ok
a2 = child[1].text # ok
a3 = child[2].text # ok
a4 = child[3].text # BOOM
# ...
The XML looks like this:
<item>
<a1>value1</a1>
<a2>value2</a2>
<a3>value3</a3>
<a4>
<a11>value222</a11>
<a22>value22</a22>
</a4>
</item>
How do I check if a4
(in this particular case, but it might've been any other element) has children?
python xml elementtree children
python xml elementtree children
edited Jan 1 at 6:38
smci
15.2k676108
15.2k676108
asked Sep 20 '14 at 16:01


アレックスアレックス
9,2182499186
9,2182499186
add a comment |
add a comment |
5 Answers
5
active
oldest
votes
You could try the list
function on the element:
>>> xml = """<item>
<a1>value1</a1>
<a2>value2</a2>
<a3>value3</a3>
<a4>
<a11>value222</a11>
<a22>value22</a22>
</a4>
</item>"""
>>> root = ET.fromstring(xml)
>>> list(root[0])
>>> list(root[3])
[<Element 'a11' at 0x2321e10>, <Element 'a22' at 0x2321e48>]
>>> len(list(root[3]))
2
>>> print "has children" if len(list(root[3])) else "no child"
has children
>>> print "has children" if len(list(root[2])) else "no child"
no child
>>> # Or simpler, without a call to list within len, it also works:
>>> print "has children" if len(root[3]) else "no child"
has children
I modified your sample because the findall
function call on the item
root did not work (as findall
will search for direct descendants, and not the current element). If you want to access text of the subchildren afterward in your working program, you could do:
for child in root.findall("item"):
# if there are children, get their text content as well.
if len(child):
for subchild in child:
subchild.text
# else just get the current child text.
else:
child.text
This would be a good fit for a recursive though.
doesn't work. Could you use my example with iteration?
– アレックス
Sep 20 '14 at 16:28
1
it does not work, because your iteration loop yields no elements, since there are no elements named 'item'
– marscher
Sep 20 '14 at 16:36
yes, it yields them in my real application.
– アレックス
Sep 20 '14 at 16:43
how do I get "<a11>" and "<a22>" elements?
– アレックス
Sep 20 '14 at 16:44
It works, check this pythonfiddle: pythonfiddle.com/check-if-element-has-children-or-not Else tell me exactly what did not work. Your sample did not work though, hence why I modified it. Let me modify my answer to tell you how to access the subchildren.
– jlr
Sep 20 '14 at 17:34
|
show 1 more comment
The simplest way I have been able to find is to use the bool
value of the element directly. This means you can use a4
in a conditional statement as-is:
a4 = Element('a4')
if a4:
print('Has kids')
else:
print('No kids yet')
a4.append(Element('x'))
if a4:
print('Has kids now')
else:
print('Still no kids')
Running this code will print
No kids yet
Has kids now
The boolean value of an element does not say anything about text
, tail
or attributes. It only indicates the presence or absence of children, which is what the original question was asking.
add a comment |
The element class has the get children method. So you should use something like this, to check if there are children and store result in a dictionary by key=tag name:
result = {}
for child in root.findall("item"):
is child.getchildren() == :
result[child.tag] = child.text
getchildren
is deprecated though since version 2.7. From the documentation: Use list(elem) or iteration.
– jlr
Sep 20 '14 at 16:15
you're right. It should not be used anymore
– marscher
Sep 20 '14 at 16:16
add a comment |
I would personally recommend that you use an xml parser that fully supports xpath expressions. The subset supported by xml.etree
is insufficient for tasks like this.
For example, in lxml
I can do:
"give me all children of the children of the <item>
node":
doc.xpath('//item/*/child::*') #equivalent to '//item/*/*', if you're being terse
Out[18]: [<Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>]
or,
"give me all of <item>
's children that have no children themselves":
doc.xpath('/item/*[count(child::*) = 0]')
Out[20]:
[<Element a1 at 0x7f60ec1c1588>,
<Element a2 at 0x7f60ec1c15c8>,
<Element a3 at 0x7f60ec1c1608>]
or,
"give me ALL of the elements that don't have any children":
doc.xpath('//*[count(child::*) = 0]')
Out[29]:
[<Element a1 at 0x7f60ec1c1588>,
<Element a2 at 0x7f60ec1c15c8>,
<Element a3 at 0x7f60ec1c1608>,
<Element a11 at 0x7f60ec1c1348>,
<Element a22 at 0x7f60ec1c1888>]
# and if I only care about the text from those nodes...
doc.xpath('//*[count(child::*) = 0]/text()')
Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22']
Suggesting lxml assumes there is a problem with performance and xpath features are lacking. It's definitely better than ElementTree but I wouldn't go this way if there is no problem with the latter, especially considering that lxml requires installation and it's not always a nice walk in the park.
– jlr
Sep 20 '14 at 17:47
1
Performance is a thing, yes, but full xpath support means that you do all the work of selecting nodes in one compact place. xpath queries take me a few seconds to write; writing python code to walk the tree and select the nodes I want takes longer and is far likelier to generate bugs. There are lots of benefits other than performance.
– roippi
Sep 20 '14 at 17:56
add a comment |
You can use the iter method
import xml.etree.ElementTree as ET
etree = ET.parse('file.xml')
root = etree.getroot()
a =
for child in root.iter():
if child.text:
if len(child.text.split()) > 0:
a.append(child.text)
print(a)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f25950635%2fcheck-if-xml-element-has-children-or-not-in-elementtree%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
You could try the list
function on the element:
>>> xml = """<item>
<a1>value1</a1>
<a2>value2</a2>
<a3>value3</a3>
<a4>
<a11>value222</a11>
<a22>value22</a22>
</a4>
</item>"""
>>> root = ET.fromstring(xml)
>>> list(root[0])
>>> list(root[3])
[<Element 'a11' at 0x2321e10>, <Element 'a22' at 0x2321e48>]
>>> len(list(root[3]))
2
>>> print "has children" if len(list(root[3])) else "no child"
has children
>>> print "has children" if len(list(root[2])) else "no child"
no child
>>> # Or simpler, without a call to list within len, it also works:
>>> print "has children" if len(root[3]) else "no child"
has children
I modified your sample because the findall
function call on the item
root did not work (as findall
will search for direct descendants, and not the current element). If you want to access text of the subchildren afterward in your working program, you could do:
for child in root.findall("item"):
# if there are children, get their text content as well.
if len(child):
for subchild in child:
subchild.text
# else just get the current child text.
else:
child.text
This would be a good fit for a recursive though.
doesn't work. Could you use my example with iteration?
– アレックス
Sep 20 '14 at 16:28
1
it does not work, because your iteration loop yields no elements, since there are no elements named 'item'
– marscher
Sep 20 '14 at 16:36
yes, it yields them in my real application.
– アレックス
Sep 20 '14 at 16:43
how do I get "<a11>" and "<a22>" elements?
– アレックス
Sep 20 '14 at 16:44
It works, check this pythonfiddle: pythonfiddle.com/check-if-element-has-children-or-not Else tell me exactly what did not work. Your sample did not work though, hence why I modified it. Let me modify my answer to tell you how to access the subchildren.
– jlr
Sep 20 '14 at 17:34
|
show 1 more comment
You could try the list
function on the element:
>>> xml = """<item>
<a1>value1</a1>
<a2>value2</a2>
<a3>value3</a3>
<a4>
<a11>value222</a11>
<a22>value22</a22>
</a4>
</item>"""
>>> root = ET.fromstring(xml)
>>> list(root[0])
>>> list(root[3])
[<Element 'a11' at 0x2321e10>, <Element 'a22' at 0x2321e48>]
>>> len(list(root[3]))
2
>>> print "has children" if len(list(root[3])) else "no child"
has children
>>> print "has children" if len(list(root[2])) else "no child"
no child
>>> # Or simpler, without a call to list within len, it also works:
>>> print "has children" if len(root[3]) else "no child"
has children
I modified your sample because the findall
function call on the item
root did not work (as findall
will search for direct descendants, and not the current element). If you want to access text of the subchildren afterward in your working program, you could do:
for child in root.findall("item"):
# if there are children, get their text content as well.
if len(child):
for subchild in child:
subchild.text
# else just get the current child text.
else:
child.text
This would be a good fit for a recursive though.
doesn't work. Could you use my example with iteration?
– アレックス
Sep 20 '14 at 16:28
1
it does not work, because your iteration loop yields no elements, since there are no elements named 'item'
– marscher
Sep 20 '14 at 16:36
yes, it yields them in my real application.
– アレックス
Sep 20 '14 at 16:43
how do I get "<a11>" and "<a22>" elements?
– アレックス
Sep 20 '14 at 16:44
It works, check this pythonfiddle: pythonfiddle.com/check-if-element-has-children-or-not Else tell me exactly what did not work. Your sample did not work though, hence why I modified it. Let me modify my answer to tell you how to access the subchildren.
– jlr
Sep 20 '14 at 17:34
|
show 1 more comment
You could try the list
function on the element:
>>> xml = """<item>
<a1>value1</a1>
<a2>value2</a2>
<a3>value3</a3>
<a4>
<a11>value222</a11>
<a22>value22</a22>
</a4>
</item>"""
>>> root = ET.fromstring(xml)
>>> list(root[0])
>>> list(root[3])
[<Element 'a11' at 0x2321e10>, <Element 'a22' at 0x2321e48>]
>>> len(list(root[3]))
2
>>> print "has children" if len(list(root[3])) else "no child"
has children
>>> print "has children" if len(list(root[2])) else "no child"
no child
>>> # Or simpler, without a call to list within len, it also works:
>>> print "has children" if len(root[3]) else "no child"
has children
I modified your sample because the findall
function call on the item
root did not work (as findall
will search for direct descendants, and not the current element). If you want to access text of the subchildren afterward in your working program, you could do:
for child in root.findall("item"):
# if there are children, get their text content as well.
if len(child):
for subchild in child:
subchild.text
# else just get the current child text.
else:
child.text
This would be a good fit for a recursive though.
You could try the list
function on the element:
>>> xml = """<item>
<a1>value1</a1>
<a2>value2</a2>
<a3>value3</a3>
<a4>
<a11>value222</a11>
<a22>value22</a22>
</a4>
</item>"""
>>> root = ET.fromstring(xml)
>>> list(root[0])
>>> list(root[3])
[<Element 'a11' at 0x2321e10>, <Element 'a22' at 0x2321e48>]
>>> len(list(root[3]))
2
>>> print "has children" if len(list(root[3])) else "no child"
has children
>>> print "has children" if len(list(root[2])) else "no child"
no child
>>> # Or simpler, without a call to list within len, it also works:
>>> print "has children" if len(root[3]) else "no child"
has children
I modified your sample because the findall
function call on the item
root did not work (as findall
will search for direct descendants, and not the current element). If you want to access text of the subchildren afterward in your working program, you could do:
for child in root.findall("item"):
# if there are children, get their text content as well.
if len(child):
for subchild in child:
subchild.text
# else just get the current child text.
else:
child.text
This would be a good fit for a recursive though.
edited Sep 20 '14 at 17:50
answered Sep 20 '14 at 16:14
jlrjlr
741515
741515
doesn't work. Could you use my example with iteration?
– アレックス
Sep 20 '14 at 16:28
1
it does not work, because your iteration loop yields no elements, since there are no elements named 'item'
– marscher
Sep 20 '14 at 16:36
yes, it yields them in my real application.
– アレックス
Sep 20 '14 at 16:43
how do I get "<a11>" and "<a22>" elements?
– アレックス
Sep 20 '14 at 16:44
It works, check this pythonfiddle: pythonfiddle.com/check-if-element-has-children-or-not Else tell me exactly what did not work. Your sample did not work though, hence why I modified it. Let me modify my answer to tell you how to access the subchildren.
– jlr
Sep 20 '14 at 17:34
|
show 1 more comment
doesn't work. Could you use my example with iteration?
– アレックス
Sep 20 '14 at 16:28
1
it does not work, because your iteration loop yields no elements, since there are no elements named 'item'
– marscher
Sep 20 '14 at 16:36
yes, it yields them in my real application.
– アレックス
Sep 20 '14 at 16:43
how do I get "<a11>" and "<a22>" elements?
– アレックス
Sep 20 '14 at 16:44
It works, check this pythonfiddle: pythonfiddle.com/check-if-element-has-children-or-not Else tell me exactly what did not work. Your sample did not work though, hence why I modified it. Let me modify my answer to tell you how to access the subchildren.
– jlr
Sep 20 '14 at 17:34
doesn't work. Could you use my example with iteration?
– アレックス
Sep 20 '14 at 16:28
doesn't work. Could you use my example with iteration?
– アレックス
Sep 20 '14 at 16:28
1
1
it does not work, because your iteration loop yields no elements, since there are no elements named 'item'
– marscher
Sep 20 '14 at 16:36
it does not work, because your iteration loop yields no elements, since there are no elements named 'item'
– marscher
Sep 20 '14 at 16:36
yes, it yields them in my real application.
– アレックス
Sep 20 '14 at 16:43
yes, it yields them in my real application.
– アレックス
Sep 20 '14 at 16:43
how do I get "<a11>" and "<a22>" elements?
– アレックス
Sep 20 '14 at 16:44
how do I get "<a11>" and "<a22>" elements?
– アレックス
Sep 20 '14 at 16:44
It works, check this pythonfiddle: pythonfiddle.com/check-if-element-has-children-or-not Else tell me exactly what did not work. Your sample did not work though, hence why I modified it. Let me modify my answer to tell you how to access the subchildren.
– jlr
Sep 20 '14 at 17:34
It works, check this pythonfiddle: pythonfiddle.com/check-if-element-has-children-or-not Else tell me exactly what did not work. Your sample did not work though, hence why I modified it. Let me modify my answer to tell you how to access the subchildren.
– jlr
Sep 20 '14 at 17:34
|
show 1 more comment
The simplest way I have been able to find is to use the bool
value of the element directly. This means you can use a4
in a conditional statement as-is:
a4 = Element('a4')
if a4:
print('Has kids')
else:
print('No kids yet')
a4.append(Element('x'))
if a4:
print('Has kids now')
else:
print('Still no kids')
Running this code will print
No kids yet
Has kids now
The boolean value of an element does not say anything about text
, tail
or attributes. It only indicates the presence or absence of children, which is what the original question was asking.
add a comment |
The simplest way I have been able to find is to use the bool
value of the element directly. This means you can use a4
in a conditional statement as-is:
a4 = Element('a4')
if a4:
print('Has kids')
else:
print('No kids yet')
a4.append(Element('x'))
if a4:
print('Has kids now')
else:
print('Still no kids')
Running this code will print
No kids yet
Has kids now
The boolean value of an element does not say anything about text
, tail
or attributes. It only indicates the presence or absence of children, which is what the original question was asking.
add a comment |
The simplest way I have been able to find is to use the bool
value of the element directly. This means you can use a4
in a conditional statement as-is:
a4 = Element('a4')
if a4:
print('Has kids')
else:
print('No kids yet')
a4.append(Element('x'))
if a4:
print('Has kids now')
else:
print('Still no kids')
Running this code will print
No kids yet
Has kids now
The boolean value of an element does not say anything about text
, tail
or attributes. It only indicates the presence or absence of children, which is what the original question was asking.
The simplest way I have been able to find is to use the bool
value of the element directly. This means you can use a4
in a conditional statement as-is:
a4 = Element('a4')
if a4:
print('Has kids')
else:
print('No kids yet')
a4.append(Element('x'))
if a4:
print('Has kids now')
else:
print('Still no kids')
Running this code will print
No kids yet
Has kids now
The boolean value of an element does not say anything about text
, tail
or attributes. It only indicates the presence or absence of children, which is what the original question was asking.
edited Jul 22 '16 at 18:20
answered Jul 22 '16 at 18:13


Mad PhysicistMad Physicist
38k1674108
38k1674108
add a comment |
add a comment |
The element class has the get children method. So you should use something like this, to check if there are children and store result in a dictionary by key=tag name:
result = {}
for child in root.findall("item"):
is child.getchildren() == :
result[child.tag] = child.text
getchildren
is deprecated though since version 2.7. From the documentation: Use list(elem) or iteration.
– jlr
Sep 20 '14 at 16:15
you're right. It should not be used anymore
– marscher
Sep 20 '14 at 16:16
add a comment |
The element class has the get children method. So you should use something like this, to check if there are children and store result in a dictionary by key=tag name:
result = {}
for child in root.findall("item"):
is child.getchildren() == :
result[child.tag] = child.text
getchildren
is deprecated though since version 2.7. From the documentation: Use list(elem) or iteration.
– jlr
Sep 20 '14 at 16:15
you're right. It should not be used anymore
– marscher
Sep 20 '14 at 16:16
add a comment |
The element class has the get children method. So you should use something like this, to check if there are children and store result in a dictionary by key=tag name:
result = {}
for child in root.findall("item"):
is child.getchildren() == :
result[child.tag] = child.text
The element class has the get children method. So you should use something like this, to check if there are children and store result in a dictionary by key=tag name:
result = {}
for child in root.findall("item"):
is child.getchildren() == :
result[child.tag] = child.text
answered Sep 20 '14 at 16:14
marschermarscher
3101212
3101212
getchildren
is deprecated though since version 2.7. From the documentation: Use list(elem) or iteration.
– jlr
Sep 20 '14 at 16:15
you're right. It should not be used anymore
– marscher
Sep 20 '14 at 16:16
add a comment |
getchildren
is deprecated though since version 2.7. From the documentation: Use list(elem) or iteration.
– jlr
Sep 20 '14 at 16:15
you're right. It should not be used anymore
– marscher
Sep 20 '14 at 16:16
getchildren
is deprecated though since version 2.7. From the documentation: Use list(elem) or iteration.– jlr
Sep 20 '14 at 16:15
getchildren
is deprecated though since version 2.7. From the documentation: Use list(elem) or iteration.– jlr
Sep 20 '14 at 16:15
you're right. It should not be used anymore
– marscher
Sep 20 '14 at 16:16
you're right. It should not be used anymore
– marscher
Sep 20 '14 at 16:16
add a comment |
I would personally recommend that you use an xml parser that fully supports xpath expressions. The subset supported by xml.etree
is insufficient for tasks like this.
For example, in lxml
I can do:
"give me all children of the children of the <item>
node":
doc.xpath('//item/*/child::*') #equivalent to '//item/*/*', if you're being terse
Out[18]: [<Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>]
or,
"give me all of <item>
's children that have no children themselves":
doc.xpath('/item/*[count(child::*) = 0]')
Out[20]:
[<Element a1 at 0x7f60ec1c1588>,
<Element a2 at 0x7f60ec1c15c8>,
<Element a3 at 0x7f60ec1c1608>]
or,
"give me ALL of the elements that don't have any children":
doc.xpath('//*[count(child::*) = 0]')
Out[29]:
[<Element a1 at 0x7f60ec1c1588>,
<Element a2 at 0x7f60ec1c15c8>,
<Element a3 at 0x7f60ec1c1608>,
<Element a11 at 0x7f60ec1c1348>,
<Element a22 at 0x7f60ec1c1888>]
# and if I only care about the text from those nodes...
doc.xpath('//*[count(child::*) = 0]/text()')
Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22']
Suggesting lxml assumes there is a problem with performance and xpath features are lacking. It's definitely better than ElementTree but I wouldn't go this way if there is no problem with the latter, especially considering that lxml requires installation and it's not always a nice walk in the park.
– jlr
Sep 20 '14 at 17:47
1
Performance is a thing, yes, but full xpath support means that you do all the work of selecting nodes in one compact place. xpath queries take me a few seconds to write; writing python code to walk the tree and select the nodes I want takes longer and is far likelier to generate bugs. There are lots of benefits other than performance.
– roippi
Sep 20 '14 at 17:56
add a comment |
I would personally recommend that you use an xml parser that fully supports xpath expressions. The subset supported by xml.etree
is insufficient for tasks like this.
For example, in lxml
I can do:
"give me all children of the children of the <item>
node":
doc.xpath('//item/*/child::*') #equivalent to '//item/*/*', if you're being terse
Out[18]: [<Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>]
or,
"give me all of <item>
's children that have no children themselves":
doc.xpath('/item/*[count(child::*) = 0]')
Out[20]:
[<Element a1 at 0x7f60ec1c1588>,
<Element a2 at 0x7f60ec1c15c8>,
<Element a3 at 0x7f60ec1c1608>]
or,
"give me ALL of the elements that don't have any children":
doc.xpath('//*[count(child::*) = 0]')
Out[29]:
[<Element a1 at 0x7f60ec1c1588>,
<Element a2 at 0x7f60ec1c15c8>,
<Element a3 at 0x7f60ec1c1608>,
<Element a11 at 0x7f60ec1c1348>,
<Element a22 at 0x7f60ec1c1888>]
# and if I only care about the text from those nodes...
doc.xpath('//*[count(child::*) = 0]/text()')
Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22']
Suggesting lxml assumes there is a problem with performance and xpath features are lacking. It's definitely better than ElementTree but I wouldn't go this way if there is no problem with the latter, especially considering that lxml requires installation and it's not always a nice walk in the park.
– jlr
Sep 20 '14 at 17:47
1
Performance is a thing, yes, but full xpath support means that you do all the work of selecting nodes in one compact place. xpath queries take me a few seconds to write; writing python code to walk the tree and select the nodes I want takes longer and is far likelier to generate bugs. There are lots of benefits other than performance.
– roippi
Sep 20 '14 at 17:56
add a comment |
I would personally recommend that you use an xml parser that fully supports xpath expressions. The subset supported by xml.etree
is insufficient for tasks like this.
For example, in lxml
I can do:
"give me all children of the children of the <item>
node":
doc.xpath('//item/*/child::*') #equivalent to '//item/*/*', if you're being terse
Out[18]: [<Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>]
or,
"give me all of <item>
's children that have no children themselves":
doc.xpath('/item/*[count(child::*) = 0]')
Out[20]:
[<Element a1 at 0x7f60ec1c1588>,
<Element a2 at 0x7f60ec1c15c8>,
<Element a3 at 0x7f60ec1c1608>]
or,
"give me ALL of the elements that don't have any children":
doc.xpath('//*[count(child::*) = 0]')
Out[29]:
[<Element a1 at 0x7f60ec1c1588>,
<Element a2 at 0x7f60ec1c15c8>,
<Element a3 at 0x7f60ec1c1608>,
<Element a11 at 0x7f60ec1c1348>,
<Element a22 at 0x7f60ec1c1888>]
# and if I only care about the text from those nodes...
doc.xpath('//*[count(child::*) = 0]/text()')
Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22']
I would personally recommend that you use an xml parser that fully supports xpath expressions. The subset supported by xml.etree
is insufficient for tasks like this.
For example, in lxml
I can do:
"give me all children of the children of the <item>
node":
doc.xpath('//item/*/child::*') #equivalent to '//item/*/*', if you're being terse
Out[18]: [<Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>]
or,
"give me all of <item>
's children that have no children themselves":
doc.xpath('/item/*[count(child::*) = 0]')
Out[20]:
[<Element a1 at 0x7f60ec1c1588>,
<Element a2 at 0x7f60ec1c15c8>,
<Element a3 at 0x7f60ec1c1608>]
or,
"give me ALL of the elements that don't have any children":
doc.xpath('//*[count(child::*) = 0]')
Out[29]:
[<Element a1 at 0x7f60ec1c1588>,
<Element a2 at 0x7f60ec1c15c8>,
<Element a3 at 0x7f60ec1c1608>,
<Element a11 at 0x7f60ec1c1348>,
<Element a22 at 0x7f60ec1c1888>]
# and if I only care about the text from those nodes...
doc.xpath('//*[count(child::*) = 0]/text()')
Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22']
edited Dec 17 '17 at 13:14


Mad Physicist
38k1674108
38k1674108
answered Sep 20 '14 at 16:17


roippiroippi
20.1k33253
20.1k33253
Suggesting lxml assumes there is a problem with performance and xpath features are lacking. It's definitely better than ElementTree but I wouldn't go this way if there is no problem with the latter, especially considering that lxml requires installation and it's not always a nice walk in the park.
– jlr
Sep 20 '14 at 17:47
1
Performance is a thing, yes, but full xpath support means that you do all the work of selecting nodes in one compact place. xpath queries take me a few seconds to write; writing python code to walk the tree and select the nodes I want takes longer and is far likelier to generate bugs. There are lots of benefits other than performance.
– roippi
Sep 20 '14 at 17:56
add a comment |
Suggesting lxml assumes there is a problem with performance and xpath features are lacking. It's definitely better than ElementTree but I wouldn't go this way if there is no problem with the latter, especially considering that lxml requires installation and it's not always a nice walk in the park.
– jlr
Sep 20 '14 at 17:47
1
Performance is a thing, yes, but full xpath support means that you do all the work of selecting nodes in one compact place. xpath queries take me a few seconds to write; writing python code to walk the tree and select the nodes I want takes longer and is far likelier to generate bugs. There are lots of benefits other than performance.
– roippi
Sep 20 '14 at 17:56
Suggesting lxml assumes there is a problem with performance and xpath features are lacking. It's definitely better than ElementTree but I wouldn't go this way if there is no problem with the latter, especially considering that lxml requires installation and it's not always a nice walk in the park.
– jlr
Sep 20 '14 at 17:47
Suggesting lxml assumes there is a problem with performance and xpath features are lacking. It's definitely better than ElementTree but I wouldn't go this way if there is no problem with the latter, especially considering that lxml requires installation and it's not always a nice walk in the park.
– jlr
Sep 20 '14 at 17:47
1
1
Performance is a thing, yes, but full xpath support means that you do all the work of selecting nodes in one compact place. xpath queries take me a few seconds to write; writing python code to walk the tree and select the nodes I want takes longer and is far likelier to generate bugs. There are lots of benefits other than performance.
– roippi
Sep 20 '14 at 17:56
Performance is a thing, yes, but full xpath support means that you do all the work of selecting nodes in one compact place. xpath queries take me a few seconds to write; writing python code to walk the tree and select the nodes I want takes longer and is far likelier to generate bugs. There are lots of benefits other than performance.
– roippi
Sep 20 '14 at 17:56
add a comment |
You can use the iter method
import xml.etree.ElementTree as ET
etree = ET.parse('file.xml')
root = etree.getroot()
a =
for child in root.iter():
if child.text:
if len(child.text.split()) > 0:
a.append(child.text)
print(a)
add a comment |
You can use the iter method
import xml.etree.ElementTree as ET
etree = ET.parse('file.xml')
root = etree.getroot()
a =
for child in root.iter():
if child.text:
if len(child.text.split()) > 0:
a.append(child.text)
print(a)
add a comment |
You can use the iter method
import xml.etree.ElementTree as ET
etree = ET.parse('file.xml')
root = etree.getroot()
a =
for child in root.iter():
if child.text:
if len(child.text.split()) > 0:
a.append(child.text)
print(a)
You can use the iter method
import xml.etree.ElementTree as ET
etree = ET.parse('file.xml')
root = etree.getroot()
a =
for child in root.iter():
if child.text:
if len(child.text.split()) > 0:
a.append(child.text)
print(a)
answered May 21 '18 at 11:17
David Córdoba RuizDavid Córdoba Ruiz
1
1
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f25950635%2fcheck-if-xml-element-has-children-or-not-in-elementtree%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown