Scraping with Python and Selenium - how should I return a 'null' if element not present
Good Day,
I am a newbie to Python and Selenium and have searched for the solution for a while now. While some answers come close, I can't see to find one that solves my problem. The snippet of my code that is a slight problem is as follows:
for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
While this does work if all elements are present (and I can see the output to Pandas dataframe), if one of the elements doesn't exist (either 'date' or 'title') Python sends out the error:
IndexError: list index out of range
what I have tried thus far:
1) created a try/except (doesn't work)
2) tried if/else (if variable is not "")
I would like to insert "Null" if the element doesn't exist so that the Pandas dataframe populates with "Null" in the event an element doesn't exist.
any assistance and guidance would be greatly appreciated.
EDIT 1:
I have tried the following:
for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
pass
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
and:
for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except (NoSuchElementException, ElementNotVisibleException, InvalidSelectorException):
pass
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
and:
for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
i = 'Null'
pass
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
I tried the same try/except at the point of appending to Pandas.
EDIT 2
the error I get:
IndexError: list index out of range
is attributed to the line:
df = df.append({'Company': company[i].text, 'Date': date[i].text,
'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
python selenium selenium-chromedriver screen-scraping
add a comment |
Good Day,
I am a newbie to Python and Selenium and have searched for the solution for a while now. While some answers come close, I can't see to find one that solves my problem. The snippet of my code that is a slight problem is as follows:
for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
While this does work if all elements are present (and I can see the output to Pandas dataframe), if one of the elements doesn't exist (either 'date' or 'title') Python sends out the error:
IndexError: list index out of range
what I have tried thus far:
1) created a try/except (doesn't work)
2) tried if/else (if variable is not "")
I would like to insert "Null" if the element doesn't exist so that the Pandas dataframe populates with "Null" in the event an element doesn't exist.
any assistance and guidance would be greatly appreciated.
EDIT 1:
I have tried the following:
for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
pass
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
and:
for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except (NoSuchElementException, ElementNotVisibleException, InvalidSelectorException):
pass
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
and:
for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
i = 'Null'
pass
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
I tried the same try/except at the point of appending to Pandas.
EDIT 2
the error I get:
IndexError: list index out of range
is attributed to the line:
df = df.append({'Company': company[i].text, 'Date': date[i].text,
'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
python selenium selenium-chromedriver screen-scraping
Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed
– Moshe Slavin
Nov 22 '18 at 6:57
I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried
– qbbq
Nov 22 '18 at 7:36
I'll take a look...
– Moshe Slavin
Nov 22 '18 at 10:10
I posted an answer let me know if you need any other assistance!
– Moshe Slavin
Nov 22 '18 at 10:27
add a comment |
Good Day,
I am a newbie to Python and Selenium and have searched for the solution for a while now. While some answers come close, I can't see to find one that solves my problem. The snippet of my code that is a slight problem is as follows:
for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
While this does work if all elements are present (and I can see the output to Pandas dataframe), if one of the elements doesn't exist (either 'date' or 'title') Python sends out the error:
IndexError: list index out of range
what I have tried thus far:
1) created a try/except (doesn't work)
2) tried if/else (if variable is not "")
I would like to insert "Null" if the element doesn't exist so that the Pandas dataframe populates with "Null" in the event an element doesn't exist.
any assistance and guidance would be greatly appreciated.
EDIT 1:
I have tried the following:
for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
pass
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
and:
for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except (NoSuchElementException, ElementNotVisibleException, InvalidSelectorException):
pass
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
and:
for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
i = 'Null'
pass
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
I tried the same try/except at the point of appending to Pandas.
EDIT 2
the error I get:
IndexError: list index out of range
is attributed to the line:
df = df.append({'Company': company[i].text, 'Date': date[i].text,
'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
python selenium selenium-chromedriver screen-scraping
Good Day,
I am a newbie to Python and Selenium and have searched for the solution for a while now. While some answers come close, I can't see to find one that solves my problem. The snippet of my code that is a slight problem is as follows:
for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
While this does work if all elements are present (and I can see the output to Pandas dataframe), if one of the elements doesn't exist (either 'date' or 'title') Python sends out the error:
IndexError: list index out of range
what I have tried thus far:
1) created a try/except (doesn't work)
2) tried if/else (if variable is not "")
I would like to insert "Null" if the element doesn't exist so that the Pandas dataframe populates with "Null" in the event an element doesn't exist.
any assistance and guidance would be greatly appreciated.
EDIT 1:
I have tried the following:
for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
pass
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
and:
for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except (NoSuchElementException, ElementNotVisibleException, InvalidSelectorException):
pass
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
and:
for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
i = 'Null'
pass
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
I tried the same try/except at the point of appending to Pandas.
EDIT 2
the error I get:
IndexError: list index out of range
is attributed to the line:
df = df.append({'Company': company[i].text, 'Date': date[i].text,
'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
python selenium selenium-chromedriver screen-scraping
python selenium selenium-chromedriver screen-scraping
edited Nov 22 '18 at 7:48
qbbq
asked Nov 22 '18 at 5:12
qbbqqbbq
227
227
Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed
– Moshe Slavin
Nov 22 '18 at 6:57
I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried
– qbbq
Nov 22 '18 at 7:36
I'll take a look...
– Moshe Slavin
Nov 22 '18 at 10:10
I posted an answer let me know if you need any other assistance!
– Moshe Slavin
Nov 22 '18 at 10:27
add a comment |
Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed
– Moshe Slavin
Nov 22 '18 at 6:57
I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried
– qbbq
Nov 22 '18 at 7:36
I'll take a look...
– Moshe Slavin
Nov 22 '18 at 10:10
I posted an answer let me know if you need any other assistance!
– Moshe Slavin
Nov 22 '18 at 10:27
Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed
– Moshe Slavin
Nov 22 '18 at 6:57
Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed
– Moshe Slavin
Nov 22 '18 at 6:57
I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried
– qbbq
Nov 22 '18 at 7:36
I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried
– qbbq
Nov 22 '18 at 7:36
I'll take a look...
– Moshe Slavin
Nov 22 '18 at 10:10
I'll take a look...
– Moshe Slavin
Nov 22 '18 at 10:10
I posted an answer let me know if you need any other assistance!
– Moshe Slavin
Nov 22 '18 at 10:27
I posted an answer let me know if you need any other assistance!
– Moshe Slavin
Nov 22 '18 at 10:27
add a comment |
1 Answer
1
active
oldest
votes
As your error shows you have an index error!
To overcome that you should add a try except within the area that raises this error.
Also, you are using the driver.current_url
which returns the URL.
But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...
In your case try this:
for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
num_page_items = len(date)
for i in range(num_page_items):
try:
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)
except IndexError:
df.append(None) # or df.append('Null')
Hope you find this helpfull!
this solution works! thank you very much - I really appreciate it.
– qbbq
Nov 22 '18 at 11:27
just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
– qbbq
Nov 22 '18 at 11:28
It's a pandas issue probably... Just useNone
...
– Moshe Slavin
Nov 22 '18 at 11:31
Glad to help!!!
– Moshe Slavin
Nov 22 '18 at 11:33
just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following:blank = "blank"
andexcept IndexError: with open('results.csv', 'a') as f: f.write(blank)
however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?
– qbbq
Nov 26 '18 at 3:07
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424252%2fscraping-with-python-and-selenium-how-should-i-return-a-null-if-element-not%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
As your error shows you have an index error!
To overcome that you should add a try except within the area that raises this error.
Also, you are using the driver.current_url
which returns the URL.
But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...
In your case try this:
for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
num_page_items = len(date)
for i in range(num_page_items):
try:
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)
except IndexError:
df.append(None) # or df.append('Null')
Hope you find this helpfull!
this solution works! thank you very much - I really appreciate it.
– qbbq
Nov 22 '18 at 11:27
just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
– qbbq
Nov 22 '18 at 11:28
It's a pandas issue probably... Just useNone
...
– Moshe Slavin
Nov 22 '18 at 11:31
Glad to help!!!
– Moshe Slavin
Nov 22 '18 at 11:33
just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following:blank = "blank"
andexcept IndexError: with open('results.csv', 'a') as f: f.write(blank)
however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?
– qbbq
Nov 26 '18 at 3:07
|
show 1 more comment
As your error shows you have an index error!
To overcome that you should add a try except within the area that raises this error.
Also, you are using the driver.current_url
which returns the URL.
But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...
In your case try this:
for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
num_page_items = len(date)
for i in range(num_page_items):
try:
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)
except IndexError:
df.append(None) # or df.append('Null')
Hope you find this helpfull!
this solution works! thank you very much - I really appreciate it.
– qbbq
Nov 22 '18 at 11:27
just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
– qbbq
Nov 22 '18 at 11:28
It's a pandas issue probably... Just useNone
...
– Moshe Slavin
Nov 22 '18 at 11:31
Glad to help!!!
– Moshe Slavin
Nov 22 '18 at 11:33
just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following:blank = "blank"
andexcept IndexError: with open('results.csv', 'a') as f: f.write(blank)
however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?
– qbbq
Nov 26 '18 at 3:07
|
show 1 more comment
As your error shows you have an index error!
To overcome that you should add a try except within the area that raises this error.
Also, you are using the driver.current_url
which returns the URL.
But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...
In your case try this:
for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
num_page_items = len(date)
for i in range(num_page_items):
try:
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)
except IndexError:
df.append(None) # or df.append('Null')
Hope you find this helpfull!
As your error shows you have an index error!
To overcome that you should add a try except within the area that raises this error.
Also, you are using the driver.current_url
which returns the URL.
But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...
In your case try this:
for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
num_page_items = len(date)
for i in range(num_page_items):
try:
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)
except IndexError:
df.append(None) # or df.append('Null')
Hope you find this helpfull!
answered Nov 22 '18 at 10:23
Moshe SlavinMoshe Slavin
2,1663824
2,1663824
this solution works! thank you very much - I really appreciate it.
– qbbq
Nov 22 '18 at 11:27
just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
– qbbq
Nov 22 '18 at 11:28
It's a pandas issue probably... Just useNone
...
– Moshe Slavin
Nov 22 '18 at 11:31
Glad to help!!!
– Moshe Slavin
Nov 22 '18 at 11:33
just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following:blank = "blank"
andexcept IndexError: with open('results.csv', 'a') as f: f.write(blank)
however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?
– qbbq
Nov 26 '18 at 3:07
|
show 1 more comment
this solution works! thank you very much - I really appreciate it.
– qbbq
Nov 22 '18 at 11:27
just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
– qbbq
Nov 22 '18 at 11:28
It's a pandas issue probably... Just useNone
...
– Moshe Slavin
Nov 22 '18 at 11:31
Glad to help!!!
– Moshe Slavin
Nov 22 '18 at 11:33
just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following:blank = "blank"
andexcept IndexError: with open('results.csv', 'a') as f: f.write(blank)
however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?
– qbbq
Nov 26 '18 at 3:07
this solution works! thank you very much - I really appreciate it.
– qbbq
Nov 22 '18 at 11:27
this solution works! thank you very much - I really appreciate it.
– qbbq
Nov 22 '18 at 11:27
just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
– qbbq
Nov 22 '18 at 11:28
just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
– qbbq
Nov 22 '18 at 11:28
It's a pandas issue probably... Just use
None
...– Moshe Slavin
Nov 22 '18 at 11:31
It's a pandas issue probably... Just use
None
...– Moshe Slavin
Nov 22 '18 at 11:31
Glad to help!!!
– Moshe Slavin
Nov 22 '18 at 11:33
Glad to help!!!
– Moshe Slavin
Nov 22 '18 at 11:33
just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following:
blank = "blank"
and except IndexError: with open('results.csv', 'a') as f: f.write(blank)
however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?– qbbq
Nov 26 '18 at 3:07
just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following:
blank = "blank"
and except IndexError: with open('results.csv', 'a') as f: f.write(blank)
however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?– qbbq
Nov 26 '18 at 3:07
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424252%2fscraping-with-python-and-selenium-how-should-i-return-a-null-if-element-not%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed
– Moshe Slavin
Nov 22 '18 at 6:57
I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried
– qbbq
Nov 22 '18 at 7:36
I'll take a look...
– Moshe Slavin
Nov 22 '18 at 10:10
I posted an answer let me know if you need any other assistance!
– Moshe Slavin
Nov 22 '18 at 10:27