Comparing 2 files containing IP addresses and returning one's that are not common
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
This is what I have done so far:
import difflib
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
result.write(line)
python-3.x
|
show 1 more comment
This is what I have done so far:
import difflib
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
result.write(line)
python-3.x
Could you include the expected output and the actual output? That would greatly help our efforts to help you.
– GeeTransit
Jan 2 at 22:50
What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.
– Baj Mile
Jan 2 at 22:58
The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference
– VKVNS
Jan 2 at 23:03
You could putset()
around your ips and then getting the difference that way. This means you wouldn't needdifflib
.
– GeeTransit
Jan 2 at 23:03
@GeeTransit maybe I should try that.
– VKVNS
Jan 2 at 23:04
|
show 1 more comment
This is what I have done so far:
import difflib
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
result.write(line)
python-3.x
This is what I have done so far:
import difflib
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
result.write(line)
python-3.x
python-3.x
asked Jan 2 at 22:38


VKVNSVKVNS
11
11
Could you include the expected output and the actual output? That would greatly help our efforts to help you.
– GeeTransit
Jan 2 at 22:50
What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.
– Baj Mile
Jan 2 at 22:58
The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference
– VKVNS
Jan 2 at 23:03
You could putset()
around your ips and then getting the difference that way. This means you wouldn't needdifflib
.
– GeeTransit
Jan 2 at 23:03
@GeeTransit maybe I should try that.
– VKVNS
Jan 2 at 23:04
|
show 1 more comment
Could you include the expected output and the actual output? That would greatly help our efforts to help you.
– GeeTransit
Jan 2 at 22:50
What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.
– Baj Mile
Jan 2 at 22:58
The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference
– VKVNS
Jan 2 at 23:03
You could putset()
around your ips and then getting the difference that way. This means you wouldn't needdifflib
.
– GeeTransit
Jan 2 at 23:03
@GeeTransit maybe I should try that.
– VKVNS
Jan 2 at 23:04
Could you include the expected output and the actual output? That would greatly help our efforts to help you.
– GeeTransit
Jan 2 at 22:50
Could you include the expected output and the actual output? That would greatly help our efforts to help you.
– GeeTransit
Jan 2 at 22:50
What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.
– Baj Mile
Jan 2 at 22:58
What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.
– Baj Mile
Jan 2 at 22:58
The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference
– VKVNS
Jan 2 at 23:03
The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference
– VKVNS
Jan 2 at 23:03
You could put
set()
around your ips and then getting the difference that way. This means you wouldn't need difflib
.– GeeTransit
Jan 2 at 23:03
You could put
set()
around your ips and then getting the difference that way. This means you wouldn't need difflib
.– GeeTransit
Jan 2 at 23:03
@GeeTransit maybe I should try that.
– VKVNS
Jan 2 at 23:04
@GeeTransit maybe I should try that.
– VKVNS
Jan 2 at 23:04
|
show 1 more comment
3 Answers
3
active
oldest
votes
Assuming the files are sorted or otherwise deterministically-ordered:
diff --old-line-format='' --new-line-format='' --unchanged-line-format='%L' file1.txt file2.txt
For sufficiently-large files, the one-shot cost of starting a subprocess will be smaller than the cost of doing logic in python.
Example using subprocess
module (note that diff
is expected to return 1
, so we can't just use check_call
):
#!/usr/bin/env python3
import subprocess
INPUT1 = 'file1.txt'
INPUT2 = 'file2.txt'
OUTPUT = 'common.txt'
with open(OUTPUT, 'wb') as out:
cmd = ['diff', '--old-line-format=', '--new-line-format=', '--unchanged-line-format=%L', INPUT1, INPUT2]
rv = subprocess.call(cmd, stdout=out)
if rv >= 2:
raise subprocess.CalledProcessError(rv, cmd)
I didn't get the answer
– VKVNS
Jan 3 at 0:09
Edited to add full code, then.
– o11c
Jan 3 at 0:53
add a comment |
If your files were like this:
File 1:
apple
banana
kokonut
orange
File 2:
banana
strawberry
orange
lime
You could try something like this:
with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
read_first = set()
read_second = set()
while True:
line = f1.readline().strip()
if line == '':
break
read_first.add(line)
# end while
while True:
line = f2.readline().strip()
if line == '':
break
read_second.add(line)
# end while
first = set()
second = set()
for i, j in zip(read_first, read_second):
first.add(i.strip())
second.add(j.strip())
diff = first - second # set difference (items in first not in second)
with open('diff.txt', 'w') as result:
for item in diff:
result.write(item + 'n')
This would output into diff.txt
and would display:
apple
kokonut
Note how banana and orange don't appear on the file we've created.
Actually, this is just concatenating the 2 lists into a single file
– VKVNS
Jan 2 at 23:19
Okay. I've fixed it so it doesn't test with then
s on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.
– GeeTransit
Jan 2 at 23:29
It is working but it is just returning both the files in diff.txt
– VKVNS
Jan 2 at 23:52
Did you want only one of the files to be checked?
– GeeTransit
Jan 2 at 23:52
file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151
– VKVNS
Jan 2 at 23:54
|
show 11 more comments
Here are 2 ways to accomplish this task without using difflib. These methods are different, because the input file structure are different.
# input file structure
# ip-10-232-10-149
# ip-10-232-10-150
# ip-10-232-10-151
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
nodes = set(f1.readlines())
dnsnames = set(f2.readlines())
diff_between_nodes_dnsnames = list(nodes.difference(dnsnames))
diff_between_dnsnames_nodes = list(dnsnames.difference(nodes))
ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))
for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))
**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151
####################################################
####################################################
# input file structure
# ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
nodes = f1.read()
dnsnames = f2.read()
split_nodes = [x for x in nodes.split()]
split_dnsnames = [x for x in dnsnames.split()]
set_nodes = set(split_nodes)
set_dnsnames = set(split_dnsnames)
diff_between_nodes_dnsnames = list(set_nodes.difference(set_dnsnames))
diff_between_dnsnames_nodes = list(set_dnsnames.difference(set_nodes))
ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))
for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))
**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151
Here is the way to accomplish this task using difflib and your original code.
import difflib
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
if re.search(r'-s(.*)', line):
print (line)
result.write(line)
**OUTPUTS**
- ip-10-232-10-150
- ip-10-232-10-151
Here is another way to accomplish this task, which using list comprehension.
node_list = ('ip-10-232-10-149', 'ip-10-232-10-150', 'ip-10-232-10-151', 'ip-10-232-10-152')
aws_instances_dnsname = ('ip-10-232-10-145','ip-10-232-10-146','ip-10-232-10-147','ip-10-232-10-149', 'ip-10-232-10-152')
ip_address_differences = [ip_address for ip_address in node_list if
ip_address not in aws_instances_dnsname]
print (ip_address_differences)
**OUTPUTS**
['ip-10-232-10-150', 'ip-10-232-10-151']
This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)
– VKVNS
Jan 3 at 12:51
Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.
– Life is complex
Jan 3 at 14:08
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014092%2fcomparing-2-files-containing-ip-addresses-and-returning-ones-that-are-not-commo%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Assuming the files are sorted or otherwise deterministically-ordered:
diff --old-line-format='' --new-line-format='' --unchanged-line-format='%L' file1.txt file2.txt
For sufficiently-large files, the one-shot cost of starting a subprocess will be smaller than the cost of doing logic in python.
Example using subprocess
module (note that diff
is expected to return 1
, so we can't just use check_call
):
#!/usr/bin/env python3
import subprocess
INPUT1 = 'file1.txt'
INPUT2 = 'file2.txt'
OUTPUT = 'common.txt'
with open(OUTPUT, 'wb') as out:
cmd = ['diff', '--old-line-format=', '--new-line-format=', '--unchanged-line-format=%L', INPUT1, INPUT2]
rv = subprocess.call(cmd, stdout=out)
if rv >= 2:
raise subprocess.CalledProcessError(rv, cmd)
I didn't get the answer
– VKVNS
Jan 3 at 0:09
Edited to add full code, then.
– o11c
Jan 3 at 0:53
add a comment |
Assuming the files are sorted or otherwise deterministically-ordered:
diff --old-line-format='' --new-line-format='' --unchanged-line-format='%L' file1.txt file2.txt
For sufficiently-large files, the one-shot cost of starting a subprocess will be smaller than the cost of doing logic in python.
Example using subprocess
module (note that diff
is expected to return 1
, so we can't just use check_call
):
#!/usr/bin/env python3
import subprocess
INPUT1 = 'file1.txt'
INPUT2 = 'file2.txt'
OUTPUT = 'common.txt'
with open(OUTPUT, 'wb') as out:
cmd = ['diff', '--old-line-format=', '--new-line-format=', '--unchanged-line-format=%L', INPUT1, INPUT2]
rv = subprocess.call(cmd, stdout=out)
if rv >= 2:
raise subprocess.CalledProcessError(rv, cmd)
I didn't get the answer
– VKVNS
Jan 3 at 0:09
Edited to add full code, then.
– o11c
Jan 3 at 0:53
add a comment |
Assuming the files are sorted or otherwise deterministically-ordered:
diff --old-line-format='' --new-line-format='' --unchanged-line-format='%L' file1.txt file2.txt
For sufficiently-large files, the one-shot cost of starting a subprocess will be smaller than the cost of doing logic in python.
Example using subprocess
module (note that diff
is expected to return 1
, so we can't just use check_call
):
#!/usr/bin/env python3
import subprocess
INPUT1 = 'file1.txt'
INPUT2 = 'file2.txt'
OUTPUT = 'common.txt'
with open(OUTPUT, 'wb') as out:
cmd = ['diff', '--old-line-format=', '--new-line-format=', '--unchanged-line-format=%L', INPUT1, INPUT2]
rv = subprocess.call(cmd, stdout=out)
if rv >= 2:
raise subprocess.CalledProcessError(rv, cmd)
Assuming the files are sorted or otherwise deterministically-ordered:
diff --old-line-format='' --new-line-format='' --unchanged-line-format='%L' file1.txt file2.txt
For sufficiently-large files, the one-shot cost of starting a subprocess will be smaller than the cost of doing logic in python.
Example using subprocess
module (note that diff
is expected to return 1
, so we can't just use check_call
):
#!/usr/bin/env python3
import subprocess
INPUT1 = 'file1.txt'
INPUT2 = 'file2.txt'
OUTPUT = 'common.txt'
with open(OUTPUT, 'wb') as out:
cmd = ['diff', '--old-line-format=', '--new-line-format=', '--unchanged-line-format=%L', INPUT1, INPUT2]
rv = subprocess.call(cmd, stdout=out)
if rv >= 2:
raise subprocess.CalledProcessError(rv, cmd)
edited Jan 3 at 0:53
answered Jan 3 at 0:07
o11co11c
11k43155
11k43155
I didn't get the answer
– VKVNS
Jan 3 at 0:09
Edited to add full code, then.
– o11c
Jan 3 at 0:53
add a comment |
I didn't get the answer
– VKVNS
Jan 3 at 0:09
Edited to add full code, then.
– o11c
Jan 3 at 0:53
I didn't get the answer
– VKVNS
Jan 3 at 0:09
I didn't get the answer
– VKVNS
Jan 3 at 0:09
Edited to add full code, then.
– o11c
Jan 3 at 0:53
Edited to add full code, then.
– o11c
Jan 3 at 0:53
add a comment |
If your files were like this:
File 1:
apple
banana
kokonut
orange
File 2:
banana
strawberry
orange
lime
You could try something like this:
with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
read_first = set()
read_second = set()
while True:
line = f1.readline().strip()
if line == '':
break
read_first.add(line)
# end while
while True:
line = f2.readline().strip()
if line == '':
break
read_second.add(line)
# end while
first = set()
second = set()
for i, j in zip(read_first, read_second):
first.add(i.strip())
second.add(j.strip())
diff = first - second # set difference (items in first not in second)
with open('diff.txt', 'w') as result:
for item in diff:
result.write(item + 'n')
This would output into diff.txt
and would display:
apple
kokonut
Note how banana and orange don't appear on the file we've created.
Actually, this is just concatenating the 2 lists into a single file
– VKVNS
Jan 2 at 23:19
Okay. I've fixed it so it doesn't test with then
s on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.
– GeeTransit
Jan 2 at 23:29
It is working but it is just returning both the files in diff.txt
– VKVNS
Jan 2 at 23:52
Did you want only one of the files to be checked?
– GeeTransit
Jan 2 at 23:52
file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151
– VKVNS
Jan 2 at 23:54
|
show 11 more comments
If your files were like this:
File 1:
apple
banana
kokonut
orange
File 2:
banana
strawberry
orange
lime
You could try something like this:
with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
read_first = set()
read_second = set()
while True:
line = f1.readline().strip()
if line == '':
break
read_first.add(line)
# end while
while True:
line = f2.readline().strip()
if line == '':
break
read_second.add(line)
# end while
first = set()
second = set()
for i, j in zip(read_first, read_second):
first.add(i.strip())
second.add(j.strip())
diff = first - second # set difference (items in first not in second)
with open('diff.txt', 'w') as result:
for item in diff:
result.write(item + 'n')
This would output into diff.txt
and would display:
apple
kokonut
Note how banana and orange don't appear on the file we've created.
Actually, this is just concatenating the 2 lists into a single file
– VKVNS
Jan 2 at 23:19
Okay. I've fixed it so it doesn't test with then
s on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.
– GeeTransit
Jan 2 at 23:29
It is working but it is just returning both the files in diff.txt
– VKVNS
Jan 2 at 23:52
Did you want only one of the files to be checked?
– GeeTransit
Jan 2 at 23:52
file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151
– VKVNS
Jan 2 at 23:54
|
show 11 more comments
If your files were like this:
File 1:
apple
banana
kokonut
orange
File 2:
banana
strawberry
orange
lime
You could try something like this:
with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
read_first = set()
read_second = set()
while True:
line = f1.readline().strip()
if line == '':
break
read_first.add(line)
# end while
while True:
line = f2.readline().strip()
if line == '':
break
read_second.add(line)
# end while
first = set()
second = set()
for i, j in zip(read_first, read_second):
first.add(i.strip())
second.add(j.strip())
diff = first - second # set difference (items in first not in second)
with open('diff.txt', 'w') as result:
for item in diff:
result.write(item + 'n')
This would output into diff.txt
and would display:
apple
kokonut
Note how banana and orange don't appear on the file we've created.
If your files were like this:
File 1:
apple
banana
kokonut
orange
File 2:
banana
strawberry
orange
lime
You could try something like this:
with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
read_first = set()
read_second = set()
while True:
line = f1.readline().strip()
if line == '':
break
read_first.add(line)
# end while
while True:
line = f2.readline().strip()
if line == '':
break
read_second.add(line)
# end while
first = set()
second = set()
for i, j in zip(read_first, read_second):
first.add(i.strip())
second.add(j.strip())
diff = first - second # set difference (items in first not in second)
with open('diff.txt', 'w') as result:
for item in diff:
result.write(item + 'n')
This would output into diff.txt
and would display:
apple
kokonut
Note how banana and orange don't appear on the file we've created.
edited Jan 3 at 0:59
answered Jan 2 at 23:08


GeeTransitGeeTransit
694316
694316
Actually, this is just concatenating the 2 lists into a single file
– VKVNS
Jan 2 at 23:19
Okay. I've fixed it so it doesn't test with then
s on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.
– GeeTransit
Jan 2 at 23:29
It is working but it is just returning both the files in diff.txt
– VKVNS
Jan 2 at 23:52
Did you want only one of the files to be checked?
– GeeTransit
Jan 2 at 23:52
file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151
– VKVNS
Jan 2 at 23:54
|
show 11 more comments
Actually, this is just concatenating the 2 lists into a single file
– VKVNS
Jan 2 at 23:19
Okay. I've fixed it so it doesn't test with then
s on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.
– GeeTransit
Jan 2 at 23:29
It is working but it is just returning both the files in diff.txt
– VKVNS
Jan 2 at 23:52
Did you want only one of the files to be checked?
– GeeTransit
Jan 2 at 23:52
file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151
– VKVNS
Jan 2 at 23:54
Actually, this is just concatenating the 2 lists into a single file
– VKVNS
Jan 2 at 23:19
Actually, this is just concatenating the 2 lists into a single file
– VKVNS
Jan 2 at 23:19
Okay. I've fixed it so it doesn't test with the
n
s on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.– GeeTransit
Jan 2 at 23:29
Okay. I've fixed it so it doesn't test with the
n
s on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.– GeeTransit
Jan 2 at 23:29
It is working but it is just returning both the files in diff.txt
– VKVNS
Jan 2 at 23:52
It is working but it is just returning both the files in diff.txt
– VKVNS
Jan 2 at 23:52
Did you want only one of the files to be checked?
– GeeTransit
Jan 2 at 23:52
Did you want only one of the files to be checked?
– GeeTransit
Jan 2 at 23:52
file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151
– VKVNS
Jan 2 at 23:54
file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151
– VKVNS
Jan 2 at 23:54
|
show 11 more comments
Here are 2 ways to accomplish this task without using difflib. These methods are different, because the input file structure are different.
# input file structure
# ip-10-232-10-149
# ip-10-232-10-150
# ip-10-232-10-151
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
nodes = set(f1.readlines())
dnsnames = set(f2.readlines())
diff_between_nodes_dnsnames = list(nodes.difference(dnsnames))
diff_between_dnsnames_nodes = list(dnsnames.difference(nodes))
ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))
for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))
**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151
####################################################
####################################################
# input file structure
# ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
nodes = f1.read()
dnsnames = f2.read()
split_nodes = [x for x in nodes.split()]
split_dnsnames = [x for x in dnsnames.split()]
set_nodes = set(split_nodes)
set_dnsnames = set(split_dnsnames)
diff_between_nodes_dnsnames = list(set_nodes.difference(set_dnsnames))
diff_between_dnsnames_nodes = list(set_dnsnames.difference(set_nodes))
ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))
for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))
**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151
Here is the way to accomplish this task using difflib and your original code.
import difflib
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
if re.search(r'-s(.*)', line):
print (line)
result.write(line)
**OUTPUTS**
- ip-10-232-10-150
- ip-10-232-10-151
Here is another way to accomplish this task, which using list comprehension.
node_list = ('ip-10-232-10-149', 'ip-10-232-10-150', 'ip-10-232-10-151', 'ip-10-232-10-152')
aws_instances_dnsname = ('ip-10-232-10-145','ip-10-232-10-146','ip-10-232-10-147','ip-10-232-10-149', 'ip-10-232-10-152')
ip_address_differences = [ip_address for ip_address in node_list if
ip_address not in aws_instances_dnsname]
print (ip_address_differences)
**OUTPUTS**
['ip-10-232-10-150', 'ip-10-232-10-151']
This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)
– VKVNS
Jan 3 at 12:51
Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.
– Life is complex
Jan 3 at 14:08
add a comment |
Here are 2 ways to accomplish this task without using difflib. These methods are different, because the input file structure are different.
# input file structure
# ip-10-232-10-149
# ip-10-232-10-150
# ip-10-232-10-151
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
nodes = set(f1.readlines())
dnsnames = set(f2.readlines())
diff_between_nodes_dnsnames = list(nodes.difference(dnsnames))
diff_between_dnsnames_nodes = list(dnsnames.difference(nodes))
ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))
for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))
**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151
####################################################
####################################################
# input file structure
# ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
nodes = f1.read()
dnsnames = f2.read()
split_nodes = [x for x in nodes.split()]
split_dnsnames = [x for x in dnsnames.split()]
set_nodes = set(split_nodes)
set_dnsnames = set(split_dnsnames)
diff_between_nodes_dnsnames = list(set_nodes.difference(set_dnsnames))
diff_between_dnsnames_nodes = list(set_dnsnames.difference(set_nodes))
ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))
for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))
**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151
Here is the way to accomplish this task using difflib and your original code.
import difflib
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
if re.search(r'-s(.*)', line):
print (line)
result.write(line)
**OUTPUTS**
- ip-10-232-10-150
- ip-10-232-10-151
Here is another way to accomplish this task, which using list comprehension.
node_list = ('ip-10-232-10-149', 'ip-10-232-10-150', 'ip-10-232-10-151', 'ip-10-232-10-152')
aws_instances_dnsname = ('ip-10-232-10-145','ip-10-232-10-146','ip-10-232-10-147','ip-10-232-10-149', 'ip-10-232-10-152')
ip_address_differences = [ip_address for ip_address in node_list if
ip_address not in aws_instances_dnsname]
print (ip_address_differences)
**OUTPUTS**
['ip-10-232-10-150', 'ip-10-232-10-151']
This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)
– VKVNS
Jan 3 at 12:51
Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.
– Life is complex
Jan 3 at 14:08
add a comment |
Here are 2 ways to accomplish this task without using difflib. These methods are different, because the input file structure are different.
# input file structure
# ip-10-232-10-149
# ip-10-232-10-150
# ip-10-232-10-151
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
nodes = set(f1.readlines())
dnsnames = set(f2.readlines())
diff_between_nodes_dnsnames = list(nodes.difference(dnsnames))
diff_between_dnsnames_nodes = list(dnsnames.difference(nodes))
ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))
for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))
**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151
####################################################
####################################################
# input file structure
# ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
nodes = f1.read()
dnsnames = f2.read()
split_nodes = [x for x in nodes.split()]
split_dnsnames = [x for x in dnsnames.split()]
set_nodes = set(split_nodes)
set_dnsnames = set(split_dnsnames)
diff_between_nodes_dnsnames = list(set_nodes.difference(set_dnsnames))
diff_between_dnsnames_nodes = list(set_dnsnames.difference(set_nodes))
ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))
for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))
**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151
Here is the way to accomplish this task using difflib and your original code.
import difflib
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
if re.search(r'-s(.*)', line):
print (line)
result.write(line)
**OUTPUTS**
- ip-10-232-10-150
- ip-10-232-10-151
Here is another way to accomplish this task, which using list comprehension.
node_list = ('ip-10-232-10-149', 'ip-10-232-10-150', 'ip-10-232-10-151', 'ip-10-232-10-152')
aws_instances_dnsname = ('ip-10-232-10-145','ip-10-232-10-146','ip-10-232-10-147','ip-10-232-10-149', 'ip-10-232-10-152')
ip_address_differences = [ip_address for ip_address in node_list if
ip_address not in aws_instances_dnsname]
print (ip_address_differences)
**OUTPUTS**
['ip-10-232-10-150', 'ip-10-232-10-151']
Here are 2 ways to accomplish this task without using difflib. These methods are different, because the input file structure are different.
# input file structure
# ip-10-232-10-149
# ip-10-232-10-150
# ip-10-232-10-151
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
nodes = set(f1.readlines())
dnsnames = set(f2.readlines())
diff_between_nodes_dnsnames = list(nodes.difference(dnsnames))
diff_between_dnsnames_nodes = list(dnsnames.difference(nodes))
ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))
for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))
**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151
####################################################
####################################################
# input file structure
# ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
nodes = f1.read()
dnsnames = f2.read()
split_nodes = [x for x in nodes.split()]
split_dnsnames = [x for x in dnsnames.split()]
set_nodes = set(split_nodes)
set_dnsnames = set(split_dnsnames)
diff_between_nodes_dnsnames = list(set_nodes.difference(set_dnsnames))
diff_between_dnsnames_nodes = list(set_dnsnames.difference(set_nodes))
ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))
for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))
**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151
Here is the way to accomplish this task using difflib and your original code.
import difflib
with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
if re.search(r'-s(.*)', line):
print (line)
result.write(line)
**OUTPUTS**
- ip-10-232-10-150
- ip-10-232-10-151
Here is another way to accomplish this task, which using list comprehension.
node_list = ('ip-10-232-10-149', 'ip-10-232-10-150', 'ip-10-232-10-151', 'ip-10-232-10-152')
aws_instances_dnsname = ('ip-10-232-10-145','ip-10-232-10-146','ip-10-232-10-147','ip-10-232-10-149', 'ip-10-232-10-152')
ip_address_differences = [ip_address for ip_address in node_list if
ip_address not in aws_instances_dnsname]
print (ip_address_differences)
**OUTPUTS**
['ip-10-232-10-150', 'ip-10-232-10-151']
edited Jan 3 at 21:08
answered Jan 3 at 5:01


Life is complexLife is complex
731518
731518
This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)
– VKVNS
Jan 3 at 12:51
Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.
– Life is complex
Jan 3 at 14:08
add a comment |
This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)
– VKVNS
Jan 3 at 12:51
Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.
– Life is complex
Jan 3 at 14:08
This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)
– VKVNS
Jan 3 at 12:51
This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)
– VKVNS
Jan 3 at 12:51
Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.
– Life is complex
Jan 3 at 14:08
Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.
– Life is complex
Jan 3 at 14:08
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014092%2fcomparing-2-files-containing-ip-addresses-and-returning-ones-that-are-not-commo%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Could you include the expected output and the actual output? That would greatly help our efforts to help you.
– GeeTransit
Jan 2 at 22:50
What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.
– Baj Mile
Jan 2 at 22:58
The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference
– VKVNS
Jan 2 at 23:03
You could put
set()
around your ips and then getting the difference that way. This means you wouldn't needdifflib
.– GeeTransit
Jan 2 at 23:03
@GeeTransit maybe I should try that.
– VKVNS
Jan 2 at 23:04