Comparing 2 files containing IP addresses and returning one's that are not common





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







-1















This is what I have done so far:



import difflib

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
result.write(line)









share|improve this question























  • Could you include the expected output and the actual output? That would greatly help our efforts to help you.

    – GeeTransit
    Jan 2 at 22:50











  • What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.

    – Baj Mile
    Jan 2 at 22:58











  • The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference

    – VKVNS
    Jan 2 at 23:03











  • You could put set() around your ips and then getting the difference that way. This means you wouldn't need difflib.

    – GeeTransit
    Jan 2 at 23:03













  • @GeeTransit maybe I should try that.

    – VKVNS
    Jan 2 at 23:04


















-1















This is what I have done so far:



import difflib

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
result.write(line)









share|improve this question























  • Could you include the expected output and the actual output? That would greatly help our efforts to help you.

    – GeeTransit
    Jan 2 at 22:50











  • What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.

    – Baj Mile
    Jan 2 at 22:58











  • The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference

    – VKVNS
    Jan 2 at 23:03











  • You could put set() around your ips and then getting the difference that way. This means you wouldn't need difflib.

    – GeeTransit
    Jan 2 at 23:03













  • @GeeTransit maybe I should try that.

    – VKVNS
    Jan 2 at 23:04














-1












-1








-1








This is what I have done so far:



import difflib

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
result.write(line)









share|improve this question














This is what I have done so far:



import difflib

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
result.write(line)






python-3.x






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 2 at 22:38









VKVNSVKVNS

11




11













  • Could you include the expected output and the actual output? That would greatly help our efforts to help you.

    – GeeTransit
    Jan 2 at 22:50











  • What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.

    – Baj Mile
    Jan 2 at 22:58











  • The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference

    – VKVNS
    Jan 2 at 23:03











  • You could put set() around your ips and then getting the difference that way. This means you wouldn't need difflib.

    – GeeTransit
    Jan 2 at 23:03













  • @GeeTransit maybe I should try that.

    – VKVNS
    Jan 2 at 23:04



















  • Could you include the expected output and the actual output? That would greatly help our efforts to help you.

    – GeeTransit
    Jan 2 at 22:50











  • What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.

    – Baj Mile
    Jan 2 at 22:58











  • The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference

    – VKVNS
    Jan 2 at 23:03











  • You could put set() around your ips and then getting the difference that way. This means you wouldn't need difflib.

    – GeeTransit
    Jan 2 at 23:03













  • @GeeTransit maybe I should try that.

    – VKVNS
    Jan 2 at 23:04

















Could you include the expected output and the actual output? That would greatly help our efforts to help you.

– GeeTransit
Jan 2 at 22:50





Could you include the expected output and the actual output? That would greatly help our efforts to help you.

– GeeTransit
Jan 2 at 22:50













What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.

– Baj Mile
Jan 2 at 22:58





What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.

– Baj Mile
Jan 2 at 22:58













The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference

– VKVNS
Jan 2 at 23:03





The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference

– VKVNS
Jan 2 at 23:03













You could put set() around your ips and then getting the difference that way. This means you wouldn't need difflib.

– GeeTransit
Jan 2 at 23:03







You could put set() around your ips and then getting the difference that way. This means you wouldn't need difflib.

– GeeTransit
Jan 2 at 23:03















@GeeTransit maybe I should try that.

– VKVNS
Jan 2 at 23:04





@GeeTransit maybe I should try that.

– VKVNS
Jan 2 at 23:04












3 Answers
3






active

oldest

votes


















0














Assuming the files are sorted or otherwise deterministically-ordered:



diff --old-line-format='' --new-line-format='' --unchanged-line-format='%L' file1.txt file2.txt


For sufficiently-large files, the one-shot cost of starting a subprocess will be smaller than the cost of doing logic in python.



Example using subprocess module (note that diff is expected to return 1, so we can't just use check_call):



#!/usr/bin/env python3

import subprocess


INPUT1 = 'file1.txt'
INPUT2 = 'file2.txt'
OUTPUT = 'common.txt'

with open(OUTPUT, 'wb') as out:
cmd = ['diff', '--old-line-format=', '--new-line-format=', '--unchanged-line-format=%L', INPUT1, INPUT2]
rv = subprocess.call(cmd, stdout=out)
if rv >= 2:
raise subprocess.CalledProcessError(rv, cmd)





share|improve this answer


























  • I didn't get the answer

    – VKVNS
    Jan 3 at 0:09











  • Edited to add full code, then.

    – o11c
    Jan 3 at 0:53



















0














If your files were like this:



File 1:

apple

banana

kokonut

orange



File 2:

banana

strawberry

orange

lime



You could try something like this:



with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
read_first = set()
read_second = set()
while True:
line = f1.readline().strip()
if line == '':
break
read_first.add(line)
# end while
while True:
line = f2.readline().strip()
if line == '':
break
read_second.add(line)
# end while
first = set()
second = set()
for i, j in zip(read_first, read_second):
first.add(i.strip())
second.add(j.strip())
diff = first - second # set difference (items in first not in second)
with open('diff.txt', 'w') as result:
for item in diff:
result.write(item + 'n')


This would output into diff.txt and would display:

apple

kokonut



Note how banana and orange don't appear on the file we've created.






share|improve this answer


























  • Actually, this is just concatenating the 2 lists into a single file

    – VKVNS
    Jan 2 at 23:19











  • Okay. I've fixed it so it doesn't test with the ns on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.

    – GeeTransit
    Jan 2 at 23:29











  • It is working but it is just returning both the files in diff.txt

    – VKVNS
    Jan 2 at 23:52











  • Did you want only one of the files to be checked?

    – GeeTransit
    Jan 2 at 23:52











  • file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151

    – VKVNS
    Jan 2 at 23:54





















0














Here are 2 ways to accomplish this task without using difflib. These methods are different, because the input file structure are different.



# input file structure
# ip-10-232-10-149
# ip-10-232-10-150
# ip-10-232-10-151

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

nodes = set(f1.readlines())
dnsnames = set(f2.readlines())

diff_between_nodes_dnsnames = list(nodes.difference(dnsnames))
diff_between_dnsnames_nodes = list(dnsnames.difference(nodes))

ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))

for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))

**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151

####################################################
####################################################

# input file structure
# ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

nodes = f1.read()
dnsnames = f2.read()

split_nodes = [x for x in nodes.split()]
split_dnsnames = [x for x in dnsnames.split()]

set_nodes = set(split_nodes)
set_dnsnames = set(split_dnsnames)

diff_between_nodes_dnsnames = list(set_nodes.difference(set_dnsnames))
diff_between_dnsnames_nodes = list(set_dnsnames.difference(set_nodes))

ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))

for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))

**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151


Here is the way to accomplish this task using difflib and your original code.



import difflib

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
if re.search(r'-s(.*)', line):
print (line)
result.write(line)

**OUTPUTS**
- ip-10-232-10-150
- ip-10-232-10-151


Here is another way to accomplish this task, which using list comprehension.



node_list = ('ip-10-232-10-149', 'ip-10-232-10-150', 'ip-10-232-10-151', 'ip-10-232-10-152')
aws_instances_dnsname = ('ip-10-232-10-145','ip-10-232-10-146','ip-10-232-10-147','ip-10-232-10-149', 'ip-10-232-10-152')

ip_address_differences = [ip_address for ip_address in node_list if
ip_address not in aws_instances_dnsname]

print (ip_address_differences)

**OUTPUTS**
['ip-10-232-10-150', 'ip-10-232-10-151']





share|improve this answer


























  • This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)

    – VKVNS
    Jan 3 at 12:51











  • Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.

    – Life is complex
    Jan 3 at 14:08












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014092%2fcomparing-2-files-containing-ip-addresses-and-returning-ones-that-are-not-commo%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














Assuming the files are sorted or otherwise deterministically-ordered:



diff --old-line-format='' --new-line-format='' --unchanged-line-format='%L' file1.txt file2.txt


For sufficiently-large files, the one-shot cost of starting a subprocess will be smaller than the cost of doing logic in python.



Example using subprocess module (note that diff is expected to return 1, so we can't just use check_call):



#!/usr/bin/env python3

import subprocess


INPUT1 = 'file1.txt'
INPUT2 = 'file2.txt'
OUTPUT = 'common.txt'

with open(OUTPUT, 'wb') as out:
cmd = ['diff', '--old-line-format=', '--new-line-format=', '--unchanged-line-format=%L', INPUT1, INPUT2]
rv = subprocess.call(cmd, stdout=out)
if rv >= 2:
raise subprocess.CalledProcessError(rv, cmd)





share|improve this answer


























  • I didn't get the answer

    – VKVNS
    Jan 3 at 0:09











  • Edited to add full code, then.

    – o11c
    Jan 3 at 0:53
















0














Assuming the files are sorted or otherwise deterministically-ordered:



diff --old-line-format='' --new-line-format='' --unchanged-line-format='%L' file1.txt file2.txt


For sufficiently-large files, the one-shot cost of starting a subprocess will be smaller than the cost of doing logic in python.



Example using subprocess module (note that diff is expected to return 1, so we can't just use check_call):



#!/usr/bin/env python3

import subprocess


INPUT1 = 'file1.txt'
INPUT2 = 'file2.txt'
OUTPUT = 'common.txt'

with open(OUTPUT, 'wb') as out:
cmd = ['diff', '--old-line-format=', '--new-line-format=', '--unchanged-line-format=%L', INPUT1, INPUT2]
rv = subprocess.call(cmd, stdout=out)
if rv >= 2:
raise subprocess.CalledProcessError(rv, cmd)





share|improve this answer


























  • I didn't get the answer

    – VKVNS
    Jan 3 at 0:09











  • Edited to add full code, then.

    – o11c
    Jan 3 at 0:53














0












0








0







Assuming the files are sorted or otherwise deterministically-ordered:



diff --old-line-format='' --new-line-format='' --unchanged-line-format='%L' file1.txt file2.txt


For sufficiently-large files, the one-shot cost of starting a subprocess will be smaller than the cost of doing logic in python.



Example using subprocess module (note that diff is expected to return 1, so we can't just use check_call):



#!/usr/bin/env python3

import subprocess


INPUT1 = 'file1.txt'
INPUT2 = 'file2.txt'
OUTPUT = 'common.txt'

with open(OUTPUT, 'wb') as out:
cmd = ['diff', '--old-line-format=', '--new-line-format=', '--unchanged-line-format=%L', INPUT1, INPUT2]
rv = subprocess.call(cmd, stdout=out)
if rv >= 2:
raise subprocess.CalledProcessError(rv, cmd)





share|improve this answer















Assuming the files are sorted or otherwise deterministically-ordered:



diff --old-line-format='' --new-line-format='' --unchanged-line-format='%L' file1.txt file2.txt


For sufficiently-large files, the one-shot cost of starting a subprocess will be smaller than the cost of doing logic in python.



Example using subprocess module (note that diff is expected to return 1, so we can't just use check_call):



#!/usr/bin/env python3

import subprocess


INPUT1 = 'file1.txt'
INPUT2 = 'file2.txt'
OUTPUT = 'common.txt'

with open(OUTPUT, 'wb') as out:
cmd = ['diff', '--old-line-format=', '--new-line-format=', '--unchanged-line-format=%L', INPUT1, INPUT2]
rv = subprocess.call(cmd, stdout=out)
if rv >= 2:
raise subprocess.CalledProcessError(rv, cmd)






share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 3 at 0:53

























answered Jan 3 at 0:07









o11co11c

11k43155




11k43155













  • I didn't get the answer

    – VKVNS
    Jan 3 at 0:09











  • Edited to add full code, then.

    – o11c
    Jan 3 at 0:53



















  • I didn't get the answer

    – VKVNS
    Jan 3 at 0:09











  • Edited to add full code, then.

    – o11c
    Jan 3 at 0:53

















I didn't get the answer

– VKVNS
Jan 3 at 0:09





I didn't get the answer

– VKVNS
Jan 3 at 0:09













Edited to add full code, then.

– o11c
Jan 3 at 0:53





Edited to add full code, then.

– o11c
Jan 3 at 0:53













0














If your files were like this:



File 1:

apple

banana

kokonut

orange



File 2:

banana

strawberry

orange

lime



You could try something like this:



with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
read_first = set()
read_second = set()
while True:
line = f1.readline().strip()
if line == '':
break
read_first.add(line)
# end while
while True:
line = f2.readline().strip()
if line == '':
break
read_second.add(line)
# end while
first = set()
second = set()
for i, j in zip(read_first, read_second):
first.add(i.strip())
second.add(j.strip())
diff = first - second # set difference (items in first not in second)
with open('diff.txt', 'w') as result:
for item in diff:
result.write(item + 'n')


This would output into diff.txt and would display:

apple

kokonut



Note how banana and orange don't appear on the file we've created.






share|improve this answer


























  • Actually, this is just concatenating the 2 lists into a single file

    – VKVNS
    Jan 2 at 23:19











  • Okay. I've fixed it so it doesn't test with the ns on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.

    – GeeTransit
    Jan 2 at 23:29











  • It is working but it is just returning both the files in diff.txt

    – VKVNS
    Jan 2 at 23:52











  • Did you want only one of the files to be checked?

    – GeeTransit
    Jan 2 at 23:52











  • file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151

    – VKVNS
    Jan 2 at 23:54


















0














If your files were like this:



File 1:

apple

banana

kokonut

orange



File 2:

banana

strawberry

orange

lime



You could try something like this:



with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
read_first = set()
read_second = set()
while True:
line = f1.readline().strip()
if line == '':
break
read_first.add(line)
# end while
while True:
line = f2.readline().strip()
if line == '':
break
read_second.add(line)
# end while
first = set()
second = set()
for i, j in zip(read_first, read_second):
first.add(i.strip())
second.add(j.strip())
diff = first - second # set difference (items in first not in second)
with open('diff.txt', 'w') as result:
for item in diff:
result.write(item + 'n')


This would output into diff.txt and would display:

apple

kokonut



Note how banana and orange don't appear on the file we've created.






share|improve this answer


























  • Actually, this is just concatenating the 2 lists into a single file

    – VKVNS
    Jan 2 at 23:19











  • Okay. I've fixed it so it doesn't test with the ns on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.

    – GeeTransit
    Jan 2 at 23:29











  • It is working but it is just returning both the files in diff.txt

    – VKVNS
    Jan 2 at 23:52











  • Did you want only one of the files to be checked?

    – GeeTransit
    Jan 2 at 23:52











  • file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151

    – VKVNS
    Jan 2 at 23:54
















0












0








0







If your files were like this:



File 1:

apple

banana

kokonut

orange



File 2:

banana

strawberry

orange

lime



You could try something like this:



with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
read_first = set()
read_second = set()
while True:
line = f1.readline().strip()
if line == '':
break
read_first.add(line)
# end while
while True:
line = f2.readline().strip()
if line == '':
break
read_second.add(line)
# end while
first = set()
second = set()
for i, j in zip(read_first, read_second):
first.add(i.strip())
second.add(j.strip())
diff = first - second # set difference (items in first not in second)
with open('diff.txt', 'w') as result:
for item in diff:
result.write(item + 'n')


This would output into diff.txt and would display:

apple

kokonut



Note how banana and orange don't appear on the file we've created.






share|improve this answer















If your files were like this:



File 1:

apple

banana

kokonut

orange



File 2:

banana

strawberry

orange

lime



You could try something like this:



with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
read_first = set()
read_second = set()
while True:
line = f1.readline().strip()
if line == '':
break
read_first.add(line)
# end while
while True:
line = f2.readline().strip()
if line == '':
break
read_second.add(line)
# end while
first = set()
second = set()
for i, j in zip(read_first, read_second):
first.add(i.strip())
second.add(j.strip())
diff = first - second # set difference (items in first not in second)
with open('diff.txt', 'w') as result:
for item in diff:
result.write(item + 'n')


This would output into diff.txt and would display:

apple

kokonut



Note how banana and orange don't appear on the file we've created.







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 3 at 0:59

























answered Jan 2 at 23:08









GeeTransitGeeTransit

694316




694316













  • Actually, this is just concatenating the 2 lists into a single file

    – VKVNS
    Jan 2 at 23:19











  • Okay. I've fixed it so it doesn't test with the ns on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.

    – GeeTransit
    Jan 2 at 23:29











  • It is working but it is just returning both the files in diff.txt

    – VKVNS
    Jan 2 at 23:52











  • Did you want only one of the files to be checked?

    – GeeTransit
    Jan 2 at 23:52











  • file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151

    – VKVNS
    Jan 2 at 23:54





















  • Actually, this is just concatenating the 2 lists into a single file

    – VKVNS
    Jan 2 at 23:19











  • Okay. I've fixed it so it doesn't test with the ns on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.

    – GeeTransit
    Jan 2 at 23:29











  • It is working but it is just returning both the files in diff.txt

    – VKVNS
    Jan 2 at 23:52











  • Did you want only one of the files to be checked?

    – GeeTransit
    Jan 2 at 23:52











  • file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151

    – VKVNS
    Jan 2 at 23:54



















Actually, this is just concatenating the 2 lists into a single file

– VKVNS
Jan 2 at 23:19





Actually, this is just concatenating the 2 lists into a single file

– VKVNS
Jan 2 at 23:19













Okay. I've fixed it so it doesn't test with the ns on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.

– GeeTransit
Jan 2 at 23:29





Okay. I've fixed it so it doesn't test with the ns on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.

– GeeTransit
Jan 2 at 23:29













It is working but it is just returning both the files in diff.txt

– VKVNS
Jan 2 at 23:52





It is working but it is just returning both the files in diff.txt

– VKVNS
Jan 2 at 23:52













Did you want only one of the files to be checked?

– GeeTransit
Jan 2 at 23:52





Did you want only one of the files to be checked?

– GeeTransit
Jan 2 at 23:52













file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151

– VKVNS
Jan 2 at 23:54







file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151

– VKVNS
Jan 2 at 23:54













0














Here are 2 ways to accomplish this task without using difflib. These methods are different, because the input file structure are different.



# input file structure
# ip-10-232-10-149
# ip-10-232-10-150
# ip-10-232-10-151

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

nodes = set(f1.readlines())
dnsnames = set(f2.readlines())

diff_between_nodes_dnsnames = list(nodes.difference(dnsnames))
diff_between_dnsnames_nodes = list(dnsnames.difference(nodes))

ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))

for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))

**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151

####################################################
####################################################

# input file structure
# ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

nodes = f1.read()
dnsnames = f2.read()

split_nodes = [x for x in nodes.split()]
split_dnsnames = [x for x in dnsnames.split()]

set_nodes = set(split_nodes)
set_dnsnames = set(split_dnsnames)

diff_between_nodes_dnsnames = list(set_nodes.difference(set_dnsnames))
diff_between_dnsnames_nodes = list(set_dnsnames.difference(set_nodes))

ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))

for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))

**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151


Here is the way to accomplish this task using difflib and your original code.



import difflib

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
if re.search(r'-s(.*)', line):
print (line)
result.write(line)

**OUTPUTS**
- ip-10-232-10-150
- ip-10-232-10-151


Here is another way to accomplish this task, which using list comprehension.



node_list = ('ip-10-232-10-149', 'ip-10-232-10-150', 'ip-10-232-10-151', 'ip-10-232-10-152')
aws_instances_dnsname = ('ip-10-232-10-145','ip-10-232-10-146','ip-10-232-10-147','ip-10-232-10-149', 'ip-10-232-10-152')

ip_address_differences = [ip_address for ip_address in node_list if
ip_address not in aws_instances_dnsname]

print (ip_address_differences)

**OUTPUTS**
['ip-10-232-10-150', 'ip-10-232-10-151']





share|improve this answer


























  • This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)

    – VKVNS
    Jan 3 at 12:51











  • Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.

    – Life is complex
    Jan 3 at 14:08
















0














Here are 2 ways to accomplish this task without using difflib. These methods are different, because the input file structure are different.



# input file structure
# ip-10-232-10-149
# ip-10-232-10-150
# ip-10-232-10-151

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

nodes = set(f1.readlines())
dnsnames = set(f2.readlines())

diff_between_nodes_dnsnames = list(nodes.difference(dnsnames))
diff_between_dnsnames_nodes = list(dnsnames.difference(nodes))

ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))

for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))

**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151

####################################################
####################################################

# input file structure
# ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

nodes = f1.read()
dnsnames = f2.read()

split_nodes = [x for x in nodes.split()]
split_dnsnames = [x for x in dnsnames.split()]

set_nodes = set(split_nodes)
set_dnsnames = set(split_dnsnames)

diff_between_nodes_dnsnames = list(set_nodes.difference(set_dnsnames))
diff_between_dnsnames_nodes = list(set_dnsnames.difference(set_nodes))

ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))

for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))

**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151


Here is the way to accomplish this task using difflib and your original code.



import difflib

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
if re.search(r'-s(.*)', line):
print (line)
result.write(line)

**OUTPUTS**
- ip-10-232-10-150
- ip-10-232-10-151


Here is another way to accomplish this task, which using list comprehension.



node_list = ('ip-10-232-10-149', 'ip-10-232-10-150', 'ip-10-232-10-151', 'ip-10-232-10-152')
aws_instances_dnsname = ('ip-10-232-10-145','ip-10-232-10-146','ip-10-232-10-147','ip-10-232-10-149', 'ip-10-232-10-152')

ip_address_differences = [ip_address for ip_address in node_list if
ip_address not in aws_instances_dnsname]

print (ip_address_differences)

**OUTPUTS**
['ip-10-232-10-150', 'ip-10-232-10-151']





share|improve this answer


























  • This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)

    – VKVNS
    Jan 3 at 12:51











  • Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.

    – Life is complex
    Jan 3 at 14:08














0












0








0







Here are 2 ways to accomplish this task without using difflib. These methods are different, because the input file structure are different.



# input file structure
# ip-10-232-10-149
# ip-10-232-10-150
# ip-10-232-10-151

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

nodes = set(f1.readlines())
dnsnames = set(f2.readlines())

diff_between_nodes_dnsnames = list(nodes.difference(dnsnames))
diff_between_dnsnames_nodes = list(dnsnames.difference(nodes))

ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))

for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))

**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151

####################################################
####################################################

# input file structure
# ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

nodes = f1.read()
dnsnames = f2.read()

split_nodes = [x for x in nodes.split()]
split_dnsnames = [x for x in dnsnames.split()]

set_nodes = set(split_nodes)
set_dnsnames = set(split_dnsnames)

diff_between_nodes_dnsnames = list(set_nodes.difference(set_dnsnames))
diff_between_dnsnames_nodes = list(set_dnsnames.difference(set_nodes))

ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))

for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))

**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151


Here is the way to accomplish this task using difflib and your original code.



import difflib

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
if re.search(r'-s(.*)', line):
print (line)
result.write(line)

**OUTPUTS**
- ip-10-232-10-150
- ip-10-232-10-151


Here is another way to accomplish this task, which using list comprehension.



node_list = ('ip-10-232-10-149', 'ip-10-232-10-150', 'ip-10-232-10-151', 'ip-10-232-10-152')
aws_instances_dnsname = ('ip-10-232-10-145','ip-10-232-10-146','ip-10-232-10-147','ip-10-232-10-149', 'ip-10-232-10-152')

ip_address_differences = [ip_address for ip_address in node_list if
ip_address not in aws_instances_dnsname]

print (ip_address_differences)

**OUTPUTS**
['ip-10-232-10-150', 'ip-10-232-10-151']





share|improve this answer















Here are 2 ways to accomplish this task without using difflib. These methods are different, because the input file structure are different.



# input file structure
# ip-10-232-10-149
# ip-10-232-10-150
# ip-10-232-10-151

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

nodes = set(f1.readlines())
dnsnames = set(f2.readlines())

diff_between_nodes_dnsnames = list(nodes.difference(dnsnames))
diff_between_dnsnames_nodes = list(dnsnames.difference(nodes))

ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))

for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))

**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151

####################################################
####################################################

# input file structure
# ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

nodes = f1.read()
dnsnames = f2.read()

split_nodes = [x for x in nodes.split()]
split_dnsnames = [x for x in dnsnames.split()]

set_nodes = set(split_nodes)
set_dnsnames = set(split_dnsnames)

diff_between_nodes_dnsnames = list(set_nodes.difference(set_dnsnames))
diff_between_dnsnames_nodes = list(set_dnsnames.difference(set_nodes))

ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))

for ip_address in sorted(ip_address_differences):
print (ip_address.rstrip('n'))

**OUTPUTS**
ip-10-232-10-150
ip-10-232-10-151


Here is the way to accomplish this task using difflib and your original code.



import difflib

with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:
diff = difflib.ndiff(f1.readlines(), f2.readlines())
with open('diff.txt', 'w') as result:
for line in diff:
if re.search(r'-s(.*)', line):
print (line)
result.write(line)

**OUTPUTS**
- ip-10-232-10-150
- ip-10-232-10-151


Here is another way to accomplish this task, which using list comprehension.



node_list = ('ip-10-232-10-149', 'ip-10-232-10-150', 'ip-10-232-10-151', 'ip-10-232-10-152')
aws_instances_dnsname = ('ip-10-232-10-145','ip-10-232-10-146','ip-10-232-10-147','ip-10-232-10-149', 'ip-10-232-10-152')

ip_address_differences = [ip_address for ip_address in node_list if
ip_address not in aws_instances_dnsname]

print (ip_address_differences)

**OUTPUTS**
['ip-10-232-10-150', 'ip-10-232-10-151']






share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 3 at 21:08

























answered Jan 3 at 5:01









Life is complexLife is complex

731518




731518













  • This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)

    – VKVNS
    Jan 3 at 12:51











  • Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.

    – Life is complex
    Jan 3 at 14:08



















  • This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)

    – VKVNS
    Jan 3 at 12:51











  • Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.

    – Life is complex
    Jan 3 at 14:08

















This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)

– VKVNS
Jan 3 at 12:51





This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)

– VKVNS
Jan 3 at 12:51













Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.

– Life is complex
Jan 3 at 14:08





Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.

– Life is complex
Jan 3 at 14:08


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014092%2fcomparing-2-files-containing-ip-addresses-and-returning-ones-that-are-not-commo%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

How to fix TextFormField cause rebuild widget in Flutter

Npm cannot find a required file even through it is in the searched directory