Comparing 2 files containing IP addresses and returning one's that are not common

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

-1

This is what I have done so far:

import difflib



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

    diff = difflib.ndiff(f1.readlines(), f2.readlines())

    with open('diff.txt', 'w') as result:

        for line in diff:

            result.write(line)

asked Jan 2 at 22:38

VKVNS

Could you include the expected output and the actual output? That would greatly help our efforts to help you.

– GeeTransit
Jan 2 at 22:50

What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.

– Baj Mile
Jan 2 at 22:58

The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference

– VKVNS
Jan 2 at 23:03

You could put set() around your ips and then getting the difference that way. This means you wouldn't need difflib.

– GeeTransit
Jan 2 at 23:03

@GeeTransit maybe I should try that.

– VKVNS
Jan 2 at 23:04

|
show 1 more comment

-1

This is what I have done so far:

import difflib



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

    diff = difflib.ndiff(f1.readlines(), f2.readlines())

    with open('diff.txt', 'w') as result:

        for line in diff:

            result.write(line)

asked Jan 2 at 22:38

VKVNS

Could you include the expected output and the actual output? That would greatly help our efforts to help you.

– GeeTransit
Jan 2 at 22:50

What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.

– Baj Mile
Jan 2 at 22:58

The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference

– VKVNS
Jan 2 at 23:03

You could put set() around your ips and then getting the difference that way. This means you wouldn't need difflib.

– GeeTransit
Jan 2 at 23:03

@GeeTransit maybe I should try that.

– VKVNS
Jan 2 at 23:04

|
show 1 more comment

-1

This is what I have done so far:

import difflib



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

    diff = difflib.ndiff(f1.readlines(), f2.readlines())

    with open('diff.txt', 'w') as result:

        for line in diff:

            result.write(line)

asked Jan 2 at 22:38

VKVNS

This is what I have done so far:

import difflib



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

    diff = difflib.ndiff(f1.readlines(), f2.readlines())

    with open('diff.txt', 'w') as result:

        for line in diff:

            result.write(line)

python-3.x

asked Jan 2 at 22:38

VKVNS

asked Jan 2 at 22:38

VKVNS

asked Jan 2 at 22:38

VKVNS

asked Jan 2 at 22:38

VKVNS

asked Jan 2 at 22:38

VKVNS

Could you include the expected output and the actual output? That would greatly help our efforts to help you.

– GeeTransit
Jan 2 at 22:50

What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.

– Baj Mile
Jan 2 at 22:58

The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference

– VKVNS
Jan 2 at 23:03

You could put set() around your ips and then getting the difference that way. This means you wouldn't need difflib.

– GeeTransit
Jan 2 at 23:03

@GeeTransit maybe I should try that.

– VKVNS
Jan 2 at 23:04

|
show 1 more comment

Could you include the expected output and the actual output? That would greatly help our efforts to help you.

– GeeTransit
Jan 2 at 22:50

What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.

– Baj Mile
Jan 2 at 22:58

The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference

– VKVNS
Jan 2 at 23:03

You could put set() around your ips and then getting the difference that way. This means you wouldn't need difflib.

– GeeTransit
Jan 2 at 23:03

@GeeTransit maybe I should try that.

– VKVNS
Jan 2 at 23:04

Could you include the expected output and the actual output? That would greatly help our efforts to help you.

– GeeTransit
Jan 2 at 22:50

What you are trying is too complicated. Maybe is better just to load the second text file in a string or string list, then iterate lines only of 'node_list.txt' and just check if the IPs from the first file 'node_list.txt' are contained in the string with python string.find(), but only remove "n" and "r" symbols. I have done this but in C.

– Baj Mile
Jan 2 at 22:58

The result I got is mostly repeating the whole list of file1, so there's something I am missing with finding the difference

– VKVNS
Jan 2 at 23:03

You could put set() around your ips and then getting the difference that way. This means you wouldn't need difflib.

– GeeTransit
Jan 2 at 23:03

@GeeTransit maybe I should try that.

– VKVNS
Jan 2 at 23:04

|
show 1 more comment

3 Answers
3

active

oldest

votes

Assuming the files are sorted or otherwise deterministically-ordered:

diff --old-line-format='' --new-line-format='' --unchanged-line-format='%L' file1.txt file2.txt

For sufficiently-large files, the one-shot cost of starting a subprocess will be smaller than the cost of doing logic in python.

Example using subprocess module (note that diff is expected to return 1, so we can't just use check_call):

#!/usr/bin/env python3



import subprocess





INPUT1 = 'file1.txt'

INPUT2 = 'file2.txt'

OUTPUT = 'common.txt'



with open(OUTPUT, 'wb') as out:

    cmd = ['diff', '--old-line-format=', '--new-line-format=', '--unchanged-line-format=%L', INPUT1, INPUT2]

    rv = subprocess.call(cmd, stdout=out)

    if rv >= 2:

        raise subprocess.CalledProcessError(rv, cmd)

edited Jan 3 at 0:53

answered Jan 3 at 0:07

o11c

11k43155

I didn't get the answer

– VKVNS
Jan 3 at 0:09

Edited to add full code, then.

– o11c
Jan 3 at 0:53

add a comment |

If your files were like this:

File 1:

apple

banana

kokonut

orange

File 2:

banana

strawberry

orange

lime

You could try something like this:

with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:

    read_first = set()

    read_second = set()

    while True:

        line = f1.readline().strip()

        if line == '':

            break

        read_first.add(line)

    # end while

    while True:

        line = f2.readline().strip()

        if line == '':

            break

        read_second.add(line)

    # end while

    first = set()

    second = set()

    for i, j in zip(read_first, read_second):

        first.add(i.strip())

        second.add(j.strip())

    diff = first - second # set difference (items in first not in second)

    with open('diff.txt', 'w') as result:

        for item in diff:

            result.write(item + 'n')

This would output into diff.txt and would display:

apple

kokonut

Note how banana and orange don't appear on the file we've created.

edited Jan 3 at 0:59

answered Jan 2 at 23:08

GeeTransit

694316

Actually, this is just concatenating the 2 lists into a single file

– VKVNS
Jan 2 at 23:19

Okay. I've fixed it so it doesn't test with the ns on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.

– GeeTransit
Jan 2 at 23:29

It is working but it is just returning both the files in diff.txt

– VKVNS
Jan 2 at 23:52

Did you want only one of the files to be checked?

– GeeTransit
Jan 2 at 23:52

file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151

– VKVNS
Jan 2 at 23:54

|
show 11 more comments

Here are 2 ways to accomplish this task without using difflib. These methods are different, because the input file structure are different.

# input file structure

# ip-10-232-10-149

# ip-10-232-10-150 

# ip-10-232-10-151



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:



  nodes = set(f1.readlines())

  dnsnames = set(f2.readlines())



  diff_between_nodes_dnsnames = list(nodes.difference(dnsnames))

  diff_between_dnsnames_nodes = list(dnsnames.difference(nodes))



  ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))



  for ip_address in sorted(ip_address_differences):

    print (ip_address.rstrip('n'))



**OUTPUTS**

ip-10-232-10-150 

ip-10-232-10-151



####################################################

####################################################



# input file structure

# ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:



nodes = f1.read()

dnsnames = f2.read()



split_nodes = [x for x in nodes.split()]

split_dnsnames = [x for x in dnsnames.split()]



set_nodes = set(split_nodes)

set_dnsnames = set(split_dnsnames)



diff_between_nodes_dnsnames = list(set_nodes.difference(set_dnsnames))

diff_between_dnsnames_nodes = list(set_dnsnames.difference(set_nodes))



ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))



for ip_address in sorted(ip_address_differences):

    print (ip_address.rstrip('n'))



**OUTPUTS**

ip-10-232-10-150 

ip-10-232-10-151

Here is the way to accomplish this task using difflib and your original code.

import difflib



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

diff = difflib.ndiff(f1.readlines(), f2.readlines())

with open('diff.txt', 'w') as result:

    for line in diff:

        if re.search(r'-s(.*)', line):

            print (line)

            result.write(line)



**OUTPUTS**

- ip-10-232-10-150 

- ip-10-232-10-151

Here is another way to accomplish this task, which using list comprehension.

node_list = ('ip-10-232-10-149', 'ip-10-232-10-150', 'ip-10-232-10-151', 'ip-10-232-10-152')

aws_instances_dnsname = ('ip-10-232-10-145','ip-10-232-10-146','ip-10-232-10-147','ip-10-232-10-149', 'ip-10-232-10-152')



ip_address_differences = [ip_address for ip_address in node_list if 

ip_address not in aws_instances_dnsname]



print (ip_address_differences)



**OUTPUTS**

['ip-10-232-10-150', 'ip-10-232-10-151']

edited Jan 3 at 21:08

answered Jan 3 at 5:01

Life is complex

731518

This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)

– VKVNS
Jan 3 at 12:51

Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.

– Life is complex
Jan 3 at 14:08

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014092%2fcomparing-2-files-containing-ip-addresses-and-returning-ones-that-are-not-commo%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Assuming the files are sorted or otherwise deterministically-ordered:

diff --old-line-format='' --new-line-format='' --unchanged-line-format='%L' file1.txt file2.txt

For sufficiently-large files, the one-shot cost of starting a subprocess will be smaller than the cost of doing logic in python.

Example using subprocess module (note that diff is expected to return 1, so we can't just use check_call):

#!/usr/bin/env python3



import subprocess





INPUT1 = 'file1.txt'

INPUT2 = 'file2.txt'

OUTPUT = 'common.txt'



with open(OUTPUT, 'wb') as out:

    cmd = ['diff', '--old-line-format=', '--new-line-format=', '--unchanged-line-format=%L', INPUT1, INPUT2]

    rv = subprocess.call(cmd, stdout=out)

    if rv >= 2:

        raise subprocess.CalledProcessError(rv, cmd)

edited Jan 3 at 0:53

answered Jan 3 at 0:07

o11c

11k43155

I didn't get the answer

– VKVNS
Jan 3 at 0:09

Edited to add full code, then.

– o11c
Jan 3 at 0:53

add a comment |

Assuming the files are sorted or otherwise deterministically-ordered:

diff --old-line-format='' --new-line-format='' --unchanged-line-format='%L' file1.txt file2.txt

For sufficiently-large files, the one-shot cost of starting a subprocess will be smaller than the cost of doing logic in python.

Example using subprocess module (note that diff is expected to return 1, so we can't just use check_call):

#!/usr/bin/env python3



import subprocess





INPUT1 = 'file1.txt'

INPUT2 = 'file2.txt'

OUTPUT = 'common.txt'



with open(OUTPUT, 'wb') as out:

    cmd = ['diff', '--old-line-format=', '--new-line-format=', '--unchanged-line-format=%L', INPUT1, INPUT2]

    rv = subprocess.call(cmd, stdout=out)

    if rv >= 2:

        raise subprocess.CalledProcessError(rv, cmd)

edited Jan 3 at 0:53

answered Jan 3 at 0:07

o11c

11k43155

I didn't get the answer

– VKVNS
Jan 3 at 0:09

Edited to add full code, then.

– o11c
Jan 3 at 0:53

add a comment |

Assuming the files are sorted or otherwise deterministically-ordered:

diff --old-line-format='' --new-line-format='' --unchanged-line-format='%L' file1.txt file2.txt

For sufficiently-large files, the one-shot cost of starting a subprocess will be smaller than the cost of doing logic in python.

Example using subprocess module (note that diff is expected to return 1, so we can't just use check_call):

#!/usr/bin/env python3



import subprocess





INPUT1 = 'file1.txt'

INPUT2 = 'file2.txt'

OUTPUT = 'common.txt'



with open(OUTPUT, 'wb') as out:

    cmd = ['diff', '--old-line-format=', '--new-line-format=', '--unchanged-line-format=%L', INPUT1, INPUT2]

    rv = subprocess.call(cmd, stdout=out)

    if rv >= 2:

        raise subprocess.CalledProcessError(rv, cmd)

edited Jan 3 at 0:53

answered Jan 3 at 0:07

o11c

11k43155

Assuming the files are sorted or otherwise deterministically-ordered:

diff --old-line-format='' --new-line-format='' --unchanged-line-format='%L' file1.txt file2.txt

For sufficiently-large files, the one-shot cost of starting a subprocess will be smaller than the cost of doing logic in python.

Example using subprocess module (note that diff is expected to return 1, so we can't just use check_call):

#!/usr/bin/env python3



import subprocess





INPUT1 = 'file1.txt'

INPUT2 = 'file2.txt'

OUTPUT = 'common.txt'



with open(OUTPUT, 'wb') as out:

    cmd = ['diff', '--old-line-format=', '--new-line-format=', '--unchanged-line-format=%L', INPUT1, INPUT2]

    rv = subprocess.call(cmd, stdout=out)

    if rv >= 2:

        raise subprocess.CalledProcessError(rv, cmd)

edited Jan 3 at 0:53

answered Jan 3 at 0:07

o11c

11k43155

edited Jan 3 at 0:53

answered Jan 3 at 0:07

o11c

11k43155

answered Jan 3 at 0:07

o11c

11k43155

answered Jan 3 at 0:07

o11c

11k43155

I didn't get the answer

– VKVNS
Jan 3 at 0:09

Edited to add full code, then.

– o11c
Jan 3 at 0:53

add a comment |

I didn't get the answer

– VKVNS
Jan 3 at 0:09

Edited to add full code, then.

– o11c
Jan 3 at 0:53

I didn't get the answer

– VKVNS
Jan 3 at 0:09

Edited to add full code, then.

– o11c
Jan 3 at 0:53

add a comment |

If your files were like this:

File 1:

apple

banana

kokonut

orange

File 2:

banana

strawberry

orange

lime

You could try something like this:

with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:

    read_first = set()

    read_second = set()

    while True:

        line = f1.readline().strip()

        if line == '':

            break

        read_first.add(line)

    # end while

    while True:

        line = f2.readline().strip()

        if line == '':

            break

        read_second.add(line)

    # end while

    first = set()

    second = set()

    for i, j in zip(read_first, read_second):

        first.add(i.strip())

        second.add(j.strip())

    diff = first - second # set difference (items in first not in second)

    with open('diff.txt', 'w') as result:

        for item in diff:

            result.write(item + 'n')

This would output into diff.txt and would display:

apple

kokonut

Note how banana and orange don't appear on the file we've created.

edited Jan 3 at 0:59

answered Jan 2 at 23:08

GeeTransit

694316

Actually, this is just concatenating the 2 lists into a single file

– VKVNS
Jan 2 at 23:19

Okay. I've fixed it so it doesn't test with the ns on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.

– GeeTransit
Jan 2 at 23:29

It is working but it is just returning both the files in diff.txt

– VKVNS
Jan 2 at 23:52

Did you want only one of the files to be checked?

– GeeTransit
Jan 2 at 23:52

file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151

– VKVNS
Jan 2 at 23:54

|
show 11 more comments

If your files were like this:

File 1:

apple

banana

kokonut

orange

File 2:

banana

strawberry

orange

lime

You could try something like this:

with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:

    read_first = set()

    read_second = set()

    while True:

        line = f1.readline().strip()

        if line == '':

            break

        read_first.add(line)

    # end while

    while True:

        line = f2.readline().strip()

        if line == '':

            break

        read_second.add(line)

    # end while

    first = set()

    second = set()

    for i, j in zip(read_first, read_second):

        first.add(i.strip())

        second.add(j.strip())

    diff = first - second # set difference (items in first not in second)

    with open('diff.txt', 'w') as result:

        for item in diff:

            result.write(item + 'n')

This would output into diff.txt and would display:

apple

kokonut

Note how banana and orange don't appear on the file we've created.

edited Jan 3 at 0:59

answered Jan 2 at 23:08

GeeTransit

694316

Actually, this is just concatenating the 2 lists into a single file

– VKVNS
Jan 2 at 23:19

Okay. I've fixed it so it doesn't test with the ns on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.

– GeeTransit
Jan 2 at 23:29

It is working but it is just returning both the files in diff.txt

– VKVNS
Jan 2 at 23:52

Did you want only one of the files to be checked?

– GeeTransit
Jan 2 at 23:52

file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151

– VKVNS
Jan 2 at 23:54

|
show 11 more comments

If your files were like this:

File 1:

apple

banana

kokonut

orange

File 2:

banana

strawberry

orange

lime

You could try something like this:

with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:

    read_first = set()

    read_second = set()

    while True:

        line = f1.readline().strip()

        if line == '':

            break

        read_first.add(line)

    # end while

    while True:

        line = f2.readline().strip()

        if line == '':

            break

        read_second.add(line)

    # end while

    first = set()

    second = set()

    for i, j in zip(read_first, read_second):

        first.add(i.strip())

        second.add(j.strip())

    diff = first - second # set difference (items in first not in second)

    with open('diff.txt', 'w') as result:

        for item in diff:

            result.write(item + 'n')

This would output into diff.txt and would display:

apple

kokonut

Note how banana and orange don't appear on the file we've created.

edited Jan 3 at 0:59

answered Jan 2 at 23:08

GeeTransit

694316

If your files were like this:

File 1:

apple

banana

kokonut

orange

File 2:

banana

strawberry

orange

lime

You could try something like this:

with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:

    read_first = set()

    read_second = set()

    while True:

        line = f1.readline().strip()

        if line == '':

            break

        read_first.add(line)

    # end while

    while True:

        line = f2.readline().strip()

        if line == '':

            break

        read_second.add(line)

    # end while

    first = set()

    second = set()

    for i, j in zip(read_first, read_second):

        first.add(i.strip())

        second.add(j.strip())

    diff = first - second # set difference (items in first not in second)

    with open('diff.txt', 'w') as result:

        for item in diff:

            result.write(item + 'n')

This would output into diff.txt and would display:

apple

kokonut

Note how banana and orange don't appear on the file we've created.

edited Jan 3 at 0:59

answered Jan 2 at 23:08

GeeTransit

694316

edited Jan 3 at 0:59

answered Jan 2 at 23:08

GeeTransit

694316

answered Jan 2 at 23:08

GeeTransit

694316

answered Jan 2 at 23:08

GeeTransit

694316

Actually, this is just concatenating the 2 lists into a single file

– VKVNS
Jan 2 at 23:19

Okay. I've fixed it so it doesn't test with the ns on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.

– GeeTransit
Jan 2 at 23:29

It is working but it is just returning both the files in diff.txt

– VKVNS
Jan 2 at 23:52

Did you want only one of the files to be checked?

– GeeTransit
Jan 2 at 23:52

file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151

– VKVNS
Jan 2 at 23:54

|
show 11 more comments

Actually, this is just concatenating the 2 lists into a single file

– VKVNS
Jan 2 at 23:19

Okay. I've fixed it so it doesn't test with the ns on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.

– GeeTransit
Jan 2 at 23:29

It is working but it is just returning both the files in diff.txt

– VKVNS
Jan 2 at 23:52

Did you want only one of the files to be checked?

– GeeTransit
Jan 2 at 23:52

file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151

– VKVNS
Jan 2 at 23:54

Actually, this is just concatenating the 2 lists into a single file

– VKVNS
Jan 2 at 23:19

Okay. I've fixed it so it doesn't test with the ns on the end. If it's not working for you, could you post a sample of the error output or the traceback? Thanks.

– GeeTransit
Jan 2 at 23:29

It is working but it is just returning both the files in diff.txt

– VKVNS
Jan 2 at 23:52

Did you want only one of the files to be checked?

– GeeTransit
Jan 2 at 23:52

file a = ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151 ip-10-232-10-152 file b = ip-10-232-10-149 ip-10-232-10-152 I want only ip-10-232-10-150 ip-10-232-10-151

– VKVNS
Jan 2 at 23:54

|
show 11 more comments

Here are 2 ways to accomplish this task without using difflib. These methods are different, because the input file structure are different.

# input file structure

# ip-10-232-10-149

# ip-10-232-10-150 

# ip-10-232-10-151



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:



  nodes = set(f1.readlines())

  dnsnames = set(f2.readlines())



  diff_between_nodes_dnsnames = list(nodes.difference(dnsnames))

  diff_between_dnsnames_nodes = list(dnsnames.difference(nodes))



  ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))



  for ip_address in sorted(ip_address_differences):

    print (ip_address.rstrip('n'))



**OUTPUTS**

ip-10-232-10-150 

ip-10-232-10-151



####################################################

####################################################



# input file structure

# ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:



nodes = f1.read()

dnsnames = f2.read()



split_nodes = [x for x in nodes.split()]

split_dnsnames = [x for x in dnsnames.split()]



set_nodes = set(split_nodes)

set_dnsnames = set(split_dnsnames)



diff_between_nodes_dnsnames = list(set_nodes.difference(set_dnsnames))

diff_between_dnsnames_nodes = list(set_dnsnames.difference(set_nodes))



ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))



for ip_address in sorted(ip_address_differences):

    print (ip_address.rstrip('n'))



**OUTPUTS**

ip-10-232-10-150 

ip-10-232-10-151

Here is the way to accomplish this task using difflib and your original code.

import difflib



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

diff = difflib.ndiff(f1.readlines(), f2.readlines())

with open('diff.txt', 'w') as result:

    for line in diff:

        if re.search(r'-s(.*)', line):

            print (line)

            result.write(line)



**OUTPUTS**

- ip-10-232-10-150 

- ip-10-232-10-151

Here is another way to accomplish this task, which using list comprehension.

node_list = ('ip-10-232-10-149', 'ip-10-232-10-150', 'ip-10-232-10-151', 'ip-10-232-10-152')

aws_instances_dnsname = ('ip-10-232-10-145','ip-10-232-10-146','ip-10-232-10-147','ip-10-232-10-149', 'ip-10-232-10-152')



ip_address_differences = [ip_address for ip_address in node_list if 

ip_address not in aws_instances_dnsname]



print (ip_address_differences)



**OUTPUTS**

['ip-10-232-10-150', 'ip-10-232-10-151']

edited Jan 3 at 21:08

answered Jan 3 at 5:01

Life is complex

731518

This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)

– VKVNS
Jan 3 at 12:51

Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.

– Life is complex
Jan 3 at 14:08

add a comment |

Here are 2 ways to accomplish this task without using difflib. These methods are different, because the input file structure are different.

# input file structure

# ip-10-232-10-149

# ip-10-232-10-150 

# ip-10-232-10-151



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:



  nodes = set(f1.readlines())

  dnsnames = set(f2.readlines())



  diff_between_nodes_dnsnames = list(nodes.difference(dnsnames))

  diff_between_dnsnames_nodes = list(dnsnames.difference(nodes))



  ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))



  for ip_address in sorted(ip_address_differences):

    print (ip_address.rstrip('n'))



**OUTPUTS**

ip-10-232-10-150 

ip-10-232-10-151



####################################################

####################################################



# input file structure

# ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:



nodes = f1.read()

dnsnames = f2.read()



split_nodes = [x for x in nodes.split()]

split_dnsnames = [x for x in dnsnames.split()]



set_nodes = set(split_nodes)

set_dnsnames = set(split_dnsnames)



diff_between_nodes_dnsnames = list(set_nodes.difference(set_dnsnames))

diff_between_dnsnames_nodes = list(set_dnsnames.difference(set_nodes))



ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))



for ip_address in sorted(ip_address_differences):

    print (ip_address.rstrip('n'))



**OUTPUTS**

ip-10-232-10-150 

ip-10-232-10-151

Here is the way to accomplish this task using difflib and your original code.

import difflib



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

diff = difflib.ndiff(f1.readlines(), f2.readlines())

with open('diff.txt', 'w') as result:

    for line in diff:

        if re.search(r'-s(.*)', line):

            print (line)

            result.write(line)



**OUTPUTS**

- ip-10-232-10-150 

- ip-10-232-10-151

Here is another way to accomplish this task, which using list comprehension.

node_list = ('ip-10-232-10-149', 'ip-10-232-10-150', 'ip-10-232-10-151', 'ip-10-232-10-152')

aws_instances_dnsname = ('ip-10-232-10-145','ip-10-232-10-146','ip-10-232-10-147','ip-10-232-10-149', 'ip-10-232-10-152')



ip_address_differences = [ip_address for ip_address in node_list if 

ip_address not in aws_instances_dnsname]



print (ip_address_differences)



**OUTPUTS**

['ip-10-232-10-150', 'ip-10-232-10-151']

edited Jan 3 at 21:08

answered Jan 3 at 5:01

Life is complex

731518

This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)

– VKVNS
Jan 3 at 12:51

Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.

– Life is complex
Jan 3 at 14:08

add a comment |

Here are 2 ways to accomplish this task without using difflib. These methods are different, because the input file structure are different.

# input file structure

# ip-10-232-10-149

# ip-10-232-10-150 

# ip-10-232-10-151



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:



  nodes = set(f1.readlines())

  dnsnames = set(f2.readlines())



  diff_between_nodes_dnsnames = list(nodes.difference(dnsnames))

  diff_between_dnsnames_nodes = list(dnsnames.difference(nodes))



  ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))



  for ip_address in sorted(ip_address_differences):

    print (ip_address.rstrip('n'))



**OUTPUTS**

ip-10-232-10-150 

ip-10-232-10-151



####################################################

####################################################



# input file structure

# ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:



nodes = f1.read()

dnsnames = f2.read()



split_nodes = [x for x in nodes.split()]

split_dnsnames = [x for x in dnsnames.split()]



set_nodes = set(split_nodes)

set_dnsnames = set(split_dnsnames)



diff_between_nodes_dnsnames = list(set_nodes.difference(set_dnsnames))

diff_between_dnsnames_nodes = list(set_dnsnames.difference(set_nodes))



ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))



for ip_address in sorted(ip_address_differences):

    print (ip_address.rstrip('n'))



**OUTPUTS**

ip-10-232-10-150 

ip-10-232-10-151

Here is the way to accomplish this task using difflib and your original code.

import difflib



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

diff = difflib.ndiff(f1.readlines(), f2.readlines())

with open('diff.txt', 'w') as result:

    for line in diff:

        if re.search(r'-s(.*)', line):

            print (line)

            result.write(line)



**OUTPUTS**

- ip-10-232-10-150 

- ip-10-232-10-151

Here is another way to accomplish this task, which using list comprehension.

node_list = ('ip-10-232-10-149', 'ip-10-232-10-150', 'ip-10-232-10-151', 'ip-10-232-10-152')

aws_instances_dnsname = ('ip-10-232-10-145','ip-10-232-10-146','ip-10-232-10-147','ip-10-232-10-149', 'ip-10-232-10-152')



ip_address_differences = [ip_address for ip_address in node_list if 

ip_address not in aws_instances_dnsname]



print (ip_address_differences)



**OUTPUTS**

['ip-10-232-10-150', 'ip-10-232-10-151']

edited Jan 3 at 21:08

answered Jan 3 at 5:01

Life is complex

731518

Here are 2 ways to accomplish this task without using difflib. These methods are different, because the input file structure are different.

# input file structure

# ip-10-232-10-149

# ip-10-232-10-150 

# ip-10-232-10-151



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:



  nodes = set(f1.readlines())

  dnsnames = set(f2.readlines())



  diff_between_nodes_dnsnames = list(nodes.difference(dnsnames))

  diff_between_dnsnames_nodes = list(dnsnames.difference(nodes))



  ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))



  for ip_address in sorted(ip_address_differences):

    print (ip_address.rstrip('n'))



**OUTPUTS**

ip-10-232-10-150 

ip-10-232-10-151



####################################################

####################################################



# input file structure

# ip-10-232-10-149 ip-10-232-10-150 ip-10-232-10-151



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:



nodes = f1.read()

dnsnames = f2.read()



split_nodes = [x for x in nodes.split()]

split_dnsnames = [x for x in dnsnames.split()]



set_nodes = set(split_nodes)

set_dnsnames = set(split_dnsnames)



diff_between_nodes_dnsnames = list(set_nodes.difference(set_dnsnames))

diff_between_dnsnames_nodes = list(set_dnsnames.difference(set_nodes))



ip_address_differences = list(set(diff_between_nodes_dnsnames + diff_between_dnsnames_nodes))



for ip_address in sorted(ip_address_differences):

    print (ip_address.rstrip('n'))



**OUTPUTS**

ip-10-232-10-150 

ip-10-232-10-151

Here is the way to accomplish this task using difflib and your original code.

import difflib



with open('node_list.txt', 'r') as f1, open('aws_instances_dnsname.txt', 'r') as f2:

diff = difflib.ndiff(f1.readlines(), f2.readlines())

with open('diff.txt', 'w') as result:

    for line in diff:

        if re.search(r'-s(.*)', line):

            print (line)

            result.write(line)



**OUTPUTS**

- ip-10-232-10-150 

- ip-10-232-10-151

Here is another way to accomplish this task, which using list comprehension.

node_list = ('ip-10-232-10-149', 'ip-10-232-10-150', 'ip-10-232-10-151', 'ip-10-232-10-152')

aws_instances_dnsname = ('ip-10-232-10-145','ip-10-232-10-146','ip-10-232-10-147','ip-10-232-10-149', 'ip-10-232-10-152')



ip_address_differences = [ip_address for ip_address in node_list if 

ip_address not in aws_instances_dnsname]



print (ip_address_differences)



**OUTPUTS**

['ip-10-232-10-150', 'ip-10-232-10-151']

edited Jan 3 at 21:08

answered Jan 3 at 5:01

Life is complex

731518

edited Jan 3 at 21:08

answered Jan 3 at 5:01

Life is complex

731518

answered Jan 3 at 5:01

Life is complex

731518

answered Jan 3 at 5:01

Life is complex

731518

This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)

– VKVNS
Jan 3 at 12:51

Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.

– Life is complex
Jan 3 at 14:08

add a comment |

This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)

– VKVNS
Jan 3 at 12:51

Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.

– Life is complex
Jan 3 at 14:08

This code is just adding in all the hosts from both the files, It is not really doing what I intend it to do. (which is to remove the duplicate ones from file1 present in file2)

– VKVNS
Jan 3 at 12:51

Your question didn't request to remove the duplicate ip address from file1 present in file2. You stated that the output from processing your input files should be -- I want only ip-10-232-10-150 ip-10-232-10-151 -- which my answer provides without using difflib.

– Life is complex
Jan 3 at 14:08

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu