Is piping, shifting, or parameter expansion more efficient?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

I'm trying to find the most efficient way to iterate through certain values that are a consistent number of values away from each other in a space separated list of words(I don't want to use an array). For example,

list="1 ant bat 5 cat dingo 6 emu fish 9 gecko hare 15 i j"

So I want to be able to just iterate through list and only access 1,5,6,9 and 15.

EDIT: I should have made it clear that the values I'm trying to get from the list don't have to be different in format from the rest of the list. What makes them special is solely their position in the list(In this case, position 1,4,7...). So the list could be1 2 3 5 9 8 6 90 84 9 3 2 15 75 55 but I'd still want the same numbers. And also, I want to be able to do it assuming I don't know the length of the list.

The methods I've thought of so far are:

Method 1

set $list

found=false

find=9

count=1

while [ $count -lt $# ]; do

    if [ "${@:count:1}" -eq $find ]; then

    found=true

    break

    fi

    count=`expr $count + 3`

done

Method 2

set list

found=false

find=9

while [ $# ne 0 ]; do

    if [ $1 -eq $find ]; then

    found=true

    break

    fi

    shift 3

done

Method 3
I'm pretty sure piping makes this the worst option, but I was trying to find a method that doesn't use set, out of curiosity.

found=false

find=9

count=1

num=`echo $list | cut -d ' ' -f$count`

while [ -n "$num" ]; do

    if [ $num -eq $find ]; then

    found=true

    break

    fi

    count=`expr $count + 3`

    num=`echo $list | cut -d ' ' -f$count`

done

So what would be most efficient, or am I missing a simpler method?

edited Jan 31 at 19:34

asked Jan 31 at 19:10

Levi Uzodike

1236

10

I wouldn't use a shell script in the first place if efficiency is an important concern. How big is your list that it makes a difference?

– Barmar
Feb 1 at 2:54

5

premature optimization is the source of all evil

– Barmar
Feb 1 at 2:56

2

Without doing statistics over actual instances of your problem, you will know nothing. This includes comparing to "programming in awk" etc. If statistics are too expensive, then looking for efficiency is probably not worth it.

– David Tonhofer
Feb 1 at 20:48

1

Levi, what exactly is the "efficient" way in your definition ? You want to find a faster way to iterate ?

– Sergiy Kolodyazhnyy
Feb 2 at 0:42

add a comment |

list="1 ant bat 5 cat dingo 6 emu fish 9 gecko hare 15 i j"

So I want to be able to just iterate through list and only access 1,5,6,9 and 15.

The methods I've thought of so far are:

Method 1

set $list

found=false

find=9

count=1

while [ $count -lt $# ]; do

    if [ "${@:count:1}" -eq $find ]; then

    found=true

    break

    fi

    count=`expr $count + 3`

done

Method 2

set list

found=false

find=9

while [ $# ne 0 ]; do

    if [ $1 -eq $find ]; then

    found=true

    break

    fi

    shift 3

done

Method 3
I'm pretty sure piping makes this the worst option, but I was trying to find a method that doesn't use set, out of curiosity.

found=false

find=9

count=1

num=`echo $list | cut -d ' ' -f$count`

while [ -n "$num" ]; do

    if [ $num -eq $find ]; then

    found=true

    break

    fi

    count=`expr $count + 3`

    num=`echo $list | cut -d ' ' -f$count`

done

So what would be most efficient, or am I missing a simpler method?

edited Jan 31 at 19:34

asked Jan 31 at 19:10

Levi Uzodike

1236

10

I wouldn't use a shell script in the first place if efficiency is an important concern. How big is your list that it makes a difference?

– Barmar
Feb 1 at 2:54

5

premature optimization is the source of all evil

– Barmar
Feb 1 at 2:56

2

Without doing statistics over actual instances of your problem, you will know nothing. This includes comparing to "programming in awk" etc. If statistics are too expensive, then looking for efficiency is probably not worth it.

– David Tonhofer
Feb 1 at 20:48

1

Levi, what exactly is the "efficient" way in your definition ? You want to find a faster way to iterate ?

– Sergiy Kolodyazhnyy
Feb 2 at 0:42

add a comment |

list="1 ant bat 5 cat dingo 6 emu fish 9 gecko hare 15 i j"

So I want to be able to just iterate through list and only access 1,5,6,9 and 15.

The methods I've thought of so far are:

Method 1

set $list

found=false

find=9

count=1

while [ $count -lt $# ]; do

    if [ "${@:count:1}" -eq $find ]; then

    found=true

    break

    fi

    count=`expr $count + 3`

done

Method 2

set list

found=false

find=9

while [ $# ne 0 ]; do

    if [ $1 -eq $find ]; then

    found=true

    break

    fi

    shift 3

done

Method 3
I'm pretty sure piping makes this the worst option, but I was trying to find a method that doesn't use set, out of curiosity.

found=false

find=9

count=1

num=`echo $list | cut -d ' ' -f$count`

while [ -n "$num" ]; do

    if [ $num -eq $find ]; then

    found=true

    break

    fi

    count=`expr $count + 3`

    num=`echo $list | cut -d ' ' -f$count`

done

So what would be most efficient, or am I missing a simpler method?

edited Jan 31 at 19:34

asked Jan 31 at 19:10

Levi Uzodike

1236

list="1 ant bat 5 cat dingo 6 emu fish 9 gecko hare 15 i j"

So I want to be able to just iterate through list and only access 1,5,6,9 and 15.

The methods I've thought of so far are:

Method 1

set $list

found=false

find=9

count=1

while [ $count -lt $# ]; do

    if [ "${@:count:1}" -eq $find ]; then

    found=true

    break

    fi

    count=`expr $count + 3`

done

Method 2

set list

found=false

find=9

while [ $# ne 0 ]; do

    if [ $1 -eq $find ]; then

    found=true

    break

    fi

    shift 3

done

Method 3
I'm pretty sure piping makes this the worst option, but I was trying to find a method that doesn't use set, out of curiosity.

found=false

find=9

count=1

num=`echo $list | cut -d ' ' -f$count`

while [ -n "$num" ]; do

    if [ $num -eq $find ]; then

    found=true

    break

    fi

    count=`expr $count + 3`

    num=`echo $list | cut -d ' ' -f$count`

done

So what would be most efficient, or am I missing a simpler method?

shell-script pipe performance cut

edited Jan 31 at 19:34

asked Jan 31 at 19:10

Levi Uzodike

1236

edited Jan 31 at 19:34

asked Jan 31 at 19:10

Levi Uzodike

1236

edited Jan 31 at 19:34

asked Jan 31 at 19:10

Levi Uzodike

1236

asked Jan 31 at 19:10

Levi Uzodike

1236

asked Jan 31 at 19:10

Levi Uzodike

1236

10

I wouldn't use a shell script in the first place if efficiency is an important concern. How big is your list that it makes a difference?

– Barmar
Feb 1 at 2:54

5

premature optimization is the source of all evil

– Barmar
Feb 1 at 2:56

2

Without doing statistics over actual instances of your problem, you will know nothing. This includes comparing to "programming in awk" etc. If statistics are too expensive, then looking for efficiency is probably not worth it.

– David Tonhofer
Feb 1 at 20:48

1

Levi, what exactly is the "efficient" way in your definition ? You want to find a faster way to iterate ?

– Sergiy Kolodyazhnyy
Feb 2 at 0:42

add a comment |

10

I wouldn't use a shell script in the first place if efficiency is an important concern. How big is your list that it makes a difference?

– Barmar
Feb 1 at 2:54

5

premature optimization is the source of all evil

– Barmar
Feb 1 at 2:56

2

Without doing statistics over actual instances of your problem, you will know nothing. This includes comparing to "programming in awk" etc. If statistics are too expensive, then looking for efficiency is probably not worth it.

– David Tonhofer
Feb 1 at 20:48

1

Levi, what exactly is the "efficient" way in your definition ? You want to find a faster way to iterate ?

– Sergiy Kolodyazhnyy
Feb 2 at 0:42

I wouldn't use a shell script in the first place if efficiency is an important concern. How big is your list that it makes a difference?

– Barmar
Feb 1 at 2:54

premature optimization is the source of all evil

– Barmar
Feb 1 at 2:56

Without doing statistics over actual instances of your problem, you will know nothing. This includes comparing to "programming in awk" etc. If statistics are too expensive, then looking for efficiency is probably not worth it.

– David Tonhofer
Feb 1 at 20:48

Levi, what exactly is the "efficient" way in your definition ? You want to find a faster way to iterate ?

– Sergiy Kolodyazhnyy
Feb 2 at 0:42

add a comment |

8 Answers
8

active

oldest

votes

Pretty simple with awk. This will get you the value of every fourth field for input of any length:

$ awk -F' ' '{for( i=1;i<=NF;i+=3) { printf( "%s%s", $i, OFS ) }; printf( "n" ) }' <<< $list

1 5 6 9 15

This works be leveraging built-in awk variables such as NF (the number of fields in the record), and doing some simple for looping to iterate along the fields to give you the ones you want without needing to know ahead of time how many there will be.

Or, if you do indeed just want those specific fields as specified in your example:

$ awk -F' ' '{ print $1, $4, $7, $10, $13 }' <<< $list

1 5 6 9 15

As for the question about efficiency, the simplest route would be to test this or each of your other methods and use time to show how long it takes; you could also use tools like strace to see how the system calls flow. Usage of time looks like:

$ time ./script.sh



real    0m0.025s

user    0m0.004s

sys     0m0.008s

You can compare that output between varying methods to see which is the most efficient in terms of time; other tools can be used for other efficiency metrics.

edited Jan 31 at 21:01

answered Jan 31 at 19:21

DopeGhoti

46.9k56190

1

Good point, @MichaelHomer; I've added an aside addressing the question of "how can I determine which method is the most efficient".

– DopeGhoti
Jan 31 at 20:58

2

@LeviUzodike Regarding echo vs <<<, "identical" is too strong a word. You could say that stuff <<< "$list" is nearly identical to printf "%sn" "$list" | stuff. Regarding echo vs printf, I direct you to this answer

– JoL
Jan 31 at 20:59

5

@DopeGhoti Actually it does. <<< adds a newline at the end. This is similar to how $() removes a newline from the end. This is because lines are terminated by newlines. <<< feeds an expression as a line, so it must be terminated by a newline. "$()" takes lines and provides them as an argument, so it makes sense to convert by removing the terminating newline.

– JoL
Feb 1 at 2:09

3

@LeviUzodike awk is a much under-appreciated tool. It will make all sorts of seemingly complex problems easy to solve. Especially when you are trying to write a complex regex for something like sed, you can often save hours by instead writing it procedurally in awk. Learning it will pa∕y large dividends.

– Joe
Feb 1 at 20:55

1

@LeviUzodike: Yes awk is a stand-alone binary that has to start up. Unlike perl or especially Python, the awk interpreter starts up quickly (still all the usual dynamic linker overhead of making quite a few system calls, but awk only uses libc/libm and libdl. e.g. use strace to check out system-calls of awk startup). Many shells (like bash) are pretty slow, so firing up one awk process can be faster than looping over tokens in a list with shell built-ins even for small-ish list sizes. And sometimes you can write a #!/usr/bin/awk script instead of a #!/bin/sh script.

– Peter Cordes
Feb 2 at 4:36

|
show 4 more comments

First rule of software optimization: Don't.

Until you know the speed of the program is an issue, there's no need to think
about how fast it is. If your list is about that length or just ~100-1000 items
long, you probably won't even notice how long it takes. There's a chance you're spending more time thinking about the optimization than what the difference would be.

Second rule: Measure.

That's the sure way to find out, and the one that gives answers for your system.
Especially with shells, there are so many, and they aren't all identical. An
answer for one shell might not apply for yours.

In larger programs, profiling goes here too. The slowest part might not be the one you think it is.

Third, the first rule of shell script optimization: Don't use the shell.

Yeah, really. Many shells aren't made to be fast (since launching external
programs doesn't have to be), and they might even parse the lines of the source
code again each time.

Use something like awk or Perl instead. In a trivial micro-benchmark I did, awk was dozens of times faster than any common shell in running a simple loop (without I/O).

However, if you do use the shell, use the shell's builtin functions instead of external commands. Here, you're using expr which isn't builtin in any shells I found on my system, but which can be replaced with standard arithmetic expansion. E.g. i=$((i+1)) instead of i=$(expr $i + 1) to increment i. Your use of cut in the last example might also be replaceable with standard parameter expansions.

See also: Why is using a shell loop to process text considered bad practice?

Steps #1 and #2 should apply to your question.

edited Feb 1 at 9:40

answered Jan 31 at 19:33

ilkkachu

63.3k10104181

12

#0, quote your expansions :-)

– Kusalananda♦
Jan 31 at 19:59

8

It's not that awk loops are necessarily any better or worse than shell loops. It's that the shell is really good at running commands and at directing input and output to and from processes, and frankly rather clunky at everything else; while tools like awk are fantastic at processing text data, because that's what shells and tools like awk are made for (respectively) in the first place.

– DopeGhoti
Jan 31 at 21:05

2

@DopeGhoti, shells do seem to be objectively slower, though. Some very simple while loops seem to be >25 times slower in dash than with gawk, and dash was the fastest shell I tested...

– ilkkachu
Jan 31 at 22:36

1

@Joe, it is :) dash and busybox don't support (( .. )) -- I think it's a nonstandard extension. ++ is also explicitly mentioned as not required, so as far as I can tell, i=$((i+1)) or : $(( i += 1)) are the safe ones.

– ilkkachu
Feb 1 at 23:10

1

Re "more time thinking": this neglects an important factor. How often does it run, and for how many users? If a program wastes 1 second, which could be fixed by the programmer thinking about it for 30 minutes, it might be a waste of time if there's only one user who's going to run it once. On the other hand if there's a million users, that's a million seconds, or 11 days of user time. If the code wasted a minute of a million users, that's about 2 years of user time.

– agc
Feb 4 at 2:49

|
show 3 more comments

I'm only going to give some general advice in this answer, and not benchmarks. Benchmarks are the only way to reliably answer questions about performance. But since you don't say how much data you're manipulating and how often you perform this operation, there's no way to do a useful benchmark. What's more efficient for 10 items and what's more efficient for 1000000 items is often not the same.

As a general rule of thumb, invoking external commands is more expensive than doing something with pure shell constructs, as long as the pure shell code doesn't involve a loop. On the other hand, a shell loop that iterates over a large string or a large amount of string is likely to be slower than one invocation of a special-purpose tool. For example, your loop invoking cut could well be noticeably slow in practice, but if you find a way to do the whole thing with a single cut invocation that's likely to be faster than doing the same thing with string manipulation in the shell.

Do note that the cutoff point can vary a lot between systems. It can depend on the kernel, on how the kernel's scheduler is configured, on the filesystem containing the external executables, on how much CPU vs memory pressure there is at the moment, and many other factors.

Don't call expr to perform arithmetic if you're at all concerned about performance. In fact, don't call expr to perform arithmetic at all. Shells have built-in arithmetic, which is clearer and faster than invoking expr.

You seem to be using bash, since you're using bash constructs that don't exist in sh. So why on earth would you not use an array? An array is the most natural solution, and it's likely to be the fastest, too. Note that array indices start at 0.

list=(1 2 3 5 9 8 6 90 84 9 3 2 15 75 55)

for ((count = 0; count += 3; count < ${#list[@]})); do

  echo "${list[$count]}"

done

Your script may well be faster if you use sh, if your system has dash or ksh as sh rather than bash. If you use sh, you don't get named arrays, but you still get the array one of positional parameters, which you can set with set. To access an element at a position that is not known until runtime, you need to use eval (take care of quoting things properly!).

# List elements must not contain whitespace or ?*[

list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'

set $list

count=1

while [ $count -le $# ]; do

  eval "value=${$count}"

  echo "$value"

  count=$((count+1))

done

If you only ever want to access the array once and are going from left to right (skipping some values), you can use shift instead of variable indices.

# List elements must not contain whitespace or ?*[

list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'

set $list

while [ $# -ge 1 ]; do

  echo "$1"

  shift && shift && shift

done

Which approach is faster depends on the shell and on the number of elements.

Another possibility is to use string processing. It has the advantage of not using the positional parameters, so you can use them for something else. It'll be slower for large amounts of data, but that's unlikely to make a noticeable difference for small amounts of data.

# List elements must be separated by a single space (not arbitrary whitespace)

list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'

while [ -n "$list" ]; do

  echo "${list% *}"

  case "$list" in * * * *) :;; *) break;; esac

  list="${list#* * * }"

done

answered Feb 1 at 7:59

Gilles

546k13011131626

"On the other hand, a shell loop that iterates over a large string or a large amount of string is likely to be slower than one invocation of a special-purpose tool" but what if that tool has loops in it like awk? @ikkachu said awk loops are faster, but would you say that with < 1000 fields to iterate through, the benefit of faster loops wouldn't outweigh the cost of calling awk since it's an external command (assuming I could do the same task in shell loops with the use of only built in commands)?

– Levi Uzodike
Feb 1 at 16:40

@LeviUzodike Please re-read the first paragraph of my answer.

– Gilles
Feb 1 at 17:04

You could also replace shift && shift && shift with shift 3 in your third example - unless the shell you're using doesn't support it.

– Joe
Feb 1 at 21:14

2

@Joe Actually, no. shift 3 would fail if there were too few remaining arguments. You'd need something like if [ $# -gt 3 ]; then shift 3; else set --; fi

– Gilles
Feb 1 at 21:25

add a comment |

awk is a great choice, if you can do all your processing inside of the Awk script. Otherwise, you just end up piping the Awk output to other utilities, destroying the performance gain of awk.

bash iteration over an array is also great, if you can fit your entire list inside the array (which for modern shells is probably a guarantee) and you don't mind the array syntax gymnastics.

However, a pipeline approach:

xargs -n3 <<< "$list" | while read -ra a; do echo $a; done | grep 9

Where:

xargs groups the whitespace-separated list into batches of three, each new-line separated

while read consumes that list and outputs the first column of each group

grep filters the first column (corresponding to every third position in the original list)

Improves understandability, in my opinion. People already know what these tools do, so it's easy to read from left to right and reason about what's going to happen. This approach also clearly documents the stride length (-n3) and the filter pattern (9), so it's easy to variabilize:

count=3

find=9

xargs -n "$count" <<< "$list" | while read -ra a; do echo $a; done | grep "$find"

When we ask questions of "efficiency", be sure to think about "total lifetime efficiency". That calculation includes the effort of maintainers to keep the code working, and we meat-bags are the least efficient machines in the whole operation.

answered Feb 1 at 19:08

bishop

2,1362923

add a comment |

Perhaps this?

cut -d' ' -f1,4,7,10,13 <<<$list

1 5 6 9 15

answered Jan 31 at 19:21

Doug O'Neal

2,9941919

Sorry I wasn't clear before, but I wanted to be able to get the numbers at those positions without knowing the length of the list. But thanks, I forgot cut could do that.

– Levi Uzodike
Jan 31 at 19:51

add a comment |

Don't use shell commands if you want to be efficient. Limit yourself to pipes, redirections, substitutions etc, and programs. That's why xargs and parallel utilities exists - because bash while loops are inefficient and very slow. Use bash loops only as the last resolve.

list="1 ant bat 5 cat dingo 6 emu fish 9 gecko hare 15 i j"

if 

    <<<"$list" tr -d -s '[0-9 ]' | 

    tr -s ' ' | tr ' ' 'n' | 

    grep -q -x '9'

then

    found=true

else 

    found=false

fi

echo ${found}

But you should get probably somewhat faster with good awk.

edited Jan 31 at 19:27

answered Jan 31 at 19:19

Kamil Cuk

1194

Sorry I wasn't clear before, but I was looking for a solution that would able to extract the values based only on their position in list. I just made the original list like that because I wanted it to be obvious the values I wanted.

– Levi Uzodike
Jan 31 at 20:02

add a comment |

In my opinion the clearest solution (and probably the most performant too) is to use the RS and ORS awk variables:

awk -v RS=' ' -v ORS=' ' 'NR % 3 == 1' <<< "$list"

answered Feb 2 at 16:43

user000001

997714

add a comment |

Using GNU sed and POSIX shell script:

echo $(printf '%sn' $list | sed -n '1~3p')

Or with bash's parameter substitution:

echo $(sed -n '1~3p' <<< ${list// /$'n'})

Non-GNU (i.e. POSIX) sed, and bash:

sed 's/([^ ]* )[^ ]* *[^ ]* */1/g' <<< "$list"

Or more portably, using both POSIX sed and shell script:

echo "$list" | sed 's/([^ ]* )[^ ]* *[^ ]* */1/g'

Output of any of these:

1 5 6 9 15

edited Feb 4 at 4:02

answered Feb 4 at 3:25

agc

4,80211138

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497985%2fis-piping-shifting-or-parameter-expansion-more-efficient%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

8 Answers
8

active

oldest

votes

8 Answers
8

active

oldest

votes

Pretty simple with awk. This will get you the value of every fourth field for input of any length:

$ awk -F' ' '{for( i=1;i<=NF;i+=3) { printf( "%s%s", $i, OFS ) }; printf( "n" ) }' <<< $list

1 5 6 9 15

Or, if you do indeed just want those specific fields as specified in your example:

$ awk -F' ' '{ print $1, $4, $7, $10, $13 }' <<< $list

1 5 6 9 15

$ time ./script.sh



real    0m0.025s

user    0m0.004s

sys     0m0.008s

You can compare that output between varying methods to see which is the most efficient in terms of time; other tools can be used for other efficiency metrics.

edited Jan 31 at 21:01

answered Jan 31 at 19:21

DopeGhoti

46.9k56190

1

Good point, @MichaelHomer; I've added an aside addressing the question of "how can I determine which method is the most efficient".

– DopeGhoti
Jan 31 at 20:58

2

@LeviUzodike Regarding echo vs <<<, "identical" is too strong a word. You could say that stuff <<< "$list" is nearly identical to printf "%sn" "$list" | stuff. Regarding echo vs printf, I direct you to this answer

– JoL
Jan 31 at 20:59

5

@DopeGhoti Actually it does. <<< adds a newline at the end. This is similar to how $() removes a newline from the end. This is because lines are terminated by newlines. <<< feeds an expression as a line, so it must be terminated by a newline. "$()" takes lines and provides them as an argument, so it makes sense to convert by removing the terminating newline.

– JoL
Feb 1 at 2:09

3

@LeviUzodike awk is a much under-appreciated tool. It will make all sorts of seemingly complex problems easy to solve. Especially when you are trying to write a complex regex for something like sed, you can often save hours by instead writing it procedurally in awk. Learning it will pa∕y large dividends.

– Joe
Feb 1 at 20:55

1

@LeviUzodike: Yes awk is a stand-alone binary that has to start up. Unlike perl or especially Python, the awk interpreter starts up quickly (still all the usual dynamic linker overhead of making quite a few system calls, but awk only uses libc/libm and libdl. e.g. use strace to check out system-calls of awk startup). Many shells (like bash) are pretty slow, so firing up one awk process can be faster than looping over tokens in a list with shell built-ins even for small-ish list sizes. And sometimes you can write a #!/usr/bin/awk script instead of a #!/bin/sh script.

– Peter Cordes
Feb 2 at 4:36

|
show 4 more comments

Pretty simple with awk. This will get you the value of every fourth field for input of any length:

$ awk -F' ' '{for( i=1;i<=NF;i+=3) { printf( "%s%s", $i, OFS ) }; printf( "n" ) }' <<< $list

1 5 6 9 15

Or, if you do indeed just want those specific fields as specified in your example:

$ awk -F' ' '{ print $1, $4, $7, $10, $13 }' <<< $list

1 5 6 9 15

$ time ./script.sh



real    0m0.025s

user    0m0.004s

sys     0m0.008s

You can compare that output between varying methods to see which is the most efficient in terms of time; other tools can be used for other efficiency metrics.

edited Jan 31 at 21:01

answered Jan 31 at 19:21

DopeGhoti

46.9k56190

1

Good point, @MichaelHomer; I've added an aside addressing the question of "how can I determine which method is the most efficient".

– DopeGhoti
Jan 31 at 20:58

2

@LeviUzodike Regarding echo vs <<<, "identical" is too strong a word. You could say that stuff <<< "$list" is nearly identical to printf "%sn" "$list" | stuff. Regarding echo vs printf, I direct you to this answer

– JoL
Jan 31 at 20:59

5

@DopeGhoti Actually it does. <<< adds a newline at the end. This is similar to how $() removes a newline from the end. This is because lines are terminated by newlines. <<< feeds an expression as a line, so it must be terminated by a newline. "$()" takes lines and provides them as an argument, so it makes sense to convert by removing the terminating newline.

– JoL
Feb 1 at 2:09

3

@LeviUzodike awk is a much under-appreciated tool. It will make all sorts of seemingly complex problems easy to solve. Especially when you are trying to write a complex regex for something like sed, you can often save hours by instead writing it procedurally in awk. Learning it will pa∕y large dividends.

– Joe
Feb 1 at 20:55

1

@LeviUzodike: Yes awk is a stand-alone binary that has to start up. Unlike perl or especially Python, the awk interpreter starts up quickly (still all the usual dynamic linker overhead of making quite a few system calls, but awk only uses libc/libm and libdl. e.g. use strace to check out system-calls of awk startup). Many shells (like bash) are pretty slow, so firing up one awk process can be faster than looping over tokens in a list with shell built-ins even for small-ish list sizes. And sometimes you can write a #!/usr/bin/awk script instead of a #!/bin/sh script.

– Peter Cordes
Feb 2 at 4:36

|
show 4 more comments

Pretty simple with awk. This will get you the value of every fourth field for input of any length:

$ awk -F' ' '{for( i=1;i<=NF;i+=3) { printf( "%s%s", $i, OFS ) }; printf( "n" ) }' <<< $list

1 5 6 9 15

Or, if you do indeed just want those specific fields as specified in your example:

$ awk -F' ' '{ print $1, $4, $7, $10, $13 }' <<< $list

1 5 6 9 15

$ time ./script.sh



real    0m0.025s

user    0m0.004s

sys     0m0.008s

You can compare that output between varying methods to see which is the most efficient in terms of time; other tools can be used for other efficiency metrics.

edited Jan 31 at 21:01

answered Jan 31 at 19:21

DopeGhoti

46.9k56190

Pretty simple with awk. This will get you the value of every fourth field for input of any length:

$ awk -F' ' '{for( i=1;i<=NF;i+=3) { printf( "%s%s", $i, OFS ) }; printf( "n" ) }' <<< $list

1 5 6 9 15

Or, if you do indeed just want those specific fields as specified in your example:

$ awk -F' ' '{ print $1, $4, $7, $10, $13 }' <<< $list

1 5 6 9 15

$ time ./script.sh



real    0m0.025s

user    0m0.004s

sys     0m0.008s

You can compare that output between varying methods to see which is the most efficient in terms of time; other tools can be used for other efficiency metrics.

edited Jan 31 at 21:01

answered Jan 31 at 19:21

DopeGhoti

46.9k56190

edited Jan 31 at 21:01

answered Jan 31 at 19:21

DopeGhoti

46.9k56190

answered Jan 31 at 19:21

DopeGhoti

46.9k56190

answered Jan 31 at 19:21

DopeGhoti

46.9k56190

1

Good point, @MichaelHomer; I've added an aside addressing the question of "how can I determine which method is the most efficient".

– DopeGhoti
Jan 31 at 20:58

2

@LeviUzodike Regarding echo vs <<<, "identical" is too strong a word. You could say that stuff <<< "$list" is nearly identical to printf "%sn" "$list" | stuff. Regarding echo vs printf, I direct you to this answer

– JoL
Jan 31 at 20:59

5

@DopeGhoti Actually it does. <<< adds a newline at the end. This is similar to how $() removes a newline from the end. This is because lines are terminated by newlines. <<< feeds an expression as a line, so it must be terminated by a newline. "$()" takes lines and provides them as an argument, so it makes sense to convert by removing the terminating newline.

– JoL
Feb 1 at 2:09

3

@LeviUzodike awk is a much under-appreciated tool. It will make all sorts of seemingly complex problems easy to solve. Especially when you are trying to write a complex regex for something like sed, you can often save hours by instead writing it procedurally in awk. Learning it will pa∕y large dividends.

– Joe
Feb 1 at 20:55

1

@LeviUzodike: Yes awk is a stand-alone binary that has to start up. Unlike perl or especially Python, the awk interpreter starts up quickly (still all the usual dynamic linker overhead of making quite a few system calls, but awk only uses libc/libm and libdl. e.g. use strace to check out system-calls of awk startup). Many shells (like bash) are pretty slow, so firing up one awk process can be faster than looping over tokens in a list with shell built-ins even for small-ish list sizes. And sometimes you can write a #!/usr/bin/awk script instead of a #!/bin/sh script.

– Peter Cordes
Feb 2 at 4:36

|
show 4 more comments

1

Good point, @MichaelHomer; I've added an aside addressing the question of "how can I determine which method is the most efficient".

– DopeGhoti
Jan 31 at 20:58

2

@LeviUzodike Regarding echo vs <<<, "identical" is too strong a word. You could say that stuff <<< "$list" is nearly identical to printf "%sn" "$list" | stuff. Regarding echo vs printf, I direct you to this answer

– JoL
Jan 31 at 20:59

5

@DopeGhoti Actually it does. <<< adds a newline at the end. This is similar to how $() removes a newline from the end. This is because lines are terminated by newlines. <<< feeds an expression as a line, so it must be terminated by a newline. "$()" takes lines and provides them as an argument, so it makes sense to convert by removing the terminating newline.

– JoL
Feb 1 at 2:09

3

@LeviUzodike awk is a much under-appreciated tool. It will make all sorts of seemingly complex problems easy to solve. Especially when you are trying to write a complex regex for something like sed, you can often save hours by instead writing it procedurally in awk. Learning it will pa∕y large dividends.

– Joe
Feb 1 at 20:55

1

@LeviUzodike: Yes awk is a stand-alone binary that has to start up. Unlike perl or especially Python, the awk interpreter starts up quickly (still all the usual dynamic linker overhead of making quite a few system calls, but awk only uses libc/libm and libdl. e.g. use strace to check out system-calls of awk startup). Many shells (like bash) are pretty slow, so firing up one awk process can be faster than looping over tokens in a list with shell built-ins even for small-ish list sizes. And sometimes you can write a #!/usr/bin/awk script instead of a #!/bin/sh script.

– Peter Cordes
Feb 2 at 4:36

Good point, @MichaelHomer; I've added an aside addressing the question of "how can I determine which method is the most efficient".

– DopeGhoti
Jan 31 at 20:58

@LeviUzodike Regarding echo vs <<<, "identical" is too strong a word. You could say that stuff <<< "$list" is nearly identical to printf "%sn" "$list" | stuff. Regarding echo vs printf, I direct you to this answer

– JoL
Jan 31 at 20:59

@DopeGhoti Actually it does. <<< adds a newline at the end. This is similar to how $() removes a newline from the end. This is because lines are terminated by newlines. <<< feeds an expression as a line, so it must be terminated by a newline. "$()" takes lines and provides them as an argument, so it makes sense to convert by removing the terminating newline.

– JoL
Feb 1 at 2:09

@LeviUzodike awk is a much under-appreciated tool. It will make all sorts of seemingly complex problems easy to solve. Especially when you are trying to write a complex regex for something like sed, you can often save hours by instead writing it procedurally in awk. Learning it will pa∕y large dividends.

– Joe
Feb 1 at 20:55

@LeviUzodike: Yes awk is a stand-alone binary that has to start up. Unlike perl or especially Python, the awk interpreter starts up quickly (still all the usual dynamic linker overhead of making quite a few system calls, but awk only uses libc/libm and libdl. e.g. use strace to check out system-calls of awk startup). Many shells (like bash) are pretty slow, so firing up one awk process can be faster than looping over tokens in a list with shell built-ins even for small-ish list sizes. And sometimes you can write a #!/usr/bin/awk script instead of a #!/bin/sh script.

– Peter Cordes
Feb 2 at 4:36

|
show 4 more comments

First rule of software optimization: Don't.

Until you know the speed of the program is an issue, there's no need to think
about how fast it is. If your list is about that length or just ~100-1000 items
long, you probably won't even notice how long it takes. There's a chance you're spending more time thinking about the optimization than what the difference would be.

Second rule: Measure.

That's the sure way to find out, and the one that gives answers for your system.
Especially with shells, there are so many, and they aren't all identical. An
answer for one shell might not apply for yours.

In larger programs, profiling goes here too. The slowest part might not be the one you think it is.

Third, the first rule of shell script optimization: Don't use the shell.

Yeah, really. Many shells aren't made to be fast (since launching external
programs doesn't have to be), and they might even parse the lines of the source
code again each time.

Use something like awk or Perl instead. In a trivial micro-benchmark I did, awk was dozens of times faster than any common shell in running a simple loop (without I/O).

However, if you do use the shell, use the shell's builtin functions instead of external commands. Here, you're using expr which isn't builtin in any shells I found on my system, but which can be replaced with standard arithmetic expansion. E.g. i=$((i+1)) instead of i=$(expr $i + 1) to increment i. Your use of cut in the last example might also be replaceable with standard parameter expansions.

See also: Why is using a shell loop to process text considered bad practice?

Steps #1 and #2 should apply to your question.

edited Feb 1 at 9:40

answered Jan 31 at 19:33

ilkkachu

63.3k10104181

12

#0, quote your expansions :-)

– Kusalananda♦
Jan 31 at 19:59

8

It's not that awk loops are necessarily any better or worse than shell loops. It's that the shell is really good at running commands and at directing input and output to and from processes, and frankly rather clunky at everything else; while tools like awk are fantastic at processing text data, because that's what shells and tools like awk are made for (respectively) in the first place.

– DopeGhoti
Jan 31 at 21:05

2

@DopeGhoti, shells do seem to be objectively slower, though. Some very simple while loops seem to be >25 times slower in dash than with gawk, and dash was the fastest shell I tested...

– ilkkachu
Jan 31 at 22:36

1

@Joe, it is :) dash and busybox don't support (( .. )) -- I think it's a nonstandard extension. ++ is also explicitly mentioned as not required, so as far as I can tell, i=$((i+1)) or : $(( i += 1)) are the safe ones.

– ilkkachu
Feb 1 at 23:10

1

Re "more time thinking": this neglects an important factor. How often does it run, and for how many users? If a program wastes 1 second, which could be fixed by the programmer thinking about it for 30 minutes, it might be a waste of time if there's only one user who's going to run it once. On the other hand if there's a million users, that's a million seconds, or 11 days of user time. If the code wasted a minute of a million users, that's about 2 years of user time.

– agc
Feb 4 at 2:49

|
show 3 more comments

First rule of software optimization: Don't.

Until you know the speed of the program is an issue, there's no need to think
about how fast it is. If your list is about that length or just ~100-1000 items
long, you probably won't even notice how long it takes. There's a chance you're spending more time thinking about the optimization than what the difference would be.

Second rule: Measure.

That's the sure way to find out, and the one that gives answers for your system.
Especially with shells, there are so many, and they aren't all identical. An
answer for one shell might not apply for yours.

In larger programs, profiling goes here too. The slowest part might not be the one you think it is.

Third, the first rule of shell script optimization: Don't use the shell.

Yeah, really. Many shells aren't made to be fast (since launching external
programs doesn't have to be), and they might even parse the lines of the source
code again each time.

Use something like awk or Perl instead. In a trivial micro-benchmark I did, awk was dozens of times faster than any common shell in running a simple loop (without I/O).

However, if you do use the shell, use the shell's builtin functions instead of external commands. Here, you're using expr which isn't builtin in any shells I found on my system, but which can be replaced with standard arithmetic expansion. E.g. i=$((i+1)) instead of i=$(expr $i + 1) to increment i. Your use of cut in the last example might also be replaceable with standard parameter expansions.

See also: Why is using a shell loop to process text considered bad practice?

Steps #1 and #2 should apply to your question.

edited Feb 1 at 9:40

answered Jan 31 at 19:33

ilkkachu

63.3k10104181

12

#0, quote your expansions :-)

– Kusalananda♦
Jan 31 at 19:59

8

It's not that awk loops are necessarily any better or worse than shell loops. It's that the shell is really good at running commands and at directing input and output to and from processes, and frankly rather clunky at everything else; while tools like awk are fantastic at processing text data, because that's what shells and tools like awk are made for (respectively) in the first place.

– DopeGhoti
Jan 31 at 21:05

2

@DopeGhoti, shells do seem to be objectively slower, though. Some very simple while loops seem to be >25 times slower in dash than with gawk, and dash was the fastest shell I tested...

– ilkkachu
Jan 31 at 22:36

1

@Joe, it is :) dash and busybox don't support (( .. )) -- I think it's a nonstandard extension. ++ is also explicitly mentioned as not required, so as far as I can tell, i=$((i+1)) or : $(( i += 1)) are the safe ones.

– ilkkachu
Feb 1 at 23:10

1

Re "more time thinking": this neglects an important factor. How often does it run, and for how many users? If a program wastes 1 second, which could be fixed by the programmer thinking about it for 30 minutes, it might be a waste of time if there's only one user who's going to run it once. On the other hand if there's a million users, that's a million seconds, or 11 days of user time. If the code wasted a minute of a million users, that's about 2 years of user time.

– agc
Feb 4 at 2:49

|
show 3 more comments

First rule of software optimization: Don't.

Until you know the speed of the program is an issue, there's no need to think
about how fast it is. If your list is about that length or just ~100-1000 items
long, you probably won't even notice how long it takes. There's a chance you're spending more time thinking about the optimization than what the difference would be.

Second rule: Measure.

That's the sure way to find out, and the one that gives answers for your system.
Especially with shells, there are so many, and they aren't all identical. An
answer for one shell might not apply for yours.

In larger programs, profiling goes here too. The slowest part might not be the one you think it is.

Third, the first rule of shell script optimization: Don't use the shell.

Yeah, really. Many shells aren't made to be fast (since launching external
programs doesn't have to be), and they might even parse the lines of the source
code again each time.

Use something like awk or Perl instead. In a trivial micro-benchmark I did, awk was dozens of times faster than any common shell in running a simple loop (without I/O).

However, if you do use the shell, use the shell's builtin functions instead of external commands. Here, you're using expr which isn't builtin in any shells I found on my system, but which can be replaced with standard arithmetic expansion. E.g. i=$((i+1)) instead of i=$(expr $i + 1) to increment i. Your use of cut in the last example might also be replaceable with standard parameter expansions.

See also: Why is using a shell loop to process text considered bad practice?

Steps #1 and #2 should apply to your question.

edited Feb 1 at 9:40

answered Jan 31 at 19:33

ilkkachu

63.3k10104181

First rule of software optimization: Don't.

Until you know the speed of the program is an issue, there's no need to think
about how fast it is. If your list is about that length or just ~100-1000 items
long, you probably won't even notice how long it takes. There's a chance you're spending more time thinking about the optimization than what the difference would be.

Second rule: Measure.

That's the sure way to find out, and the one that gives answers for your system.
Especially with shells, there are so many, and they aren't all identical. An
answer for one shell might not apply for yours.

In larger programs, profiling goes here too. The slowest part might not be the one you think it is.

Third, the first rule of shell script optimization: Don't use the shell.

Yeah, really. Many shells aren't made to be fast (since launching external
programs doesn't have to be), and they might even parse the lines of the source
code again each time.

Use something like awk or Perl instead. In a trivial micro-benchmark I did, awk was dozens of times faster than any common shell in running a simple loop (without I/O).

However, if you do use the shell, use the shell's builtin functions instead of external commands. Here, you're using expr which isn't builtin in any shells I found on my system, but which can be replaced with standard arithmetic expansion. E.g. i=$((i+1)) instead of i=$(expr $i + 1) to increment i. Your use of cut in the last example might also be replaceable with standard parameter expansions.

See also: Why is using a shell loop to process text considered bad practice?

Steps #1 and #2 should apply to your question.

edited Feb 1 at 9:40

answered Jan 31 at 19:33

ilkkachu

63.3k10104181

edited Feb 1 at 9:40

answered Jan 31 at 19:33

ilkkachu

63.3k10104181

answered Jan 31 at 19:33

ilkkachu

63.3k10104181

answered Jan 31 at 19:33

ilkkachu

63.3k10104181

12

#0, quote your expansions :-)

– Kusalananda♦
Jan 31 at 19:59

8

It's not that awk loops are necessarily any better or worse than shell loops. It's that the shell is really good at running commands and at directing input and output to and from processes, and frankly rather clunky at everything else; while tools like awk are fantastic at processing text data, because that's what shells and tools like awk are made for (respectively) in the first place.

– DopeGhoti
Jan 31 at 21:05

2

@DopeGhoti, shells do seem to be objectively slower, though. Some very simple while loops seem to be >25 times slower in dash than with gawk, and dash was the fastest shell I tested...

– ilkkachu
Jan 31 at 22:36

1

@Joe, it is :) dash and busybox don't support (( .. )) -- I think it's a nonstandard extension. ++ is also explicitly mentioned as not required, so as far as I can tell, i=$((i+1)) or : $(( i += 1)) are the safe ones.

– ilkkachu
Feb 1 at 23:10

1

Re "more time thinking": this neglects an important factor. How often does it run, and for how many users? If a program wastes 1 second, which could be fixed by the programmer thinking about it for 30 minutes, it might be a waste of time if there's only one user who's going to run it once. On the other hand if there's a million users, that's a million seconds, or 11 days of user time. If the code wasted a minute of a million users, that's about 2 years of user time.

– agc
Feb 4 at 2:49

|
show 3 more comments

12

#0, quote your expansions :-)

– Kusalananda♦
Jan 31 at 19:59

8

It's not that awk loops are necessarily any better or worse than shell loops. It's that the shell is really good at running commands and at directing input and output to and from processes, and frankly rather clunky at everything else; while tools like awk are fantastic at processing text data, because that's what shells and tools like awk are made for (respectively) in the first place.

– DopeGhoti
Jan 31 at 21:05

2

@DopeGhoti, shells do seem to be objectively slower, though. Some very simple while loops seem to be >25 times slower in dash than with gawk, and dash was the fastest shell I tested...

– ilkkachu
Jan 31 at 22:36

1

@Joe, it is :) dash and busybox don't support (( .. )) -- I think it's a nonstandard extension. ++ is also explicitly mentioned as not required, so as far as I can tell, i=$((i+1)) or : $(( i += 1)) are the safe ones.

– ilkkachu
Feb 1 at 23:10

1

Re "more time thinking": this neglects an important factor. How often does it run, and for how many users? If a program wastes 1 second, which could be fixed by the programmer thinking about it for 30 minutes, it might be a waste of time if there's only one user who's going to run it once. On the other hand if there's a million users, that's a million seconds, or 11 days of user time. If the code wasted a minute of a million users, that's about 2 years of user time.

– agc
Feb 4 at 2:49

#0, quote your expansions :-)

– Kusalananda♦
Jan 31 at 19:59

It's not that awk loops are necessarily any better or worse than shell loops. It's that the shell is really good at running commands and at directing input and output to and from processes, and frankly rather clunky at everything else; while tools like awk are fantastic at processing text data, because that's what shells and tools like awk are made for (respectively) in the first place.

– DopeGhoti
Jan 31 at 21:05

@DopeGhoti, shells do seem to be objectively slower, though. Some very simple while loops seem to be >25 times slower in dash than with gawk, and dash was the fastest shell I tested...

– ilkkachu
Jan 31 at 22:36

@Joe, it is :) dash and busybox don't support (( .. )) -- I think it's a nonstandard extension. ++ is also explicitly mentioned as not required, so as far as I can tell, i=$((i+1)) or : $(( i += 1)) are the safe ones.

– ilkkachu
Feb 1 at 23:10

Re "more time thinking": this neglects an important factor. How often does it run, and for how many users? If a program wastes 1 second, which could be fixed by the programmer thinking about it for 30 minutes, it might be a waste of time if there's only one user who's going to run it once. On the other hand if there's a million users, that's a million seconds, or 11 days of user time. If the code wasted a minute of a million users, that's about 2 years of user time.

– agc
Feb 4 at 2:49

|
show 3 more comments

list=(1 2 3 5 9 8 6 90 84 9 3 2 15 75 55)

for ((count = 0; count += 3; count < ${#list[@]})); do

  echo "${list[$count]}"

done

# List elements must not contain whitespace or ?*[

list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'

set $list

count=1

while [ $count -le $# ]; do

  eval "value=${$count}"

  echo "$value"

  count=$((count+1))

done

If you only ever want to access the array once and are going from left to right (skipping some values), you can use shift instead of variable indices.

# List elements must not contain whitespace or ?*[

list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'

set $list

while [ $# -ge 1 ]; do

  echo "$1"

  shift && shift && shift

done

Which approach is faster depends on the shell and on the number of elements.

# List elements must be separated by a single space (not arbitrary whitespace)

list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'

while [ -n "$list" ]; do

  echo "${list% *}"

  case "$list" in * * * *) :;; *) break;; esac

  list="${list#* * * }"

done

answered Feb 1 at 7:59

Gilles

546k13011131626

"On the other hand, a shell loop that iterates over a large string or a large amount of string is likely to be slower than one invocation of a special-purpose tool" but what if that tool has loops in it like awk? @ikkachu said awk loops are faster, but would you say that with < 1000 fields to iterate through, the benefit of faster loops wouldn't outweigh the cost of calling awk since it's an external command (assuming I could do the same task in shell loops with the use of only built in commands)?

– Levi Uzodike
Feb 1 at 16:40

@LeviUzodike Please re-read the first paragraph of my answer.

– Gilles
Feb 1 at 17:04

You could also replace shift && shift && shift with shift 3 in your third example - unless the shell you're using doesn't support it.

– Joe
Feb 1 at 21:14

2

@Joe Actually, no. shift 3 would fail if there were too few remaining arguments. You'd need something like if [ $# -gt 3 ]; then shift 3; else set --; fi

– Gilles
Feb 1 at 21:25

add a comment |

list=(1 2 3 5 9 8 6 90 84 9 3 2 15 75 55)

for ((count = 0; count += 3; count < ${#list[@]})); do

  echo "${list[$count]}"

done

# List elements must not contain whitespace or ?*[

list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'

set $list

count=1

while [ $count -le $# ]; do

  eval "value=${$count}"

  echo "$value"

  count=$((count+1))

done

If you only ever want to access the array once and are going from left to right (skipping some values), you can use shift instead of variable indices.

# List elements must not contain whitespace or ?*[

list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'

set $list

while [ $# -ge 1 ]; do

  echo "$1"

  shift && shift && shift

done

Which approach is faster depends on the shell and on the number of elements.

# List elements must be separated by a single space (not arbitrary whitespace)

list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'

while [ -n "$list" ]; do

  echo "${list% *}"

  case "$list" in * * * *) :;; *) break;; esac

  list="${list#* * * }"

done

answered Feb 1 at 7:59

Gilles

546k13011131626

"On the other hand, a shell loop that iterates over a large string or a large amount of string is likely to be slower than one invocation of a special-purpose tool" but what if that tool has loops in it like awk? @ikkachu said awk loops are faster, but would you say that with < 1000 fields to iterate through, the benefit of faster loops wouldn't outweigh the cost of calling awk since it's an external command (assuming I could do the same task in shell loops with the use of only built in commands)?

– Levi Uzodike
Feb 1 at 16:40

@LeviUzodike Please re-read the first paragraph of my answer.

– Gilles
Feb 1 at 17:04

You could also replace shift && shift && shift with shift 3 in your third example - unless the shell you're using doesn't support it.

– Joe
Feb 1 at 21:14

2

@Joe Actually, no. shift 3 would fail if there were too few remaining arguments. You'd need something like if [ $# -gt 3 ]; then shift 3; else set --; fi

– Gilles
Feb 1 at 21:25

add a comment |

list=(1 2 3 5 9 8 6 90 84 9 3 2 15 75 55)

for ((count = 0; count += 3; count < ${#list[@]})); do

  echo "${list[$count]}"

done

# List elements must not contain whitespace or ?*[

list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'

set $list

count=1

while [ $count -le $# ]; do

  eval "value=${$count}"

  echo "$value"

  count=$((count+1))

done

If you only ever want to access the array once and are going from left to right (skipping some values), you can use shift instead of variable indices.

# List elements must not contain whitespace or ?*[

list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'

set $list

while [ $# -ge 1 ]; do

  echo "$1"

  shift && shift && shift

done

Which approach is faster depends on the shell and on the number of elements.

# List elements must be separated by a single space (not arbitrary whitespace)

list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'

while [ -n "$list" ]; do

  echo "${list% *}"

  case "$list" in * * * *) :;; *) break;; esac

  list="${list#* * * }"

done

answered Feb 1 at 7:59

Gilles

546k13011131626

list=(1 2 3 5 9 8 6 90 84 9 3 2 15 75 55)

for ((count = 0; count += 3; count < ${#list[@]})); do

  echo "${list[$count]}"

done

# List elements must not contain whitespace or ?*[

list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'

set $list

count=1

while [ $count -le $# ]; do

  eval "value=${$count}"

  echo "$value"

  count=$((count+1))

done

If you only ever want to access the array once and are going from left to right (skipping some values), you can use shift instead of variable indices.

# List elements must not contain whitespace or ?*[

list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'

set $list

while [ $# -ge 1 ]; do

  echo "$1"

  shift && shift && shift

done

Which approach is faster depends on the shell and on the number of elements.

# List elements must be separated by a single space (not arbitrary whitespace)

list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'

while [ -n "$list" ]; do

  echo "${list% *}"

  case "$list" in * * * *) :;; *) break;; esac

  list="${list#* * * }"

done

answered Feb 1 at 7:59

Gilles

546k13011131626

answered Feb 1 at 7:59

Gilles

546k13011131626

answered Feb 1 at 7:59

Gilles

546k13011131626

answered Feb 1 at 7:59

Gilles

546k13011131626

"On the other hand, a shell loop that iterates over a large string or a large amount of string is likely to be slower than one invocation of a special-purpose tool" but what if that tool has loops in it like awk? @ikkachu said awk loops are faster, but would you say that with < 1000 fields to iterate through, the benefit of faster loops wouldn't outweigh the cost of calling awk since it's an external command (assuming I could do the same task in shell loops with the use of only built in commands)?

– Levi Uzodike
Feb 1 at 16:40

@LeviUzodike Please re-read the first paragraph of my answer.

– Gilles
Feb 1 at 17:04

You could also replace shift && shift && shift with shift 3 in your third example - unless the shell you're using doesn't support it.

– Joe
Feb 1 at 21:14

2

@Joe Actually, no. shift 3 would fail if there were too few remaining arguments. You'd need something like if [ $# -gt 3 ]; then shift 3; else set --; fi

– Gilles
Feb 1 at 21:25

add a comment |

"On the other hand, a shell loop that iterates over a large string or a large amount of string is likely to be slower than one invocation of a special-purpose tool" but what if that tool has loops in it like awk? @ikkachu said awk loops are faster, but would you say that with < 1000 fields to iterate through, the benefit of faster loops wouldn't outweigh the cost of calling awk since it's an external command (assuming I could do the same task in shell loops with the use of only built in commands)?

– Levi Uzodike
Feb 1 at 16:40

@LeviUzodike Please re-read the first paragraph of my answer.

– Gilles
Feb 1 at 17:04

You could also replace shift && shift && shift with shift 3 in your third example - unless the shell you're using doesn't support it.

– Joe
Feb 1 at 21:14

2

@Joe Actually, no. shift 3 would fail if there were too few remaining arguments. You'd need something like if [ $# -gt 3 ]; then shift 3; else set --; fi

– Gilles
Feb 1 at 21:25

"On the other hand, a shell loop that iterates over a large string or a large amount of string is likely to be slower than one invocation of a special-purpose tool" but what if that tool has loops in it like awk? @ikkachu said awk loops are faster, but would you say that with < 1000 fields to iterate through, the benefit of faster loops wouldn't outweigh the cost of calling awk since it's an external command (assuming I could do the same task in shell loops with the use of only built in commands)?

– Levi Uzodike
Feb 1 at 16:40

@LeviUzodike Please re-read the first paragraph of my answer.

– Gilles
Feb 1 at 17:04

You could also replace shift && shift && shift with shift 3 in your third example - unless the shell you're using doesn't support it.

– Joe
Feb 1 at 21:14

@Joe Actually, no. shift 3 would fail if there were too few remaining arguments. You'd need something like if [ $# -gt 3 ]; then shift 3; else set --; fi

– Gilles
Feb 1 at 21:25

add a comment |

awk is a great choice, if you can do all your processing inside of the Awk script. Otherwise, you just end up piping the Awk output to other utilities, destroying the performance gain of awk.

bash iteration over an array is also great, if you can fit your entire list inside the array (which for modern shells is probably a guarantee) and you don't mind the array syntax gymnastics.

However, a pipeline approach:

xargs -n3 <<< "$list" | while read -ra a; do echo $a; done | grep 9

Where:

xargs groups the whitespace-separated list into batches of three, each new-line separated

while read consumes that list and outputs the first column of each group

grep filters the first column (corresponding to every third position in the original list)

count=3

find=9

xargs -n "$count" <<< "$list" | while read -ra a; do echo $a; done | grep "$find"

answered Feb 1 at 19:08

bishop

2,1362923

add a comment |

awk is a great choice, if you can do all your processing inside of the Awk script. Otherwise, you just end up piping the Awk output to other utilities, destroying the performance gain of awk.

bash iteration over an array is also great, if you can fit your entire list inside the array (which for modern shells is probably a guarantee) and you don't mind the array syntax gymnastics.

However, a pipeline approach:

xargs -n3 <<< "$list" | while read -ra a; do echo $a; done | grep 9

Where:

xargs groups the whitespace-separated list into batches of three, each new-line separated

while read consumes that list and outputs the first column of each group

grep filters the first column (corresponding to every third position in the original list)

count=3

find=9

xargs -n "$count" <<< "$list" | while read -ra a; do echo $a; done | grep "$find"

answered Feb 1 at 19:08

bishop

2,1362923

add a comment |

awk is a great choice, if you can do all your processing inside of the Awk script. Otherwise, you just end up piping the Awk output to other utilities, destroying the performance gain of awk.

bash iteration over an array is also great, if you can fit your entire list inside the array (which for modern shells is probably a guarantee) and you don't mind the array syntax gymnastics.

However, a pipeline approach:

xargs -n3 <<< "$list" | while read -ra a; do echo $a; done | grep 9

Where:

xargs groups the whitespace-separated list into batches of three, each new-line separated

while read consumes that list and outputs the first column of each group

grep filters the first column (corresponding to every third position in the original list)

count=3

find=9

xargs -n "$count" <<< "$list" | while read -ra a; do echo $a; done | grep "$find"

answered Feb 1 at 19:08

bishop

2,1362923

awk is a great choice, if you can do all your processing inside of the Awk script. Otherwise, you just end up piping the Awk output to other utilities, destroying the performance gain of awk.

bash iteration over an array is also great, if you can fit your entire list inside the array (which for modern shells is probably a guarantee) and you don't mind the array syntax gymnastics.

However, a pipeline approach:

xargs -n3 <<< "$list" | while read -ra a; do echo $a; done | grep 9

Where:

xargs groups the whitespace-separated list into batches of three, each new-line separated

while read consumes that list and outputs the first column of each group

grep filters the first column (corresponding to every third position in the original list)

count=3

find=9

xargs -n "$count" <<< "$list" | while read -ra a; do echo $a; done | grep "$find"

answered Feb 1 at 19:08

bishop

2,1362923

answered Feb 1 at 19:08

bishop

2,1362923

answered Feb 1 at 19:08

bishop

2,1362923

answered Feb 1 at 19:08

bishop

2,1362923

add a comment |

Perhaps this?

cut -d' ' -f1,4,7,10,13 <<<$list

1 5 6 9 15

answered Jan 31 at 19:21

Doug O'Neal

2,9941919

Sorry I wasn't clear before, but I wanted to be able to get the numbers at those positions without knowing the length of the list. But thanks, I forgot cut could do that.

– Levi Uzodike
Jan 31 at 19:51

add a comment |

Perhaps this?

cut -d' ' -f1,4,7,10,13 <<<$list

1 5 6 9 15

answered Jan 31 at 19:21

Doug O'Neal

2,9941919

Sorry I wasn't clear before, but I wanted to be able to get the numbers at those positions without knowing the length of the list. But thanks, I forgot cut could do that.

– Levi Uzodike
Jan 31 at 19:51

add a comment |

Perhaps this?

cut -d' ' -f1,4,7,10,13 <<<$list

1 5 6 9 15

answered Jan 31 at 19:21

Doug O'Neal

2,9941919

Perhaps this?

cut -d' ' -f1,4,7,10,13 <<<$list

1 5 6 9 15

answered Jan 31 at 19:21

Doug O'Neal

2,9941919

answered Jan 31 at 19:21

Doug O'Neal

2,9941919

answered Jan 31 at 19:21

Doug O'Neal

2,9941919

answered Jan 31 at 19:21

Doug O'Neal

2,9941919

Sorry I wasn't clear before, but I wanted to be able to get the numbers at those positions without knowing the length of the list. But thanks, I forgot cut could do that.

– Levi Uzodike
Jan 31 at 19:51

add a comment |

Sorry I wasn't clear before, but I wanted to be able to get the numbers at those positions without knowing the length of the list. But thanks, I forgot cut could do that.

– Levi Uzodike
Jan 31 at 19:51

Sorry I wasn't clear before, but I wanted to be able to get the numbers at those positions without knowing the length of the list. But thanks, I forgot cut could do that.

– Levi Uzodike
Jan 31 at 19:51

add a comment |

list="1 ant bat 5 cat dingo 6 emu fish 9 gecko hare 15 i j"

if 

    <<<"$list" tr -d -s '[0-9 ]' | 

    tr -s ' ' | tr ' ' 'n' | 

    grep -q -x '9'

then

    found=true

else 

    found=false

fi

echo ${found}

But you should get probably somewhat faster with good awk.

edited Jan 31 at 19:27

answered Jan 31 at 19:19

Kamil Cuk

1194

Sorry I wasn't clear before, but I was looking for a solution that would able to extract the values based only on their position in list. I just made the original list like that because I wanted it to be obvious the values I wanted.

– Levi Uzodike
Jan 31 at 20:02

add a comment |

list="1 ant bat 5 cat dingo 6 emu fish 9 gecko hare 15 i j"

if 

    <<<"$list" tr -d -s '[0-9 ]' | 

    tr -s ' ' | tr ' ' 'n' | 

    grep -q -x '9'

then

    found=true

else 

    found=false

fi

echo ${found}

But you should get probably somewhat faster with good awk.

edited Jan 31 at 19:27

answered Jan 31 at 19:19

Kamil Cuk

1194

Sorry I wasn't clear before, but I was looking for a solution that would able to extract the values based only on their position in list. I just made the original list like that because I wanted it to be obvious the values I wanted.

– Levi Uzodike
Jan 31 at 20:02

add a comment |

list="1 ant bat 5 cat dingo 6 emu fish 9 gecko hare 15 i j"

if 

    <<<"$list" tr -d -s '[0-9 ]' | 

    tr -s ' ' | tr ' ' 'n' | 

    grep -q -x '9'

then

    found=true

else 

    found=false

fi

echo ${found}

But you should get probably somewhat faster with good awk.

edited Jan 31 at 19:27

answered Jan 31 at 19:19

Kamil Cuk

1194

list="1 ant bat 5 cat dingo 6 emu fish 9 gecko hare 15 i j"

if 

    <<<"$list" tr -d -s '[0-9 ]' | 

    tr -s ' ' | tr ' ' 'n' | 

    grep -q -x '9'

then

    found=true

else 

    found=false

fi

echo ${found}

But you should get probably somewhat faster with good awk.

edited Jan 31 at 19:27

answered Jan 31 at 19:19

Kamil Cuk

1194

edited Jan 31 at 19:27

answered Jan 31 at 19:19

Kamil Cuk

1194

answered Jan 31 at 19:19

Kamil Cuk

1194

answered Jan 31 at 19:19

Kamil Cuk

1194

Sorry I wasn't clear before, but I was looking for a solution that would able to extract the values based only on their position in list. I just made the original list like that because I wanted it to be obvious the values I wanted.

– Levi Uzodike
Jan 31 at 20:02

add a comment |

Sorry I wasn't clear before, but I was looking for a solution that would able to extract the values based only on their position in list. I just made the original list like that because I wanted it to be obvious the values I wanted.

– Levi Uzodike
Jan 31 at 20:02

Sorry I wasn't clear before, but I was looking for a solution that would able to extract the values based only on their position in list. I just made the original list like that because I wanted it to be obvious the values I wanted.

– Levi Uzodike
Jan 31 at 20:02

add a comment |

In my opinion the clearest solution (and probably the most performant too) is to use the RS and ORS awk variables:

awk -v RS=' ' -v ORS=' ' 'NR % 3 == 1' <<< "$list"

answered Feb 2 at 16:43

user000001

997714

add a comment |

In my opinion the clearest solution (and probably the most performant too) is to use the RS and ORS awk variables:

awk -v RS=' ' -v ORS=' ' 'NR % 3 == 1' <<< "$list"

answered Feb 2 at 16:43

user000001

997714

add a comment |

In my opinion the clearest solution (and probably the most performant too) is to use the RS and ORS awk variables:

awk -v RS=' ' -v ORS=' ' 'NR % 3 == 1' <<< "$list"

answered Feb 2 at 16:43

user000001

997714

In my opinion the clearest solution (and probably the most performant too) is to use the RS and ORS awk variables:

awk -v RS=' ' -v ORS=' ' 'NR % 3 == 1' <<< "$list"

answered Feb 2 at 16:43

user000001

997714

answered Feb 2 at 16:43

user000001

997714

answered Feb 2 at 16:43

user000001

997714

answered Feb 2 at 16:43

user000001

997714

add a comment |

Using GNU sed and POSIX shell script:

echo $(printf '%sn' $list | sed -n '1~3p')

Or with bash's parameter substitution:

echo $(sed -n '1~3p' <<< ${list// /$'n'})

Non-GNU (i.e. POSIX) sed, and bash:

sed 's/([^ ]* )[^ ]* *[^ ]* */1/g' <<< "$list"

Or more portably, using both POSIX sed and shell script:

echo "$list" | sed 's/([^ ]* )[^ ]* *[^ ]* */1/g'

Output of any of these:

1 5 6 9 15

edited Feb 4 at 4:02

answered Feb 4 at 3:25

agc

4,80211138

add a comment |

Using GNU sed and POSIX shell script:

echo $(printf '%sn' $list | sed -n '1~3p')

Or with bash's parameter substitution:

echo $(sed -n '1~3p' <<< ${list// /$'n'})

Non-GNU (i.e. POSIX) sed, and bash:

sed 's/([^ ]* )[^ ]* *[^ ]* */1/g' <<< "$list"

Or more portably, using both POSIX sed and shell script:

echo "$list" | sed 's/([^ ]* )[^ ]* *[^ ]* */1/g'

Output of any of these:

1 5 6 9 15

edited Feb 4 at 4:02

answered Feb 4 at 3:25

agc

4,80211138

add a comment |

Using GNU sed and POSIX shell script:

echo $(printf '%sn' $list | sed -n '1~3p')

Or with bash's parameter substitution:

echo $(sed -n '1~3p' <<< ${list// /$'n'})

Non-GNU (i.e. POSIX) sed, and bash:

sed 's/([^ ]* )[^ ]* *[^ ]* */1/g' <<< "$list"

Or more portably, using both POSIX sed and shell script:

echo "$list" | sed 's/([^ ]* )[^ ]* *[^ ]* */1/g'

Output of any of these:

1 5 6 9 15

edited Feb 4 at 4:02

answered Feb 4 at 3:25

agc

4,80211138

Using GNU sed and POSIX shell script:

echo $(printf '%sn' $list | sed -n '1~3p')

Or with bash's parameter substitution:

echo $(sed -n '1~3p' <<< ${list// /$'n'})

Non-GNU (i.e. POSIX) sed, and bash:

sed 's/([^ ]* )[^ ]* *[^ ]* */1/g' <<< "$list"

Or more portably, using both POSIX sed and shell script:

echo "$list" | sed 's/([^ ]* )[^ ]* *[^ ]* */1/g'

Output of any of these:

1 5 6 9 15

edited Feb 4 at 4:02

answered Feb 4 at 3:25

agc

4,80211138

edited Feb 4 at 4:02

answered Feb 4 at 3:25

agc

4,80211138

answered Feb 4 at 3:25

agc

4,80211138

answered Feb 4 at 3:25

agc

4,80211138

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu