What is the difference between “Redirection” and “Pipe”?
This question may sound a bit stupid, but I can not really see the difference between redirection and pipes.
Redirection is used to redirect the stdout/stdin/stderr, e.g. ls > log.txt
.
Pipes are used to give the output of a command as input to another command, e.g. ls | grep file.txt
.
But why are there two operators for the same thing?
Why not just write ls > grep
to pass the output through, isn't this just a kind of redirection also? What I am missing?
pipe redirect
add a comment |
This question may sound a bit stupid, but I can not really see the difference between redirection and pipes.
Redirection is used to redirect the stdout/stdin/stderr, e.g. ls > log.txt
.
Pipes are used to give the output of a command as input to another command, e.g. ls | grep file.txt
.
But why are there two operators for the same thing?
Why not just write ls > grep
to pass the output through, isn't this just a kind of redirection also? What I am missing?
pipe redirect
add a comment |
This question may sound a bit stupid, but I can not really see the difference between redirection and pipes.
Redirection is used to redirect the stdout/stdin/stderr, e.g. ls > log.txt
.
Pipes are used to give the output of a command as input to another command, e.g. ls | grep file.txt
.
But why are there two operators for the same thing?
Why not just write ls > grep
to pass the output through, isn't this just a kind of redirection also? What I am missing?
pipe redirect
This question may sound a bit stupid, but I can not really see the difference between redirection and pipes.
Redirection is used to redirect the stdout/stdin/stderr, e.g. ls > log.txt
.
Pipes are used to give the output of a command as input to another command, e.g. ls | grep file.txt
.
But why are there two operators for the same thing?
Why not just write ls > grep
to pass the output through, isn't this just a kind of redirection also? What I am missing?
pipe redirect
pipe redirect
edited Apr 1 '15 at 9:38
John Threepwood
asked Aug 7 '12 at 13:22
John ThreepwoodJohn Threepwood
1,09741111
1,09741111
add a comment |
add a comment |
8 Answers
8
active
oldest
votes
Pipe is used to pass output to another program or utility.
Redirect is used to pass output to either a file or stream.
Example: thing1 > thing2
vs thing1 | thing2
thing1 > thing2
- Your shell will run the program named
thing1
- Everything that
thing1
outputs will be placed in a file calledthing2
. (Note - ifthing2
exists, it will be overwritten)
If you want to pass the output from program thing1
to a program called thing2
, you could do the following:
thing1 > temp_file && thing2 < temp_file
which would
- run program named
thing1
- save the output into a file named
temp_file
- run program named
thing2
, pretending that the person at the keyboard typed the contents oftemp_file
as the input.
However, that's clunky, so they made pipes as a simpler way to do that. thing1 | thing2
does the same thing as thing1 > temp_file && thing2 < temp_file
EDIT to provide more details to question in comment:
If >
tried to be both "pass to program" and "write to file", it could cause problems in both directions.
First example: You are trying to write to a file. There already exists a file with that name that you wish to overwrite. However, the file is executable. Presumably, it would try to execute this file, passing the input. You'd have to do something like write the output to a new filename, then rename the file.
Second example: As Florian Diesch pointed out, what if there's another command elsewhere in the system with the same name (that is in the execute path). If you intended to make a file with that name in your current folder, you'd be stuck.
Thirdly: if you mis-type a command, it wouldn't warn you that the command doesn't exist. Right now, if you type ls | gerp log.txt
it will tell you bash: gerp: command not found
. If >
meant both, it would simply create a new file for you (then warn it doesn't know what to do with log.txt
).
Thank you. You mentionedthing1 > temp_file && thing2 < temp_file
to do more easier with pipes. But why not re-use the>
operator to do this, e.g.thing1 > thing2
for commandsthing1
andthing2
? Why an extra operator|
?
– John Threepwood
Aug 7 '12 at 13:57
1
"Take the output and write it to a file" is a different action than "Take the output and pass it to a different program". I'll edit more thoughts into my answer...
– David Oneill
Aug 7 '12 at 14:01
1
@JohnThreepwood They have different meanings. What if I wanted to redirect something to a file namedless
, for example?thing | less
andthing > less
are perfectly different, as they do different things. What you propose would create an ambiguity.
– Darkhogg
May 25 '14 at 9:55
Is it accurate to say that "thing1 > temp_file" is merely syntactic sugar for "thing1 | tee temp_file" ? Since finding out about tee I almost never use redirects.
– Sridhar-Sarnobat
Jun 5 '14 at 5:09
2
@Sridhar-Sarnobat no, thetee
command does something different.tee
writes output to both the screen (stdout
) and the file. Redirect does only the file.
– David Oneill
Jun 5 '14 at 9:16
|
show 6 more comments
If the meaning of foo > bar
would depend on whether there is a command named bar
that would make using redirection a lot harder and more error prone: Every time I want to redirect to a file I first had to check whether there's a command named like my destination file.
This would be an issue only if you're writing tobar
in a directory that's part of your$PATH
env variable. If you're in something like /bin, then ot could be a problem. But even then,bar
would have to have executable permission set, so that shell checks not just for finding an executablebar
but actually can execute it. And if the concern is with overwriting existing file,noclober
shell option should prevent overwriting existing files in redirections.
– Sergiy Kolodyazhnyy
Sep 12 '18 at 19:20
add a comment |
There's a vital difference between the two operators:
ls > log.txt
--> This command sends the output to the log.txt file.ls | grep file.txt
--> This command sends the output of the ls to grep command through the use of pipe (|
), and the grep command searches for file.txt in the in the input provided to it by the previous command.
If you had to perform the same task using the first scenario, then it would be:
ls > log.txt; grep 'file.txt' log.txt
So a pipe (with |
) is used to send the output to other command, whereas redirection (with >
) is used to redirect the output to some file.
add a comment |
From the Unix and Linux System Administration Handbook:
Redirection
The shell interprets the symbols <,>, and >> as instructions to reroute a command's input or output to or from a file.
Pipes
To connect the STDOUT of one command to the STDIN of another use the | symbol, commonly known as a pipe.
So my interpretation is: If it's command to command, use a pipe. If you are outputting to or from a file use the redirect.
add a comment |
There's a big syntactic difference between the two:
- A redirect is an argument to a program
- A pipe separates two commands
You can think of redirects like this: cat [<infile] [>outfile]
. This implies order doesn't matter: cat <infile >outfile
is the same as cat >outfile <infile
. You can even mix redirects up with other arguments: cat >outfile <infile -b
and cat <infile -b >outfile
are both perfectly fine. Also you can string together more than one input or output (inputs will be read sequentially and all output will be written to each output file): cat >outfile1 >outfile2 <infile1 <infile2
. The target or source of a redirect can be either a filename or the name of a stream (like &1, at least in bash).
But pipes totally separate one command from another command, you can't mix them in with arguments:
[command1] | [command2]
The pipe takes everything written to standard output from command1 and sends it to the standard input of command2.
You can also combine piping and redirection. For example:
cat <infile >outfile | cat <infile2 >outfile2
The first cat
will read lines from infile, then simultaneously write each line to outfile and send it to the second cat
.
In the second cat
, standard input first reads from the pipe (the contents of infile), then reads from infile2, writing each line to outfile2. After running this, outfile will be a copy of infile, and outfile2 will contain infile followed by infile2.
Finally, you actually do something really similar to your example using "here string" redirection (bash family only) and backticks:
grep blah <<<`ls`
will give the same result as
ls | grep blah
But I think the redirection version will first read all of the output of ls into a buffer (in memory), and then feed that buffer to grep one line at a time, whereas the piped version will take each line from ls as it emerges, and pass that line to grep.
1
Nitpick: order matters in redirection if you redirect one fd to another:echo yes 1>&2 2>/tmp/blah; wc -l /tmp/blah; echo yes 2>/tmp/blah 1>&2; wc -l /tmp/blah
Further, redirection to a file will only use the last redirection.echo yes >/tmp/blah >/tmp/blah2
will only write to/tmp/blah2
.
– muru
Aug 23 '14 at 22:49
2
Redirect is not actually argument to the program. The program will not know or care where its output goes (or input comes from). It's just way of telling bash how to arrange things before running the program.
– Alois Mahdal
Apr 23 '15 at 18:33
add a comment |
Note:The answer reflects my own understanding of these mechanisms up to date, accumulated over research and reading of the answers by the peers on this site and unix.stackexchange.com, and will be updated as time goes on. Don't hesitate to ask questions or suggest improvements in the comments. I also suggest you try to see how syscalls work in shell with strace
command. Also please don't be intimidated by the notion of internals or syscalls - you don't have to know or be able to use them in order to understand how shell does things, but they definitely help understanding.
TL;DR
|
pipes are not associated with an entry on disk, therefore do not have an inode number of disk filesystem (but do have inode in pipefs virtual filesystem in kernel-space), but redirections often involve files, which do have disk entries and therefore have corresponding inode.- pipes are not
lseek()
'able so commands can't read some data and then rewind back, but when you redirect with>
or<
usually it's a file which islseek()
able object, so commands can navigate however they please. - redirections are manipulations on file descriptors, which can be many; pipes have only two file descriptors - one for left command and one for right command
- redirection on standard streams and pipes are both buffered.
- pipes almost always involve forking and therefore pairs of processes are involved; redirections - not always, though in both cases resulting file descriptors are inherited by sub-processes.
- pipes always connect file descriptors (a pair), redirections - either use a pathname or file descriptors.
- pipes are Inter-Process Communication method, while redirections are just manipulations on open files or file-like objects
- both employ
dup2()
syscalls underneath the hood to provide copies of file descriptors, where actual flow of data occurs. - redirections can be applied "globally" with
exec
built-in command ( see this and this ), so if you doexec > output.txt
every command will write tooutput.txt
from then on.|
pipes are applied only for current command (which means either simple command or subshell likeseq 5 | (head -n1; head -n2)
or compound commands. When redirection is done on files, things like
echo "TEST" > file
andecho "TEST" >> file
both useopen()
syscall on that file (see also) and get file descriptor from it to pass it todup2()
. Pipes|
only usepipe()
anddup2()
syscall.As far as commands being executed, pipes and redirection are no more than file descriptors - file-like objects, to which they may write blindly, or manipulate them internally (which may produce unexpected behaviors;
apt
for instance, tends to not even write to stdout if it knows there's redirection).
Introduction
In order to understand how these two mechanisms differ, it's necessary to understand their essential properties, the history behind the two, and their roots in C programming language. In fact, knowing what file descriptors are, and how dup2()
and pipe()
system calls work is essential, as well as lseek()
. Shell is meant as a way of making these mechanisms abstract to the user, but digging deeper than the abstraction helps understand the true nature of shell's behavior.
The Origins of Redirections and Pipes
According to Dennis Ritche's article Prophetic Petroglyphs, pipes originated from a 1964 internal memo by Malcolm Douglas McIlroy, at the time when they were working on Multics operating system. Quote:
To put my strongest concerns into a nutshell:
- We should have some ways of connecting programs like garden hose--screw in another segment when it becomes when it becomes necessary to massage data in another way. This is the way of IO also.
What's apparent is that at the time programs were capable of writing to disk, however that was inefficient if output was large. To quote Brian Kernighan's explanation in Unix Pipeline video :
First, you don't have to write one big massive program - you've got existing smaller programs that may already do parts of the job...Another is that it's possible that the amount of data you're procesing would not fit if you stored it in a file...because remember, we're back in the days when disks on these things had, if you were lucky, a Megabyte or two of data...So the pipeline never had to instantiate the whole output.
Thus conceptual difference is apparent: pipes are a mechanism of making programs talk to one another. Redirections - are way of writing to file at basic level. In both cases, shell makes these two things easy, but underneath the hood, there's whole lot of going on.
Going deeper: syscalls and internal workings of the shell
We start with the notion of file descriptor. File descriptors describe basically an open file (whether that's a file on disk, or in memory, or anonymous file), which is represented by an integer number. The two standard data streams (stdin,stdout,stderr) are file descriptors 0,1, and 2 respectively. Where do they come from ? Well, in shell commands the file descriptors are inherited from their parent - shell. And it's true in general for all processes - child process inherits parent's file descriptors. For daemons it is common to close all inherited file descriptors and/or redirect to other places.
Back to redirection. What is it really ? It's a mechanism that tells the shell to prepare file descriptors for command (because redirections are done by shell before command runs), and point them where the user suggested. The standard definition of output redirection is
[n]>word
That [n]
there is the file descriptor number. When you do echo "Something" > /dev/null
the number 1 is implied there, and echo 2> /dev/null
.
Underneath the hood this is done by duplicating file descriptor via dup2()
system call. Let's take df > /dev/null
. The shell will create a child process where df
runs, but before that it will open /dev/null
as file descriptor #3, and dup2(3,1)
will be issued, which makes a copy of file descriptor 3 and the copy will be 1. You know how you have two files file1.txt
and file2.txt
, and when you do cp file1.txt file2.txt
you'll have two same files, but you can manipulate them independently ? That's kinda the same thing happening here. Often you can see that before running, the bash
will do dup(1,10)
to make a copy file descriptor #1 which is stdout
( and that copy will be fd #10 ) in order to restore it later. Important is to note that when you consider built-in commands (which are part of shell itself, and have no file in /bin
or elsewhere) or simple commands in non-interactive shell, the shell doesn't create a child process.
And then we have things like [n]>&[m]
and [n]&<[m]
. This is duplicating file descriptors, which the same mechanism as dup2()
only now it's in the shell syntax, conveniently available for the user.
One of the important things to note about redirection is that their order is not fixed, but is significant to how shell interprets what user wants. Compare the following:
# Make copy of where fd 2 points , then redirect fd 2
$ ls -l /proc/self/fd/ 3>&2 2> /dev/null
total 0
lrwx------ 1 user user 64 Sep 13 00:08 0 -> /dev/pts/0
lrwx------ 1 user user 64 Sep 13 00:08 1 -> /dev/pts/0
l-wx------ 1 user user 64 Sep 13 00:08 2 -> /dev/null
lrwx------ 1 runner user 64 Sep 13 00:08 3 -> /dev/pts/0
lr-x------ 1 user user 64 Sep 13 00:08 4 -> /proc/29/fd
# redirect fd #2 first, then clone it
$ ls -l /proc/self/fd/ 2> /dev/null 3>&2
total 0
lrwx------ 1 user user 64 Sep 13 00:08 0 -> /dev/pts/0
lrwx------ 1 user user 64 Sep 13 00:08 1 -> /dev/pts/0
l-wx------ 1 user user 64 Sep 13 00:08 2 -> /dev/null
l-wx------ 1 user user 64 Sep 13 00:08 3 -> /dev/null
lr-x------ 1 user user 64 Sep 13 00:08 4 -> /proc/31/fd
The practical use of these in shell scripting can be versatile:
- saving output to variable of a program that only writes to stderr
- swapping stderr and stdout
- separating even input lines from odd input lines
and many other.
Plumbing with pipe()
and dup2()
So how do pipes get created ? Via pipe()
syscall, which will take as input an array (aka list) called pipefd
of two items of type int
(integer). Those two integers are file descriptors. The pipefd[0]
will be the read end of the pipe and pipefd[1]
will be the write end. So in df | grep 'foo'
, grep
will get copy of pipefd[0]
and df
will get a copy of pipefd[1]
. But how ? Of course, with the magic of dup2()
syscall. For df
in our example, let's say pipefd[1]
has #4, so the shell will make a child, do dup2(4,1)
(remember my cp
example ?), and then do execve()
to actually run df
. Naturally, df
will inherit file descriptor #1, but will be unaware that it's no longer pointing at terminal, but actually fd #4, which is actually the write end of the pipe. Naturally, same thing will occur with grep 'foo'
except with different numbers of file descriptors.
Now, interesting question: could we make pipes that redirect fd #2 as well, not just fd #1 ? Yes, in fact that's what |&
does in bash. The POSIX standard requires shell command language to support df 2>&1 | grep 'foo'
syntax for that purpose, but bash
does |&
as well.
What's important to note is that pipes always deal with file descriptors. There exists FIFO
or named pipe, which has a filename on disk and let's you use it as a file, but behaves like a pipe. But the |
types of pipes are what's known as anonymous pipe - they have no filename, because they're really just two objects connected together. The fact that we're not dealing with files also makes an important implication: pipes aren't lseek()
'able. Files, either in memory or on disk, are static - programs can use lseek()
syscall to jump to byte 120, then back to byte 10, then forward all the way to the end. Pipes are not static - they're sequential, and therefore you cannot rewind data you get from them with lseek()
. This is what makes some programs aware if they're reading from file or from pipe, and thus they can make necessary adjustments for efficient performance; in other words, a prog
can detect if I do cat file.txt | prog
or prog < input.txt
. Real work example of that is tail.
The other two very interesting property of pipes is that they have a buffer, which on Linux is 4096 bytes, and they actually have a filesystem as defined in Linux source code ! They're not simply an object for passing data around, they are a datastructure themselves ! In fact, because there exists pipefs filesystem, which manages both pipes and FIFOs, pipes have an inode number on their respective filesystem:
# Stdout of ls is wired to pipe
$ ls -l /proc/self/fd/ | cat
lrwx------ 1 user user 64 Sep 13 00:02 0 -> /dev/pts/0
l-wx------ 1 user user 64 Sep 13 00:02 1 -> pipe:[15655630]
lrwx------ 1 user user 64 Sep 13 00:02 2 -> /dev/pts/0
lr-x------ 1 user user 64 Sep 13 00:02 3 -> /proc/22/fd
# stdin of ls is wired to pipe
$ true | ls -l /proc/self/fd/0
lr-x------ 1 user user 64 Sep 13 03:58 /proc/self/fd/0 -> 'pipe:[54741]'
On Linux pipes are uni-directional, just like redirection. On some Unix-like implementations - there are bi-directional pipes. Although with magic of shell scripting, you can make bi-directional pipes on Linux as well.
See Also:
- How to make output of any shell command unbuffered?
Wikipedia example of how pipeline is created in C usingpipe()
syscall anddup2()
.- why pipes are used instead of input redirection
- What is the differences between &> and 2>&1
- Redirections such as
<<
,<<<
are implemented as anonymous (unlinked) temp files inbash
andksh
, while< <()
uses anonymous pipes ;/bin/dash
uses pipes for<<
. See What's the difference between <<, <<< and < < in bash?
add a comment |
To add to the other answers, there are subtle semantic difference too - e.g. pipes close more readily than redirects:
seq 5 | (head -n1; head -n1) # just 1
seq 5 > tmp5; (head -n1; head -n1) < tmp5 # 1 and 2
seq 5 | (read LINE; echo $LINE; head -n1) # 1 and 2
In the first example, when the first call to head
finishes, it closes the pipe, and seq
terminates, so there's no input available for the second head
.
In the second example, head consumes the first line, but when it closes it's own stdin
pipe, the file remains open for the next call to use.
The third example shows that if we use read
to avoid closing the pipe it is still available within the subprocess.
So the "stream" is the thing that we shunt data through (stdin etc), and is the same in both cases, but the pipe connects streams from two processes, where a redirection connects a streams between a process and a file, so you can see source of both the similarities and differences.
P.S. If you're as curious about and/or surprised by those examples as I was, you can get dig in further using trap
to see how the processes resolve, E.g:
(trap 'echo seq EXITed >&2' EXIT; seq 5) | (trap 'echo all done' EXIT; (trap 'echo first head exited' EXIT; head -n1)
echo '.'
(trap 'echo second head exited' EXIT; head -n1))
Sometimes the first process closes before 1
is printed, sometimes afterwards.
I also found it interesting to use exec <&-
to close the stream from the redirection to approximate the behaviour of the pipe (albeit with an error):
seq 5 > tmp5
(trap 'echo all done' EXIT
(trap 'echo first head exited' EXIT; head -n1)
echo '.'
exec <&-
(trap 'echo second head exited' EXIT; head -n1)) < tmp5`
"when the first call to head finishes, it closes the pipe" This is actually inaccurate for two reasons. One, (head -n1; head -n1) is subshell with two commands, each of which inherits read end of pipe as descriptor 0, and thus subshell AND each command have that file descriptor open. Second reason, you can see that with strace -f bash -c 'seq 5 | (head -n1; head -n1)'. So first head closes only its copy of file descriptor
– Sergiy Kolodyazhnyy
Sep 3 '18 at 5:20
Third example is also inaccurate, becauseread
consumes only first line ( that's one byte for1
and newline ).seq
sent in total 10 bytes ( 5 numbers and 5 newlines ). So there's 8 bytes remaining in pipe buffer, and that's why secondhead
works - there's data still available in pipe buffer. Btw, head exits only if there's 0 bytes read, kinda as inhead /dev/null
– Sergiy Kolodyazhnyy
Sep 3 '18 at 5:39
Thanks for the clarification. Am I understanding correctly that inseq 5 | (head -n1; head -n1)
the first call empties the pipe, so it still exists in an open state but with no data for the second call tohead
? So the difference in behavior between the pipe and the redirect is because head pulls all the data out of the pipe, but only the 2 lines out of the file handle?
– Julian de Bhal
Sep 12 '18 at 5:10
That's correct. And it's something that can be seen withstrace
command I gave in the first comment. With redirection, tmp file is on disk which makes it seekable ( because they uselseek()
syscall - commands can jump around the file from first byte to last however they want. But pipes are sequential and not seekable. So the only way for head to do its job is to read everything first, or if file is big - map some of it to RAM viammap()
call. I once did my owntail
in Python, and ran into exactly same problem.
– Sergiy Kolodyazhnyy
Sep 12 '18 at 5:21
It's also important to remember that read end of the pipe ( file descriptor ) is given to subshell first(...)
, and the subshell will make copy of its own stdin to each command inside(...)
. So they're technically read from same object. Firsthead
thinks it's reading from it's own stdin. Secondhead
thinks it has its own stdin. But in reality their fd #1 ( stdin ) is just copy of same fd, which is read end of the pipe. Also, I've posted an answer, so maybe it'll help clarify things.
– Sergiy Kolodyazhnyy
Sep 12 '18 at 9:56
|
show 1 more comment
I've hit a problem with this in C today. Essentially Pipe's have different semantics to redirects as well, even when sent to stdin
. Really I think given the differences, pipes should go somewhere other than stdin
, so that stdin
and lets call it stdpipe
(to make an arbitrary differential) can be handled in different ways.
Consider this. When piping one program output to another fstat
seems to return zero as the st_size
despite ls -lha /proc/{PID}/fd
showing that there is a file. When redirecting a file this is not the case (at least on debian wheezy
, stretch
and jessie
vanilla and ubuntu 14.04
, 16.04
vanilla.
If you cat /proc/{PID}/fd/0
with a redirection you'll be able to repeat to read as many times as you like. If you do this with a pipe you'll notice that the second time you run the task consecutively, you don't get the same output.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f172982%2fwhat-is-the-difference-between-redirection-and-pipe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
8 Answers
8
active
oldest
votes
8 Answers
8
active
oldest
votes
active
oldest
votes
active
oldest
votes
Pipe is used to pass output to another program or utility.
Redirect is used to pass output to either a file or stream.
Example: thing1 > thing2
vs thing1 | thing2
thing1 > thing2
- Your shell will run the program named
thing1
- Everything that
thing1
outputs will be placed in a file calledthing2
. (Note - ifthing2
exists, it will be overwritten)
If you want to pass the output from program thing1
to a program called thing2
, you could do the following:
thing1 > temp_file && thing2 < temp_file
which would
- run program named
thing1
- save the output into a file named
temp_file
- run program named
thing2
, pretending that the person at the keyboard typed the contents oftemp_file
as the input.
However, that's clunky, so they made pipes as a simpler way to do that. thing1 | thing2
does the same thing as thing1 > temp_file && thing2 < temp_file
EDIT to provide more details to question in comment:
If >
tried to be both "pass to program" and "write to file", it could cause problems in both directions.
First example: You are trying to write to a file. There already exists a file with that name that you wish to overwrite. However, the file is executable. Presumably, it would try to execute this file, passing the input. You'd have to do something like write the output to a new filename, then rename the file.
Second example: As Florian Diesch pointed out, what if there's another command elsewhere in the system with the same name (that is in the execute path). If you intended to make a file with that name in your current folder, you'd be stuck.
Thirdly: if you mis-type a command, it wouldn't warn you that the command doesn't exist. Right now, if you type ls | gerp log.txt
it will tell you bash: gerp: command not found
. If >
meant both, it would simply create a new file for you (then warn it doesn't know what to do with log.txt
).
Thank you. You mentionedthing1 > temp_file && thing2 < temp_file
to do more easier with pipes. But why not re-use the>
operator to do this, e.g.thing1 > thing2
for commandsthing1
andthing2
? Why an extra operator|
?
– John Threepwood
Aug 7 '12 at 13:57
1
"Take the output and write it to a file" is a different action than "Take the output and pass it to a different program". I'll edit more thoughts into my answer...
– David Oneill
Aug 7 '12 at 14:01
1
@JohnThreepwood They have different meanings. What if I wanted to redirect something to a file namedless
, for example?thing | less
andthing > less
are perfectly different, as they do different things. What you propose would create an ambiguity.
– Darkhogg
May 25 '14 at 9:55
Is it accurate to say that "thing1 > temp_file" is merely syntactic sugar for "thing1 | tee temp_file" ? Since finding out about tee I almost never use redirects.
– Sridhar-Sarnobat
Jun 5 '14 at 5:09
2
@Sridhar-Sarnobat no, thetee
command does something different.tee
writes output to both the screen (stdout
) and the file. Redirect does only the file.
– David Oneill
Jun 5 '14 at 9:16
|
show 6 more comments
Pipe is used to pass output to another program or utility.
Redirect is used to pass output to either a file or stream.
Example: thing1 > thing2
vs thing1 | thing2
thing1 > thing2
- Your shell will run the program named
thing1
- Everything that
thing1
outputs will be placed in a file calledthing2
. (Note - ifthing2
exists, it will be overwritten)
If you want to pass the output from program thing1
to a program called thing2
, you could do the following:
thing1 > temp_file && thing2 < temp_file
which would
- run program named
thing1
- save the output into a file named
temp_file
- run program named
thing2
, pretending that the person at the keyboard typed the contents oftemp_file
as the input.
However, that's clunky, so they made pipes as a simpler way to do that. thing1 | thing2
does the same thing as thing1 > temp_file && thing2 < temp_file
EDIT to provide more details to question in comment:
If >
tried to be both "pass to program" and "write to file", it could cause problems in both directions.
First example: You are trying to write to a file. There already exists a file with that name that you wish to overwrite. However, the file is executable. Presumably, it would try to execute this file, passing the input. You'd have to do something like write the output to a new filename, then rename the file.
Second example: As Florian Diesch pointed out, what if there's another command elsewhere in the system with the same name (that is in the execute path). If you intended to make a file with that name in your current folder, you'd be stuck.
Thirdly: if you mis-type a command, it wouldn't warn you that the command doesn't exist. Right now, if you type ls | gerp log.txt
it will tell you bash: gerp: command not found
. If >
meant both, it would simply create a new file for you (then warn it doesn't know what to do with log.txt
).
Thank you. You mentionedthing1 > temp_file && thing2 < temp_file
to do more easier with pipes. But why not re-use the>
operator to do this, e.g.thing1 > thing2
for commandsthing1
andthing2
? Why an extra operator|
?
– John Threepwood
Aug 7 '12 at 13:57
1
"Take the output and write it to a file" is a different action than "Take the output and pass it to a different program". I'll edit more thoughts into my answer...
– David Oneill
Aug 7 '12 at 14:01
1
@JohnThreepwood They have different meanings. What if I wanted to redirect something to a file namedless
, for example?thing | less
andthing > less
are perfectly different, as they do different things. What you propose would create an ambiguity.
– Darkhogg
May 25 '14 at 9:55
Is it accurate to say that "thing1 > temp_file" is merely syntactic sugar for "thing1 | tee temp_file" ? Since finding out about tee I almost never use redirects.
– Sridhar-Sarnobat
Jun 5 '14 at 5:09
2
@Sridhar-Sarnobat no, thetee
command does something different.tee
writes output to both the screen (stdout
) and the file. Redirect does only the file.
– David Oneill
Jun 5 '14 at 9:16
|
show 6 more comments
Pipe is used to pass output to another program or utility.
Redirect is used to pass output to either a file or stream.
Example: thing1 > thing2
vs thing1 | thing2
thing1 > thing2
- Your shell will run the program named
thing1
- Everything that
thing1
outputs will be placed in a file calledthing2
. (Note - ifthing2
exists, it will be overwritten)
If you want to pass the output from program thing1
to a program called thing2
, you could do the following:
thing1 > temp_file && thing2 < temp_file
which would
- run program named
thing1
- save the output into a file named
temp_file
- run program named
thing2
, pretending that the person at the keyboard typed the contents oftemp_file
as the input.
However, that's clunky, so they made pipes as a simpler way to do that. thing1 | thing2
does the same thing as thing1 > temp_file && thing2 < temp_file
EDIT to provide more details to question in comment:
If >
tried to be both "pass to program" and "write to file", it could cause problems in both directions.
First example: You are trying to write to a file. There already exists a file with that name that you wish to overwrite. However, the file is executable. Presumably, it would try to execute this file, passing the input. You'd have to do something like write the output to a new filename, then rename the file.
Second example: As Florian Diesch pointed out, what if there's another command elsewhere in the system with the same name (that is in the execute path). If you intended to make a file with that name in your current folder, you'd be stuck.
Thirdly: if you mis-type a command, it wouldn't warn you that the command doesn't exist. Right now, if you type ls | gerp log.txt
it will tell you bash: gerp: command not found
. If >
meant both, it would simply create a new file for you (then warn it doesn't know what to do with log.txt
).
Pipe is used to pass output to another program or utility.
Redirect is used to pass output to either a file or stream.
Example: thing1 > thing2
vs thing1 | thing2
thing1 > thing2
- Your shell will run the program named
thing1
- Everything that
thing1
outputs will be placed in a file calledthing2
. (Note - ifthing2
exists, it will be overwritten)
If you want to pass the output from program thing1
to a program called thing2
, you could do the following:
thing1 > temp_file && thing2 < temp_file
which would
- run program named
thing1
- save the output into a file named
temp_file
- run program named
thing2
, pretending that the person at the keyboard typed the contents oftemp_file
as the input.
However, that's clunky, so they made pipes as a simpler way to do that. thing1 | thing2
does the same thing as thing1 > temp_file && thing2 < temp_file
EDIT to provide more details to question in comment:
If >
tried to be both "pass to program" and "write to file", it could cause problems in both directions.
First example: You are trying to write to a file. There already exists a file with that name that you wish to overwrite. However, the file is executable. Presumably, it would try to execute this file, passing the input. You'd have to do something like write the output to a new filename, then rename the file.
Second example: As Florian Diesch pointed out, what if there's another command elsewhere in the system with the same name (that is in the execute path). If you intended to make a file with that name in your current folder, you'd be stuck.
Thirdly: if you mis-type a command, it wouldn't warn you that the command doesn't exist. Right now, if you type ls | gerp log.txt
it will tell you bash: gerp: command not found
. If >
meant both, it would simply create a new file for you (then warn it doesn't know what to do with log.txt
).
edited Sep 29 '14 at 4:56
Ofer Zelig
1095
1095
answered Aug 7 '12 at 13:30
David OneillDavid Oneill
5,804114366
5,804114366
Thank you. You mentionedthing1 > temp_file && thing2 < temp_file
to do more easier with pipes. But why not re-use the>
operator to do this, e.g.thing1 > thing2
for commandsthing1
andthing2
? Why an extra operator|
?
– John Threepwood
Aug 7 '12 at 13:57
1
"Take the output and write it to a file" is a different action than "Take the output and pass it to a different program". I'll edit more thoughts into my answer...
– David Oneill
Aug 7 '12 at 14:01
1
@JohnThreepwood They have different meanings. What if I wanted to redirect something to a file namedless
, for example?thing | less
andthing > less
are perfectly different, as they do different things. What you propose would create an ambiguity.
– Darkhogg
May 25 '14 at 9:55
Is it accurate to say that "thing1 > temp_file" is merely syntactic sugar for "thing1 | tee temp_file" ? Since finding out about tee I almost never use redirects.
– Sridhar-Sarnobat
Jun 5 '14 at 5:09
2
@Sridhar-Sarnobat no, thetee
command does something different.tee
writes output to both the screen (stdout
) and the file. Redirect does only the file.
– David Oneill
Jun 5 '14 at 9:16
|
show 6 more comments
Thank you. You mentionedthing1 > temp_file && thing2 < temp_file
to do more easier with pipes. But why not re-use the>
operator to do this, e.g.thing1 > thing2
for commandsthing1
andthing2
? Why an extra operator|
?
– John Threepwood
Aug 7 '12 at 13:57
1
"Take the output and write it to a file" is a different action than "Take the output and pass it to a different program". I'll edit more thoughts into my answer...
– David Oneill
Aug 7 '12 at 14:01
1
@JohnThreepwood They have different meanings. What if I wanted to redirect something to a file namedless
, for example?thing | less
andthing > less
are perfectly different, as they do different things. What you propose would create an ambiguity.
– Darkhogg
May 25 '14 at 9:55
Is it accurate to say that "thing1 > temp_file" is merely syntactic sugar for "thing1 | tee temp_file" ? Since finding out about tee I almost never use redirects.
– Sridhar-Sarnobat
Jun 5 '14 at 5:09
2
@Sridhar-Sarnobat no, thetee
command does something different.tee
writes output to both the screen (stdout
) and the file. Redirect does only the file.
– David Oneill
Jun 5 '14 at 9:16
Thank you. You mentioned
thing1 > temp_file && thing2 < temp_file
to do more easier with pipes. But why not re-use the >
operator to do this, e.g. thing1 > thing2
for commands thing1
and thing2
? Why an extra operator |
?– John Threepwood
Aug 7 '12 at 13:57
Thank you. You mentioned
thing1 > temp_file && thing2 < temp_file
to do more easier with pipes. But why not re-use the >
operator to do this, e.g. thing1 > thing2
for commands thing1
and thing2
? Why an extra operator |
?– John Threepwood
Aug 7 '12 at 13:57
1
1
"Take the output and write it to a file" is a different action than "Take the output and pass it to a different program". I'll edit more thoughts into my answer...
– David Oneill
Aug 7 '12 at 14:01
"Take the output and write it to a file" is a different action than "Take the output and pass it to a different program". I'll edit more thoughts into my answer...
– David Oneill
Aug 7 '12 at 14:01
1
1
@JohnThreepwood They have different meanings. What if I wanted to redirect something to a file named
less
, for example? thing | less
and thing > less
are perfectly different, as they do different things. What you propose would create an ambiguity.– Darkhogg
May 25 '14 at 9:55
@JohnThreepwood They have different meanings. What if I wanted to redirect something to a file named
less
, for example? thing | less
and thing > less
are perfectly different, as they do different things. What you propose would create an ambiguity.– Darkhogg
May 25 '14 at 9:55
Is it accurate to say that "thing1 > temp_file" is merely syntactic sugar for "thing1 | tee temp_file" ? Since finding out about tee I almost never use redirects.
– Sridhar-Sarnobat
Jun 5 '14 at 5:09
Is it accurate to say that "thing1 > temp_file" is merely syntactic sugar for "thing1 | tee temp_file" ? Since finding out about tee I almost never use redirects.
– Sridhar-Sarnobat
Jun 5 '14 at 5:09
2
2
@Sridhar-Sarnobat no, the
tee
command does something different. tee
writes output to both the screen (stdout
) and the file. Redirect does only the file.– David Oneill
Jun 5 '14 at 9:16
@Sridhar-Sarnobat no, the
tee
command does something different. tee
writes output to both the screen (stdout
) and the file. Redirect does only the file.– David Oneill
Jun 5 '14 at 9:16
|
show 6 more comments
If the meaning of foo > bar
would depend on whether there is a command named bar
that would make using redirection a lot harder and more error prone: Every time I want to redirect to a file I first had to check whether there's a command named like my destination file.
This would be an issue only if you're writing tobar
in a directory that's part of your$PATH
env variable. If you're in something like /bin, then ot could be a problem. But even then,bar
would have to have executable permission set, so that shell checks not just for finding an executablebar
but actually can execute it. And if the concern is with overwriting existing file,noclober
shell option should prevent overwriting existing files in redirections.
– Sergiy Kolodyazhnyy
Sep 12 '18 at 19:20
add a comment |
If the meaning of foo > bar
would depend on whether there is a command named bar
that would make using redirection a lot harder and more error prone: Every time I want to redirect to a file I first had to check whether there's a command named like my destination file.
This would be an issue only if you're writing tobar
in a directory that's part of your$PATH
env variable. If you're in something like /bin, then ot could be a problem. But even then,bar
would have to have executable permission set, so that shell checks not just for finding an executablebar
but actually can execute it. And if the concern is with overwriting existing file,noclober
shell option should prevent overwriting existing files in redirections.
– Sergiy Kolodyazhnyy
Sep 12 '18 at 19:20
add a comment |
If the meaning of foo > bar
would depend on whether there is a command named bar
that would make using redirection a lot harder and more error prone: Every time I want to redirect to a file I first had to check whether there's a command named like my destination file.
If the meaning of foo > bar
would depend on whether there is a command named bar
that would make using redirection a lot harder and more error prone: Every time I want to redirect to a file I first had to check whether there's a command named like my destination file.
answered Aug 7 '12 at 13:40
Florian DieschFlorian Diesch
65.3k16162180
65.3k16162180
This would be an issue only if you're writing tobar
in a directory that's part of your$PATH
env variable. If you're in something like /bin, then ot could be a problem. But even then,bar
would have to have executable permission set, so that shell checks not just for finding an executablebar
but actually can execute it. And if the concern is with overwriting existing file,noclober
shell option should prevent overwriting existing files in redirections.
– Sergiy Kolodyazhnyy
Sep 12 '18 at 19:20
add a comment |
This would be an issue only if you're writing tobar
in a directory that's part of your$PATH
env variable. If you're in something like /bin, then ot could be a problem. But even then,bar
would have to have executable permission set, so that shell checks not just for finding an executablebar
but actually can execute it. And if the concern is with overwriting existing file,noclober
shell option should prevent overwriting existing files in redirections.
– Sergiy Kolodyazhnyy
Sep 12 '18 at 19:20
This would be an issue only if you're writing to
bar
in a directory that's part of your $PATH
env variable. If you're in something like /bin, then ot could be a problem. But even then, bar
would have to have executable permission set, so that shell checks not just for finding an executable bar
but actually can execute it. And if the concern is with overwriting existing file, noclober
shell option should prevent overwriting existing files in redirections.– Sergiy Kolodyazhnyy
Sep 12 '18 at 19:20
This would be an issue only if you're writing to
bar
in a directory that's part of your $PATH
env variable. If you're in something like /bin, then ot could be a problem. But even then, bar
would have to have executable permission set, so that shell checks not just for finding an executable bar
but actually can execute it. And if the concern is with overwriting existing file, noclober
shell option should prevent overwriting existing files in redirections.– Sergiy Kolodyazhnyy
Sep 12 '18 at 19:20
add a comment |
There's a vital difference between the two operators:
ls > log.txt
--> This command sends the output to the log.txt file.ls | grep file.txt
--> This command sends the output of the ls to grep command through the use of pipe (|
), and the grep command searches for file.txt in the in the input provided to it by the previous command.
If you had to perform the same task using the first scenario, then it would be:
ls > log.txt; grep 'file.txt' log.txt
So a pipe (with |
) is used to send the output to other command, whereas redirection (with >
) is used to redirect the output to some file.
add a comment |
There's a vital difference between the two operators:
ls > log.txt
--> This command sends the output to the log.txt file.ls | grep file.txt
--> This command sends the output of the ls to grep command through the use of pipe (|
), and the grep command searches for file.txt in the in the input provided to it by the previous command.
If you had to perform the same task using the first scenario, then it would be:
ls > log.txt; grep 'file.txt' log.txt
So a pipe (with |
) is used to send the output to other command, whereas redirection (with >
) is used to redirect the output to some file.
add a comment |
There's a vital difference between the two operators:
ls > log.txt
--> This command sends the output to the log.txt file.ls | grep file.txt
--> This command sends the output of the ls to grep command through the use of pipe (|
), and the grep command searches for file.txt in the in the input provided to it by the previous command.
If you had to perform the same task using the first scenario, then it would be:
ls > log.txt; grep 'file.txt' log.txt
So a pipe (with |
) is used to send the output to other command, whereas redirection (with >
) is used to redirect the output to some file.
There's a vital difference between the two operators:
ls > log.txt
--> This command sends the output to the log.txt file.ls | grep file.txt
--> This command sends the output of the ls to grep command through the use of pipe (|
), and the grep command searches for file.txt in the in the input provided to it by the previous command.
If you had to perform the same task using the first scenario, then it would be:
ls > log.txt; grep 'file.txt' log.txt
So a pipe (with |
) is used to send the output to other command, whereas redirection (with >
) is used to redirect the output to some file.
edited Jan 10 '17 at 19:23
Eliah Kagan
82.3k22227367
82.3k22227367
answered Aug 7 '12 at 13:32
AnkitAnkit
2,385134374
2,385134374
add a comment |
add a comment |
From the Unix and Linux System Administration Handbook:
Redirection
The shell interprets the symbols <,>, and >> as instructions to reroute a command's input or output to or from a file.
Pipes
To connect the STDOUT of one command to the STDIN of another use the | symbol, commonly known as a pipe.
So my interpretation is: If it's command to command, use a pipe. If you are outputting to or from a file use the redirect.
add a comment |
From the Unix and Linux System Administration Handbook:
Redirection
The shell interprets the symbols <,>, and >> as instructions to reroute a command's input or output to or from a file.
Pipes
To connect the STDOUT of one command to the STDIN of another use the | symbol, commonly known as a pipe.
So my interpretation is: If it's command to command, use a pipe. If you are outputting to or from a file use the redirect.
add a comment |
From the Unix and Linux System Administration Handbook:
Redirection
The shell interprets the symbols <,>, and >> as instructions to reroute a command's input or output to or from a file.
Pipes
To connect the STDOUT of one command to the STDIN of another use the | symbol, commonly known as a pipe.
So my interpretation is: If it's command to command, use a pipe. If you are outputting to or from a file use the redirect.
From the Unix and Linux System Administration Handbook:
Redirection
The shell interprets the symbols <,>, and >> as instructions to reroute a command's input or output to or from a file.
Pipes
To connect the STDOUT of one command to the STDIN of another use the | symbol, commonly known as a pipe.
So my interpretation is: If it's command to command, use a pipe. If you are outputting to or from a file use the redirect.
edited Feb 16 '16 at 2:32
karel
59.6k13129151
59.6k13129151
answered Feb 16 '16 at 0:40
Mr WhateverMr Whatever
11112
11112
add a comment |
add a comment |
There's a big syntactic difference between the two:
- A redirect is an argument to a program
- A pipe separates two commands
You can think of redirects like this: cat [<infile] [>outfile]
. This implies order doesn't matter: cat <infile >outfile
is the same as cat >outfile <infile
. You can even mix redirects up with other arguments: cat >outfile <infile -b
and cat <infile -b >outfile
are both perfectly fine. Also you can string together more than one input or output (inputs will be read sequentially and all output will be written to each output file): cat >outfile1 >outfile2 <infile1 <infile2
. The target or source of a redirect can be either a filename or the name of a stream (like &1, at least in bash).
But pipes totally separate one command from another command, you can't mix them in with arguments:
[command1] | [command2]
The pipe takes everything written to standard output from command1 and sends it to the standard input of command2.
You can also combine piping and redirection. For example:
cat <infile >outfile | cat <infile2 >outfile2
The first cat
will read lines from infile, then simultaneously write each line to outfile and send it to the second cat
.
In the second cat
, standard input first reads from the pipe (the contents of infile), then reads from infile2, writing each line to outfile2. After running this, outfile will be a copy of infile, and outfile2 will contain infile followed by infile2.
Finally, you actually do something really similar to your example using "here string" redirection (bash family only) and backticks:
grep blah <<<`ls`
will give the same result as
ls | grep blah
But I think the redirection version will first read all of the output of ls into a buffer (in memory), and then feed that buffer to grep one line at a time, whereas the piped version will take each line from ls as it emerges, and pass that line to grep.
1
Nitpick: order matters in redirection if you redirect one fd to another:echo yes 1>&2 2>/tmp/blah; wc -l /tmp/blah; echo yes 2>/tmp/blah 1>&2; wc -l /tmp/blah
Further, redirection to a file will only use the last redirection.echo yes >/tmp/blah >/tmp/blah2
will only write to/tmp/blah2
.
– muru
Aug 23 '14 at 22:49
2
Redirect is not actually argument to the program. The program will not know or care where its output goes (or input comes from). It's just way of telling bash how to arrange things before running the program.
– Alois Mahdal
Apr 23 '15 at 18:33
add a comment |
There's a big syntactic difference between the two:
- A redirect is an argument to a program
- A pipe separates two commands
You can think of redirects like this: cat [<infile] [>outfile]
. This implies order doesn't matter: cat <infile >outfile
is the same as cat >outfile <infile
. You can even mix redirects up with other arguments: cat >outfile <infile -b
and cat <infile -b >outfile
are both perfectly fine. Also you can string together more than one input or output (inputs will be read sequentially and all output will be written to each output file): cat >outfile1 >outfile2 <infile1 <infile2
. The target or source of a redirect can be either a filename or the name of a stream (like &1, at least in bash).
But pipes totally separate one command from another command, you can't mix them in with arguments:
[command1] | [command2]
The pipe takes everything written to standard output from command1 and sends it to the standard input of command2.
You can also combine piping and redirection. For example:
cat <infile >outfile | cat <infile2 >outfile2
The first cat
will read lines from infile, then simultaneously write each line to outfile and send it to the second cat
.
In the second cat
, standard input first reads from the pipe (the contents of infile), then reads from infile2, writing each line to outfile2. After running this, outfile will be a copy of infile, and outfile2 will contain infile followed by infile2.
Finally, you actually do something really similar to your example using "here string" redirection (bash family only) and backticks:
grep blah <<<`ls`
will give the same result as
ls | grep blah
But I think the redirection version will first read all of the output of ls into a buffer (in memory), and then feed that buffer to grep one line at a time, whereas the piped version will take each line from ls as it emerges, and pass that line to grep.
1
Nitpick: order matters in redirection if you redirect one fd to another:echo yes 1>&2 2>/tmp/blah; wc -l /tmp/blah; echo yes 2>/tmp/blah 1>&2; wc -l /tmp/blah
Further, redirection to a file will only use the last redirection.echo yes >/tmp/blah >/tmp/blah2
will only write to/tmp/blah2
.
– muru
Aug 23 '14 at 22:49
2
Redirect is not actually argument to the program. The program will not know or care where its output goes (or input comes from). It's just way of telling bash how to arrange things before running the program.
– Alois Mahdal
Apr 23 '15 at 18:33
add a comment |
There's a big syntactic difference between the two:
- A redirect is an argument to a program
- A pipe separates two commands
You can think of redirects like this: cat [<infile] [>outfile]
. This implies order doesn't matter: cat <infile >outfile
is the same as cat >outfile <infile
. You can even mix redirects up with other arguments: cat >outfile <infile -b
and cat <infile -b >outfile
are both perfectly fine. Also you can string together more than one input or output (inputs will be read sequentially and all output will be written to each output file): cat >outfile1 >outfile2 <infile1 <infile2
. The target or source of a redirect can be either a filename or the name of a stream (like &1, at least in bash).
But pipes totally separate one command from another command, you can't mix them in with arguments:
[command1] | [command2]
The pipe takes everything written to standard output from command1 and sends it to the standard input of command2.
You can also combine piping and redirection. For example:
cat <infile >outfile | cat <infile2 >outfile2
The first cat
will read lines from infile, then simultaneously write each line to outfile and send it to the second cat
.
In the second cat
, standard input first reads from the pipe (the contents of infile), then reads from infile2, writing each line to outfile2. After running this, outfile will be a copy of infile, and outfile2 will contain infile followed by infile2.
Finally, you actually do something really similar to your example using "here string" redirection (bash family only) and backticks:
grep blah <<<`ls`
will give the same result as
ls | grep blah
But I think the redirection version will first read all of the output of ls into a buffer (in memory), and then feed that buffer to grep one line at a time, whereas the piped version will take each line from ls as it emerges, and pass that line to grep.
There's a big syntactic difference between the two:
- A redirect is an argument to a program
- A pipe separates two commands
You can think of redirects like this: cat [<infile] [>outfile]
. This implies order doesn't matter: cat <infile >outfile
is the same as cat >outfile <infile
. You can even mix redirects up with other arguments: cat >outfile <infile -b
and cat <infile -b >outfile
are both perfectly fine. Also you can string together more than one input or output (inputs will be read sequentially and all output will be written to each output file): cat >outfile1 >outfile2 <infile1 <infile2
. The target or source of a redirect can be either a filename or the name of a stream (like &1, at least in bash).
But pipes totally separate one command from another command, you can't mix them in with arguments:
[command1] | [command2]
The pipe takes everything written to standard output from command1 and sends it to the standard input of command2.
You can also combine piping and redirection. For example:
cat <infile >outfile | cat <infile2 >outfile2
The first cat
will read lines from infile, then simultaneously write each line to outfile and send it to the second cat
.
In the second cat
, standard input first reads from the pipe (the contents of infile), then reads from infile2, writing each line to outfile2. After running this, outfile will be a copy of infile, and outfile2 will contain infile followed by infile2.
Finally, you actually do something really similar to your example using "here string" redirection (bash family only) and backticks:
grep blah <<<`ls`
will give the same result as
ls | grep blah
But I think the redirection version will first read all of the output of ls into a buffer (in memory), and then feed that buffer to grep one line at a time, whereas the piped version will take each line from ls as it emerges, and pass that line to grep.
answered Aug 23 '14 at 22:24
user319857user319857
411
411
1
Nitpick: order matters in redirection if you redirect one fd to another:echo yes 1>&2 2>/tmp/blah; wc -l /tmp/blah; echo yes 2>/tmp/blah 1>&2; wc -l /tmp/blah
Further, redirection to a file will only use the last redirection.echo yes >/tmp/blah >/tmp/blah2
will only write to/tmp/blah2
.
– muru
Aug 23 '14 at 22:49
2
Redirect is not actually argument to the program. The program will not know or care where its output goes (or input comes from). It's just way of telling bash how to arrange things before running the program.
– Alois Mahdal
Apr 23 '15 at 18:33
add a comment |
1
Nitpick: order matters in redirection if you redirect one fd to another:echo yes 1>&2 2>/tmp/blah; wc -l /tmp/blah; echo yes 2>/tmp/blah 1>&2; wc -l /tmp/blah
Further, redirection to a file will only use the last redirection.echo yes >/tmp/blah >/tmp/blah2
will only write to/tmp/blah2
.
– muru
Aug 23 '14 at 22:49
2
Redirect is not actually argument to the program. The program will not know or care where its output goes (or input comes from). It's just way of telling bash how to arrange things before running the program.
– Alois Mahdal
Apr 23 '15 at 18:33
1
1
Nitpick: order matters in redirection if you redirect one fd to another:
echo yes 1>&2 2>/tmp/blah; wc -l /tmp/blah; echo yes 2>/tmp/blah 1>&2; wc -l /tmp/blah
Further, redirection to a file will only use the last redirection. echo yes >/tmp/blah >/tmp/blah2
will only write to /tmp/blah2
.– muru
Aug 23 '14 at 22:49
Nitpick: order matters in redirection if you redirect one fd to another:
echo yes 1>&2 2>/tmp/blah; wc -l /tmp/blah; echo yes 2>/tmp/blah 1>&2; wc -l /tmp/blah
Further, redirection to a file will only use the last redirection. echo yes >/tmp/blah >/tmp/blah2
will only write to /tmp/blah2
.– muru
Aug 23 '14 at 22:49
2
2
Redirect is not actually argument to the program. The program will not know or care where its output goes (or input comes from). It's just way of telling bash how to arrange things before running the program.
– Alois Mahdal
Apr 23 '15 at 18:33
Redirect is not actually argument to the program. The program will not know or care where its output goes (or input comes from). It's just way of telling bash how to arrange things before running the program.
– Alois Mahdal
Apr 23 '15 at 18:33
add a comment |
Note:The answer reflects my own understanding of these mechanisms up to date, accumulated over research and reading of the answers by the peers on this site and unix.stackexchange.com, and will be updated as time goes on. Don't hesitate to ask questions or suggest improvements in the comments. I also suggest you try to see how syscalls work in shell with strace
command. Also please don't be intimidated by the notion of internals or syscalls - you don't have to know or be able to use them in order to understand how shell does things, but they definitely help understanding.
TL;DR
|
pipes are not associated with an entry on disk, therefore do not have an inode number of disk filesystem (but do have inode in pipefs virtual filesystem in kernel-space), but redirections often involve files, which do have disk entries and therefore have corresponding inode.- pipes are not
lseek()
'able so commands can't read some data and then rewind back, but when you redirect with>
or<
usually it's a file which islseek()
able object, so commands can navigate however they please. - redirections are manipulations on file descriptors, which can be many; pipes have only two file descriptors - one for left command and one for right command
- redirection on standard streams and pipes are both buffered.
- pipes almost always involve forking and therefore pairs of processes are involved; redirections - not always, though in both cases resulting file descriptors are inherited by sub-processes.
- pipes always connect file descriptors (a pair), redirections - either use a pathname or file descriptors.
- pipes are Inter-Process Communication method, while redirections are just manipulations on open files or file-like objects
- both employ
dup2()
syscalls underneath the hood to provide copies of file descriptors, where actual flow of data occurs. - redirections can be applied "globally" with
exec
built-in command ( see this and this ), so if you doexec > output.txt
every command will write tooutput.txt
from then on.|
pipes are applied only for current command (which means either simple command or subshell likeseq 5 | (head -n1; head -n2)
or compound commands. When redirection is done on files, things like
echo "TEST" > file
andecho "TEST" >> file
both useopen()
syscall on that file (see also) and get file descriptor from it to pass it todup2()
. Pipes|
only usepipe()
anddup2()
syscall.As far as commands being executed, pipes and redirection are no more than file descriptors - file-like objects, to which they may write blindly, or manipulate them internally (which may produce unexpected behaviors;
apt
for instance, tends to not even write to stdout if it knows there's redirection).
Introduction
In order to understand how these two mechanisms differ, it's necessary to understand their essential properties, the history behind the two, and their roots in C programming language. In fact, knowing what file descriptors are, and how dup2()
and pipe()
system calls work is essential, as well as lseek()
. Shell is meant as a way of making these mechanisms abstract to the user, but digging deeper than the abstraction helps understand the true nature of shell's behavior.
The Origins of Redirections and Pipes
According to Dennis Ritche's article Prophetic Petroglyphs, pipes originated from a 1964 internal memo by Malcolm Douglas McIlroy, at the time when they were working on Multics operating system. Quote:
To put my strongest concerns into a nutshell:
- We should have some ways of connecting programs like garden hose--screw in another segment when it becomes when it becomes necessary to massage data in another way. This is the way of IO also.
What's apparent is that at the time programs were capable of writing to disk, however that was inefficient if output was large. To quote Brian Kernighan's explanation in Unix Pipeline video :
First, you don't have to write one big massive program - you've got existing smaller programs that may already do parts of the job...Another is that it's possible that the amount of data you're procesing would not fit if you stored it in a file...because remember, we're back in the days when disks on these things had, if you were lucky, a Megabyte or two of data...So the pipeline never had to instantiate the whole output.
Thus conceptual difference is apparent: pipes are a mechanism of making programs talk to one another. Redirections - are way of writing to file at basic level. In both cases, shell makes these two things easy, but underneath the hood, there's whole lot of going on.
Going deeper: syscalls and internal workings of the shell
We start with the notion of file descriptor. File descriptors describe basically an open file (whether that's a file on disk, or in memory, or anonymous file), which is represented by an integer number. The two standard data streams (stdin,stdout,stderr) are file descriptors 0,1, and 2 respectively. Where do they come from ? Well, in shell commands the file descriptors are inherited from their parent - shell. And it's true in general for all processes - child process inherits parent's file descriptors. For daemons it is common to close all inherited file descriptors and/or redirect to other places.
Back to redirection. What is it really ? It's a mechanism that tells the shell to prepare file descriptors for command (because redirections are done by shell before command runs), and point them where the user suggested. The standard definition of output redirection is
[n]>word
That [n]
there is the file descriptor number. When you do echo "Something" > /dev/null
the number 1 is implied there, and echo 2> /dev/null
.
Underneath the hood this is done by duplicating file descriptor via dup2()
system call. Let's take df > /dev/null
. The shell will create a child process where df
runs, but before that it will open /dev/null
as file descriptor #3, and dup2(3,1)
will be issued, which makes a copy of file descriptor 3 and the copy will be 1. You know how you have two files file1.txt
and file2.txt
, and when you do cp file1.txt file2.txt
you'll have two same files, but you can manipulate them independently ? That's kinda the same thing happening here. Often you can see that before running, the bash
will do dup(1,10)
to make a copy file descriptor #1 which is stdout
( and that copy will be fd #10 ) in order to restore it later. Important is to note that when you consider built-in commands (which are part of shell itself, and have no file in /bin
or elsewhere) or simple commands in non-interactive shell, the shell doesn't create a child process.
And then we have things like [n]>&[m]
and [n]&<[m]
. This is duplicating file descriptors, which the same mechanism as dup2()
only now it's in the shell syntax, conveniently available for the user.
One of the important things to note about redirection is that their order is not fixed, but is significant to how shell interprets what user wants. Compare the following:
# Make copy of where fd 2 points , then redirect fd 2
$ ls -l /proc/self/fd/ 3>&2 2> /dev/null
total 0
lrwx------ 1 user user 64 Sep 13 00:08 0 -> /dev/pts/0
lrwx------ 1 user user 64 Sep 13 00:08 1 -> /dev/pts/0
l-wx------ 1 user user 64 Sep 13 00:08 2 -> /dev/null
lrwx------ 1 runner user 64 Sep 13 00:08 3 -> /dev/pts/0
lr-x------ 1 user user 64 Sep 13 00:08 4 -> /proc/29/fd
# redirect fd #2 first, then clone it
$ ls -l /proc/self/fd/ 2> /dev/null 3>&2
total 0
lrwx------ 1 user user 64 Sep 13 00:08 0 -> /dev/pts/0
lrwx------ 1 user user 64 Sep 13 00:08 1 -> /dev/pts/0
l-wx------ 1 user user 64 Sep 13 00:08 2 -> /dev/null
l-wx------ 1 user user 64 Sep 13 00:08 3 -> /dev/null
lr-x------ 1 user user 64 Sep 13 00:08 4 -> /proc/31/fd
The practical use of these in shell scripting can be versatile:
- saving output to variable of a program that only writes to stderr
- swapping stderr and stdout
- separating even input lines from odd input lines
and many other.
Plumbing with pipe()
and dup2()
So how do pipes get created ? Via pipe()
syscall, which will take as input an array (aka list) called pipefd
of two items of type int
(integer). Those two integers are file descriptors. The pipefd[0]
will be the read end of the pipe and pipefd[1]
will be the write end. So in df | grep 'foo'
, grep
will get copy of pipefd[0]
and df
will get a copy of pipefd[1]
. But how ? Of course, with the magic of dup2()
syscall. For df
in our example, let's say pipefd[1]
has #4, so the shell will make a child, do dup2(4,1)
(remember my cp
example ?), and then do execve()
to actually run df
. Naturally, df
will inherit file descriptor #1, but will be unaware that it's no longer pointing at terminal, but actually fd #4, which is actually the write end of the pipe. Naturally, same thing will occur with grep 'foo'
except with different numbers of file descriptors.
Now, interesting question: could we make pipes that redirect fd #2 as well, not just fd #1 ? Yes, in fact that's what |&
does in bash. The POSIX standard requires shell command language to support df 2>&1 | grep 'foo'
syntax for that purpose, but bash
does |&
as well.
What's important to note is that pipes always deal with file descriptors. There exists FIFO
or named pipe, which has a filename on disk and let's you use it as a file, but behaves like a pipe. But the |
types of pipes are what's known as anonymous pipe - they have no filename, because they're really just two objects connected together. The fact that we're not dealing with files also makes an important implication: pipes aren't lseek()
'able. Files, either in memory or on disk, are static - programs can use lseek()
syscall to jump to byte 120, then back to byte 10, then forward all the way to the end. Pipes are not static - they're sequential, and therefore you cannot rewind data you get from them with lseek()
. This is what makes some programs aware if they're reading from file or from pipe, and thus they can make necessary adjustments for efficient performance; in other words, a prog
can detect if I do cat file.txt | prog
or prog < input.txt
. Real work example of that is tail.
The other two very interesting property of pipes is that they have a buffer, which on Linux is 4096 bytes, and they actually have a filesystem as defined in Linux source code ! They're not simply an object for passing data around, they are a datastructure themselves ! In fact, because there exists pipefs filesystem, which manages both pipes and FIFOs, pipes have an inode number on their respective filesystem:
# Stdout of ls is wired to pipe
$ ls -l /proc/self/fd/ | cat
lrwx------ 1 user user 64 Sep 13 00:02 0 -> /dev/pts/0
l-wx------ 1 user user 64 Sep 13 00:02 1 -> pipe:[15655630]
lrwx------ 1 user user 64 Sep 13 00:02 2 -> /dev/pts/0
lr-x------ 1 user user 64 Sep 13 00:02 3 -> /proc/22/fd
# stdin of ls is wired to pipe
$ true | ls -l /proc/self/fd/0
lr-x------ 1 user user 64 Sep 13 03:58 /proc/self/fd/0 -> 'pipe:[54741]'
On Linux pipes are uni-directional, just like redirection. On some Unix-like implementations - there are bi-directional pipes. Although with magic of shell scripting, you can make bi-directional pipes on Linux as well.
See Also:
- How to make output of any shell command unbuffered?
Wikipedia example of how pipeline is created in C usingpipe()
syscall anddup2()
.- why pipes are used instead of input redirection
- What is the differences between &> and 2>&1
- Redirections such as
<<
,<<<
are implemented as anonymous (unlinked) temp files inbash
andksh
, while< <()
uses anonymous pipes ;/bin/dash
uses pipes for<<
. See What's the difference between <<, <<< and < < in bash?
add a comment |
Note:The answer reflects my own understanding of these mechanisms up to date, accumulated over research and reading of the answers by the peers on this site and unix.stackexchange.com, and will be updated as time goes on. Don't hesitate to ask questions or suggest improvements in the comments. I also suggest you try to see how syscalls work in shell with strace
command. Also please don't be intimidated by the notion of internals or syscalls - you don't have to know or be able to use them in order to understand how shell does things, but they definitely help understanding.
TL;DR
|
pipes are not associated with an entry on disk, therefore do not have an inode number of disk filesystem (but do have inode in pipefs virtual filesystem in kernel-space), but redirections often involve files, which do have disk entries and therefore have corresponding inode.- pipes are not
lseek()
'able so commands can't read some data and then rewind back, but when you redirect with>
or<
usually it's a file which islseek()
able object, so commands can navigate however they please. - redirections are manipulations on file descriptors, which can be many; pipes have only two file descriptors - one for left command and one for right command
- redirection on standard streams and pipes are both buffered.
- pipes almost always involve forking and therefore pairs of processes are involved; redirections - not always, though in both cases resulting file descriptors are inherited by sub-processes.
- pipes always connect file descriptors (a pair), redirections - either use a pathname or file descriptors.
- pipes are Inter-Process Communication method, while redirections are just manipulations on open files or file-like objects
- both employ
dup2()
syscalls underneath the hood to provide copies of file descriptors, where actual flow of data occurs. - redirections can be applied "globally" with
exec
built-in command ( see this and this ), so if you doexec > output.txt
every command will write tooutput.txt
from then on.|
pipes are applied only for current command (which means either simple command or subshell likeseq 5 | (head -n1; head -n2)
or compound commands. When redirection is done on files, things like
echo "TEST" > file
andecho "TEST" >> file
both useopen()
syscall on that file (see also) and get file descriptor from it to pass it todup2()
. Pipes|
only usepipe()
anddup2()
syscall.As far as commands being executed, pipes and redirection are no more than file descriptors - file-like objects, to which they may write blindly, or manipulate them internally (which may produce unexpected behaviors;
apt
for instance, tends to not even write to stdout if it knows there's redirection).
Introduction
In order to understand how these two mechanisms differ, it's necessary to understand their essential properties, the history behind the two, and their roots in C programming language. In fact, knowing what file descriptors are, and how dup2()
and pipe()
system calls work is essential, as well as lseek()
. Shell is meant as a way of making these mechanisms abstract to the user, but digging deeper than the abstraction helps understand the true nature of shell's behavior.
The Origins of Redirections and Pipes
According to Dennis Ritche's article Prophetic Petroglyphs, pipes originated from a 1964 internal memo by Malcolm Douglas McIlroy, at the time when they were working on Multics operating system. Quote:
To put my strongest concerns into a nutshell:
- We should have some ways of connecting programs like garden hose--screw in another segment when it becomes when it becomes necessary to massage data in another way. This is the way of IO also.
What's apparent is that at the time programs were capable of writing to disk, however that was inefficient if output was large. To quote Brian Kernighan's explanation in Unix Pipeline video :
First, you don't have to write one big massive program - you've got existing smaller programs that may already do parts of the job...Another is that it's possible that the amount of data you're procesing would not fit if you stored it in a file...because remember, we're back in the days when disks on these things had, if you were lucky, a Megabyte or two of data...So the pipeline never had to instantiate the whole output.
Thus conceptual difference is apparent: pipes are a mechanism of making programs talk to one another. Redirections - are way of writing to file at basic level. In both cases, shell makes these two things easy, but underneath the hood, there's whole lot of going on.
Going deeper: syscalls and internal workings of the shell
We start with the notion of file descriptor. File descriptors describe basically an open file (whether that's a file on disk, or in memory, or anonymous file), which is represented by an integer number. The two standard data streams (stdin,stdout,stderr) are file descriptors 0,1, and 2 respectively. Where do they come from ? Well, in shell commands the file descriptors are inherited from their parent - shell. And it's true in general for all processes - child process inherits parent's file descriptors. For daemons it is common to close all inherited file descriptors and/or redirect to other places.
Back to redirection. What is it really ? It's a mechanism that tells the shell to prepare file descriptors for command (because redirections are done by shell before command runs), and point them where the user suggested. The standard definition of output redirection is
[n]>word
That [n]
there is the file descriptor number. When you do echo "Something" > /dev/null
the number 1 is implied there, and echo 2> /dev/null
.
Underneath the hood this is done by duplicating file descriptor via dup2()
system call. Let's take df > /dev/null
. The shell will create a child process where df
runs, but before that it will open /dev/null
as file descriptor #3, and dup2(3,1)
will be issued, which makes a copy of file descriptor 3 and the copy will be 1. You know how you have two files file1.txt
and file2.txt
, and when you do cp file1.txt file2.txt
you'll have two same files, but you can manipulate them independently ? That's kinda the same thing happening here. Often you can see that before running, the bash
will do dup(1,10)
to make a copy file descriptor #1 which is stdout
( and that copy will be fd #10 ) in order to restore it later. Important is to note that when you consider built-in commands (which are part of shell itself, and have no file in /bin
or elsewhere) or simple commands in non-interactive shell, the shell doesn't create a child process.
And then we have things like [n]>&[m]
and [n]&<[m]
. This is duplicating file descriptors, which the same mechanism as dup2()
only now it's in the shell syntax, conveniently available for the user.
One of the important things to note about redirection is that their order is not fixed, but is significant to how shell interprets what user wants. Compare the following:
# Make copy of where fd 2 points , then redirect fd 2
$ ls -l /proc/self/fd/ 3>&2 2> /dev/null
total 0
lrwx------ 1 user user 64 Sep 13 00:08 0 -> /dev/pts/0
lrwx------ 1 user user 64 Sep 13 00:08 1 -> /dev/pts/0
l-wx------ 1 user user 64 Sep 13 00:08 2 -> /dev/null
lrwx------ 1 runner user 64 Sep 13 00:08 3 -> /dev/pts/0
lr-x------ 1 user user 64 Sep 13 00:08 4 -> /proc/29/fd
# redirect fd #2 first, then clone it
$ ls -l /proc/self/fd/ 2> /dev/null 3>&2
total 0
lrwx------ 1 user user 64 Sep 13 00:08 0 -> /dev/pts/0
lrwx------ 1 user user 64 Sep 13 00:08 1 -> /dev/pts/0
l-wx------ 1 user user 64 Sep 13 00:08 2 -> /dev/null
l-wx------ 1 user user 64 Sep 13 00:08 3 -> /dev/null
lr-x------ 1 user user 64 Sep 13 00:08 4 -> /proc/31/fd
The practical use of these in shell scripting can be versatile:
- saving output to variable of a program that only writes to stderr
- swapping stderr and stdout
- separating even input lines from odd input lines
and many other.
Plumbing with pipe()
and dup2()
So how do pipes get created ? Via pipe()
syscall, which will take as input an array (aka list) called pipefd
of two items of type int
(integer). Those two integers are file descriptors. The pipefd[0]
will be the read end of the pipe and pipefd[1]
will be the write end. So in df | grep 'foo'
, grep
will get copy of pipefd[0]
and df
will get a copy of pipefd[1]
. But how ? Of course, with the magic of dup2()
syscall. For df
in our example, let's say pipefd[1]
has #4, so the shell will make a child, do dup2(4,1)
(remember my cp
example ?), and then do execve()
to actually run df
. Naturally, df
will inherit file descriptor #1, but will be unaware that it's no longer pointing at terminal, but actually fd #4, which is actually the write end of the pipe. Naturally, same thing will occur with grep 'foo'
except with different numbers of file descriptors.
Now, interesting question: could we make pipes that redirect fd #2 as well, not just fd #1 ? Yes, in fact that's what |&
does in bash. The POSIX standard requires shell command language to support df 2>&1 | grep 'foo'
syntax for that purpose, but bash
does |&
as well.
What's important to note is that pipes always deal with file descriptors. There exists FIFO
or named pipe, which has a filename on disk and let's you use it as a file, but behaves like a pipe. But the |
types of pipes are what's known as anonymous pipe - they have no filename, because they're really just two objects connected together. The fact that we're not dealing with files also makes an important implication: pipes aren't lseek()
'able. Files, either in memory or on disk, are static - programs can use lseek()
syscall to jump to byte 120, then back to byte 10, then forward all the way to the end. Pipes are not static - they're sequential, and therefore you cannot rewind data you get from them with lseek()
. This is what makes some programs aware if they're reading from file or from pipe, and thus they can make necessary adjustments for efficient performance; in other words, a prog
can detect if I do cat file.txt | prog
or prog < input.txt
. Real work example of that is tail.
The other two very interesting property of pipes is that they have a buffer, which on Linux is 4096 bytes, and they actually have a filesystem as defined in Linux source code ! They're not simply an object for passing data around, they are a datastructure themselves ! In fact, because there exists pipefs filesystem, which manages both pipes and FIFOs, pipes have an inode number on their respective filesystem:
# Stdout of ls is wired to pipe
$ ls -l /proc/self/fd/ | cat
lrwx------ 1 user user 64 Sep 13 00:02 0 -> /dev/pts/0
l-wx------ 1 user user 64 Sep 13 00:02 1 -> pipe:[15655630]
lrwx------ 1 user user 64 Sep 13 00:02 2 -> /dev/pts/0
lr-x------ 1 user user 64 Sep 13 00:02 3 -> /proc/22/fd
# stdin of ls is wired to pipe
$ true | ls -l /proc/self/fd/0
lr-x------ 1 user user 64 Sep 13 03:58 /proc/self/fd/0 -> 'pipe:[54741]'
On Linux pipes are uni-directional, just like redirection. On some Unix-like implementations - there are bi-directional pipes. Although with magic of shell scripting, you can make bi-directional pipes on Linux as well.
See Also:
- How to make output of any shell command unbuffered?
Wikipedia example of how pipeline is created in C usingpipe()
syscall anddup2()
.- why pipes are used instead of input redirection
- What is the differences between &> and 2>&1
- Redirections such as
<<
,<<<
are implemented as anonymous (unlinked) temp files inbash
andksh
, while< <()
uses anonymous pipes ;/bin/dash
uses pipes for<<
. See What's the difference between <<, <<< and < < in bash?
add a comment |
Note:The answer reflects my own understanding of these mechanisms up to date, accumulated over research and reading of the answers by the peers on this site and unix.stackexchange.com, and will be updated as time goes on. Don't hesitate to ask questions or suggest improvements in the comments. I also suggest you try to see how syscalls work in shell with strace
command. Also please don't be intimidated by the notion of internals or syscalls - you don't have to know or be able to use them in order to understand how shell does things, but they definitely help understanding.
TL;DR
|
pipes are not associated with an entry on disk, therefore do not have an inode number of disk filesystem (but do have inode in pipefs virtual filesystem in kernel-space), but redirections often involve files, which do have disk entries and therefore have corresponding inode.- pipes are not
lseek()
'able so commands can't read some data and then rewind back, but when you redirect with>
or<
usually it's a file which islseek()
able object, so commands can navigate however they please. - redirections are manipulations on file descriptors, which can be many; pipes have only two file descriptors - one for left command and one for right command
- redirection on standard streams and pipes are both buffered.
- pipes almost always involve forking and therefore pairs of processes are involved; redirections - not always, though in both cases resulting file descriptors are inherited by sub-processes.
- pipes always connect file descriptors (a pair), redirections - either use a pathname or file descriptors.
- pipes are Inter-Process Communication method, while redirections are just manipulations on open files or file-like objects
- both employ
dup2()
syscalls underneath the hood to provide copies of file descriptors, where actual flow of data occurs. - redirections can be applied "globally" with
exec
built-in command ( see this and this ), so if you doexec > output.txt
every command will write tooutput.txt
from then on.|
pipes are applied only for current command (which means either simple command or subshell likeseq 5 | (head -n1; head -n2)
or compound commands. When redirection is done on files, things like
echo "TEST" > file
andecho "TEST" >> file
both useopen()
syscall on that file (see also) and get file descriptor from it to pass it todup2()
. Pipes|
only usepipe()
anddup2()
syscall.As far as commands being executed, pipes and redirection are no more than file descriptors - file-like objects, to which they may write blindly, or manipulate them internally (which may produce unexpected behaviors;
apt
for instance, tends to not even write to stdout if it knows there's redirection).
Introduction
In order to understand how these two mechanisms differ, it's necessary to understand their essential properties, the history behind the two, and their roots in C programming language. In fact, knowing what file descriptors are, and how dup2()
and pipe()
system calls work is essential, as well as lseek()
. Shell is meant as a way of making these mechanisms abstract to the user, but digging deeper than the abstraction helps understand the true nature of shell's behavior.
The Origins of Redirections and Pipes
According to Dennis Ritche's article Prophetic Petroglyphs, pipes originated from a 1964 internal memo by Malcolm Douglas McIlroy, at the time when they were working on Multics operating system. Quote:
To put my strongest concerns into a nutshell:
- We should have some ways of connecting programs like garden hose--screw in another segment when it becomes when it becomes necessary to massage data in another way. This is the way of IO also.
What's apparent is that at the time programs were capable of writing to disk, however that was inefficient if output was large. To quote Brian Kernighan's explanation in Unix Pipeline video :
First, you don't have to write one big massive program - you've got existing smaller programs that may already do parts of the job...Another is that it's possible that the amount of data you're procesing would not fit if you stored it in a file...because remember, we're back in the days when disks on these things had, if you were lucky, a Megabyte or two of data...So the pipeline never had to instantiate the whole output.
Thus conceptual difference is apparent: pipes are a mechanism of making programs talk to one another. Redirections - are way of writing to file at basic level. In both cases, shell makes these two things easy, but underneath the hood, there's whole lot of going on.
Going deeper: syscalls and internal workings of the shell
We start with the notion of file descriptor. File descriptors describe basically an open file (whether that's a file on disk, or in memory, or anonymous file), which is represented by an integer number. The two standard data streams (stdin,stdout,stderr) are file descriptors 0,1, and 2 respectively. Where do they come from ? Well, in shell commands the file descriptors are inherited from their parent - shell. And it's true in general for all processes - child process inherits parent's file descriptors. For daemons it is common to close all inherited file descriptors and/or redirect to other places.
Back to redirection. What is it really ? It's a mechanism that tells the shell to prepare file descriptors for command (because redirections are done by shell before command runs), and point them where the user suggested. The standard definition of output redirection is
[n]>word
That [n]
there is the file descriptor number. When you do echo "Something" > /dev/null
the number 1 is implied there, and echo 2> /dev/null
.
Underneath the hood this is done by duplicating file descriptor via dup2()
system call. Let's take df > /dev/null
. The shell will create a child process where df
runs, but before that it will open /dev/null
as file descriptor #3, and dup2(3,1)
will be issued, which makes a copy of file descriptor 3 and the copy will be 1. You know how you have two files file1.txt
and file2.txt
, and when you do cp file1.txt file2.txt
you'll have two same files, but you can manipulate them independently ? That's kinda the same thing happening here. Often you can see that before running, the bash
will do dup(1,10)
to make a copy file descriptor #1 which is stdout
( and that copy will be fd #10 ) in order to restore it later. Important is to note that when you consider built-in commands (which are part of shell itself, and have no file in /bin
or elsewhere) or simple commands in non-interactive shell, the shell doesn't create a child process.
And then we have things like [n]>&[m]
and [n]&<[m]
. This is duplicating file descriptors, which the same mechanism as dup2()
only now it's in the shell syntax, conveniently available for the user.
One of the important things to note about redirection is that their order is not fixed, but is significant to how shell interprets what user wants. Compare the following:
# Make copy of where fd 2 points , then redirect fd 2
$ ls -l /proc/self/fd/ 3>&2 2> /dev/null
total 0
lrwx------ 1 user user 64 Sep 13 00:08 0 -> /dev/pts/0
lrwx------ 1 user user 64 Sep 13 00:08 1 -> /dev/pts/0
l-wx------ 1 user user 64 Sep 13 00:08 2 -> /dev/null
lrwx------ 1 runner user 64 Sep 13 00:08 3 -> /dev/pts/0
lr-x------ 1 user user 64 Sep 13 00:08 4 -> /proc/29/fd
# redirect fd #2 first, then clone it
$ ls -l /proc/self/fd/ 2> /dev/null 3>&2
total 0
lrwx------ 1 user user 64 Sep 13 00:08 0 -> /dev/pts/0
lrwx------ 1 user user 64 Sep 13 00:08 1 -> /dev/pts/0
l-wx------ 1 user user 64 Sep 13 00:08 2 -> /dev/null
l-wx------ 1 user user 64 Sep 13 00:08 3 -> /dev/null
lr-x------ 1 user user 64 Sep 13 00:08 4 -> /proc/31/fd
The practical use of these in shell scripting can be versatile:
- saving output to variable of a program that only writes to stderr
- swapping stderr and stdout
- separating even input lines from odd input lines
and many other.
Plumbing with pipe()
and dup2()
So how do pipes get created ? Via pipe()
syscall, which will take as input an array (aka list) called pipefd
of two items of type int
(integer). Those two integers are file descriptors. The pipefd[0]
will be the read end of the pipe and pipefd[1]
will be the write end. So in df | grep 'foo'
, grep
will get copy of pipefd[0]
and df
will get a copy of pipefd[1]
. But how ? Of course, with the magic of dup2()
syscall. For df
in our example, let's say pipefd[1]
has #4, so the shell will make a child, do dup2(4,1)
(remember my cp
example ?), and then do execve()
to actually run df
. Naturally, df
will inherit file descriptor #1, but will be unaware that it's no longer pointing at terminal, but actually fd #4, which is actually the write end of the pipe. Naturally, same thing will occur with grep 'foo'
except with different numbers of file descriptors.
Now, interesting question: could we make pipes that redirect fd #2 as well, not just fd #1 ? Yes, in fact that's what |&
does in bash. The POSIX standard requires shell command language to support df 2>&1 | grep 'foo'
syntax for that purpose, but bash
does |&
as well.
What's important to note is that pipes always deal with file descriptors. There exists FIFO
or named pipe, which has a filename on disk and let's you use it as a file, but behaves like a pipe. But the |
types of pipes are what's known as anonymous pipe - they have no filename, because they're really just two objects connected together. The fact that we're not dealing with files also makes an important implication: pipes aren't lseek()
'able. Files, either in memory or on disk, are static - programs can use lseek()
syscall to jump to byte 120, then back to byte 10, then forward all the way to the end. Pipes are not static - they're sequential, and therefore you cannot rewind data you get from them with lseek()
. This is what makes some programs aware if they're reading from file or from pipe, and thus they can make necessary adjustments for efficient performance; in other words, a prog
can detect if I do cat file.txt | prog
or prog < input.txt
. Real work example of that is tail.
The other two very interesting property of pipes is that they have a buffer, which on Linux is 4096 bytes, and they actually have a filesystem as defined in Linux source code ! They're not simply an object for passing data around, they are a datastructure themselves ! In fact, because there exists pipefs filesystem, which manages both pipes and FIFOs, pipes have an inode number on their respective filesystem:
# Stdout of ls is wired to pipe
$ ls -l /proc/self/fd/ | cat
lrwx------ 1 user user 64 Sep 13 00:02 0 -> /dev/pts/0
l-wx------ 1 user user 64 Sep 13 00:02 1 -> pipe:[15655630]
lrwx------ 1 user user 64 Sep 13 00:02 2 -> /dev/pts/0
lr-x------ 1 user user 64 Sep 13 00:02 3 -> /proc/22/fd
# stdin of ls is wired to pipe
$ true | ls -l /proc/self/fd/0
lr-x------ 1 user user 64 Sep 13 03:58 /proc/self/fd/0 -> 'pipe:[54741]'
On Linux pipes are uni-directional, just like redirection. On some Unix-like implementations - there are bi-directional pipes. Although with magic of shell scripting, you can make bi-directional pipes on Linux as well.
See Also:
- How to make output of any shell command unbuffered?
Wikipedia example of how pipeline is created in C usingpipe()
syscall anddup2()
.- why pipes are used instead of input redirection
- What is the differences between &> and 2>&1
- Redirections such as
<<
,<<<
are implemented as anonymous (unlinked) temp files inbash
andksh
, while< <()
uses anonymous pipes ;/bin/dash
uses pipes for<<
. See What's the difference between <<, <<< and < < in bash?
Note:The answer reflects my own understanding of these mechanisms up to date, accumulated over research and reading of the answers by the peers on this site and unix.stackexchange.com, and will be updated as time goes on. Don't hesitate to ask questions or suggest improvements in the comments. I also suggest you try to see how syscalls work in shell with strace
command. Also please don't be intimidated by the notion of internals or syscalls - you don't have to know or be able to use them in order to understand how shell does things, but they definitely help understanding.
TL;DR
|
pipes are not associated with an entry on disk, therefore do not have an inode number of disk filesystem (but do have inode in pipefs virtual filesystem in kernel-space), but redirections often involve files, which do have disk entries and therefore have corresponding inode.- pipes are not
lseek()
'able so commands can't read some data and then rewind back, but when you redirect with>
or<
usually it's a file which islseek()
able object, so commands can navigate however they please. - redirections are manipulations on file descriptors, which can be many; pipes have only two file descriptors - one for left command and one for right command
- redirection on standard streams and pipes are both buffered.
- pipes almost always involve forking and therefore pairs of processes are involved; redirections - not always, though in both cases resulting file descriptors are inherited by sub-processes.
- pipes always connect file descriptors (a pair), redirections - either use a pathname or file descriptors.
- pipes are Inter-Process Communication method, while redirections are just manipulations on open files or file-like objects
- both employ
dup2()
syscalls underneath the hood to provide copies of file descriptors, where actual flow of data occurs. - redirections can be applied "globally" with
exec
built-in command ( see this and this ), so if you doexec > output.txt
every command will write tooutput.txt
from then on.|
pipes are applied only for current command (which means either simple command or subshell likeseq 5 | (head -n1; head -n2)
or compound commands. When redirection is done on files, things like
echo "TEST" > file
andecho "TEST" >> file
both useopen()
syscall on that file (see also) and get file descriptor from it to pass it todup2()
. Pipes|
only usepipe()
anddup2()
syscall.As far as commands being executed, pipes and redirection are no more than file descriptors - file-like objects, to which they may write blindly, or manipulate them internally (which may produce unexpected behaviors;
apt
for instance, tends to not even write to stdout if it knows there's redirection).
Introduction
In order to understand how these two mechanisms differ, it's necessary to understand their essential properties, the history behind the two, and their roots in C programming language. In fact, knowing what file descriptors are, and how dup2()
and pipe()
system calls work is essential, as well as lseek()
. Shell is meant as a way of making these mechanisms abstract to the user, but digging deeper than the abstraction helps understand the true nature of shell's behavior.
The Origins of Redirections and Pipes
According to Dennis Ritche's article Prophetic Petroglyphs, pipes originated from a 1964 internal memo by Malcolm Douglas McIlroy, at the time when they were working on Multics operating system. Quote:
To put my strongest concerns into a nutshell:
- We should have some ways of connecting programs like garden hose--screw in another segment when it becomes when it becomes necessary to massage data in another way. This is the way of IO also.
What's apparent is that at the time programs were capable of writing to disk, however that was inefficient if output was large. To quote Brian Kernighan's explanation in Unix Pipeline video :
First, you don't have to write one big massive program - you've got existing smaller programs that may already do parts of the job...Another is that it's possible that the amount of data you're procesing would not fit if you stored it in a file...because remember, we're back in the days when disks on these things had, if you were lucky, a Megabyte or two of data...So the pipeline never had to instantiate the whole output.
Thus conceptual difference is apparent: pipes are a mechanism of making programs talk to one another. Redirections - are way of writing to file at basic level. In both cases, shell makes these two things easy, but underneath the hood, there's whole lot of going on.
Going deeper: syscalls and internal workings of the shell
We start with the notion of file descriptor. File descriptors describe basically an open file (whether that's a file on disk, or in memory, or anonymous file), which is represented by an integer number. The two standard data streams (stdin,stdout,stderr) are file descriptors 0,1, and 2 respectively. Where do they come from ? Well, in shell commands the file descriptors are inherited from their parent - shell. And it's true in general for all processes - child process inherits parent's file descriptors. For daemons it is common to close all inherited file descriptors and/or redirect to other places.
Back to redirection. What is it really ? It's a mechanism that tells the shell to prepare file descriptors for command (because redirections are done by shell before command runs), and point them where the user suggested. The standard definition of output redirection is
[n]>word
That [n]
there is the file descriptor number. When you do echo "Something" > /dev/null
the number 1 is implied there, and echo 2> /dev/null
.
Underneath the hood this is done by duplicating file descriptor via dup2()
system call. Let's take df > /dev/null
. The shell will create a child process where df
runs, but before that it will open /dev/null
as file descriptor #3, and dup2(3,1)
will be issued, which makes a copy of file descriptor 3 and the copy will be 1. You know how you have two files file1.txt
and file2.txt
, and when you do cp file1.txt file2.txt
you'll have two same files, but you can manipulate them independently ? That's kinda the same thing happening here. Often you can see that before running, the bash
will do dup(1,10)
to make a copy file descriptor #1 which is stdout
( and that copy will be fd #10 ) in order to restore it later. Important is to note that when you consider built-in commands (which are part of shell itself, and have no file in /bin
or elsewhere) or simple commands in non-interactive shell, the shell doesn't create a child process.
And then we have things like [n]>&[m]
and [n]&<[m]
. This is duplicating file descriptors, which the same mechanism as dup2()
only now it's in the shell syntax, conveniently available for the user.
One of the important things to note about redirection is that their order is not fixed, but is significant to how shell interprets what user wants. Compare the following:
# Make copy of where fd 2 points , then redirect fd 2
$ ls -l /proc/self/fd/ 3>&2 2> /dev/null
total 0
lrwx------ 1 user user 64 Sep 13 00:08 0 -> /dev/pts/0
lrwx------ 1 user user 64 Sep 13 00:08 1 -> /dev/pts/0
l-wx------ 1 user user 64 Sep 13 00:08 2 -> /dev/null
lrwx------ 1 runner user 64 Sep 13 00:08 3 -> /dev/pts/0
lr-x------ 1 user user 64 Sep 13 00:08 4 -> /proc/29/fd
# redirect fd #2 first, then clone it
$ ls -l /proc/self/fd/ 2> /dev/null 3>&2
total 0
lrwx------ 1 user user 64 Sep 13 00:08 0 -> /dev/pts/0
lrwx------ 1 user user 64 Sep 13 00:08 1 -> /dev/pts/0
l-wx------ 1 user user 64 Sep 13 00:08 2 -> /dev/null
l-wx------ 1 user user 64 Sep 13 00:08 3 -> /dev/null
lr-x------ 1 user user 64 Sep 13 00:08 4 -> /proc/31/fd
The practical use of these in shell scripting can be versatile:
- saving output to variable of a program that only writes to stderr
- swapping stderr and stdout
- separating even input lines from odd input lines
and many other.
Plumbing with pipe()
and dup2()
So how do pipes get created ? Via pipe()
syscall, which will take as input an array (aka list) called pipefd
of two items of type int
(integer). Those two integers are file descriptors. The pipefd[0]
will be the read end of the pipe and pipefd[1]
will be the write end. So in df | grep 'foo'
, grep
will get copy of pipefd[0]
and df
will get a copy of pipefd[1]
. But how ? Of course, with the magic of dup2()
syscall. For df
in our example, let's say pipefd[1]
has #4, so the shell will make a child, do dup2(4,1)
(remember my cp
example ?), and then do execve()
to actually run df
. Naturally, df
will inherit file descriptor #1, but will be unaware that it's no longer pointing at terminal, but actually fd #4, which is actually the write end of the pipe. Naturally, same thing will occur with grep 'foo'
except with different numbers of file descriptors.
Now, interesting question: could we make pipes that redirect fd #2 as well, not just fd #1 ? Yes, in fact that's what |&
does in bash. The POSIX standard requires shell command language to support df 2>&1 | grep 'foo'
syntax for that purpose, but bash
does |&
as well.
What's important to note is that pipes always deal with file descriptors. There exists FIFO
or named pipe, which has a filename on disk and let's you use it as a file, but behaves like a pipe. But the |
types of pipes are what's known as anonymous pipe - they have no filename, because they're really just two objects connected together. The fact that we're not dealing with files also makes an important implication: pipes aren't lseek()
'able. Files, either in memory or on disk, are static - programs can use lseek()
syscall to jump to byte 120, then back to byte 10, then forward all the way to the end. Pipes are not static - they're sequential, and therefore you cannot rewind data you get from them with lseek()
. This is what makes some programs aware if they're reading from file or from pipe, and thus they can make necessary adjustments for efficient performance; in other words, a prog
can detect if I do cat file.txt | prog
or prog < input.txt
. Real work example of that is tail.
The other two very interesting property of pipes is that they have a buffer, which on Linux is 4096 bytes, and they actually have a filesystem as defined in Linux source code ! They're not simply an object for passing data around, they are a datastructure themselves ! In fact, because there exists pipefs filesystem, which manages both pipes and FIFOs, pipes have an inode number on their respective filesystem:
# Stdout of ls is wired to pipe
$ ls -l /proc/self/fd/ | cat
lrwx------ 1 user user 64 Sep 13 00:02 0 -> /dev/pts/0
l-wx------ 1 user user 64 Sep 13 00:02 1 -> pipe:[15655630]
lrwx------ 1 user user 64 Sep 13 00:02 2 -> /dev/pts/0
lr-x------ 1 user user 64 Sep 13 00:02 3 -> /proc/22/fd
# stdin of ls is wired to pipe
$ true | ls -l /proc/self/fd/0
lr-x------ 1 user user 64 Sep 13 03:58 /proc/self/fd/0 -> 'pipe:[54741]'
On Linux pipes are uni-directional, just like redirection. On some Unix-like implementations - there are bi-directional pipes. Although with magic of shell scripting, you can make bi-directional pipes on Linux as well.
See Also:
- How to make output of any shell command unbuffered?
Wikipedia example of how pipeline is created in C usingpipe()
syscall anddup2()
.- why pipes are used instead of input redirection
- What is the differences between &> and 2>&1
- Redirections such as
<<
,<<<
are implemented as anonymous (unlinked) temp files inbash
andksh
, while< <()
uses anonymous pipes ;/bin/dash
uses pipes for<<
. See What's the difference between <<, <<< and < < in bash?
edited Jan 30 at 1:02
answered Sep 12 '18 at 9:26
Sergiy KolodyazhnyySergiy Kolodyazhnyy
72.7k9152316
72.7k9152316
add a comment |
add a comment |
To add to the other answers, there are subtle semantic difference too - e.g. pipes close more readily than redirects:
seq 5 | (head -n1; head -n1) # just 1
seq 5 > tmp5; (head -n1; head -n1) < tmp5 # 1 and 2
seq 5 | (read LINE; echo $LINE; head -n1) # 1 and 2
In the first example, when the first call to head
finishes, it closes the pipe, and seq
terminates, so there's no input available for the second head
.
In the second example, head consumes the first line, but when it closes it's own stdin
pipe, the file remains open for the next call to use.
The third example shows that if we use read
to avoid closing the pipe it is still available within the subprocess.
So the "stream" is the thing that we shunt data through (stdin etc), and is the same in both cases, but the pipe connects streams from two processes, where a redirection connects a streams between a process and a file, so you can see source of both the similarities and differences.
P.S. If you're as curious about and/or surprised by those examples as I was, you can get dig in further using trap
to see how the processes resolve, E.g:
(trap 'echo seq EXITed >&2' EXIT; seq 5) | (trap 'echo all done' EXIT; (trap 'echo first head exited' EXIT; head -n1)
echo '.'
(trap 'echo second head exited' EXIT; head -n1))
Sometimes the first process closes before 1
is printed, sometimes afterwards.
I also found it interesting to use exec <&-
to close the stream from the redirection to approximate the behaviour of the pipe (albeit with an error):
seq 5 > tmp5
(trap 'echo all done' EXIT
(trap 'echo first head exited' EXIT; head -n1)
echo '.'
exec <&-
(trap 'echo second head exited' EXIT; head -n1)) < tmp5`
"when the first call to head finishes, it closes the pipe" This is actually inaccurate for two reasons. One, (head -n1; head -n1) is subshell with two commands, each of which inherits read end of pipe as descriptor 0, and thus subshell AND each command have that file descriptor open. Second reason, you can see that with strace -f bash -c 'seq 5 | (head -n1; head -n1)'. So first head closes only its copy of file descriptor
– Sergiy Kolodyazhnyy
Sep 3 '18 at 5:20
Third example is also inaccurate, becauseread
consumes only first line ( that's one byte for1
and newline ).seq
sent in total 10 bytes ( 5 numbers and 5 newlines ). So there's 8 bytes remaining in pipe buffer, and that's why secondhead
works - there's data still available in pipe buffer. Btw, head exits only if there's 0 bytes read, kinda as inhead /dev/null
– Sergiy Kolodyazhnyy
Sep 3 '18 at 5:39
Thanks for the clarification. Am I understanding correctly that inseq 5 | (head -n1; head -n1)
the first call empties the pipe, so it still exists in an open state but with no data for the second call tohead
? So the difference in behavior between the pipe and the redirect is because head pulls all the data out of the pipe, but only the 2 lines out of the file handle?
– Julian de Bhal
Sep 12 '18 at 5:10
That's correct. And it's something that can be seen withstrace
command I gave in the first comment. With redirection, tmp file is on disk which makes it seekable ( because they uselseek()
syscall - commands can jump around the file from first byte to last however they want. But pipes are sequential and not seekable. So the only way for head to do its job is to read everything first, or if file is big - map some of it to RAM viammap()
call. I once did my owntail
in Python, and ran into exactly same problem.
– Sergiy Kolodyazhnyy
Sep 12 '18 at 5:21
It's also important to remember that read end of the pipe ( file descriptor ) is given to subshell first(...)
, and the subshell will make copy of its own stdin to each command inside(...)
. So they're technically read from same object. Firsthead
thinks it's reading from it's own stdin. Secondhead
thinks it has its own stdin. But in reality their fd #1 ( stdin ) is just copy of same fd, which is read end of the pipe. Also, I've posted an answer, so maybe it'll help clarify things.
– Sergiy Kolodyazhnyy
Sep 12 '18 at 9:56
|
show 1 more comment
To add to the other answers, there are subtle semantic difference too - e.g. pipes close more readily than redirects:
seq 5 | (head -n1; head -n1) # just 1
seq 5 > tmp5; (head -n1; head -n1) < tmp5 # 1 and 2
seq 5 | (read LINE; echo $LINE; head -n1) # 1 and 2
In the first example, when the first call to head
finishes, it closes the pipe, and seq
terminates, so there's no input available for the second head
.
In the second example, head consumes the first line, but when it closes it's own stdin
pipe, the file remains open for the next call to use.
The third example shows that if we use read
to avoid closing the pipe it is still available within the subprocess.
So the "stream" is the thing that we shunt data through (stdin etc), and is the same in both cases, but the pipe connects streams from two processes, where a redirection connects a streams between a process and a file, so you can see source of both the similarities and differences.
P.S. If you're as curious about and/or surprised by those examples as I was, you can get dig in further using trap
to see how the processes resolve, E.g:
(trap 'echo seq EXITed >&2' EXIT; seq 5) | (trap 'echo all done' EXIT; (trap 'echo first head exited' EXIT; head -n1)
echo '.'
(trap 'echo second head exited' EXIT; head -n1))
Sometimes the first process closes before 1
is printed, sometimes afterwards.
I also found it interesting to use exec <&-
to close the stream from the redirection to approximate the behaviour of the pipe (albeit with an error):
seq 5 > tmp5
(trap 'echo all done' EXIT
(trap 'echo first head exited' EXIT; head -n1)
echo '.'
exec <&-
(trap 'echo second head exited' EXIT; head -n1)) < tmp5`
"when the first call to head finishes, it closes the pipe" This is actually inaccurate for two reasons. One, (head -n1; head -n1) is subshell with two commands, each of which inherits read end of pipe as descriptor 0, and thus subshell AND each command have that file descriptor open. Second reason, you can see that with strace -f bash -c 'seq 5 | (head -n1; head -n1)'. So first head closes only its copy of file descriptor
– Sergiy Kolodyazhnyy
Sep 3 '18 at 5:20
Third example is also inaccurate, becauseread
consumes only first line ( that's one byte for1
and newline ).seq
sent in total 10 bytes ( 5 numbers and 5 newlines ). So there's 8 bytes remaining in pipe buffer, and that's why secondhead
works - there's data still available in pipe buffer. Btw, head exits only if there's 0 bytes read, kinda as inhead /dev/null
– Sergiy Kolodyazhnyy
Sep 3 '18 at 5:39
Thanks for the clarification. Am I understanding correctly that inseq 5 | (head -n1; head -n1)
the first call empties the pipe, so it still exists in an open state but with no data for the second call tohead
? So the difference in behavior between the pipe and the redirect is because head pulls all the data out of the pipe, but only the 2 lines out of the file handle?
– Julian de Bhal
Sep 12 '18 at 5:10
That's correct. And it's something that can be seen withstrace
command I gave in the first comment. With redirection, tmp file is on disk which makes it seekable ( because they uselseek()
syscall - commands can jump around the file from first byte to last however they want. But pipes are sequential and not seekable. So the only way for head to do its job is to read everything first, or if file is big - map some of it to RAM viammap()
call. I once did my owntail
in Python, and ran into exactly same problem.
– Sergiy Kolodyazhnyy
Sep 12 '18 at 5:21
It's also important to remember that read end of the pipe ( file descriptor ) is given to subshell first(...)
, and the subshell will make copy of its own stdin to each command inside(...)
. So they're technically read from same object. Firsthead
thinks it's reading from it's own stdin. Secondhead
thinks it has its own stdin. But in reality their fd #1 ( stdin ) is just copy of same fd, which is read end of the pipe. Also, I've posted an answer, so maybe it'll help clarify things.
– Sergiy Kolodyazhnyy
Sep 12 '18 at 9:56
|
show 1 more comment
To add to the other answers, there are subtle semantic difference too - e.g. pipes close more readily than redirects:
seq 5 | (head -n1; head -n1) # just 1
seq 5 > tmp5; (head -n1; head -n1) < tmp5 # 1 and 2
seq 5 | (read LINE; echo $LINE; head -n1) # 1 and 2
In the first example, when the first call to head
finishes, it closes the pipe, and seq
terminates, so there's no input available for the second head
.
In the second example, head consumes the first line, but when it closes it's own stdin
pipe, the file remains open for the next call to use.
The third example shows that if we use read
to avoid closing the pipe it is still available within the subprocess.
So the "stream" is the thing that we shunt data through (stdin etc), and is the same in both cases, but the pipe connects streams from two processes, where a redirection connects a streams between a process and a file, so you can see source of both the similarities and differences.
P.S. If you're as curious about and/or surprised by those examples as I was, you can get dig in further using trap
to see how the processes resolve, E.g:
(trap 'echo seq EXITed >&2' EXIT; seq 5) | (trap 'echo all done' EXIT; (trap 'echo first head exited' EXIT; head -n1)
echo '.'
(trap 'echo second head exited' EXIT; head -n1))
Sometimes the first process closes before 1
is printed, sometimes afterwards.
I also found it interesting to use exec <&-
to close the stream from the redirection to approximate the behaviour of the pipe (albeit with an error):
seq 5 > tmp5
(trap 'echo all done' EXIT
(trap 'echo first head exited' EXIT; head -n1)
echo '.'
exec <&-
(trap 'echo second head exited' EXIT; head -n1)) < tmp5`
To add to the other answers, there are subtle semantic difference too - e.g. pipes close more readily than redirects:
seq 5 | (head -n1; head -n1) # just 1
seq 5 > tmp5; (head -n1; head -n1) < tmp5 # 1 and 2
seq 5 | (read LINE; echo $LINE; head -n1) # 1 and 2
In the first example, when the first call to head
finishes, it closes the pipe, and seq
terminates, so there's no input available for the second head
.
In the second example, head consumes the first line, but when it closes it's own stdin
pipe, the file remains open for the next call to use.
The third example shows that if we use read
to avoid closing the pipe it is still available within the subprocess.
So the "stream" is the thing that we shunt data through (stdin etc), and is the same in both cases, but the pipe connects streams from two processes, where a redirection connects a streams between a process and a file, so you can see source of both the similarities and differences.
P.S. If you're as curious about and/or surprised by those examples as I was, you can get dig in further using trap
to see how the processes resolve, E.g:
(trap 'echo seq EXITed >&2' EXIT; seq 5) | (trap 'echo all done' EXIT; (trap 'echo first head exited' EXIT; head -n1)
echo '.'
(trap 'echo second head exited' EXIT; head -n1))
Sometimes the first process closes before 1
is printed, sometimes afterwards.
I also found it interesting to use exec <&-
to close the stream from the redirection to approximate the behaviour of the pipe (albeit with an error):
seq 5 > tmp5
(trap 'echo all done' EXIT
(trap 'echo first head exited' EXIT; head -n1)
echo '.'
exec <&-
(trap 'echo second head exited' EXIT; head -n1)) < tmp5`
edited Sep 2 '18 at 22:59
answered Jun 5 '18 at 0:54
Julian de BhalJulian de Bhal
1213
1213
"when the first call to head finishes, it closes the pipe" This is actually inaccurate for two reasons. One, (head -n1; head -n1) is subshell with two commands, each of which inherits read end of pipe as descriptor 0, and thus subshell AND each command have that file descriptor open. Second reason, you can see that with strace -f bash -c 'seq 5 | (head -n1; head -n1)'. So first head closes only its copy of file descriptor
– Sergiy Kolodyazhnyy
Sep 3 '18 at 5:20
Third example is also inaccurate, becauseread
consumes only first line ( that's one byte for1
and newline ).seq
sent in total 10 bytes ( 5 numbers and 5 newlines ). So there's 8 bytes remaining in pipe buffer, and that's why secondhead
works - there's data still available in pipe buffer. Btw, head exits only if there's 0 bytes read, kinda as inhead /dev/null
– Sergiy Kolodyazhnyy
Sep 3 '18 at 5:39
Thanks for the clarification. Am I understanding correctly that inseq 5 | (head -n1; head -n1)
the first call empties the pipe, so it still exists in an open state but with no data for the second call tohead
? So the difference in behavior between the pipe and the redirect is because head pulls all the data out of the pipe, but only the 2 lines out of the file handle?
– Julian de Bhal
Sep 12 '18 at 5:10
That's correct. And it's something that can be seen withstrace
command I gave in the first comment. With redirection, tmp file is on disk which makes it seekable ( because they uselseek()
syscall - commands can jump around the file from first byte to last however they want. But pipes are sequential and not seekable. So the only way for head to do its job is to read everything first, or if file is big - map some of it to RAM viammap()
call. I once did my owntail
in Python, and ran into exactly same problem.
– Sergiy Kolodyazhnyy
Sep 12 '18 at 5:21
It's also important to remember that read end of the pipe ( file descriptor ) is given to subshell first(...)
, and the subshell will make copy of its own stdin to each command inside(...)
. So they're technically read from same object. Firsthead
thinks it's reading from it's own stdin. Secondhead
thinks it has its own stdin. But in reality their fd #1 ( stdin ) is just copy of same fd, which is read end of the pipe. Also, I've posted an answer, so maybe it'll help clarify things.
– Sergiy Kolodyazhnyy
Sep 12 '18 at 9:56
|
show 1 more comment
"when the first call to head finishes, it closes the pipe" This is actually inaccurate for two reasons. One, (head -n1; head -n1) is subshell with two commands, each of which inherits read end of pipe as descriptor 0, and thus subshell AND each command have that file descriptor open. Second reason, you can see that with strace -f bash -c 'seq 5 | (head -n1; head -n1)'. So first head closes only its copy of file descriptor
– Sergiy Kolodyazhnyy
Sep 3 '18 at 5:20
Third example is also inaccurate, becauseread
consumes only first line ( that's one byte for1
and newline ).seq
sent in total 10 bytes ( 5 numbers and 5 newlines ). So there's 8 bytes remaining in pipe buffer, and that's why secondhead
works - there's data still available in pipe buffer. Btw, head exits only if there's 0 bytes read, kinda as inhead /dev/null
– Sergiy Kolodyazhnyy
Sep 3 '18 at 5:39
Thanks for the clarification. Am I understanding correctly that inseq 5 | (head -n1; head -n1)
the first call empties the pipe, so it still exists in an open state but with no data for the second call tohead
? So the difference in behavior between the pipe and the redirect is because head pulls all the data out of the pipe, but only the 2 lines out of the file handle?
– Julian de Bhal
Sep 12 '18 at 5:10
That's correct. And it's something that can be seen withstrace
command I gave in the first comment. With redirection, tmp file is on disk which makes it seekable ( because they uselseek()
syscall - commands can jump around the file from first byte to last however they want. But pipes are sequential and not seekable. So the only way for head to do its job is to read everything first, or if file is big - map some of it to RAM viammap()
call. I once did my owntail
in Python, and ran into exactly same problem.
– Sergiy Kolodyazhnyy
Sep 12 '18 at 5:21
It's also important to remember that read end of the pipe ( file descriptor ) is given to subshell first(...)
, and the subshell will make copy of its own stdin to each command inside(...)
. So they're technically read from same object. Firsthead
thinks it's reading from it's own stdin. Secondhead
thinks it has its own stdin. But in reality their fd #1 ( stdin ) is just copy of same fd, which is read end of the pipe. Also, I've posted an answer, so maybe it'll help clarify things.
– Sergiy Kolodyazhnyy
Sep 12 '18 at 9:56
"when the first call to head finishes, it closes the pipe" This is actually inaccurate for two reasons. One, (head -n1; head -n1) is subshell with two commands, each of which inherits read end of pipe as descriptor 0, and thus subshell AND each command have that file descriptor open. Second reason, you can see that with strace -f bash -c 'seq 5 | (head -n1; head -n1)'. So first head closes only its copy of file descriptor
– Sergiy Kolodyazhnyy
Sep 3 '18 at 5:20
"when the first call to head finishes, it closes the pipe" This is actually inaccurate for two reasons. One, (head -n1; head -n1) is subshell with two commands, each of which inherits read end of pipe as descriptor 0, and thus subshell AND each command have that file descriptor open. Second reason, you can see that with strace -f bash -c 'seq 5 | (head -n1; head -n1)'. So first head closes only its copy of file descriptor
– Sergiy Kolodyazhnyy
Sep 3 '18 at 5:20
Third example is also inaccurate, because
read
consumes only first line ( that's one byte for 1
and newline ). seq
sent in total 10 bytes ( 5 numbers and 5 newlines ). So there's 8 bytes remaining in pipe buffer, and that's why second head
works - there's data still available in pipe buffer. Btw, head exits only if there's 0 bytes read, kinda as in head /dev/null
– Sergiy Kolodyazhnyy
Sep 3 '18 at 5:39
Third example is also inaccurate, because
read
consumes only first line ( that's one byte for 1
and newline ). seq
sent in total 10 bytes ( 5 numbers and 5 newlines ). So there's 8 bytes remaining in pipe buffer, and that's why second head
works - there's data still available in pipe buffer. Btw, head exits only if there's 0 bytes read, kinda as in head /dev/null
– Sergiy Kolodyazhnyy
Sep 3 '18 at 5:39
Thanks for the clarification. Am I understanding correctly that in
seq 5 | (head -n1; head -n1)
the first call empties the pipe, so it still exists in an open state but with no data for the second call to head
? So the difference in behavior between the pipe and the redirect is because head pulls all the data out of the pipe, but only the 2 lines out of the file handle?– Julian de Bhal
Sep 12 '18 at 5:10
Thanks for the clarification. Am I understanding correctly that in
seq 5 | (head -n1; head -n1)
the first call empties the pipe, so it still exists in an open state but with no data for the second call to head
? So the difference in behavior between the pipe and the redirect is because head pulls all the data out of the pipe, but only the 2 lines out of the file handle?– Julian de Bhal
Sep 12 '18 at 5:10
That's correct. And it's something that can be seen with
strace
command I gave in the first comment. With redirection, tmp file is on disk which makes it seekable ( because they use lseek()
syscall - commands can jump around the file from first byte to last however they want. But pipes are sequential and not seekable. So the only way for head to do its job is to read everything first, or if file is big - map some of it to RAM via mmap()
call. I once did my own tail
in Python, and ran into exactly same problem.– Sergiy Kolodyazhnyy
Sep 12 '18 at 5:21
That's correct. And it's something that can be seen with
strace
command I gave in the first comment. With redirection, tmp file is on disk which makes it seekable ( because they use lseek()
syscall - commands can jump around the file from first byte to last however they want. But pipes are sequential and not seekable. So the only way for head to do its job is to read everything first, or if file is big - map some of it to RAM via mmap()
call. I once did my own tail
in Python, and ran into exactly same problem.– Sergiy Kolodyazhnyy
Sep 12 '18 at 5:21
It's also important to remember that read end of the pipe ( file descriptor ) is given to subshell first
(...)
, and the subshell will make copy of its own stdin to each command inside (...)
. So they're technically read from same object. First head
thinks it's reading from it's own stdin. Second head
thinks it has its own stdin. But in reality their fd #1 ( stdin ) is just copy of same fd, which is read end of the pipe. Also, I've posted an answer, so maybe it'll help clarify things.– Sergiy Kolodyazhnyy
Sep 12 '18 at 9:56
It's also important to remember that read end of the pipe ( file descriptor ) is given to subshell first
(...)
, and the subshell will make copy of its own stdin to each command inside (...)
. So they're technically read from same object. First head
thinks it's reading from it's own stdin. Second head
thinks it has its own stdin. But in reality their fd #1 ( stdin ) is just copy of same fd, which is read end of the pipe. Also, I've posted an answer, so maybe it'll help clarify things.– Sergiy Kolodyazhnyy
Sep 12 '18 at 9:56
|
show 1 more comment
I've hit a problem with this in C today. Essentially Pipe's have different semantics to redirects as well, even when sent to stdin
. Really I think given the differences, pipes should go somewhere other than stdin
, so that stdin
and lets call it stdpipe
(to make an arbitrary differential) can be handled in different ways.
Consider this. When piping one program output to another fstat
seems to return zero as the st_size
despite ls -lha /proc/{PID}/fd
showing that there is a file. When redirecting a file this is not the case (at least on debian wheezy
, stretch
and jessie
vanilla and ubuntu 14.04
, 16.04
vanilla.
If you cat /proc/{PID}/fd/0
with a redirection you'll be able to repeat to read as many times as you like. If you do this with a pipe you'll notice that the second time you run the task consecutively, you don't get the same output.
add a comment |
I've hit a problem with this in C today. Essentially Pipe's have different semantics to redirects as well, even when sent to stdin
. Really I think given the differences, pipes should go somewhere other than stdin
, so that stdin
and lets call it stdpipe
(to make an arbitrary differential) can be handled in different ways.
Consider this. When piping one program output to another fstat
seems to return zero as the st_size
despite ls -lha /proc/{PID}/fd
showing that there is a file. When redirecting a file this is not the case (at least on debian wheezy
, stretch
and jessie
vanilla and ubuntu 14.04
, 16.04
vanilla.
If you cat /proc/{PID}/fd/0
with a redirection you'll be able to repeat to read as many times as you like. If you do this with a pipe you'll notice that the second time you run the task consecutively, you don't get the same output.
add a comment |
I've hit a problem with this in C today. Essentially Pipe's have different semantics to redirects as well, even when sent to stdin
. Really I think given the differences, pipes should go somewhere other than stdin
, so that stdin
and lets call it stdpipe
(to make an arbitrary differential) can be handled in different ways.
Consider this. When piping one program output to another fstat
seems to return zero as the st_size
despite ls -lha /proc/{PID}/fd
showing that there is a file. When redirecting a file this is not the case (at least on debian wheezy
, stretch
and jessie
vanilla and ubuntu 14.04
, 16.04
vanilla.
If you cat /proc/{PID}/fd/0
with a redirection you'll be able to repeat to read as many times as you like. If you do this with a pipe you'll notice that the second time you run the task consecutively, you don't get the same output.
I've hit a problem with this in C today. Essentially Pipe's have different semantics to redirects as well, even when sent to stdin
. Really I think given the differences, pipes should go somewhere other than stdin
, so that stdin
and lets call it stdpipe
(to make an arbitrary differential) can be handled in different ways.
Consider this. When piping one program output to another fstat
seems to return zero as the st_size
despite ls -lha /proc/{PID}/fd
showing that there is a file. When redirecting a file this is not the case (at least on debian wheezy
, stretch
and jessie
vanilla and ubuntu 14.04
, 16.04
vanilla.
If you cat /proc/{PID}/fd/0
with a redirection you'll be able to repeat to read as many times as you like. If you do this with a pipe you'll notice that the second time you run the task consecutively, you don't get the same output.
edited Feb 16 '18 at 12:37
answered Oct 26 '17 at 16:17
MrMeseesMrMesees
1188
1188
add a comment |
add a comment |
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f172982%2fwhat-is-the-difference-between-redirection-and-pipe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown