LPI 010-160 – Searching and Extracting Data from Files and Archiving
- Command Line Pipes
Sometimes when we’re running a command, we want to take the output of one command and push it over into the input of the next command. To do that, we use what’s known as a command line pipe. Now, a command line pipe or a pipeline is a standard output from one programmer command that’s being redirected as the standard input into a second programmer command. A pipe is created by using a vertical bar between two different commands and this key can be found above the Enter key on your keyboard and access. By pressing the Shift key and pressing the key. These pipelines can be useful when applied in various different ways. For example, you may have a lengthy output from a command like Cat that would display a lot of information to the screen.
Well, if you want to pipe it with the Less command, this will show you only one scroll length or one screen at a time until you hit the next page option. This avoids jumping all the way to the last page of a file and not seeing the content at the beginning or the middle. The PG and More commands can also be used with varying results to do the same type of thing.
Check out the man pages for Cat Less, PG and More to get additional information about these particular commands and I’ll show you how to use them later in a demonstration. Now, when we’re dealing with pipes, this is going to help users mash up two or more commands at the same time and run them consecutively. This allows us to do quick execution of commands to perform complex tasks and require less pauses as the system doesn’t have to wait for the next command to be inputted by you, the user.
Most commonly I see this being done with the command Grep. Grep is a command that’s used with pipelines to search for keywords inside of the output. So I might run something like a directory listing and then use a pipe and then a Grep command to search for a keyword inside that directory listing. Grep is really useful in this way because you can find specified strings and then return the name of a file or its contents line by line based on what it’s finding. The Grep command can also be used to search for a specified file with a specified string as well.
To use Grep, you’re going type the command name grip a set of options you want to use, a regular expression and then the optional file name specification you want to use. The Grep command supports a large number of options and all of the information about it can be found in detail inside of its man pages.
- I/O Redirection
Sometimes from the command line, you need to redirect things into or out of a program. And that’s where redirection comes into play. When you have a need to save a program’s output for future reference, you can redirect it to an output file. But if you have a program that needs to take something as an input, you can redirect it from an input file. Now, when you’re dealing with input redirection, I know this can sometimes sound strange, but some programs will rely on this feature to enable them to process data such as raw text files that are being fed through a program to search it for patterns and things like that. In addition to redirecting your output to files or as input into another program, a program’s output can also be passed into another program as its input by using a thing called piping, as we just learned about. Now, there’s a related technique that involves something called xargs. The X ARGs command enables a user to generate command line options from files and other programs outputs. Now, this all comes into play because it’s helpful when you start writing scripts. So let’s take a look at some commonly used redirect or operations.
The first one is what’s known as the greater than sign. When you use the greater than sign, it’s going to create a new file containing standard output. If the specified file exists, then it’s going to be overwritten with whatever the program is going to export next. We have two greater than signs. This is going to append the standard output to an existing file. So we’re not going to overwrite the existing file, we’ll just add to it. So for example, if you’re creating log files, this is something that’s very useful because you want to add stuff to the end and not lose the initial stuff that was there. Another thing we can use is two, and then the greater than sign.
This is going to create a new file containing any standard errors that the program is going to throw. If a specified file exists, it’s going to be overwritten as well. Now, just like we had the standard output that we could attend to a file, we can do the same thing with a standard error. To append standard error to the existing file, you’re going to use two greater than greater than. If the specified file doesn’t exist, a new file will be created. Another way we can do redirection is by using the ampersand and greater than signs. This is going to create a new file that’s going to contain both the standard output and the standard error. If the specified file doesn’t exist, it is going to be overwritten. Let’s look at another redirection command. We can use the less than sign. Now, this is going to send the contents of the specified file as input back into the standard input.
Now, what standard input on a machine, normally it’s your keyboard. Whatever you’re typing is normally the standard input, but in this case we can actually use a file as our input instead of typing the commands ourself. The next one we have is two less than signs and this will accept text on the following lines as standard input. So basically, instead of reading it from a file, we’ll just read it from the screen as well of what we’ve typed in. Now, if we have a less than and a greater than sign, this is going to cause the specified file to be used both for standard input and standard output, meaning it can read and write from that file and put that back into or take it out of the program. So remember, when it comes to output, we have two types. We have our standard output, which is our normal program messages that normally go to the screen and we have standard error messages that contain our standard error messages that are put to the screen.
If we write these to a file instead using our redirection commands, then we can look at them later and use them as part of our log files. So let’s say I had the grip command and I’m searching for information on a particular user in all of the configuration files that are located in the etc directory. Now, without redirection, the output of the command will be displayed on the screen, but I can’t use it for future reference. And if there’s a lot, a lot of cases where it found it, this is going to start scrolling over the screen and making it very hard for me to be able to read it all. So if I run this with a root permission, for instance, I’m going to get so many things that are going to be found through these configuration files that it’s just going to start scrolling and scrolling and scrolling.
Now instead, if I redirect all that to a file, I can then take that file, send it to somebody else to look at, I can print it out or I can search through it again myself. This makes it a lot easier. So how would I do that? Well, I would type grepetcstar greater than sign userfile TXT and what this is going to say is use the grip command to search through all the files in the etc directory and anything that you find, I want you to output it into a file known as userfile. TXT. This way I have that userfile. TXT with all the results that I can then look through. Now let’s take a look at another example. Let’s discuss the idea of a standard error and standard output. Let’s say I used greenspace username space. Telstar. And I do this as a normal user. What would I get? Well, I’m going to display all the files in which the username is going to appear. However, I’m also going to get a lot of error messages because I don’t have permissions to read all of the files in the etc directory.
So if I do this using that command as the standard output, I’m going to see whatever the username finds. But if I look at all the errors, they’re going to be shown through the standard error. So if I actually redirect these into the files for the standard output and the standard errors, I’d be able to pull apart the good stuff versus the errors. And this is going to make it easier for me. Now, what if I don’t care about the errors at all? What if I never want to look at them? Well, I can redirect them to the null file. There’s a device called slash dev, slash null and this is a device file that serves as a trash for all of the data that we don’t care about. So if I wanted to do maybe a search for the keyword david or the username david, I could use grepspacedavidspace tester.
And then use two greater thandevnull this is going to search for the word david through all the configuration files located in the etc directory and anything that’s an error, I’m just going to discard by throwing it into this null file. And anything that’s good is going to be shown to the screen and that way I don’t have to have all these error messages is in the way when I’m trying to find the value of the information that I’m really looking for.
- Piping and Redirection
In this lesson, we’re going to work on redirecting and piping. Now, to do this, we need to have some files to play with first. So we’re going to use some of our file commands and directory creation tools to give us some files we can play with. And then we’re going to do directory listings and create some errors and pipe these things in and out to different files. So to do this, first we need to make a directory. So I’m just going to go ahead and make a directory called Test, and that’s going to be inside my Documents folder. So now I’m going to go CD test, and in here I have nothing because it’s a blank folder that we just created. So what I want to do is create a couple of different files. So we’re just going to use our touch command, and we’ll call it Barry TXT.
We’ll do touch bob, we’ll do touch example. PNG. We’re going to do touch first file. We’re going to do touch foo one, and we’re going to do touch video MPEG. Now, all of these are actually blank text files. They’re not ping files or movie files or anything like that. And that’s fine. I just want to make sure that we have some files here. So when we do different directory listings, we can see things to the screen or we can get errors. For instance, I did LS, and you see those files are showing up on the screen.
Now, if I do LS and I do something like foo two, which doesn’t exist, I should get an error to the screen, and that’s a standard error. Right now, the first thing I want to do here is I want to go ahead and do my listing, which would be LS, and instead of putting it to the screen like that, I want it to go into a file called My Output. So I’m just going to do LS and then pipe it using the greater than sign into a file called My Output. If I hit enter, nothing shows up to the screen, but instead, all that content should have gone into My Output.
Now to see if that really happened, we need to display the contents of My Output. To do that, we use a command called Cat cat, and then My Output, this is going to display it to the screen. And what we have is those seven files listed one per line inside this text file, because that’s the way it’s going to be done. As it gets input in each file name is one new line inside of My Output. So now that we have this file, we’re going to do some other data and other techniques, and we’re going to write them to it. So as we saw there, we were able to redirect output from a command into a file. And in the case of My Output, when we did the LS command, my Output didn’t exist. Yet.
So what we did was we create a file and put all the contents into it. All right? So the next thing we’re going to do is we’re going to use a program called Word Count, which is WC. Now, WC has a lot of different options, including one that’s known as L, which what that will do is instead of doing a word count, we’re going to do a line count of a particular file. So let’s say I wanted to go through and I wanted to count up the number of lines inside of my output. Well, if we did that and we put it to the screen, what are we going to get? We’re going to get seven, because there’s seven lines that were inside of that file, as you can see above.
Barry Bob example, first file foo, my output and video dot MPEG. So if I wanted to put that into a file, like into the my output file, I could do that. But if I do it directly by doing it this way, what’s going to happen is I’m going to output this line of seven, my output to the file, my output. And what’s going to happen there is we’re actually going to end up overwriting my output. And so if I go ahead and do that, let’s see what happens if we cat it. And now instead of looking at my output as the list of files, we see a single thing. We see that there are zero things inside of my output. Now, why is there zero? Because when we overwrote it and exported it to that command, we blanked out that file and then counted it. And so that was the issue there.
So again, let’s go ahead and we’ll do something like LS, and we will do an LS, and we’ll output that into barry dot TXT, okay? Now if I cat berry dot TXT, what do I have? I have seven items. Now, if I do this word count l of berry TXT, and I output that into my output, what are we going to get? We should get seven. And the file name that we did, berry TXT, let’s see what happens here. So we go ahead and do that. We’re going to do a cat of my output. And you’ll see, seven lines were in the file, barry TXT. So you can see how that works. Now, but the problem with that is I just overwrote all the contents of my output. So my output used to have the contents here, all seven lines. What if I wanted to add this line at the end instead of overwriting it like I just did? Well, to do that, we need to use a different way of redirecting, and that is that we want to append it to the file instead of replacing it.
If you remember from the lesson, we can do that by, instead of using a single greater than sign, we’ll use two greater than signs. And that means we’re going to output it as an additional line to the file instead of overwriting what’s currently there. So if I run this command right now, what do you think is going to happen? If it happens correctly, we should see sevenberry TXT, sevenberry TXT. We should have two lines, the existing line of sevenberry TXT and the new line of sevenberry TXT.
Let’s go ahead and check that out. And there we go. So you can see that we can actually add more data to the file. For instance, if we want to do this with the LS command, we can add that to our My output file. And now we should have sevenberry TXT, sevenberry TXT and the seven files listed. And there you go. You can see we appended it there to the bottom. Now that is great for sending data to a file, but how do we get data back from a file? Well, let’s say I wanted to count the contents of my output. In this case, I know that there is currently nine lines. I have seven berry, seven berry, and then the seven file lines. So if I wanted to count those, I can again use my word count, count up the lines, and I’ll do that for the My output. What should I get to the screen? I should get the number nine for nine lines and the name of the file, My output. And there we go.
We can see that right there to the screen. Now, instead of doing it that way, by calling that file, we can actually bring that file in as an input. And to do that, we would just use the word count l and then the less than sign, which is going to tell us what file we’re counting, and inputting it in as if it was a data stream on the keyboard. And this is going to look a little different because we’re just going to get the lines and not the file name. The reason is it doesn’t realize that this My output is a file anymore. Instead, it’s thinking it’s input from the keyboard, but we’re doing it automated, as in from this file. And so here we just get nine. Okay, so you saw how we were able to send things to a file and you saw how we could take things from a file.
Now, can we do both in the same command? Well, yes, we can. And we can do that by using something like word count l again. And this time I want to come in from the file Barry text and I want to go out to the file, My output. So what we’re going to do is we’re going to take the contents of berry TXT and input that in and count the number of lines in Barry TXT and whatever that result is, instead of showing it on the screen, I’m going to save that and overwrite the current contents of my output. So let’s see how that works. Again, nothing to the screen because I redirected my standard output to the file, my output. And so if I do a cat of my output, we can see that I have seven lines were inputted from the Berry folder. Now to verify that, we can cat the Berry file and count those up. Barry Bob example, first file foo, my output and video, that is seven files.
So that was seven different lines and that’s how we got the number seven there and that was saved into my output. And so you could see how we can chain these things together to take input from a file and then save output to another file. And again, if we wanted to, we could actually append this instead of overwriting it when we go ahead and use this, my output by using two symbols to the right. And so now my output should have what, it should have two number sevens, one on each line.
And so oops, I hit the wrong thing. So let’s just go up here and we are going to cat my output and we see seven and seven. And so we have those seven lines from Barry inside of Barry still, we haven’t affected that input file. And we have the two sevens, the one that we did up here with this command, this one gave us the first seven and then this command appended it to the second seven that we have here. And so you can see how these things work together and we can combine them to get more effects if we need to. Now let’s go ahead and talk about the errors and redirecting our standard errors. Now, when we have an error, like, for instance, if I wanted to do an LS of Video MPEG and I wanted to do something like, I don’t know, BLA foo, BLA foo doesn’t exist, it’s not one of the files. So I should get an error saying something like can’t access blah foo.
And we do, we get this error. And then the second thing we got listed there was Video MPEG because it found that so instead of getting these errors to the screen, I can actually redirect them to a text file and save them later. To do this, we’re just going to go ahead and use our list command just like we just did. I’m going to bring that command back up, but instead of leaving it that way, I’m going to use the two and the greater than sign and I’m going to put this into a file called errors TXT. Now, what this does is it tells me that anything that’s an error output to the file, anything that is good output to the screen. So here I should just see Video MPEG and then I should see nothing else on the screen because any errors are going to go to that errors file.
So if I do that, that’s exactly what I see. And if I hit LS now, you’ll see that I do have a new file called errors text. If I catch that errors text, you can see I have the error that I would have gotten. This cannot access BLA foo. No such file or directory is now shown inside of the error file instead of being displayed to the screen. So that’s where you can redirect those errors. Now, if you want to save both your normal output and your error messages into a single file, you can do that by redirecting both the standard error stream to your standard out stream and then redirecting your standard out into a file.
Now, I know that sounds kind of complicated, but really it’s not that hard. What we’re going to do here is simply go ahead and bring up the Lsvideo mpegbla foo just like we did before, and now I want to output that to the file, my output, and then I want to go ahead and redirect all of that over to my standard out. And so now I have this output from my screen going to my standard output, and then that going into this file, right?
And so if I go ahead and hit enter, you should see nothing but a command prompt come back up, which is what happened. And now if I cat the my output, you’ll see what that output would have been, which is the same thing we would have seen on the screen, right? Because again, we’re outputting everything to this file, just like we outputted this into the screen. We took that same thing and put it into the file, my output, and that’s why we see it right here under my output. All right, so you can see how this stuff starts working together, right?
What is this two greater than ampersand one? What does that ampersand one mean? Well, as you saw earlier, the two was telling us where to output the standard error, which is to the errors file here. Well, in this case, we’re saying put it to ampersand one. What is ampersand one? Well, ampersand one is our standard out, and wherever I put standard out, in this case to a file, it’s going to go ahead and send it. Do I really need this part? No, I really don’t, because standard error is already going to go to standard output unless you tell it otherwise. But in this case, we might have said, hey, I want things, my standard output to go to my output, and I wanted to redirect this to another file called errors TXT or something of that nature. You can do that kind of stuff.
You really have a lot of control over where your standard output and where your standard errors are going to go. All right, now let’s go ahead and clear our screen, and let’s talk about piping. Now, when we talk about piping, this is where we want to combine multiple commands together. So for instance, if I did LS, I just have one command, I list everything to the screen. But if I did LS and then I did a pipe, I can then add another command that will use the output from LS and do something to it. For example, maybe I want to use the command head three. Now, what does head three do? Well, head is saying give me the first three things in the file. The dash three is what says give me three things from this file. I could do two and that would give me the first two things, but we’ll use three. So what I should get is the first three things, which would be barry, bob and errors. Let’s see if that happens.
There you go. Now, if I did the same thing but I said I wanted four things, I see four. If I wanted two things, I get two. And so I can control that. But what’s happening is whatever the output was of that first command, the LS was instead routed to the second command, in this case, head. Okay, now we’ve piped two things, but we can actually pipe as many as we want. So for example, let’s say I was going to do an LS and I wanted to see the first five items. Well, the first five items would be barry, bob, errors, example, and first file. But let’s say now I just want to see the last two of those. Well, what’s the last two of those five? Well, it’s example and first file. And to do that, where HEG uses the first ones, there’s a command called tail that is going to give you the last ones.
And I can do the last one or the last two or the last 50, whatever it is. Tail is a very helpful command when you’re looking at log files, because the last things in a log is the most recent things that you really worry about where the head might have been, things from days ago. So anyway, here if I’m doing the directory listing, I’m going to say here’s the full listing, which is buried through video. Then I only want to see the first five, which is going to be barry, bob, errors, example, and first file. And now I just want to see the last two of those, which should give me example and first file. Let’s see if it does it. There you go. Because what we just did was we ran each of those commands one at a time, but it piped them together. So the output of one became the input of the next. And that’s how we get that chaining to go on, by doing this.
.And so we can do this as many as you want, as much as you want, and you can even do that same thing and then pipe it to a file. So let’s just call it directory TXT. And what I should see is nothing to my screen. But when I open up directory TXT using the cat command, I’m going to see example and first file there. Now, why didn’t I see example and first file like I was expecting? Well, you got to remember we just created a new file called directory, right? And where is directory put? Right here between bob and errors. So now what happened was it created this file in preparation to save this data. Then it ran the LS command and so what it got if I did an LS here is the first five barry, bob, directory errors and examples. Then I said from that five, give me the last two which would be errors and examples. And that’s how we ended up with that.
So you got to be careful sometimes thinking about when you’re saving this to a file because if the file doesn’t exist or the file happens to be in the same directory you’re listing, that could be an issue. Now, again, these are kind of simple commands that we’re doing. We really would be using more of this when we’re going and gripping through files, getting stuff and then putting it into another directory. We wouldn’t save to the directory we’re manipulating like I did here. But it is something to be aware of because sometimes you’ll get an unexpected error like that. In this case, I was expecting to see example and first file because I didn’t think about the fact that directory is going to show up first and directory comes someplace in the beginning here. If I had used this as a name like, I don’t know, ZZZ, this wouldn’t have been an issue because ZZZ would have been over here after video.
So just something to keep in mind as you’re playing with this stuff. And again, like I said, you can combine pipes together, you can combine standard outputs and standard inputs together and putting things to files. It all depends on what you really want to do. And there’s a lot of control here that you have as you’re doing this type of stuff. All right, so what I would recommend is, again, get into the terminal here, create yourself a couple of files using the touch command, using some make directory, playing around with this listing of files or any other commands you want and try piping them together. Try redirecting your output or your input to different places and get comfortable with these concepts. It’ll really help you out and make you much more comfortable in the Linux.