LPI 102-500 – 103.2: Process text streams with filters
- cat, head, tail
First let’s look at a few ways you can look at text files. There is the command Cat, which I am sure you know from the other videos. You can use cat to output, connect or link text files. How to output a file with cat should be clear, I think. Cat file one, for example, and the content of file one is displayed here. It is the word hello. So how do you combine two files with the help of Cat? In principle, it’s very simple. You just use cat on two files. In that case cat file one, file two and we see the result. Here is the content of file one and here is the content of file two. And that has now been summarized by Cat. Would you like to have the result? Now in a new file you can easily save it in a third file, like this cat file one, file two and then maybe file 22. We use this larger than symbol, which means that the content is copied and assigned to the new file. This is easy to remember through the tip of the case, which points to the file in which the content is copied. In this case file 22. If I cat file 22 now, then the file has been saved accordingly. Another possibility to display a text file is with a program Less. With Less. You usually open larger text files because less outputs the content of the text file page by page. I have prepared a file here, namely the Lorem Ipsum file.
And if I simply use the cat Lorem Ipsum, then it runs right through to the bottom. To see here many, many text clear the screen. If I say now left Lauren Ipsum, then we see the lines here from top to bottom and here it pauses to go one page further I press the space bar, then I go one page further and again and so on with the up and down arrow keys. I can also go back and forth partially. At some point I’m at the very end, very much text. Now you see, here we are at the end and we can leave Less with Q for quid. And there is nothing more to say about Less. Two further options for displaying files or outputting content are to use the head and tail commands. Head only shows the first ten lines, while tail shows the last ten lines. Let’s take a look at head, for example etsy password password the password file contains the users of the system, their groups, home drives and so on.
This file is significantly longer than ten lines, but only the first ten are shown here. You can also expand the output to any line. For this we use the N function, which would then be had N and maybe 20 for 20 lines password and now we see 20 lines. You can also omit the N. Then it would be written as follows at password and year 20 lines two. There are more options here, but none of them are important in my opinion. But as always, you can check the main page of the command head. Tail is used the same way as head password and in that case tail shows us the last ten lines of the Etsy password file. And here we can define the lines ourselves again by using the N function with tail and maybe the last 30 lines.
And now we have the last 30 lines of the file easy password. Tail is used relatively often in day to day work as a system administrator to check log files when troubleshooting. It happens that log files are several tens of thousands of even several hundred thousand lines in size. If you were to open such a file with cat, you would have to watch the text move across the screen for ten minutes. That is why one prefers to use tail here and then for example show the last 100 lines. And if the corresponding error occurred in the last 100 lines and not already three months ago, you can find it that way. With Tail you can also monitor log files. For example, if you have install new software and it is now starting, you can view log files live with tail and the option F. You can see the last line and if the log file grows, this is displayed dynamically.
We simulate this with the Pig lock file. I will open it with tailkglock and here we see what he did last. We have here the state year 2020, december the third at 1030. And so now I would select the file with tail and the option S. Sorry Tail VAR lock and we can see that I cannot enter anything here at this point. So to simulate that I have to open another terminal window. Open a new window. Okay, so now I would like to install the program hop in this terminal pseudo apt install h top and now we have to pay attention to this terminal. Something should happen now the DP pkg lock file documents things that Pig executes, for example our installation tool and we see something happening here has happened here. Hop is installed and we could read what was happening here live.
We were also able to do that on this page. But as a rule, if you install a larger program and start it sometimes a start like this can sometimes take five minutes because various applications within these application still have to be deployed. You can follow this via this log file or tail with the option F. So I would delete hard top here again with pseudo apt perch top yes. And here you see something happens here. So you can look at the log file life. To leave this log file, select the key combination control plus C and we are back in the normal console.
- zcat, bzcat, xzcat
If text files have been compressed, we can display them with special commands modified by cat without having to unpack them beforehand. Which commands we use depends on the compression method used to compress the text file. In the case of a file compressed with gzip, so with the ending gzet, we use the zCAT command. So I created a file here with the ending gzet. So let’s try it out. We are using zCAT, and we can see the result of the file here. To review, if we just plain the cat, or if we just use the cat command with the same file, then something is output. But of course, nobody can do anything with it.
How does that cat work? The file is unpacked in the background. The content of the file is output, and the file is packed. Again, all of this happens in the background without us noticing anything. If we are dealing with a file that was compressed with BZ two, then we are using bzcat. So you see, here I have a file with BZ two. Here we are using bzcat, and here too, we see a correct result. If we have a text file that has been compressed with x that we can use x cat, so x that cat. And here too, we have a correct results.
- nl, wc, od
Linux has a handful of commands that you can use to view statistics on text. The NL command stands for number lines and outputs the number of lines in a document. For example, at the first video and we see that the file opens on the away, including the line numbers here, 48 lines. Let’s look again at the file lorem Ipsum it is not so clear because different blocks can be found here, but there are empty lines in this file. These have no line numbers by default. So NL Laurem Ipsum and now we see here we have a line number here and a text block follows. This is a whole block and we have a blank line here where no line number follows and so on. And if you want the empty lines to have a line number two, we use the B option together with A. B stands for body numbering and the A stands for all lines, so that all lines are counted. So NLB Alorum Ipsum sorry, without a minus.
And here we can see now that the blank lines are also numbered with the command WC, which stands for word count. You can count the word words in a text file, for example, and we get this result displayed. What does that mean exactly? The first number shows the number of lines in the text document. So we have 397 lines. The second number shows the number of the words in the document. So here we have 18,462 words and the third number shows the size of the documents of the document in bytes. So in this case 113,173. And here the file name at the end.
So let’s I want to say let’s print the file to look if it’s true, but I think we cannot count the words, but I think we can check the lines. So you see here we only have 199 lines, but if we are using the command with the B and the A option, then we have 397 lines. In order to only display the number of words, we choose the option W for words, so WC with the option W for words, then laura Missus and here you see we have 18,462 words to only display the number of lines, you can use the L option WCL laura Miso and here you have 397 lines. This last command with L is ultimately nothing else than N-L-B-A laura Ipsum and we see here 397 lines. The OD command creates a so called dump of files in October or other formats. A dump is a copy of the contents of the memory. It is usually used for troubleshooting’s and can be output in various formats.
These would be for example, character format, hexadecimal format, binary format or octal format. OD stands for octal dump, which means that the standard output is octal. So OD lorem Ipsum here we see a dump of the file Loren Ipsum in onto notation for the exam it is not necessary to be able to read the spelling. We should only know what odd is and for what it is used to us. For example, to display the file in strings, we choose the option C, the invisible lines. For example, a line break is displayed so OD with C lorem ipsum and we now see the corresponding individual words here and here. The N is the character for a line break and we would like to see this file. For example, to display in binary form, we use the B option with ODB lorem ipsum. This is what our text looks like in binary code.
You can also combine both options CNB for example, with ODCB lorem ipsum then in principle we can have every letter and every binary digit under each other and maybe see through a little better. To display a dump of the file in Xadimar format we can use the age option OD Lauren ipsum and this is what it looks like in hexa HTML format. The ASCII format is also possible. That would be the A option ODA Lorenz and the result similar to that of the character string. Only line breaks and spaces are displayed differently. With OD, programmers can find characters in files that may not even belong there and may cause errors.
- md5sum, sha256sum, sha512sum
This lesson is about message digests. You have probably heard the word checksum or hash value before. When you download a file from the Internet, there is often an additional file that contains a hash value. We can use several commands to check this hash value, whether the download is correct and complete, or whether files were subsequently added which could indicate a wrong file. Let’s look at the first command MD 50. And then I would choose the file lorem ipson that is stored in this directory. And we are now shown a hash value of the file. If we had downloaded this file from the Internet, we could compare the value with the value that would have been supplied as a text file.
If the values are the same, we can assume that the file is correct and complete. If the value is different, it could be that the file is damaged, incomplete or even compressed. We execute the command again and save the corresponding hash value in the file. By the way, with the up arrow key, you can click through the last commands. And now I have the MD five some lower Ipsum here and I want the result in the file ally hash. So I use this ally hash MD five and let me check the output of this file and we see that the hash is being saved. And it’s the same as this one.
So everything is okay. So let’s change now the content of the lorem ipsum file with VI lorem ipsum. And I just enter a few additional characters here, something like that. That’s enough safe. And let me check the hash value again. And then we compare these two hash values. And here we can see at first glance that the complete hash value is simply different. Although I only changed a few characters, it’s completely different than this one. We have just saved the hash value as a file for a specific reason. There is an option C. With this option we can compare the current hash value of the file with the hash value of the saved MD five file. For this we simply choose MD five sum C and li five and we see here failed. The calculated checksum did not match.
The result shows us that the hash value does not match. We don’t have to refer to the file here because MD five sum has linked the original file in this same saved file. So here is linked file name just cat and here is the file name. So the program knows with which file it should compare the hash. The two words sha two, fuss six sum and sha 50. Some work in the same way. It may be that a package has to be installed in order to be able to use the two tools. This is called hash a lot, so pseudopod and now we use them in the same way with sha two five six sum and then Lauren ipsum and we see that the hash value is significantly longer than the one of MD five sum, which means that the encryption algorithm is significantly stronger. If we still do the whole thing with 512 some, it should be a lot more longer, a lot longer to some.
Lauren Ipsum and here the hash value uses 512 bit encryption and is accordingly much stronger and longer than the Sha 256 sum and even more than the MD five sum. So again, for comparison Lauren Ipsum we see what difference that is. Here is the hash value of MD five sum and here is the hash value of Sha 512. And here too we can use the option C to check whether the hash value of the file is correct. This time I have not changed anything in the file, so that should be correct. Check that with Sha 512 sum, option C and then li Sha 512 and in that case we see. OK, so everything is fine. The file is correct and complete.
- sort, uniq, tr, cut, paste
Now we come to the commands with which one can manipulate files or output them in a manipulated manner. The first command would be Sort. So let’s take a quick look at the man page. Man sort sort lines of text files. So Sort lines of text files. This command sorts the content of files according to certain criteria. I have prepared a table here. You can see it here, table CSV. And here we see the birthdays of employees in the department. The first column shows the birthday. The second column shows the month of the birth, the third the year of the birth. The fourth column shows the first names of the colleagues and the fifth columns shows the employment relationship.
We only have employees here, so let’s do Sort on this table without choosing any option. Sort table CSV and we see here that the standard Sort is based on the first number or the first column. But Sort does not pay attention to the complete number, but only to the first number, because otherwise the two or four would have to be at the top. And not like this is shown here. The N option would use the complete number in the first column and not just the first. So we try this again with Sort N table and so it works exactly the way we want it to. So you see here 2410 1117 so it’s the correct sort. Suppose we want to sort the list by year. These are not at the beginning, but in the third column here. How do we go about this with Sort? First of all, if we look at the list, we see that the individual columns are separated from one another by a comma. Here comma, comma, comma. Everywhere a comma. So we have to teach Sort that the columns are separated by a comma.
We do this with the option T followed by the comma in quotation marks, so that Sort knows that the comma separates the individual columns. This is followed by the K option, which stands for key and means something like that. A certain key is used for sorting. The key is then the three as the third column. So we try this Sort T option table CSV and we see this time the table was sorted by the year 19 6567 707-279-8889 and 19 1990. Just write it down again. So maybe that’s a little oprah. Therefore we repeat that shortly. So we are using Sort with the option T, which ensures that we can teach Sort which characters are used to separate the individual columns. The corresponding character follows the T in quotation marks. Here the comma. It goes without saying that if we had a semicolon in our table, then we would have to write a semicolon in here.
This is followed by the K option. K stands for key. The third key or column should be used with three. And then we also specify the file. And accordingly Sort uses the third column and sort according to the value that is in the third column. Now we come to the unique command which can be used to delete repeated lines. Let’s look at an example. I’ve prepared a little text file with the name unique TXT. And we see we have two different sentences that occur several times and are sometimes also written directly below one another. Here the same sentence three times, then another sentence twice, and then the sentence from above and twice the sentence from the middle. What happens now if we use unique on this text file? Let’s check that out. Unique and the name of the file it’s unique TXT.
The lines of the same name that were directly below each other were deleted so that the line only exists once. However, this only applies to the same lines that are below each other. If there is another line in between, it will be interpreted as the new line and not deleted. This is a test. In the first line it was written three times before and it made a line out of it. Only one line out of it. Then we had to say and another test twice, one single line out of it. Then we add this is a test again, one time. And we have this one time here too. And here at the end we have the text at another test twice. And we have this one time here with the option C. We are shown how often the corresponding lines occurred. So unique C. And we see here the line this is the test came three times, the next twice, this one once and the last twice. With the option group the whole thing can be displayed more clearly. For example unique group, unique. We have to use double minus, of course.
And here it is then grouped and a blank line is always inserted. So we have here this is a test three times. Then we have a blank line inserted and the next sentence and another test. Then we have a line inserted and so on. I think it is clear what the option group does. We come to the command Tr is the abbreviation for translate. The Tr command can be used to globally change individual characters or numbers in the file. Whole words cannot be changed. Only individual characters add symbols. Let’s look again at the table with the birthdays cat table. And we have the birthdays here and a lot of commas. We can now, for example, replace the commas with semicolons. The Tr command cannot be used alone. It can only be processed into the output of another command. So when we list the table with cat, we pipe the result onto the Tr command.
This result is what we see here. This is the result of cat table CSV. And we pass the result on to another command. We do this by taking the pipe symbol this one and entering another command which would then be tr. Now we have to tell tr of course, which character it should exchange, for which immediately after tr there is the sign that is already in the file and that we want to exchange. So the comma. And this is enclosed in single quotes embedded and immediately followed by the character that we would like to use instead. So in this case the semicolon again in quotation marks. Sorry. So let me just write that down cat table CSV and then we pipe the output, this one to the tr command. Then we say we want to exchange the commas to semicolons.
And here we can see that all commas have been replaced with semicolons. By the way, as in the command sort and Unix. Unix, for example. Also the file itself is not changed, only the output of the file. If I just let the table be output here again everything will be the same as before. So cat table and you see, we have still our commas here. With the option D, you can remove characters. So let’s remove, for example, the commas. We pipe the result to tr, select the option D and what should be deleted. Here the comma. So cat table CSV pipe two TRD for delete and we want to delete the comma. And now you see, the output of our list is without any command. With tr we can even convert uppercase letters into lowercase letters, and vice versa. For example, we choose here a to that in lower case and a to that in uppercase. So let me show that cat table CSV pipe tr and we choose a to that in lower cases and a to that in upper cases.
And now we see every letter is now a capital letter. We can even change several things at the same time. All we have to do is pass the result of tr to another tr command. This result that has now been output here. We do not output now, but pass it to another tr and say we also want to exchange the commerce and semicolons, for example. So maybe get table CSV pipe tr, we want to have capital letters, and the output will be forwarded to another tr. And we want to change the commerce to semicolons. And now we have capitalized everything and we have changed the commas to semicolons. Okay, now we come to the cut command. With Cut we can cut out columns of a file and only output these. We are using the birthday table again. And now we only want to display the names of the colleagues. We use the following command for example d comma f four table CSV.
By default, Cut assumes that the separator for each column is a tab. If this were actually the case, we wouldn’t have to use the D function, because D tells Cut that we want to use the comma as a separator instead. The F stands for fields, and in this case, the four stands for the fourth column here from the left. And then of course, the name of the table follows. And we see that only the corresponding names from the table have now been output here, and the other information has been omitted. Why? Because we told Cut that we only want to output the fourth column here. 1st, 2nd, 3rd, fourth. And Cut has done what we have told. If we want to output several columns, we use different numbers after the F, which can be separated by a comma. So, for example, cut D and then the F option. And we want to output the first column, the second column and the fourth column to repeat. With the option D, you can simply tell which separator is used to separate the individual columns from each other.
With the option F you can choose which column you want to display. And if we have a table here where a comma is not used, but a tab, we would omit the D and the comma here. At the end of this lesson we will look at the paste command. With the paste command you can mix the lines of two files together. It depends on the files, whether the output makes sense at all. I have prepared two files here. Let’s take a look at them below. So with cat paste one, we have these numbers here from one to twelve, and then cat paste two TXT. And we have the months from January to December. And we see this might make sense with the one to twelve and January to December.
And now we are using both files with a paste command. So with paste, paste one TXT and paste two TXT. And I think the result made it clear what exactly paste does. The two files have been merged into one. The two columns are separated by Tap. Just like with cat cut, Tap is used by default. And we can use the D option to ensure that semicolon is used instead of Tap.
For example, with paste D. And we want to use a semicolon instead of a tab, and then paste one TXT and paste two TXT. And now it is not separated by a Tap, but by a semicolon. So, as we wanted it to have with the option, as you can ensure that the result is not written in column form, but daily, but directly, one after the other. So we try this paste D. We still want to have the semicolon, and then we’re using the S option paste one, paste two. And now the individual data are written one after the other. So that’s all for this lesson. The next lesson is about Sed. See you then.
- sed, split
This lesson is about Sed, which is short for stream editor. Sed is a very large and very complex topic. But for the epic one exam we do not have to be able to know the full range of the Sed command. It is only about replacing certain contents of files with other contents with Sed. For this we use our birthday list again and maybe let’s replace the word employee with the word Lawyer. We do that with the phone following command. Okay, what does this command mean? That means we start the stream editor with Sed. Everything that comes after Sed must be in single quotation marks. You can see it here. So here’s the open quotation mark, followed by an S which stands for Substitute, which is which means for replace or exchange followed by a slash. And then the word that should be exchanged, in this case it is employee followed by another slash. And then the word we would like to use instead. So in this case lawyer. So the lawyer now must first be closed with a slash. And in this case, we have added the G option. And that stands for Global, which means that not only the first word, employee, so this one here is changed in the file, but all words with the name employee.
The command must be closed with a single quotation mark, and we must specify the file on which the command is to be executed. In this case. Table CSV. And we see the word Employee has been replaced with the word Lawyer in the whole file. The word is only replaced in the output, not in the file itself, so it is not saved. We can check this by outputting the file again. And here we have employees again. I think with the sort command I once showed how you can save the result in a file. You can use the same way in SCD, for example, like this scdryloyerg and then table CSV. And we want to save that in a new file with the name table two CSV. And if we output the table two CSV file, we also see that we have Lawyer instead of Employee and that the file have been saved. In contrast to Sort, Sed also offers an option that enables changes to be made directly in the file. That is the option I So we try that out. Sed the option I and then the S for Substitute, then the word employee.
Then we want to have the word Lawyer with the global option and the table CSV. And now the word Employee should be replaced with the word Lawyer and saved directly in the CSV table. We can check that of course, with cat table CSV. And we see here the world is not Employee anymore, but Lawyer. So that’s it about Sed. What we need for the LPIC one exam you don’t need to know more about this. And the last command in this chapter is the split command. Split divides a file into several files. By default, a new file is created every thousand lines. But of course, you can still adjust the value with the appropriate options. For example, our Lorim Ipsum file.
I don’t know, let me check how much lines we had. We have 397 lines here. So Split would only make a copy of the file because, as I said, the file will be splitted every thousand lines. We have only 397 lines here, so it would not work. But maybe we will use the B option for this. Example, b stands for bits, and we can choose to create one file per thousand bits. So, for example, split b thousand Lauren Ipsum. And we see a lot of files have been created here, and they are all thousand bytes in size, except for the last one. I think that’s only 173 bytes. We might look at one of those files and that would be the result of 1000 bits. Let me check that. XCF. These are 1000 bytes here.
So it is a standard feature of Split that the files contain these names accordingly. A new file always begins with an X, followed by two more characters. We delete the files for repetition. We use the RM command here with RMX. And now the files are deleted. And that’s it for the Split issue.