Navigating the Filesystem
This section will give a brief overview on some linux commands you can use to navigate and interact with the filesystem on your VM. If you do not have a basic understanding of what a file system is, it is suggested that you do some further background reading. The Unix Shell lesson by Software Carpentry is a good suggested starting point: https://swcarpentry.github.io/shell-novice/. Below is a list of commands you may find useful. This list is not exhaustive, so please feel free to explore this topic further on the internet. To get further information on a given command x, enter the command “man x” in your terminal window while logged into your VM. This will bring up the manual page on that command.
- pwd: This command prints the full filename of the current working directory. This allows you to easily determine where in the filesystem you are currently sitting.
- ls: This command lists the contents (files and directories) of the current directory.
- cd <dir>: This command is used to navigate the filesystem. If <dir> is not included, you will be moved to your home directory. Otherwise, you will be moved to <dir>.
- mkdir <name>: This command will create an empty directory called <name>.
- rmdir <name>: This command will remove an empty directory called <name>. It will fail if the directory <name> is not empty.
- rm <name>: This command will remove the file called <name>. If <name> is a directory, the command will fail. To remove a directory with rm, add the parameter -r (rm -r <name>). This removes <name> recursively (ie. that directory and everything in it will be deleted).
- mv <old> <new>: This command will move the file/directory <old> to <new> (ie. renames the file/directory)
- cp <old> <new>: This command will copy the file/directory <old> to <new>.
Every file and directory in Linux is owned by a user, and a group. Linux is a multiuser environment, where your files are separated from others on the same Linux system.
Linux has three levels of permissions – user, group, and other. Each of these three levels lets you set a read, write, and execute permission. So in total there are 9 standard permissions in Linux. Read means you are able to look at a file’s content; write means you can save to the file. And execute means that if the file is a program, then you are can run it.
If your user matches a files, then the user permissions apply to you. If one of your groups matches a file’s group, then group permissions apply to you.
When you run ls -l you will see the long listing format for files. In this format, the third and fourth column shows the user and group associated with each file. The file’s permissions are shown in the first column as a sequence o r, w, x, – characters. The nine permissions are listed in the order rwx for each of the three user, group, other levels. If a permission is replaced with a – then that file does not have the particular permission.
You can change permissions with the chmod command. For example, to remove group read from a file: chmod g-r example.txt
Basic Text Editing
A good basic text editor for the Linux command-line is nano. To edit a file just run the command nano followed by the filename. In the nano editor, special commands are listed at the bottom with the “^” character representing the CTRL key. If you prefer a graphical text editor, you may used gedit instead of nano.
Monitoring Load, Resources and Processes
This section will give a set of commands that you can use to monitor the current load and resource use on your VM.
- top / htop: Both top and htop gives you a dynamic, real-time view of all of the processes running on your system. There is a lot of data displayed for each process. Likely the most important information for you is the %CPU and %MEM details of the processes you are running. If you are running a multithreaded application, %CPU can go over 100%. The max %CPU will be the number of cores assigned to your VM (n) times 100%. If you notice your process (or sum of the CPU usage over you processes) are using close to n00% CPU, you are completely using all available cores. You need to also monitor %MEM. If this gets too high (ie. reaches 100%), your process will crash. If you notice that your process is slowly but continually using more memory it will likely crash eventually.
- ps <ID>: Every process running on your system is given a unique ID. The ID assigned to each process is listed in the first column of the top output. This command gives more information on a specific running process.
- free -h: This command gives a quick summary of the free and used memory across your VM. Including the option -h displays the output using human readable units.
- df -h: This command gives a quick usage summary of the available disk storage. An intervention should be made if you notice any listed volume approaching 100% usage.
Running Processes in the Background
Sometimes if a job is going to take a long time to execute, we would like to run this process in the background. This allows us to have our job execute but still lets us have full control over our terminal. To run a job in the background, simply add an & to the end of the command line before executing it. For example:
When we do this it will launch command and return us to the command prompt. If we run top, we should see it in our process list. This method allows us to run one process in the background. To be able to run multiple background processes, after executing the command above we much commit it to the background, we must follow it by bg:
If you have already started a job and want to relegate it to the background, press CTRL-Z. This will suspend the job. When you follow this by bg it will have it run in the background.
To get a list of jobs running in the background, we use the jobs command. Each background job is assigned a job ID. If we want to switch to a background job (ie. bring it to the foreground), we use the command fg <ID>. <ID> can be omitted if there is only one background process running.
If you want to end a process that is running, you can use the kill <ID> command. The kill command takes a process ID as a parameter. This ID can be determined using the top command.
Redirecting Process Output to a File
Running certain programs cause a lot of information to be dumped to the console. Sometimes this is done so quickly that we cannot follow what is going on or we would like to save this output for later. To do this, we can use something called a redirect. This redirect causes the information normally dumped to your screen to be saved to a file. You do this by using ‘>’:
command > results.txt
This will save the output of command to results.txt. This is great, however there are actually two sets of outputs being produced here. The first is the regular output produced by command (known are stdout) as well as all of the errors produced (known as stderr). These “output streams” are normally dumped to the same location so the end user does not know where a given line came from. Sometimes it is useful to separate the stdout from the stderr. We can redirect either of these streams to a file by identifying them with ‘1’ (stdout) or ‘2’ (stderr). For example, if we want to save the stdout to a file and have the errors print to screen, we would do the following:
command 1> results.txt
On the other hand, if we would like to print the stdout to screen and save the errors to a file, we would do the following:
command 2> errors.txt
If we want to save both streams to separate files, we would do the following:
command 1> results.txt 2> errors.txt
Finally, we can combine this with what we learned in the previous section to send this entire process to the background:
command 1> results.txt 2> errors.txt &
Keep jobs running after logout
Running jobs in the background a useful way to run many long programs at once. However, your shell session stops all its jobs when you logout or disconnect. Even background jobs are stopped. For jobs that run a long time, they will be stopped if your local computer turns off, or if there are any network interruptions. It is also convenient to be able to switch between various computers, e.g. at home and school, and continue monitoring your long jobs from both places.
To run a job that can be disconnected from your shell, you should to do two things. First, tell your job to ignore any disconnect (“hangup” or HUP) signals from your shell using the nohup command. Second, redirect your output to a file for later viewing. For example:
nohup command > results.txt 2> errors.txt &