Useful Linux Commands
User and Group management
Add user (abasar) with home directory with sudoer permission.
sudo useradd abasar -d /home/abasar
sudo usermod -aG wheel abasar
Additional commands to copy the existing public to to the new user. This is especially helpful when you want to switch from default user to to your preferred username.
sudo mkdir /home/abasar/.ssh
sudo cp ~/.ssh/authorized_keys /home/abasar/.ssh/
sudo chown abasar:abasar -R /home/abasar/.ssh/
sudo chmod 700 -R /home/abasar/.ssh/
sudo chmod 600 /home/abasar/.ssh/authorized_keys
sudo ls -la /home/abasar/.ssh
Add public ssh key to the new user so that you can use the default ssh key for login.
ssh -i ~/.ssh/key.pem abasar@einext03
cat ~/.ssh/id_rsa.pub | ssh -i key.pem abasar@einext03 'cat >> .ssh/authorized_keys && echo "Key copied"'
ssh abasar@einext03
Add an existing user to a group
$ usermod -G <groupname> <username>
Check members of a group
$ cat /etc/groups | grep -i <group name>
Check groups a user belong to
$ groups <username>
Package management
Base repo http://fedoraproject.org/wiki/EPEL
Update existing packages
$ sudo yum update
Search yum repos
$ sudo yum search <package name>
Install
$ sudo yum install <package name> -y
Yum download only
$ sudo yum install yum-downloadonly
$ yum install --downloadonly --downloaddir=<directory> <package>
Install from local rpm
$ sudo yum localinstall <package name> -y
Remove
$ sudo yum remove <package name>
View installed packages
$ sudo yum list installed
View configured repos list
$ sudo yum repolist -v
$ sudo ls -l /etc/yum.repos.d/
Find details of a installed package
$ sudo rpm -ql package-name
Add a new repo
- Download .repo file in /etc/yum.repos.d/
- Optionally add pgp key to verify the downloaded packages using rpm -import command.
Example: http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_cdh5_install.html#topic_4_4_2
Tar and untar
Tar a directory and compress the tar file with gzip
$ tar -zcf anaconda3.tar.gz anaconda3
Untar
$ tar xf anaconda3.tar.gz
Explore Processes
View Running jvm processes
$ sudo jps -lv
View java thread dump
$ jstack <pid>
Find whether a given process is running
$ sudo ps -ef | grep -i <process name e.g. java>
Tools to automatically restart process daemons: daemontools, supervisor
Resources Used by a process
Show CPU and memory every 1 second
while true; do ps -p $PID -o %cpu,%mem ; sleep 1; done
Process details
ps -f -p $PID
CPU/memory usage
pidstat -p $PID 3
Disk read/write
sudo iotop -p $PID
Find ports used
lsof -i | grep $PID
Files accessed
sudo strace -f -t -e trace=file -p $PID
OS limits
sudo cat /proc/$PID/limits
Process status
sudo cat /proc/$PID/status
Command line argument
sudo cat /proc/$PID/cmdline
Network usage by network interface (output shows pid)
dstat -dnyc -N <interface> -C total -f 5
Network usage (not by a given process, but output shows pid)
sudo nethogs
Device Mounting
Manually mount
sudo mkdir -p /nvme
sudo mount /dev/nvme0n1p1 /nvme
Unmount
sudo umount /nvme
To mount on startup, add the entries for partitions in /etc/fstab. Partitions must have file system format e.g. ext4.
/dev/sdb1 /hdd1 ext4 defaults 0 0
/dev/nvme0n1p1 /nvme ext4 defaults 0 0
Activate the /etc/fstab entries
sudo mount -a
Mount an ISO image
sudo mkdir /media/iso
sudo mount -o loop path/to/iso/file/YOUR_ISO_FILE.ISO /media/iso
NFS Mount
# View shared directories published by a server
sudo showmount -e server01
# Create a local directory
sudo mkdir /mnt/server01
# Mount the NFS server path to the local
sudo mount -t nfs4 server01:/hdd1 /mnt/server01
Mount Iso Image
Create a directory to serve as the mount location:
sudo mkdir /media/iso
Mount the ISO in the target directory:
sudo mount -o loop path/to/iso/file/YOUR_ISO_FILE.ISO /media/iso
Unmount the ISO:
sudo umount /media/iso
Resource Utilization
View disk io for each processes
$ sudo yum install iotop -y
$ sudo iotop -o
Disk io at the disk level
$ sudo yum install sysstat -y
$ sudo iostat -dx 5
# Options #
d : Display the Disk utilization
x : Display extended statistics
5 : Interval in seconds
View mem and CPU utilization
$ top
View memory information
$ cat /proc/meminfo
View network utilization
$ yum install iftop -y
$ sudo iftop -n
There are several other commands. See the list below.
iftop -- shows current open connections and their transfer rates
vmstat -- shows virtual memory status (found in procps package)
iostat -- shows current IO transfer rate by devices (found in sysstat package)
dstat -- combines vmstat, iostat and iostat-like info for network IO
iotop -- like top, but the focus is on IO transfer rate
lsof -- get info from currently open files
fuser -- does a subset of lsof: identify processes and users using certain files
Find file access by a process id
$ sudo strace -f -t -e trace=file -p <pid>
Disk and memory utilization using sar command.
Memory Utilization
$ sar -p -r 1 10
Linux 4.15.0-48-generic (einext02) 05/29/2020 _x86_64_ (8 CPU)
08:49:22 AM kbmemfree kbavail kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
08:49:23 AM 45064688 51654268 20790624 31.57 171280 6977412 22193668 16.71 18861288 1541152 168
08:49:24 AM 45085708 51675288 20769604 31.54 171280 6977412 22193668 16.71 18839796 1541152 36
08:49:25 AM 45085708 51675288 20769604 31.54 171280 6977412 22193668 16.71 18839796 1541152 36
08:49:26 AM 45085708 51675288 20769604 31.54 171280 6977412 22193668 16.71 18839796 1541152 36
08:49:27 AM 45085708 51675284 20769604 31.54 171296 6977392 22193668 16.71 18839800 1541148 36
08:49:28 AM 45085628 51675216 20769684 31.54 171296 6977412 22226616 16.73 18839808 1541152 436
08:49:29 AM 45085932 51675516 20769380 31.54 171300 6977412 22226616 16.73 18839816 1541152 444
08:49:30 AM 45085932 51675516 20769380 31.54 171300 6977412 22226616 16.73 18839816 1541152 444
08:49:31 AM 45085932 51675516 20769380 31.54 171300 6977412 22226616 16.73 18840140 1541152 444
08:49:32 AM 45085932 51675520 20769380 31.54 171308 6977408 22226616 16.73 18840140 1541156 444
Average: 45083688 51673270 20771624 31.54 171292 6977410 22210142 16.72 18842020 1541152 252
Disk utilization
$ sar -p -d 1 3
Linux 4.15.0-48-generic (einext02) 05/29/2020 _x86_64_ (8 CPU)
08:52:01 AM DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util
08:52:02 AM loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
08:52:02 AM sda 3.00 0.00 72.00 24.00 0.04 14.67 14.67 4.40
08:52:02 AM sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
08:52:02 AM DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util
08:52:03 AM loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
08:52:03 AM sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
08:52:03 AM sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
08:52:03 AM DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util
08:52:04 AM loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
08:52:04 AM sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
08:52:04 AM sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util
Average: loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: sda 1.00 0.00 24.00 24.00 0.01 14.67 14.67 1.47
Average: sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Test Disk Speed
sudo dd if=/dev/zero of=/tmp/test.img bs=1M count=1K oflag=dsync
sudo hdparm -Tt /dev/nvme0n1
Upgrade gcc to 4.9
sudo yum install centos-release-scl
sudo yum install devtoolset-6
sudo yum install devtoolset-3-gcc devtoolset-3-gcc-c++
scl enable devtoolset-6 bash
Verify gcc version
$ gcc --version
gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE
Services
Find exact name of available services
$ sudo ls -l /etc/init.d/<first few letters of name e.g. hadoop>*
Start/Stop/Restart/Find status of a service
$ sudo service <name of a service e.g. hadoop-hdfs-namenode> [start|stop|restart|status]
Start/Stop/Restart/Find status of a service that matches a pattern
$ for service in /etc/init.d/hadoop*; do sudo $service status;done
System Activity Information
$ sudo sar
Run a command every n seconds
$ watch -n 1 cat /proc/meminfo
Move file using rsync
$ rsync -avhuz --progress --rsh="ssh -p2222" Downloads training@localhost:~
You can also set environment variable for SSH port.
$ export RSYNC_CONNECT_PROG='ssh -p2222'
If you want to delete the files in the target that have been deleted at source, add --delete argument.
Pipe Commands
$ aws logs describe-log-groups --output text | cut -f 4 | while read -r line; do aws logs delete-log-group --log-group-name $line; done
Copy only .pdf files from one directory
$ rsync -avm --include='*.pdf' --include='*/' --exclude='*' --prune-empty-dirs /source /destination
Synchronize two directory every second
$ while sleep 1; do rsync -avuz /Users/user01/workspace/scala/heavy --exclude "target" user01@server01:/home/user01/workspace/scala; done
Clush
Clush is an open source tool that allows you to execute commands in parallel across the nodes in the cluster
$ sudo -i # Login as root
$ yum install clustershell -y
$ vi /etc/clustershell/groups
Add the cluster nodes
all: server[01-04]
Run commands to all nodes in the cluster
$ clush -a date
To copy file /etc/hadoop/conf/core-site.xml to all cluster nodes .
$ clush -a -c /etc/hadoop/conf/core-site.xml
Verify :
$ clush -ab ls -l /etc/hadoop/conf/
Find external IP
$ dig +short myip.opendns.com @resolver1.opendns.com
or
$ curl -4 icanhazip.com
Text Editing
Compare two files
$ vim -d file1 file2
Find and replace text
:%s/find_string/replace_with string/gc
Working with directories and files
View directory listing
$ ls
Change directory
$ cd <dir name, absolute path or relative path>
Know current directory
$ pwd
Create directory
$ mkdir -p <directory you want to create... with -p creates directory layers>
Move or Rename file/dir
$ mv <source path> <destination path>
Delete a file /dir
$ rm -rf <file/dir path>
Search files by name
$ find ./ -name "*.py"
Search content of files recursively
$ grep -rn "lookup-by-id" src/app/services/*
See file/folder size in desc order
$ sudo du -chd 1 | sort -h
Download multiple files using urls in a file
Create a file links.txt and put links in the file - one line per link. Then run the wget command.
$ wget -i links.txt
Download all files mentioned in links on a page
$ wget --no-clobber --convert-links --random-wait -r -p -E -e robots=off -U mozilla http://site/path/
Find ports used in the current system
$ sudo lsof -i -P | grep -i "listen"
$ netstat -an -ptcp | grep LISTEN # mac
$ sudo netstat -pant | grep LISTEN #ubuntu
View binary data
hexdump -C data/version-2/log.1 | head -n 10
00000000 5a 4b 4c 47 00 00 00 02 00 00 00 00 00 00 00 00 |ZKLG............|
00000010 00 00 00 00 fd e3 0b f9 00 00 00 30 01 00 02 d9 |...........0....|
00000020 f1 8a 00 00 00 00 00 00 00 00 00 00 00 00 00 01 |................|
00000030 00 00 01 72 e7 5a 53 bc ff ff ff f6 00 00 13 88 |...r.ZS.........|
00000040 00 00 00 02 00 00 00 00 51 c6 d6 60 42 00 00 00 |........Q..`B...|
00000050 00 36 10 18 99 00 00 00 80 01 00 02 d9 f1 8a 00 |.6..............|
00000060 00 00 00 00 02 00 00 00 00 00 00 00 02 00 00 01 |................|
00000070 72 e7 5a 53 de 00 00 00 01 00 00 00 08 2f 65 78 |r.ZS........./ex|
00000080 61 6d 70 6c 65 00 00 00 24 33 30 33 32 65 39 32 |ample...$3032e92|
00000090 39 2d 63 32 32 66 2d 34 61 33 62 2d 62 61 66 33 |9-c22f-4a3b-baf3|
$ echo "hello world" | hexdump -C
00000000 68 65 6c 6c 6f 20 77 6f 72 6c 64 0a |hello world.|
0000000c
How to Disable ipv6
in /etc/sysctl.conf : net.ipv6.conf.all.disable_ipv6 = 1
in /etc/sysconfig/network : NETWORKING_IPV6=no
in /etc/sysconfig/network-scripts/ifcfg-eth0 : IPV6INIT=”no”
disable iptables6 – chkconfig –level 345 ip6tables off
reboot
Test network speed between 2 machines
Machine 1: Start Netcat to Listen
$ nc -lk 2112 >/dev/null
Machine 2:
$ dd if=/dev/zero bs=16000 count=625 | nc -v <machine 1 ip/name> 2112
Use iperf tool
$ sudo apt install iperf -y
Machine 1:
$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 16.0 MByte (default)
------------------------------------------------------------
[ 4] local 172.21.2.13 port 5001 connected with 172.21.2.14 port 64776
------------------------------------------------------------
Client connecting to 172.21.2.14, TCP port 5001
TCP window size: 16.0 MByte (default)
------------------------------------------------------------
[ 6] local 172.21.2.13 port 65452 connected with 172.21.2.14 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-10.0 sec 1.08 GBytes 929 Mbits/sec
[ 4] 0.0-10.8 sec 512 MBytes 397 Mbits/sec
Machine 2:
$ iperf -c 172.21.2.13 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: -1.00 Byte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.21.2.13, TCP port 5001
TCP window size: 16.0 MByte (default)
------------------------------------------------------------
[ 5] local 172.21.2.14 port 64776 connected with 172.21.2.13 port 5001
[ 4] local 172.21.2.14 port 5001 connected with 172.21.2.13 port 65452
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 512 MBytes 427 Mbits/sec
[ 4] 0.0-10.1 sec 1.08 GBytes 917 Mbits/sec
Complete guide: http://openmaniak.com/iperf.php
Test Internet speed using speedtest-cli
$ wget https://raw.githubusercontent.com/sivel/speedtest-cli/master/speedtest.py
$ python3 speedtest.py
Retrieving speedtest.net configuration...
Testing from ACT (106.51.31.233)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by E-Infrastructure & Entertainment India Pvt. Ltd (Bangalore) [4.55 km]: 12.169 ms
Testing download speed................................................................................
Download: 30.84 Mbit/s
Testing upload speed....................................................................................................
Upload: 31.62 Mbit/s
Test Disk-IO using dd (sequential read/write)
$ dd if=/dev/zero of=/tmp/testfile bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 6.14984 s, 175 MB/s
Random Read/Write test using fio tool
$ sudo install fio
$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
Reference:
https://www.binarylane.com.au/support/solutions/articles/1000055889-how-to-benchmark-disk-i-o
Source a file as stream
Randomly subset from a text file and write to a new file. Size of the subset is randomly selected to a number within 10.
$ shuf -n $(($RANDOM % 10)) ~/tweets.small.json > /user/mapr/tweets_raw/$(date +%s).json
Looping in Shell
$ for i in `seq 1 10`; do echo $i; done
Looping with delay of 1 sec
$ for i in `seq 1 10`; do echo $i; sleep 1 ; done
Printing a file sequentially from a file with a delay of one second
$ for i in `seq 1 10`; do head -n $i ~/tweets.small.json | tail -n 1; sleep 1; done
SSH keygen (quiet mode)
ssh-keygen -q -t rsa -N '' -f ./id_rsa <<<y >/dev/null 2>&1
Using grep and awk
$ head /data/ml-latest-small/movies.csv
movieId,title,genres
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji (1995),Adventure|Children|Fantasy
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995),Comedy
6,Heat (1995),Action|Crime|Thriller
7,Sabrina (1995),Comedy|Romance
8,Tom and Huck (1995),Adventure|Children
9,Sudden Death (1995),Action
Find lines that contain "comedy"
$ cat /data/ml-latest-small/movies.csv | grep -i "comedy" | head
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995),Comedy
7,Sabrina (1995),Comedy|Romance
11,"American President, The (1995)",Comedy|Drama|Romance
12,Dracula: Dead and Loving It (1995),Comedy|Horror
18,Four Rooms (1995),Comedy
19,Ace Ventura: When Nature Calls (1995),Comedy
20,Money Train (1995),Action|Comedy|Crime|Drama|Thriller
Show the movie title in the above output.
$ cat /data/ml-latest-small/movies.csv | grep -i "comedy" | awk -F "," '{print $2}' | head
Toy Story (1995)
Grumpier Old Men (1995)
Waiting to Exhale (1995)
Father of the Bride Part II (1995)
Sabrina (1995)
"American President
Dracula: Dead and Loving It (1995)
Four Rooms (1995)
Ace Ventura: When Nature Calls (1995)
Money Train (1995)
Looks like some names have been clipped off.Possibly because the title contains comma, and the field splitted by that comma. Lets ask - does each contain 3 fields - id, title and genre?
$ cat /data/ml-latest-small/movies.csv | grep -i "comedy" | awk -F "," 'NF != 3 {print}' | head
11,"American President, The (1995)",Comedy|Drama|Romance
54,"Big Green, The (1995)",Children|Comedy
58,"Postman, The (Postino, Il) (1994)",Comedy|Drama|Romance
119,"Steal Big, Steal Little (1995)",Comedy
141,"Birdcage, The (1996)",Comedy
144,"Brothers McMullen, The (1995)",Comedy
166,"Doom Generation, The (1995)",Comedy|Crime|Drama
203,"To Wong Foo, Thanks for Everything! Julie Newmar (1995)",Comedy
239,"Goofy Movie, A (1995)",Animation|Children|Comedy|Romance
255,"Jerky Boys, The (1995)",Comedy