Useful Linux Commands

User and Group management


Add user (abasar) with home directory with sudoer permission.

sudo useradd abasar -d /home/abasar

sudo usermod -aG wheel abasar


Additional commands to copy the existing public to to the new user. This is especially helpful when you want to switch from default user to to your preferred username.

sudo mkdir /home/abasar/.ssh

sudo cp ~/.ssh/authorized_keys /home/abasar/.ssh/

sudo chown abasar:abasar -R /home/abasar/.ssh/

sudo chmod 700 -R /home/abasar/.ssh/

sudo chmod 600 /home/abasar/.ssh/authorized_keys

sudo ls -la /home/abasar/.ssh

Add public ssh key to the new user so that you can use the default ssh key for login.

ssh -i ~/.ssh/key.pem abasar@einext03

cat ~/.ssh/id_rsa.pub | ssh -i key.pem abasar@einext03 'cat >> .ssh/authorized_keys && echo "Key copied"'

ssh abasar@einext03


Add an existing user to a group

$ usermod -G <groupname> <username>

Check members of a group

$ cat /etc/groups | grep -i <group name>

Check groups a user belong to

$ groups <username>

Package management

Base repo http://fedoraproject.org/wiki/EPEL

Update existing packages

$ sudo yum update

Search yum repos

$ sudo yum search <package name>

Install

$ sudo yum install <package name> -y

Yum download only

$ sudo yum install yum-downloadonly

$ yum install --downloadonly --downloaddir=<directory> <package>

Install from local rpm

$ sudo yum localinstall <package name> -y

Remove

$ sudo yum remove <package name>

View installed packages

$ sudo yum list installed

View configured repos list

$ sudo yum repolist -v

$ sudo ls -l /etc/yum.repos.d/

Find details of a installed package

$ sudo rpm -ql package-name

Add a new repo

- Download .repo file in /etc/yum.repos.d/

- Optionally add pgp key to verify the downloaded packages using rpm -import command.

Example: http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_cdh5_install.html#topic_4_4_2

Tar and untar

Tar a directory and compress the tar file with gzip

$ tar -zcf anaconda3.tar.gz anaconda3

Untar

$ tar xf anaconda3.tar.gz

Explore Processes

View Running jvm processes

$ sudo jps -lv

View java thread dump

$ jstack <pid>

Find whether a given process is running

$ sudo ps -ef | grep -i <process name e.g. java>

Tools to automatically restart process daemons: daemontools, supervisor


Resources Used by a process


Show CPU and memory every 1 second

while true; do ps -p $PID -o %cpu,%mem ; sleep 1; done

Process details

ps -f -p $PID

CPU/memory usage

pidstat -p $PID 3

Disk read/write

sudo iotop -p $PID

Find ports used

lsof -i | grep $PID

Files accessed

sudo strace -f -t -e trace=file -p $PID

OS limits

sudo cat /proc/$PID/limits

Process status

sudo cat /proc/$PID/status

Command line argument

sudo cat /proc/$PID/cmdline


Network usage by network interface (output shows pid)

dstat -dnyc -N <interface> -C total -f 5

Network usage (not by a given process, but output shows pid)

sudo nethogs



Resource Utilization

View disk io for each processes

$ sudo yum install iotop -y

$ sudo iotop -o

Disk io at the disk level

$ sudo yum install sysstat -y

$ sudo iostat -dx 5

# Options #

d : Display the Disk utilization

x : Display extended statistics

5 : Interval in seconds

View mem and CPU utilization

$ top

View memory information

$ cat /proc/meminfo

View network utilization

$ yum install iftop -y

$ sudo iftop -n

There are several other commands. See the list below.

iftop -- shows current open connections and their transfer rates

vmstat -- shows virtual memory status (found in procps package)

iostat -- shows current IO transfer rate by devices (found in sysstat package)

dstat -- combines vmstat, iostat and iostat-like info for network IO

iotop -- like top, but the focus is on IO transfer rate

lsof -- get info from currently open files

fuser -- does a subset of lsof: identify processes and users using certain files


Find file access by a process id

$ sudo strace -f -t -e trace=file -p <pid>




Disk and memory utilization using sar command.

Memory Utilization


$ sar -p -r 1 10

Linux 4.15.0-48-generic (einext02) 05/29/2020 _x86_64_ (8 CPU)


08:49:22 AM kbmemfree kbavail kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty

08:49:23 AM 45064688 51654268 20790624 31.57 171280 6977412 22193668 16.71 18861288 1541152 168

08:49:24 AM 45085708 51675288 20769604 31.54 171280 6977412 22193668 16.71 18839796 1541152 36

08:49:25 AM 45085708 51675288 20769604 31.54 171280 6977412 22193668 16.71 18839796 1541152 36

08:49:26 AM 45085708 51675288 20769604 31.54 171280 6977412 22193668 16.71 18839796 1541152 36

08:49:27 AM 45085708 51675284 20769604 31.54 171296 6977392 22193668 16.71 18839800 1541148 36

08:49:28 AM 45085628 51675216 20769684 31.54 171296 6977412 22226616 16.73 18839808 1541152 436

08:49:29 AM 45085932 51675516 20769380 31.54 171300 6977412 22226616 16.73 18839816 1541152 444

08:49:30 AM 45085932 51675516 20769380 31.54 171300 6977412 22226616 16.73 18839816 1541152 444

08:49:31 AM 45085932 51675516 20769380 31.54 171300 6977412 22226616 16.73 18840140 1541152 444

08:49:32 AM 45085932 51675520 20769380 31.54 171308 6977408 22226616 16.73 18840140 1541156 444

Average: 45083688 51673270 20771624 31.54 171292 6977410 22210142 16.72 18842020 1541152 252


Disk utilization

$ sar -p -d 1 3

Linux 4.15.0-48-generic (einext02) 05/29/2020 _x86_64_ (8 CPU)


08:52:01 AM DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util

08:52:02 AM loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

08:52:02 AM sda 3.00 0.00 72.00 24.00 0.04 14.67 14.67 4.40

08:52:02 AM sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


08:52:02 AM DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util

08:52:03 AM loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

08:52:03 AM sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

08:52:03 AM sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


08:52:03 AM DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util

08:52:04 AM loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

08:52:04 AM sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

08:52:04 AM sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


Average: DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util

Average: loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Average: sda 1.00 0.00 24.00 24.00 0.01 14.67 14.67 1.47

Average: sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00





Upgrade gcc to 4.9

sudo yum install centos-release-scl

sudo yum install devtoolset-6

sudo yum install devtoolset-3-gcc devtoolset-3-gcc-c++

scl enable devtoolset-6 bash

Verify gcc version

$ gcc --version

gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)

Copyright (C) 2014 Free Software Foundation, Inc.

This is free software; see the source for copying conditions. There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE

Services

Find exact name of available services

$ sudo ls -l /etc/init.d/<first few letters of name e.g. hadoop>*

Start/Stop/Restart/Find status of a service

$ sudo service <name of a service e.g. hadoop-hdfs-namenode> [start|stop|restart|status]

Start/Stop/Restart/Find status of a service that matches a pattern

$ for service in /etc/init.d/hadoop*; do sudo $service status;done

System Activity Information

$ sudo sar

Run a command every n seconds

$ watch -n 1 cat /proc/meminfo

Move file using rsync

$ rsync -avhuz --progress --rsh="ssh -p2222" Downloads training@localhost:~

You can also set environment variable for SSH port.

$ export RSYNC_CONNECT_PROG='ssh -p2222'


If you want to delete the files in the target that have been deleted at source, add --delete argument.


Pipe Commands

$ aws logs describe-log-groups --output text | cut -f 4 | while read -r line; do aws logs delete-log-group --log-group-name $line; done


Copy only .pdf files from one directory

$ rsync -avm --include='*.pdf' --include='*/' --exclude='*' --prune-empty-dirs /source /destination

Synchronize two directory every second

$ while sleep 1; do rsync -avuz /Users/user01/workspace/scala/heavy --exclude "target" user01@server01:/home/user01/workspace/scala; done

Clush

Clush is an open source tool that allows you to execute commands in parallel across the nodes in the cluster

$ sudo -i # Login as root

$ yum install clustershell -y

$ vi /etc/clustershell/groups

Add the cluster nodes

all: server[01-04]

Run commands to all nodes in the cluster

$ clush -a date

To copy file /etc/hadoop/conf/core-site.xml to all cluster nodes .

$ clush -a -c /etc/hadoop/conf/core-site.xml

Verify :

$ clush -ab ls -l /etc/hadoop/conf/

Find external IP

$ dig +short myip.opendns.com @resolver1.opendns.com

or

$ curl -4 icanhazip.com

Text Editing

Compare two files

$ vim -d file1 file2

Find and replace text

:%s/find_string/replace_with string/gc

Working with directories and files

View directory listing

$ ls

Change directory

$ cd <dir name, absolute path or relative path>

Know current directory

$ pwd

Create directory

$ mkdir -p <directory you want to create... with -p creates directory layers>

Move or Rename file/dir

$ mv <source path> <destination path>

Delete a file /dir

$ rm -rf <file/dir path>


Search files by name

$ find ./ -name "*.py"

Search content of files recursively

$ grep -rn "lookup-by-id" src/app/services/*


Download multiple files using urls in a file

Create a file links.txt and put links in the file - one line per link. Then run the wget command.

$ wget -i links.txt


Download all files mentioned in links on a page

$ wget --no-clobber --convert-links --random-wait -r -p -E -e robots=off -U mozilla http://site/path/


Find ports used in the current system

$ sudo lsof -i -P | grep -i "listen"

$ netstat -an -ptcp | grep LISTEN # mac

$ sudo netstat -pant | grep LISTEN #ubuntu



View binary data

hexdump -C data/version-2/log.1 | head -n 10

00000000 5a 4b 4c 47 00 00 00 02 00 00 00 00 00 00 00 00 |ZKLG............|

00000010 00 00 00 00 fd e3 0b f9 00 00 00 30 01 00 02 d9 |...........0....|

00000020 f1 8a 00 00 00 00 00 00 00 00 00 00 00 00 00 01 |................|

00000030 00 00 01 72 e7 5a 53 bc ff ff ff f6 00 00 13 88 |...r.ZS.........|

00000040 00 00 00 02 00 00 00 00 51 c6 d6 60 42 00 00 00 |........Q..`B...|

00000050 00 36 10 18 99 00 00 00 80 01 00 02 d9 f1 8a 00 |.6..............|

00000060 00 00 00 00 02 00 00 00 00 00 00 00 02 00 00 01 |................|

00000070 72 e7 5a 53 de 00 00 00 01 00 00 00 08 2f 65 78 |r.ZS........./ex|

00000080 61 6d 70 6c 65 00 00 00 24 33 30 33 32 65 39 32 |ample...$3032e92|

00000090 39 2d 63 32 32 66 2d 34 61 33 62 2d 62 61 66 33 |9-c22f-4a3b-baf3|


$ echo "hello world" | hexdump -C

00000000 68 65 6c 6c 6f 20 77 6f 72 6c 64 0a |hello world.|

0000000c



How to Disable ipv6

in /etc/sysctl.conf : net.ipv6.conf.all.disable_ipv6 = 1

in /etc/sysconfig/network : NETWORKING_IPV6=no

in /etc/sysconfig/network-scripts/ifcfg-eth0 : IPV6INIT=”no”

disable iptables6 – chkconfig –level 345 ip6tables off

reboot

Test network speed between 2 machines

Machine 1: Start Netcat to Listen

$ nc -lk 2112 >/dev/null

Machine 2:

$ dd if=/dev/zero bs=16000 count=625 | nc -v <machine 1 ip/name> 2112

Use iperf tool

$ sudo install iperf -y

Machine 1:

$ iperf -s

Machine 2:

$ iperf -c <machine 1 address> -d

Complete guide: http://openmaniak.com/iperf.php

Test Internet speed using speedtest-cli

$ wget https://raw.githubusercontent.com/sivel/speedtest-cli/master/speedtest.py

$ python3 speedtest.py

Retrieving speedtest.net configuration...

Testing from ACT (106.51.31.233)...

Retrieving speedtest.net server list...

Selecting best server based on ping...

Hosted by E-Infrastructure & Entertainment India Pvt. Ltd (Bangalore) [4.55 km]: 12.169 ms

Testing download speed................................................................................

Download: 30.84 Mbit/s

Testing upload speed....................................................................................................

Upload: 31.62 Mbit/s

Test Disk-IO using dd (sequential read/write)

$ dd if=/dev/zero of=/tmp/testfile bs=1G count=1 oflag=direct

1+0 records in

1+0 records out

1073741824 bytes (1.1 GB) copied, 6.14984 s, 175 MB/s

Random Read/Write test using fio tool

$ sudo install fio

$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

Reference:

  • https://www.binarylane.com.au/support/solutions/articles/1000055889-how-to-benchmark-disk-i-o

Source a file as stream

Randomly subset from a text file and write to a new file. Size of the subset is randomly selected to a number within 10.

$ shuf -n $(($RANDOM % 10)) ~/tweets.small.json > /user/mapr/tweets_raw/$(date +%s).json

Looping in Shell

$ for i in `seq 1 10`; do echo $i; done


Looping with delay of 1 sec

$ for i in `seq 1 10`; do echo $i; sleep 1 ; done


Printing a file sequentially from a file with a delay of one second

$ for i in `seq 1 10`; do head -n $i ~/tweets.small.json | tail -n 1; sleep 1; done


Using grep and awk

$ head /data/ml-latest-small/movies.csv

movieId,title,genres

1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy

2,Jumanji (1995),Adventure|Children|Fantasy

3,Grumpier Old Men (1995),Comedy|Romance

4,Waiting to Exhale (1995),Comedy|Drama|Romance

5,Father of the Bride Part II (1995),Comedy

6,Heat (1995),Action|Crime|Thriller

7,Sabrina (1995),Comedy|Romance

8,Tom and Huck (1995),Adventure|Children

9,Sudden Death (1995),Action

Find lines that contain "comedy"

$ cat /data/ml-latest-small/movies.csv | grep -i "comedy" | head

1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy

3,Grumpier Old Men (1995),Comedy|Romance

4,Waiting to Exhale (1995),Comedy|Drama|Romance

5,Father of the Bride Part II (1995),Comedy

7,Sabrina (1995),Comedy|Romance

11,"American President, The (1995)",Comedy|Drama|Romance

12,Dracula: Dead and Loving It (1995),Comedy|Horror

18,Four Rooms (1995),Comedy

19,Ace Ventura: When Nature Calls (1995),Comedy

20,Money Train (1995),Action|Comedy|Crime|Drama|Thriller

Show the movie title in the above output.

$ cat /data/ml-latest-small/movies.csv | grep -i "comedy" | awk -F "," '{print $2}' | head

Toy Story (1995)

Grumpier Old Men (1995)

Waiting to Exhale (1995)

Father of the Bride Part II (1995)

Sabrina (1995)

"American President

Dracula: Dead and Loving It (1995)

Four Rooms (1995)

Ace Ventura: When Nature Calls (1995)

Money Train (1995)

Looks like some names have been clipped off.Possibly because the title contains comma, and the field splitted by that comma. Lets ask - does each contain 3 fields - id, title and genre?

$ cat /data/ml-latest-small/movies.csv | grep -i "comedy" | awk -F "," 'NF != 3 {print}' | head

11,"American President, The (1995)",Comedy|Drama|Romance

54,"Big Green, The (1995)",Children|Comedy

58,"Postman, The (Postino, Il) (1994)",Comedy|Drama|Romance

119,"Steal Big, Steal Little (1995)",Comedy

141,"Birdcage, The (1996)",Comedy

144,"Brothers McMullen, The (1995)",Comedy

166,"Doom Generation, The (1995)",Comedy|Crime|Drama

203,"To Wong Foo, Thanks for Everything! Julie Newmar (1995)",Comedy

239,"Goofy Movie, A (1995)",Animation|Children|Comedy|Romance

255,"Jerky Boys, The (1995)",Comedy