Wednesday, July 28, 2010

Multithreading with nothing but Korn Shell

Multithreading usually brings C++ or Java to mind. But sometimes we need a "quick and dirty" system that will allow us to run multiple threads to do some work in the background - like database archiving, all sorts of cleanups, etc.
Below is a couple of Shell scripts that allow you to do just that - create multiple threads (or, rather, processes) with Parent thread (process) watching for children, and stopping the whole system when a child is having a problem. Here is the code with comments.


---------- PARENT ---------------

This is a parent shell script - the main thread, that controls all the kids.


#!/usr/bin/ksh

HOME_DIR=/host/linux_dev
LOG_DIR=${HOME_DIR}/logs

export HOME_DIR
export LOG_DIR

#
# Cleanup old files
#
rm ${LOG_DIR}/*

echo PARENTPID=$$ > ${LOG_DIR}/allPids.txt

total_children=3
sleep_before_start=1
sleep_check_interval=2

echo Starting ${total_children} children
i=0
while [[ $i -lt ${total_children} ]]
do
  sleep $sleep_before_start
  child $i >> ${LOG_DIR}/child_log_$i.txt 2>&1 &
  i=$((i+1))
done

sleep $sleep_before_start

. ${LOG_DIR}/allPids.txt

echo Children started, checking on them
while [[ $i -lt 1000 ]]
do
  sleep $sleep_check_interval
  j=0
  while [[ $j -lt ${total_children} ]]
  do
    . ${LOG_DIR}/CHILD_${j}_STATE
    if [[ ${CHILDSTATE} == 'DONE' ]]
    then
      echo CHILD $i is DONE
    elif [[ ${CHILDSTATE} == 'SUCCESS' ]]
    then
      eval "CHILDINSTANCEVAR=CHILD_${j}_PID"
      eval "CHILDINSTANCEPID=\$$CHILDINSTANCEVAR"
      echo "Checking CHILDINSTANCEPID=${CHILDINSTANCEPID}"
      CHILDINSTANCES=`ps auxwww | grep ${CHILDINSTANCEPID} | grep -v grep | wc -l`
      if [[ ${CHILDINSTANCES} -lt 1 ]]
      then
        echo "CHILD $j IS NOT DONE, BUT NOT RUNNING, Exiting"
        exit 1
      else
        echo "CHILD $j is running fine"
      fi
    else
      echo "CHILD $j needs attention, Exiting"
    fi
    j=$((j+1))
  done

  i=$((i+1))
done

------------------------------------ CHILD ----------------------------------

This is a child shell-script. Put the code below into a file called "child"



#!/usr/bin/ksh

THREAD_NUM=$1

echo CHILD_${THREAD_NUM}_PID=$$ >> ${LOG_DIR}/allPids.txt
# actual work

echo Starting doing actual work

cycles=10
sleep_interval=5

i=0
while [[ $i -lt $cycles ]]
do
  #
  # checkng if Parent is alive
  #
  . ${LOG_DIR}/allPids.txt
  PARENTINSTANCES=`ps auxwww | grep ${PARENTPID} | grep -v grep | wc -l`
  if [[ ${PARENTINSTANCES} -lt 1 ]]
  then
    echo "PARENT IS NOT RUNNING, Exiting"
    exit 1
  fi
  #
  # Doing work
  #
  rm ${LOG_DIR}/CHILD_${THREAD_NUM}_STATE
  echo "Working $(date): $i"
  echo "CHILDSTATE=SUCCESS" > ${LOG_DIR}/CHILD_${THREAD_NUM}_STATE
  sleep ${sleep_interval}
  i=$((i+1))
done

echo "CHILDSTATE=DONE" > ${LOG_DIR}/CHILD_${THREAD_NUM}_STATE
echo Thread $1 is DONE
-------------------------------------------------------------------------------
Here is how it all works

1. Parent thread starts children in the background in a loop (the number of children is controlled by total_children=3) by executing command as follows:


child $i >> ${LOG_DIR}/child_log_$i.txt 2>&1 &

each instance of a child is aware of its id (it accepts one parameter $i supplied by parent).


2. Children communicate with the parent by means of files. At the start, the parent writes its PPID into a file (allPids.txt), with every child adding its PPID to the same file. Once all children start, the file is sourced by the parent, as well as by all the children. The line of code in the parent where PPID is written is


echo PARENTPID=$$ > ${LOG_DIR}/allPids.txt


Children add their PPIDs to the communication file by running the command below


echo CHILD_${THREAD_NUM}_PID=$$ >> ${LOG_DIR}/allPids.txt


The resulting file looks as follows

PARENTPID=1234
CHILD_0_PID=1235
CHILD_1_PID=1236
CHILD_2_PID=1237
where 1234, etc. - are PPIDs for each "thread"

With all that done, children start doing their work with parent checking on their PPIDs periodically. If a parent sees that a child's process is not visible, it stops itself. Children check on parent's existence on every loop when they do their work.
If the parent process is not visible, the child stops itself.

Additionally, children can return status of their work into CHILD_${THREAD_NUM}_STATE file. If the status says "ERROR", for example, the parent can also stop, stopping all the children as a result.

In case of DB work (isql for SYBASE), children can return their work status by running something like "select 'CHILDSTATE='+@status", where status is defined by the results of the SQL executed. The status is written into a file that can be sourced and evaluated by a parent. A child can stop itself also, with the result of stopping all other children.

This program can be modified/improved, but serves as a good first step.

Feel free to use this script (or any part of it) anywhere you need.