Below is a couple of Shell scripts that allow you to do just that - create multiple threads (or, rather, processes) with Parent thread (process) watching for children, and stopping the whole system when a child is having a problem. Here is the code with comments.
---------- PARENT ---------------
This is a parent shell script - the main thread, that controls all the kids.
#!/usr/bin/ksh HOME_DIR=/host/linux_dev LOG_DIR=${HOME_DIR}/logs export HOME_DIR export LOG_DIR # # Cleanup old files # rm ${LOG_DIR}/* echo PARENTPID=$$ > ${LOG_DIR}/allPids.txt total_children=3 sleep_before_start=1 sleep_check_interval=2 echo Starting ${total_children} children i=0 while [[ $i -lt ${total_children} ]] do sleep $sleep_before_start child $i >> ${LOG_DIR}/child_log_$i.txt 2>&1 & i=$((i+1)) done sleep $sleep_before_start . ${LOG_DIR}/allPids.txt echo Children started, checking on them while [[ $i -lt 1000 ]] do sleep $sleep_check_interval j=0 while [[ $j -lt ${total_children} ]] do . ${LOG_DIR}/CHILD_${j}_STATE if [[ ${CHILDSTATE} == 'DONE' ]] then echo CHILD $i is DONE elif [[ ${CHILDSTATE} == 'SUCCESS' ]] then eval "CHILDINSTANCEVAR=CHILD_${j}_PID" eval "CHILDINSTANCEPID=\$$CHILDINSTANCEVAR" echo "Checking CHILDINSTANCEPID=${CHILDINSTANCEPID}" CHILDINSTANCES=`ps auxwww | grep ${CHILDINSTANCEPID} | grep -v grep | wc -l` if [[ ${CHILDINSTANCES} -lt 1 ]] then echo "CHILD $j IS NOT DONE, BUT NOT RUNNING, Exiting" exit 1 else echo "CHILD $j is running fine" fi else echo "CHILD $j needs attention, Exiting" fi j=$((j+1)) done i=$((i+1)) done |
------------------------------------ CHILD ----------------------------------
This is a child shell-script. Put the code below into a file called "child"
#!/usr/bin/ksh THREAD_NUM=$1 echo CHILD_${THREAD_NUM}_PID=$$ >> ${LOG_DIR}/allPids.txt # actual work echo Starting doing actual work cycles=10 sleep_interval=5 i=0 while [[ $i -lt $cycles ]] do # # checkng if Parent is alive # . ${LOG_DIR}/allPids.txt PARENTINSTANCES=`ps auxwww | grep ${PARENTPID} | grep -v grep | wc -l` if [[ ${PARENTINSTANCES} -lt 1 ]] then echo "PARENT IS NOT RUNNING, Exiting" exit 1 fi # # Doing work # rm ${LOG_DIR}/CHILD_${THREAD_NUM}_STATE echo "Working $(date): $i" echo "CHILDSTATE=SUCCESS" > ${LOG_DIR}/CHILD_${THREAD_NUM}_STATE sleep ${sleep_interval} i=$((i+1)) done echo "CHILDSTATE=DONE" > ${LOG_DIR}/CHILD_${THREAD_NUM}_STATE echo Thread $1 is DONE |
Here is how it all works
1. Parent thread starts children in the background in a loop (the number of children is controlled by total_children=3) by executing command as follows:
child $i >> ${LOG_DIR}/child_log_$i.txt 2>&1 &
each instance of a child is aware of its id (it accepts one parameter $i supplied by parent).
2. Children communicate with the parent by means of files. At the start, the parent writes its PPID into a file (allPids.txt), with every child adding its PPID to the same file. Once all children start, the file is sourced by the parent, as well as by all the children. The line of code in the parent where PPID is written is
echo PARENTPID=$$ > ${LOG_DIR}/allPids.txt
Children add their PPIDs to the communication file by running the command below
echo CHILD_${THREAD_NUM}_PID=$$ >> ${LOG_DIR}/allPids.txt
The resulting file looks as follows
PARENTPID=1234 CHILD_0_PID=1235 CHILD_1_PID=1236 CHILD_2_PID=1237where 1234, etc. - are PPIDs for each "thread"
With all that done, children start doing their work with parent checking on their PPIDs periodically. If a parent sees that a child's process is not visible, it stops itself. Children check on parent's existence on every loop when they do their work.
If the parent process is not visible, the child stops itself.
Additionally, children can return status of their work into CHILD_${THREAD_NUM}_STATE file. If the status says "ERROR", for example, the parent can also stop, stopping all the children as a result.
In case of DB work (isql for SYBASE), children can return their work status by running something like "select 'CHILDSTATE='+@status", where status is defined by the results of the SQL executed. The status is written into a file that can be sourced and evaluated by a parent. A child can stop itself also, with the result of stopping all other children.
This program can be modified/improved, but serves as a good first step.
Feel free to use this script (or any part of it) anywhere you need.
Very useful. Thanks for the information.
ReplyDeleteExcellent example and explanation. Thank you for taking the time to post this.
ReplyDelete