Below is a couple of Shell scripts that allow you to do just that - create multiple threads (or, rather, processes) with Parent thread (process) watching for children, and stopping the whole system when a child is having a problem. Here is the code with comments.
---------- PARENT ---------------
This is a parent shell script - the main thread, that controls all the kids.
#!/usr/bin/ksh
HOME_DIR=/host/linux_dev
LOG_DIR=${HOME_DIR}/logs
export HOME_DIR
export LOG_DIR
#
# Cleanup old files
#
rm ${LOG_DIR}/*
echo PARENTPID=$$ > ${LOG_DIR}/allPids.txt
total_children=3
sleep_before_start=1
sleep_check_interval=2
echo Starting ${total_children} children
i=0
while [[ $i -lt ${total_children} ]]
do
sleep $sleep_before_start
child $i >> ${LOG_DIR}/child_log_$i.txt 2>&1 &
i=$((i+1))
done
sleep $sleep_before_start
. ${LOG_DIR}/allPids.txt
echo Children started, checking on them
while [[ $i -lt 1000 ]]
do
sleep $sleep_check_interval
j=0
while [[ $j -lt ${total_children} ]]
do
. ${LOG_DIR}/CHILD_${j}_STATE
if [[ ${CHILDSTATE} == 'DONE' ]]
then
echo CHILD $i is DONE
elif [[ ${CHILDSTATE} == 'SUCCESS' ]]
then
eval "CHILDINSTANCEVAR=CHILD_${j}_PID"
eval "CHILDINSTANCEPID=\$$CHILDINSTANCEVAR"
echo "Checking CHILDINSTANCEPID=${CHILDINSTANCEPID}"
CHILDINSTANCES=`ps auxwww | grep ${CHILDINSTANCEPID} | grep -v grep | wc -l`
if [[ ${CHILDINSTANCES} -lt 1 ]]
then
echo "CHILD $j IS NOT DONE, BUT NOT RUNNING, Exiting"
exit 1
else
echo "CHILD $j is running fine"
fi
else
echo "CHILD $j needs attention, Exiting"
fi
j=$((j+1))
done
i=$((i+1))
done
|
------------------------------------ CHILD ----------------------------------
This is a child shell-script. Put the code below into a file called "child"
#!/usr/bin/ksh
THREAD_NUM=$1
echo CHILD_${THREAD_NUM}_PID=$$ >> ${LOG_DIR}/allPids.txt
# actual work
echo Starting doing actual work
cycles=10
sleep_interval=5
i=0
while [[ $i -lt $cycles ]]
do
#
# checkng if Parent is alive
#
. ${LOG_DIR}/allPids.txt
PARENTINSTANCES=`ps auxwww | grep ${PARENTPID} | grep -v grep | wc -l`
if [[ ${PARENTINSTANCES} -lt 1 ]]
then
echo "PARENT IS NOT RUNNING, Exiting"
exit 1
fi
#
# Doing work
#
rm ${LOG_DIR}/CHILD_${THREAD_NUM}_STATE
echo "Working $(date): $i"
echo "CHILDSTATE=SUCCESS" > ${LOG_DIR}/CHILD_${THREAD_NUM}_STATE
sleep ${sleep_interval}
i=$((i+1))
done
echo "CHILDSTATE=DONE" > ${LOG_DIR}/CHILD_${THREAD_NUM}_STATE
echo Thread $1 is DONE
|
Here is how it all works
1. Parent thread starts children in the background in a loop (the number of children is controlled by total_children=3) by executing command as follows:
child $i >> ${LOG_DIR}/child_log_$i.txt 2>&1 &
each instance of a child is aware of its id (it accepts one parameter $i supplied by parent).
2. Children communicate with the parent by means of files. At the start, the parent writes its PPID into a file (allPids.txt), with every child adding its PPID to the same file. Once all children start, the file is sourced by the parent, as well as by all the children. The line of code in the parent where PPID is written is
echo PARENTPID=$$ > ${LOG_DIR}/allPids.txt
Children add their PPIDs to the communication file by running the command below
echo CHILD_${THREAD_NUM}_PID=$$ >> ${LOG_DIR}/allPids.txt
The resulting file looks as follows
PARENTPID=1234 CHILD_0_PID=1235 CHILD_1_PID=1236 CHILD_2_PID=1237where 1234, etc. - are PPIDs for each "thread"
With all that done, children start doing their work with parent checking on their PPIDs periodically. If a parent sees that a child's process is not visible, it stops itself. Children check on parent's existence on every loop when they do their work.
If the parent process is not visible, the child stops itself.
Additionally, children can return status of their work into CHILD_${THREAD_NUM}_STATE file. If the status says "ERROR", for example, the parent can also stop, stopping all the children as a result.
In case of DB work (isql for SYBASE), children can return their work status by running something like "select 'CHILDSTATE='+@status", where status is defined by the results of the SQL executed. The status is written into a file that can be sourced and evaluated by a parent. A child can stop itself also, with the result of stopping all other children.
This program can be modified/improved, but serves as a good first step.
Feel free to use this script (or any part of it) anywhere you need.
Very useful. Thanks for the information.
ReplyDeleteExcellent example and explanation. Thank you for taking the time to post this.
ReplyDelete