Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

Solaris Volume Manager Administration Guide
Previous Next

Monitoring Solaris Volume Manager With a cron Job

How to Automate Checking for Errors in Volumes

  • To automatically check your Solaris Volume Manager configuration for errors, create a script that the cron utility can periodically run.

    The following example shows a script that you can adapt and modify for your needs.


    Note - This script serves as a starting point for automating error checking for Solaris Volume Manager. You probably need to modify this script for your own configuration.


    #
    #!/bin/ksh
    #ident "@(#)metacheck.sh   1.3     96/06/21 SMI"
    # ident='%Z%%M%   %I%     %E% SMI'
    #
    # Copyright (c) 1999 by Sun Microsystems, Inc.
    #
    # metacheck
    #
    # Check on the status of the metadevice configuration.  If there is a problem
    # return a non zero exit code.  Depending on options, send email notification.
    #
    # -h
    #    help
    # -s setname
    #    Specify the set to check.  By default, the 'local' set will be checked.
    # -m recipient [recipient...]
    #    Send email notification to the specified recipients.  This
    #    must be the last argument. The notification shows up as a short 
    #    email message with a subject of 
    #        "Solaris Volume Manager Problem: metacheck.who.nodename.setname"
    #    which summarizes the problem(s) and tells how to obtain detailed 
    #    information. The "setname" is from the -s option, "who" is from 
    #    the -w option, and "nodename" is reported by uname(1).
    #    Email notification is further affected by the following options:
    #        -f    to suppress additional messages after a problem 
    #            has been found. 
    #        -d    to control the supression.
    #        -w    to identify who generated the email.
    #        -t    to force email even when there is no problem.
    # -w who
    #    indicate who is running the command. By default, this is the
    #    user-name as reported by id(1M). This is used when sending
    #    email notification (-m).
    # -f 
    #    Enable filtering.  Filtering applies to email notification (-m).
    #    Filtering requires root permission.  When sending email notification
    #    the file /etc/lvm/metacheck.setname.pending is used to 
    #    controll the filter.  The following matrix specifies the behavior
    #    of the filter:
    #
    #    problem_found    file_exists
    #      yes          no        Create file, send notification
    #      yes          yes        Resend notification if the current date 
    #                    (as specified by -d datefmt) is 
    #                    different than the file date.
    #      no          yes        Delete file, send notification 
    #                    that the problem is resolved.
    #      no          no        Send notification if -t specified.
    #    
    # -d datefmt
    #    Specify the format of the date for filtering (-f).  This option 
    #    controls the how often re-notification via email occurs. If the 
    #    current date according to the specified format (strftime(3C)) is 
    #    identical to the date contained in the 
    #    /etc/lvm/metacheck.setname.pending file then the message is 
    #    suppressed. The default date format is "%D", which will send one 
    #    re-notification per day.
    # -t
    #    Test mode.  Enable email generation even when there is no problem.
    #    Used for end-to-end verification of the mechanism and email addresses.
    #    
    #
    # These options are designed to allow integration of metacheck
    # into crontab.  For example, a root crontab entry of:
    #
    # 0,15,30,45 * * * * /usr/sbin/metacheck -f -w SVMcron \
    #   -d '\%D \%h' -m [email protected] [email protected]
    #
    # would check for problems every 15 minutes, and generate an email to
    # [email protected] (and send to an email pager service) every hour when 
    # there is a problem.  Note the \ prior to the '%' characters for a 
    # crontab entry.  Bounced email would come back to root@nodename.
    # The subject line for email generated by the above line would be
    # Solaris Volume Manager Problem: metacheck.SVMcron.nodename.local
    #
    
    # display a debug line to controlling terminal (works in pipes)
    decho()
    {
        if [ "$debug" = "yes" ] ; then
        echo "DEBUG: $*"    < /dev/null > /dev/tty 2>&1
        fi
    }
    
    # if string $1 is in $2-* then return $1, else return ""
    strstr()
    {
        typeset    look="$1"
        typeset    ret=""
    
        shift
    #   decho "strstr LOOK .$look. FIRST .$1."
        while [ $# -ne 0 ] ; do
        if [ "$look" = "$1" ] ; then
            ret="$look"
        fi
        shift
        done
        echo "$ret"
    }
    
    # if string $1 is in $2-* then delete it. return result
    strdstr()
    {
        typeset    look="$1"
        typeset    ret=""
    
        shift
    #   decho "strdstr LOOK .$look. FIRST .$1."
        while [ $# -ne 0 ] ; do
        if [ "$look" != "$1" ] ; then
            ret="$ret $1"
        fi
        shift
        done
        echo "$ret"
    }
    
    merge_continued_lines()
    {
        awk -e '\
        BEGIN { line = "";} \
        $NF == "\\" { \
            $NF = ""; \
            line = line $0; \
            next; \
        } \
        $NF != "\\" { \
            if ( line != "" ) { \
            print line $0; \
            line = ""; \
            } else { \
            print $0; \
            } \
        }'
    }
    
    # trim out stuff not associated with metadevices
    find_meta_devices()
    {
        typeset    devices=""
    
    #   decho "find_meta_devices .$*."
        while [ $# -ne 0 ] ; do
        case $1 in
        d+([0-9]) )    # metadevice name
            devices="$devices $1"
            ;;
        esac
        shift
        done
        echo "$devices"
    }
    
    # return the list of top level metadevices
    toplevel()
    {
        typeset    comp_meta_devices=""
        typeset    top_meta_devices=""
        typeset    devices=""
        typeset    device=""
        typeset    comp=""
    
        metastat$setarg -p | merge_continued_lines | while read line ; do
        echo "$line"
        devices=`find_meta_devices $line`
        set -- $devices
        if [ $# -ne 0 ] ; then
            device=$1
            shift
            # check to see if device already refered to as component
            comp=`strstr $device $comp_meta_devices`
            if [ -z $comp ] ; then 
            top_meta_devices="$top_meta_devices $device"
            fi
            # add components to component list, remove from top list
            while [ $# -ne 0 ] ; do
            comp=$1
            comp_meta_devices="$comp_meta_devices $comp"
            top_meta_devices=`strdstr $comp $top_meta_devices`
            shift
            done
        fi
        done > /dev/null 2>&1
        echo $top_meta_devices
    }
    
    #
    # - MAIN
    #
    METAPATH=/usr/sbin
    PATH=//usr/bin:$METAPATH
    USAGE="usage: metacheck [-s setname] [-h] [[-t] [-f [-d datefmt]] \
        [-w who] -m recipient [recipient...]]"
    
    datefmt="%D"
    debug="no"
    filter="no"
    mflag="no"
    set="local"
    setarg=""
    testarg="no"
    who=`id | sed -e 's/^uid=[0-9][0-9]*(//' -e 's/).*//'`
    
    while getopts d:Dfms:tw: flag
    do
        case $flag in
        d)    datefmt=$OPTARG;
        ;;
        D)    debug="yes"
        ;;
        f)    filter="yes"
        ;;
        m)    mflag="yes"
        ;;
        s)    set=$OPTARG;
        if [ "$set" != "local" ] ; then
            setarg=" -s $set";
        fi
        ;;
        t)    testarg="yes";
        ;;
        w)    who=$OPTARG;
        ;;
        \?)    echo $USAGE
        exit 1
        ;;
        esac
    done
    
    # if mflag specified then everything else part of recipient
    shift `expr $OPTIND - 1`
    if [ $mflag = "no" ] ; then
        if [ $# -ne 0 ] ; then 
        echo $USAGE
        exit 1
        fi
    else
        if [ $# -eq 0 ] ; then 
        echo $USAGE
        exit 1
        fi
    fi
    recipients="$*"
    
    curdate_filter=`date +$datefmt`
    curdate=`date`
    node=`uname -n`
    
    # establish files
    msg_f=/tmp/metacheck.msg.$$
    msgs_f=/tmp/metacheck.msgs.$$
    metastat_f=/tmp/metacheck.metastat.$$
    metadb_f=/tmp/metacheck.metadb.$$
    metahs_f=/tmp/metacheck.metahs.$$
    pending_f=/etc/lvm/metacheck.$set.pending 
    files="$metastat_f $metadb_f $metahs_f $msg_f $msgs_f"
    
    rm -f $files                            > /dev/null 2>&1
    trap "rm -f $files > /dev/null 2>&1; exit 1" 1 2 3 15
    
    # Check to see if metadb is capable of running
    have_metadb="yes"
    metadb$setarg                             > $metadb_f 2>&1
    if [ $? -ne 0 ] ; then
        have_metadb="no"
    fi
    grep "there are no existing databases"      < $metadb_f    > /dev/null 2>&1
    if [ $? -eq 0 ] ; then
        have_metadb="no"
    fi
    grep "/dev/md/admin"                < $metadb_f    > /dev/null 2>&1
    if [ $? -eq 0 ] ; then
        have_metadb="no"
    fi
    
    # check for problems accessing metadbs
    retval=0
    if [ "$have_metadb" = "no" ] ; then
        retval=1
        echo "metacheck: metadb problem, can't run '$METAPATH/metadb$setarg'" \
                                    >> $msgs_f
    else
        # snapshot the state
        metadb$setarg 2>&1 | sed -e '1d' | merge_continued_lines    > $metadb_f
        metastat$setarg 2>&1 | merge_continued_lines        > $metastat_f
        metahs$setarg -i 2>&1 | merge_continued_lines        > $metahs_f
    
        #
        # Check replicas for problems, capital letters in the flags
        # indicate an error, fields are seperated by tabs.
        #
        problem=`awk < $metadb_f -F\t '{if ($1 ~ /[A-Z]/) print $1;}'`
        if [ -n "$problem" ] ; then
        retval=`expr $retval + 64`
        echo "\
    metacheck: metadb problem, for more detail run:\n\t$METAPATH/metadb$setarg -i" \
                                    >> $msgs_f
        fi
    
        #
        # Check the metadevice state
        #
        problem=`awk < $metastat_f -e \
            '/State:/ {if ($2 != "Okay" && $2 != "Resyncing") print $0;}'`
        if [ -n "$problem" ] ; then
        retval=`expr $retval + 128`
        echo "\
    metacheck: metadevice problem, for more detail run:" \
                                    >> $msgs_f
    
        # refine the message to toplevel metadevices that have a problem
        top=`toplevel`
        set -- $top
        while [ $# -ne 0 ] ; do
            device=$1
            problem=`metastat $device | awk -e \
            '/State:/ {if ($2 != "Okay" && $2 != "Resyncing") print $0;}'`
            if [ -n "$problem" ] ; then
            echo "\t$METAPATH/metastat$setarg $device"    >> $msgs_f
            # find out what is mounted on the device
            mp=`mount|awk -e '/\/dev\/md\/dsk\/'$device'[ \t]/{print $1;}'`
            if [ -n "$mp" ] ; then
                echo "\t\t$mp mounted on $device"        >> $msgs_f
            fi
            fi
            shift
        done
        fi
    
        #
        # Check the hotspares to see if any have been used.
        #
        problem=""
        grep "no hotspare pools found"    < $metahs_f        > /dev/null 2>&1
        if [ $? -ne 0 ] ; then
        problem=`awk < $metahs_f -e \
            '/blocks/ { if ( $2 != "Available" ) print $0;}'`
        fi
        if [ -n "$problem" ] ; then
        retval=`expr $retval + 256`
        echo "\
    metacheck: hot spare in use, for more detail run:\n\t$METAPATH/metahs$setarg -i" \
                                     >> $msgs_f
        fi
    fi
    
    # If any errors occurred, then mail the report
    if [ $retval -ne 0 ] ; then
        if [ -n "$recipients" ] ; then 
        re=""
        if [ -f $pending_f ] && [ "$filter" = "yes" ] ; then
            re="Re: "
            # we have a pending notification, check date to see if we resend
            penddate_filter=`cat $pending_f | head -1`
            if [ "$curdate_filter" != "$penddate_filter" ] ; then
            rm -f $pending_f                > /dev/null 2>&1
            else
             if [ "$debug" = "yes" ] ; then
                echo "metacheck: email problem notification still pending"
                cat $pending_f
            fi
            fi
        fi
        if [ ! -f $pending_f ] ; then
            if [ "$filter" = "yes" ] ; then
            echo "$curdate_filter\n\tDate:$curdate\n\tTo:$recipients" \
                                    > $pending_f
            fi
            echo "\
    Solaris Volume Manager: $node: metacheck$setarg: Report: $curdate"        >> $msg_f
            echo "\
    --------------------------------------------------------------" >> $msg_f
            cat $msg_f $msgs_f | mailx -s \
            "${re}Solaris Volume Manager Problem: metacheck.$who.$set.$node" $recipients
        fi
        else
        cat $msgs_f
        fi
    else
        # no problems detected,
        if [ -n "$recipients" ] ; then
        # default is to not send any mail, or print anything.
        echo "\
    Solaris Volume Manager: $node: metacheck$setarg: Report: $curdate"        >> $msg_f
        echo "\
    --------------------------------------------------------------" >> $msg_f
        if [ -f $pending_f ] && [ "$filter" = "yes" ] ; then
            # pending filter exista, remove it and send OK
            rm -f $pending_f                    > /dev/null 2>&1
            echo "Problem resolved"                >> $msg_f
            cat $msg_f | mailx -s \
            "Re: Solaris Volume Manager Problem: metacheck.$who.$node.$set" $recipients
        elif [ "$testarg" = "yes" ] ; then
            # for testing, send mail every time even thought there is no problem
            echo "Messaging test, no problems detected"        >> $msg_f
            cat $msg_f | mailx -s \
            "Solaris Volume Manager Problem: metacheck.$who.$node.$set" $recipients
        fi
        else
        echo "metacheck: Okay"
        fi
    fi
    
    rm -f $files                            > /dev/null 2>&1
    exit $retval

    For information on invoking scripts by using the cron utility, see the cron(1M) man page.

Previous Next

 
 
  Published under the terms fo the Public Documentation License Version 1.01. Design by Interspire