Jump to page titleUNITED STATES
hp.com home products and services support and drivers solutions how to buy
» contact hp


more options
 
hp.com home
End of Jump to page title
HP Services software patches
Jump to content


» software & drivers
» ask Compaq
» reference library
» forums & communities
» support tools
» warranty information
» contact support
» parts
» give us feedback

associated links
» what's new
» contract access
» browse patch tree
» search patch tree
» join mailing list

patches by topic
» DOS
» OpenVMS
» Security
» Tru64 Unix
» Ultrix 32
» Windows
» Windows NT

connection tools
» nameserver lookup
» traceroute
» ping


Find Support Information and Customer Communities for Presario.
Content starts here
OpenVMS VMS73_MEM_CHAN-V0100 Alpha V7.3 Memory Channel ECO Summary
TITLE: OpenVMS VMS73_MEM_CHAN-V0100 Alpha V7.3 Memory Channel ECO Summary
 
NOTE:  An OpenVMS saveset or PCSI installation file is stored
       on the Internet in a self-expanding compressed file.
 
       For OpenVMS savesets, the name of the compressed saveset
       file will be kit_name.a-dcx_vaxexe for OpenVMS VAX or
       kit_name.a-dcx_axpexe for OpenVMS Alpha. Once the OpenVMS
       saveset is copied to your system, expand the compressed
       saveset by typing RUN kitname.dcx_vaxexe or kitname.dcx_alpexe.
 
       For PCSI files, once the PCSI file is copied to your system,
       rename the PCSI file to kitname.pcsi-dcx_axpexe or
       kitname.pcsi-dcx_vaxexe, then it can be expanded by typing
       RUN kitname.pcsi-dcx_axpexe or kitname.pcsi-dcx_vaxexe.  The
       resultant file will be the PCSI installation file which can be
       used to install the ECO.
 

New Kit Date:       30-OCT-2002
Modification Date:  22-JUL-2003
Modification Type:  KIT RELEASED WITH ADDED TEXT


**********************************************************************************
*                    OPENVMS ECO KIT INSTALLATION WARNING                        *
*                    ------------------------------------                        *
*                                                                                *
* After the installation of the VMS73_MEM_CHAN-V0100 ECO kit, systems may        *
* hang when using the Memory Channel SCS-port.  The system will hang  and not    *
* crash, requiring manual intervention and a system-HALT (Console ^P)  to        *
* recover.  This hang only occurs if there is high SCS-data-transfer activity    *
* (MSCP/TMSCP disk/tape serving) with high IPL-8 fork latency on the Memory      *
* Channel target node.                                                           *
*                                                                                *
* TECHNICAL NOTE:                                                                *
* --------------                                                                 *
* The system-hang is the result of VMS timer-queue TQE-element queue linkage     *
* corruption, due to double insertion of the SYS$PMDRIVER TQE.   The VMS         *
* system-hang is manifested as an infinite loop within the timer-service routine,*
* which can never successfully remove the TQE element with corrupted linkage.    *
*                                                                                *
* In order to avoid this system hang problem, HP recommends that customers       *
* who install the  VMS73_MEM_CHAN-V0100 patch kit, and who use Memory            *
* Channel for MSCP disk-serving or TMSP tape-serving, perform the following      *
* steps immediately after installing the kit:                                    *
*                                                                                *
*    1. Delete the [SYS$LDR]SYS$PMDRIVER.EXE image version installed by          *
*	the kit.  This image should be the highest version of the image listed on*
*	the system.  You can verify the image to be deleted by performing the    *
*	following command:                                                       *
*		                                                                 *
*		$ ANALYZE/IMAGE [SYS$LDR]SYS$PMDRIVER.EXE                        *
*                                                                                *
*	The Image Identification Information fields will contain the following   *
*	information:                                                             *
*                                                                                *
*		Image Identification Information                                 *
*			                                                         *
*		image name: "SYS$PMDRIVER"                                       *
*		image file identification:  "X-30"                               *
*		image file build identification:  "X91Y-0060010009"              *
*		link date/time: 25-JUL-2002 00:24:39.72                          *
*		linker identification:  "A11-50"                                 *
*                                                                                *
*    2. During kit installation, the previous                                    *
*	[SYS$LDR]SYS$PMDRIVER.EXE was renamed to                                 *
*	[SYS$LDR]SYS$PMDRIVER.EXE_OLD.  Rename this image back                   *
*	to [SYS$LDR]SYS$PMDRIVER.EXE:                                            *
*                                                                                *
*		$ RENAME [SYS$LDR]SYS$PMDRIVER.EXE_OLD -                         *
*		_$   [SYS$LDR]SYS$PMDRIVER.EXE                                   *
*		                                                                 *
*    3.	Reboot the system.                                                       *
*                                                                                *
*	This  system hang  problem will be corrected in a future ECO kit.        *
*                                                                                *
**********************************************************************************


Copyright (c) Hewlett-Packard Company 2002,2003.  All rights reserved.
                    
OP/SYS:     OpenVMS Alpha V7.3

COMPONENT:  Memory Channel

SOURCE:     Hewlett-Packard Company

ECO INFORMATION:

     ECO Kit Name:  VMS73_MEM_CHAN-V0100
                    DEC-AXPVMS-VMS73_MEM_CHAN-V0100--4.PCSI
     ECO Kits Superseded by This ECO Kit: None
     ECO Kit Approximate Size: 688 Blocks
     Kit Applies To:  OpenVMS Alpha V7.3
     System/Cluster Reboot Necessary: Yes
     Rolling Re-boot Supported:  Yes
     Installation Rating:  INSTALL_2
                             2 : To  be  installed  by   all  customers  using  the  following
	                         feature(s):

                                 Memory Channel
                           
     Kit Dependencies:

       The following remedial kit(s), or later, must be installed BEFORE
       installation of this, or any required kit:

         VMS73_UPDATE-V0100

       In order to receive all the corrections listed in this
       kit, the following remedial kits should also be installed:

         None 


ECO KIT SUMMARY:

An ECO kit exists for Memory Channel on OpenVMS Alpha V7.3.
This kit addresses the following problems: 

PROBLEMS ADDRESSED IN VMS73_MEM_CHAN-V0100 KIT


     o  1.  A Memory Channel virtual-hub (VHUB) will fail to come
            "ONLINE" and form SCS-virtual-circuitlink-up if the Memory
            Channel VHUB VH0/Master node is not booted first, prior to
            booting the VHUB VH1/Slave MC-node

        2.  If a VH0/Master Memory Channel node crashes and/or reboots
            while the VH1/Slave Memory Channel node remains running,
            the Memory Channel link will fail and both VHUB Memory
            Channel nodes MCA0 (and MCB0 if applicable) will remain
            "OFFLINE"

          This MCx0 "OFFLINE" problem may also occur during MCA0/MCB0
          adapter/link error-handling/recovery.

          The following symptoms are manifestations of this MC VHUB BOOT
          "OFFLINE" problem:

          OPA0: console errors:
          --------------------

          %MCA0 CPU00:  19-SEP-2000 04:17:50  Slave but adapter_ok
                        off, retrying.
          %MCA0 CPU00:  19-SEP-2000 04:17:50 MC re-init 5 second timer.
          %MCA0 CPU00:  19-SEP-2000 04:17:55 Slave but adapter_ok
                        off, retrying.
          %MCA0 CPU00:  19-SEP-2000 04:17:55 MC re-init 5 second timer.
                               .
                               .
          ....... after 20 retries ...............
                               .
                               .
          %MCA0 CPU00:  19-SEP-2000 04:18:00 Slave but adapter_ok
                        off, retrying.
          %MCA0 CPU00:  19-SEP-2000 04:18:00 MC re-init 10 minute timer.
          %MCA0 CPU00:  19-SEP-2000 04:28:00 Slave but adapter_ok off,
                        retrying.
          %MCA0 CPU00:  19-SEP-2000 04:28:00 MC re-init 10 minute timer.
                               .
                               .
                               .
          ON REMOTE NODE ATTEMPTING MC SW INIT .........
          MCA0 CPU00:  19-SEP-2000 04:27:50 node state retries exceeded"


          DCL SHOW DEVICE command output:
          -------------------------------
          $ DCL SHOW DVICE MCA0: & PMA0: (& MCB0:/PMB0:) = OFFLINE:

          $ SHOW DEVICE MC
           Device                  Device           Error
            Name                   Status           Count
            MCA0:                   Offline              2
            MCB0:                   Offline             16

          $ SHOW DEVICE PM
           Device                  Device           Error
            Name                   Status           Count
            PMA0:                   Offline              0
            PMB0:                   Offline              0


          Images Affected:[SYS$LDR]SYS$MCDRIVER.EXE


     o  An MC_INCONSTATE (SYS$MCDRIVER) bugcheck may occur during
        local/remote Memory Channel node reboot or Memory Channel
        adapter/Memory Channel link- error-recovery.  This bugcheck
        can occur regardless of the Memory Channel hub configuration:
        VHUB or real-HUB.  The MC_INCONSTATE bugcheck will typically
        occur when a "nested error (MCDRIVER-internal or MC-adapter
        HW-error)" is encountered while recovering from a memory
        channel link error or local/remote memory channel node
        crash/reboot.

        The "MC_INCONSTATE" bugcheck is obvious, and is nearly always
        caused by this "nested error-handling" bug.  A typical MCx0:
        error-log event sequence, and SDA> crash summary are shown
        below:

          MCx0: ERROR-LOG SUMMARY: Unsuccessful events:
          ---------------------------------------------
          MCB0 - Hardware error, reinitializing.
          MCB0 -
                  Node 0:     State:  Uninitialized
                  Node 1:     State:  Uninitialized
          MCB0 - Memory channel link online failure 2
          MCB0 - We shouldn't be here.
                  CRASH - MC_INCONSTATE

          Crashdump Summary Information:
          ------------------------------
          Bugcheck Type:     MC_INCONSTATE, Fatal error
                             detected by Memory Channel
          Failing PC:        FFFFFFFF.E2983A44  SYS$MCDRIVER+0BA44
          Failing PS:        30000000.00000804
          Module:            SYS$MCDRIVER (Link Date/Time:
                              29-DEC-1999 04:09:37.99)
          Offset:            0000BA44


          Images Affected:[SYS$LDR]SYS$MCDRIVER.EXE



     o  Memory Channel Receive channel (RX_MESS_CHAN) message
        processing may hang after processing 512 RX_MESS_CHAN messages
        during a single fork-thread ([MEM_CHAN]MC$HANDLE_MESS_CHAN_INT
        routine).  This could occur with heavy Memory Channel
        SCS-traffic and high IPL-8 fork-thread scheduling latency.  A
        Memory Channel RX_MESS_CHAN message-handling hang will lead to
        CNXMGR/LOCK_MGR stalls (and potential cluster hangs) as well
        as SCS "virtual-circuit timeouts".

          OPA0: CONSOLE PM/MC ERROR MESSAGES:
          -----------------------------------
          %PMA0 CPU00:  ... MC$_CHAN_QUE_EMPTY
                            channel = 541C8  ppd = 83DD4CC0
          %PMA0 CPU00:  ... stall state CLEAR
                            channel = 541C8  ppd = 83DD4CC0
          %MCA0 CPU00:  ... Timeslice exceeded while in workque
                            for node RM763A
          %MCA0 CPU00:  ... Timeslice exceeded while in workque
                            for node RM763A
          %MCA0 CPU00:  ... Timeslice exceeded while in workque
                            for node RM763A
          %PMA0, Virtual Circuit Timeout - REMOTE PORT  xxxx


          SCS VC-TIMEOUT ERRLOG ENTRY:
          ----------------------------
                                   .
                                   .
                                   .
          Error Type/SubType     x4009    Signaled via Packet, Virtual
                                          Circuit Timeout.

          The "...  Timeslice exceeded" error may continue to occur
          after this fix is applied.  However, MC RX_MESS_CHAN
          processing will no longer hang after this event.

          Images Affected:[SYS$LDR]SYS$MCDRIVER.EXE




     o  Following a boot-time Memory Channel C unit-init/self-test
        "LOOPBACK WRITE TEST" failure, which indicates a Memory
        Channel adapter PCI-DMA error, the MCDRIVER will enter an
        infinite HW/SW initialization error-retry loop.  The following
        OPA0:/console errors will be issued at 5 second intervals,
        changing to 10 minute intervals after 20 retries:

          %MCA0 CPU00:  ... MC loopback write interrupt test failed.
          %MCA0 CPU00:  ... Couldn't get mgmt lock.
          %MCA0 CPU00:  ... ERR - ucb offline and adapter not crashing .
          %MCA0 CPU00:  ... Couldn't get mgmt lock.
          %MCA0 CPU00:  ... ERR - ucb offline and adapter not crashing .
          %MCA0 CPU00:  ... Couldn't get mgmt lock.
          %MCA0 CPU00:  ... ERR - ucb offline and adapter not crashing .


          Note:  The first error message occurs on the first pass only.

          Images Affected:[SYS$LDR]SYS$MCDRIVER.EXE




     o  CPUSPINWAIT bugchecks may occur on any GSxxx Alphaserver
        platform (GS140,GS80/160/320) with a Memory Channel-adapter.
        The bugchecks occur due to an eror in the SYS$MCDRIVER
        "MC$ALLOCATE_MESSAGE" routine performing Memory Channel
        message free-queue-header "loopback WRITE", and an incorrect
        timer implementation.  The CPUSPINWAIT bugcheck will always
        involve an SMP$TIMEOUT acquiring the SCS-spinlock while
        another SMP-CPU is holding the SCS-spinlock within the
        SYS$MCDRIVER / [MEM_CHAN]MCCHANNELS.C MC$ALLOCATE_MESSAGE
        routine.

          Crashdump Summary Information:
          ------------------------------
          Bugcheck Type:     CPUSPINWAIT, CPU spinwait timer expired
          Failing PC:        FFFFFFFF.8007A384    SMP$TIMEOUT_C+00064
          Failing PS:        28000000.00000804
          Module:            SYSTEM_SYNCHRONIZATION_MIN
          Offset:            00000384


          NOTE:  The "MC loopback write interrupt test failed" error is
          typically due to a leftover/stale Memory Channel adapter
          PCI-logic error-state that will only clear with a CONSOLE >>>
          INIT operation (to perform PCI-bus RESET).  Users who
          frequently reboot without using the CONSOLE >>> BOOT_RESET =
          ON switch (Environment Variable) or without performing a
          CONSOLE >>> INIT command are susceptible to this "MC loopback
          write test" error.

          Images Affected:[SYS$LDR]SYS$MCDRIVER.EXE





     o  Any SCS-data-transfer of "0-length", using the
        Memory-Channel/MC SCS-port will result in an "INVPTEFMT,
        Invalid page table entry format" bugcheck The bugcheck is
        within IOC_STD$PTETOPFN, as a result of a call to
        IOC_STD$FILSPT from PMDRIVER.C/SETUP_COPY.

          Crashdump Summary Information:
          ------------------------------
          Bugcheck Type:     INVPTEFMT, Invalid page table
                             entry format
          Current Process:   NULL
          Current Image:     
          Failing PC:        FFFFFFFF.800B88FC 
                             IOC_STD$PTETOPFN_C+0008C
          Failing PS:        38000000.00000804
          Module:            IO_ROUTINES (Link Date/Time:
                             13-DEC-2000 00:39:37.49)
          Offset:            000048FC


          Images Affected:[SYS$LDR]SYS$PMDRIVER.EXE




     o  SCS "SEND MESSAGE" (typically LOCK_MGR and MSCP disk commands)
        and SCS data transfer commands, issued over a PM/MC SCS
        virtual circuit (VC), can stall or hang following exhaustion
        of Memory channel "channel-free-queue" entries.  The duration
        of this stall or hang is entirely dependent on SCS-sysap
        traffic and flow-control (SCS "credit") patterns and will
        persist until one of the following occurs:

           o  SCS VC timeout error closes the VC

           o  SCS-sysap sends a message that breaks the stalemate

           o  SCS VC timeout mechanism sends a message that breaks the
              stalemate

           o  PMx0:  SCS-port timeout occurs, crashing the MC port


        This SYS$PMDRIVER MC-SCS-command processing hang/stall can
        occur under the following two conditions:

           -  HANG:  Under heavy and primarily unidirectional loads;

           -  STALL:  Under more bi-directional loads, stalls will
              create low performance over the Memory Channel VC,
              drastically reducing Memory Channel performance under
              load.


        Because this hang/stall will block internode SCS-sysap cluster
        communications, symptoms can be obscure and numerous, or may
        manifest as:

           o  Performance degradation over Memory Channel based SCS VCs

           o  A SCS VC-timeout

           o  A LOCK_MGR stall/hang or performance loss

           o  MSCP served disk command timeouts or disk I/O slowdowns

           o  Customer LOCK_MGR-dependent application stalls, hangs, or
              slowdowns

          Images Affected:[SYS$LDR]SYS$PMDRIVER.EXE




INSTALLATION NOTES:

This kit requires a system reboot.  Compaq strongly recommends that
a  reboot  is performed immediately after kit installation to avoid
system instability

If you have other nodes in your OpenVMS cluster, they must also  be
rebooted  in  order  to make use of the new image(s).  If it is not
possible or convenient to reboot the entire cluster at this time, a
rolling re-boot may be performed.

INSTALLATION INSTRUCTIONS:

Install this kit with the POLYCENTER Software installation utility
by logging into the SYSTEM account, and typing the following at the
DCL prompt:

PRODUCT INSTALL VMS73_MEM_CHAN /SOURCE=[location of Kit]

The kit location may be a tape drive, CD, or a disk directory that
contains the kit.

Additional help on installing PCSI kits can be found by typing
HELP PRODUCT INSTALL at the system prompt

Special Installation Instructions:

     o  Scripting of Answers to Installation Questions

        During installation, this kit will ask and require user
        response to several questions.  If you wish to automate the
        installation of this kit and avoid having to provide responses
        to these questions, you must create a DCL command procedure
        that includes the following definitions and commands:

           -  $ DEFINE/SYS NO_ASK$BACKUP TRUE

           -  $ DEFINE/SYS NO_ASK$REBOOT TRUE

           -  Add the following qualifiers to the PRODUCT INSTALL
              command and add that command to the DCL procedure.

                /PROD=DEC/BASE=AXPVMS/VER=V1.0


           -  De-assign the logicals assigned

        For example, a sample command file to install the
        VMS73_MEM_CHAN-V0100 kit would be:

          $
          $ DEFINE/SYS NO_ASK$BACKUP TRUE
          $ DEFINE/SYS NO_ASK$REBOOT TRUE
          $!
          $ PROD INSTALL VMS73_MEM_CHAN/PROD=DEC/BASE=AXPVMS/VER=V1.0
          $!
          $ DEASSIGN/SYS NO_ASK$BACKUP
          $ DEASSIGN/SYS NO_ASK$REBOOT
          $!
          $ exit

All trademarks are the property of their respective owners.

Files on this server are as follows:
»dec-axpvms-vms73_mem_chan-v0100--4.README
»dec-axpvms-vms73_mem_chan-v0100--4.CHKSUM
»dec-axpvms-vms73_mem_chan-v0100--4.pcsi-dcx_axpexe
privacy statement using this site means you accept its terms