Jump to page titleUNITED STATES
hp.com home products and services support and drivers solutions how to buy
» contact hp

more options
hp.com home
End of Jump to page title
HP Services software patches
Jump to content

» software & drivers
» ask Compaq
» reference library
» forums & communities
» support tools
» warranty information
» contact support
» parts
» give us feedback

associated links
» what's new
» contract access
» browse patch tree
» search patch tree
» join mailing list

patches by topic
» OpenVMS
» Security
» Tru64 Unix
» Ultrix 32
» Windows
» Windows NT

connection tools
» nameserver lookup
» traceroute
» ping

Find Support Information and Customer Communities for Presario.
Content starts here
OpenVMS VMS73_MEM_CHAN-V0200 Alpha V7.3 Memory Channel ECO Summary
TITLE: OpenVMS VMS73_MEM_CHAN-V0200 Alpha V7.3 Memory Channel ECO Summary
NOTE:  An OpenVMS saveset or PCSI installation file is stored
       on the Internet in a self-expanding compressed file.
       For OpenVMS savesets, the name of the compressed saveset
       file will be kit_name.a-dcx_vaxexe for OpenVMS VAX or
       kit_name.a-dcx_axpexe for OpenVMS Alpha. Once the OpenVMS
       saveset is copied to your system, expand the compressed
       saveset by typing RUN kitname.dcx_vaxexe or kitname.dcx_alpexe.
       For PCSI files, once the PCSI file is copied to your system,
       rename the PCSI file to kitname.pcsi-dcx_axpexe or
       kitname.pcsi-dcx_vaxexe, then it can be expanded by typing
       RUN kitname.pcsi-dcx_axpexe or kitname.pcsi-dcx_vaxexe.  The
       resultant file will be the PCSI installation file which can be
       used to install the ECO.


New Kit Date:       22-SEP-2003
Modification Date:  14-OCT-2003
Modification Type:  Kit released with corrected dependency information.

Copyright (c) Hewlett-Packard Company 2002,2003.  All rights reserved.
OP/SYS:     OpenVMS Alpha V7.3

COMPONENT:  Memory Channel

SOURCE:     Hewlett-Packard Company


     ECO Kit Name:  VMS73_MEM_CHAN-V0200
     ECO Kits Superseded by This ECO Kit: Yes
     ECO Kit Approximate Size: 720 Blocks
     Kit Applies To:  OpenVMS Alpha V7.3
     System/Cluster Reboot Necessary: Yes
     Rolling Re-boot Supported:  Yes
     Installation Rating:  INSTALL_2
                             2 : To  be  installed  by   all  customers  using  the  following

                                 Memory Channel
     Kit Dependencies:

       The following remedial kit(s), or later, must be installed BEFORE
       installation of this, or any required kit:


	*The required kit VMS73_UPDATE-V0100 is incorrect.	       *
	*The correct kit required is VMS73_UPDATE-V0200                *
	*                                                              *

       In order to receive all the corrections listed in this
       kit, the following remedial kits should also be installed:




      o  [SYS$LDR]SYS$MCDRIVER.EXE (new image)

         Image Identification Information

         image name: "SYS$MCDRIVER"
         image file identification:  "X-59"
         image file build identification:  "X91Y-0060010000"
         link date/time: 14-MAY-2002 07:04:33.89
         linker identification:  "A11-50"

      o  [SYS$LDR]SYS$PMDRIVER.EXE (new image)

         Image Identification Information

         image name: "SYS$PMDRIVER"
         image file identification:  "X-31"
         image file build identification:  "X91Y-0060010010"
         link date/time: 12-MAY-2003 15:51:28.65
         linker identification:  "A11-50"


New problems addressed in the VMS73_MEM_CHAN-V0200 kit


               After installation of the VMS73_MEM_CHAN-V0100 ECO kit,
               systems may hang when using the Memory Channel SCS-port.
               The system will hang and not crash, requiring manual
               intervention and a system-HALT (Console ^P) to recover.
               This hang only occurs if there is high SCS-data-transfer
               activity (MSCP/TMSCP disk/tape serving) with high IPL-8
               fork latency on the Memory Channel target node.

               A forced operator crash-dump and analysis will reveal the
               OpenVMS EXEC looping within the following routines:

               Primary SMP CPU stuck scanning EXE$GL_TQFL
               TQE-queue; check PCs on CPU-0 stack.

               + SYS$PMDRIVER.EXE:      PM$COMQ_RETRY
               V7.2-2: TQE$L_FPC: SYS$PMDRIVER+13CC0
               SDA> FORMAT/TYPE=TQE @.
               SDA> REPEAT ..........

               The OpenVMS EXE$GL_TQFL TQE-timer-queue will be
               corrupted, typically with the first TQE linked back to

               + SDA> VAL QUE EXE$GL_TQFL

               Occasionally, there will be an ACCVIO within
               TIMESCHDL_xxx (SYSTEM_PRIMITIVES) while servicing

               Images Affected:[SYS$LDR]SYS$PMDRIVER.EXE

Problems addressed in the VMS73_MEM_CHAN-V0100 kit

     o  Memory Channel virtual-hub  (VHUB)  can  fail  to  come

               1.  A Memory Channel virtual-hub (VHUB) will fail to come
                   "ONLINE" and form SCS-virtual-circuitlink-up if the
                   Memory Channel VHUB VH0/Master node is not booted
                   first, prior to booting the VHUB VH1/Slave MC-node

               2.  If a VH0/Master Memory Channel node crashes and/or
                   reboots while the VH1/Slave Memory Channel node
                   remains running, the Memory Channel link will fail
                   and both VHUB Memory Channel nodes MCA0 (and MCB0 if
                   applicable) will remain "OFFLINE"

               This MCx0 "OFFLINE" problem may also occur during
               MCA0/MCB0 adapter/link error-handling/recovery.

               The following symptoms are manifestations of this MC VHUB
               BOOT "OFFLINE" problem:

               OPA0: console errors:

               %MCA0 CPU00:  19-SEP-2000 04:17:50  Slave but adapter_ok
                             off, retrying.
               %MCA0 CPU00:  19-SEP-2000 04:17:50 MC re-init 5 second timer.
               %MCA0 CPU00:  19-SEP-2000 04:17:55 Slave but adapter_ok
                             off, retrying.
               %MCA0 CPU00:  19-SEP-2000 04:17:55 MC re-init 5 second timer.
               ON REMOTE NODE ATTEMPTING MC SW INIT .........
               MCA0 CPU00:  19-SEP-2000 04:27:50 node state retries exceeded"

               DCL SHOW DEVICE command output:

               $ DCL SHOW DVICE MCA0: & PMA0: (& MCB0:/PMB0:) = OFFLINE:

               $ SHOW DEVICE MC
                Device                  Device           Error
                 Name                   Status           Count
                 MCA0:                  Offline           2
                 MCB0:                  Offline           16
               $ SHOW DEVICE PM
                Device                  Device           Error
                 Name                   Status           Count
                 PMA0:                  Offline            0
                 PMB0:                  Offline            0

               Images Affected:[SYS$LDR]SYS$MCDRIVER.EXE


               An MC_INCONSTATE (SYS$MCDRIVER) bugcheck may occur during
               local/remote Memory Channel node reboot or Memory Channel
               adapter/Memory Channel link- error-recovery.  This
               bugcheck can occur regardless of the Memory Channel hub
               configuration:  VHUB or real-HUB.  The MC_INCONSTATE
               bugcheck will typically occur when a "nested error
               (MCDRIVER-internal or MC-adapter HW-error)" is
               encountered while recovering from a memory channel link
               error or local/remote memory channel node crash/reboot.

               The "MC_INCONSTATE" bugcheck is obvious, and is nearly
               always caused by this "nested error-handling" bug.  A
               typical MCx0:  error-log event sequence, and SDA> crash
               summary are shown below:

               MCx0: ERROR-LOG SUMMARY: Unsuccessful events:
               MCB0 - Hardware error, reinitializing.
               MCB0 -
                       Node 0:     State:  Uninitialized
                Node 1:     State:  Uninitialized
               MCB0 - Memory channel link online failure 2
               MCB0 - We shouldn't be here.
                       CRASH - MC_INCONSTATE

               Crashdump Summary Information:
               Bugcheck Type:     MC_INCONSTATE, Fatal error
                                  detected by Memory Channel
               Failing PC:        FFFFFFFF.E2983A44  SYS$MCDRIVER+0BA44
               Failing PS:        30000000.00000804
               Module:            SYS$MCDRIVER (Link Date/Time:

                           29-DEC-1999 04:09:37.99)
               Offset:            0000BA44

               Images Affected:[SYS$LDR]SYS$MCDRIVER.EXE

     o  Memory Channel Receive channel (RX_MESS_CHAN) message
                 processing may hang
               Memory Channel Receive channel (RX_MESS_CHAN) message
               processing may hang after processing 512 RX_MESS_CHAN
               messages during a single fork-thread
               ([MEM_CHAN]MC$HANDLE_MESS_CHAN_INT routine).  This could
               occur with heavy Memory Channel SCS-traffic and high
               IPL-8 fork-thread scheduling latency.  A Memory Channel
               RX_MESS_CHAN message-handling hang will lead to
               CNXMGR/LOCK_MGR stalls (and potential cluster hangs) as
               well as SCS "virtual-circuit timeouts".

               %PMA0 CPU00:  ... MC$_CHAN_QUE_EMPTY
                                 channel = 541C8  ppd = 83DD4CC0
               %PMA0 CPU00:  ... stall state CLEAR
                          channel = 541C8  ppd = 83DD4CC0
               %MCA0 CPU00:  ... Timeslice exceeded
                          while in workque for node RM763A
               %MCA0 CPU00:  ... Timeslice exceeded while in workque
                                 for node RM763A
               %MCA0 CPU00:  ... Timeslice exceeded while in workque
                          for node RM763A
               %PMA0, Virtual Circuit Timeout - REMOTE PORT xxxx

               Error Type/SubType     x4009    Signaled via Packet, Virtual
                                               Circuit Timeout.

               The "...  Timeslice exceeded" error may continue to occur
               after this fix is applied.  However, MC RX_MESS_CHAN
               processing will no longer hang after this event.

               Images Affected:[SYS$LDR]SYS$MCDRIVER.EXE

     o  MCDRIVER enters an infinite Hardware/Software
        initialization error-retry loop

               Following a boot-time Memory Channel C
               unit-init/self-test "LOOPBACK WRITE TEST" failure, which
               indicates a Memory Channel adapter PCI-DMA error, the
               MCDRIVER will enter an infinite HW/SW initialization
               error-retry loop.  The following OPA0:/console errors
               will be issued at 5 second intervals, changing to 10
               minute intervals after 20 retries:

               %MCA0 CPU00:  ... MC loopback write interrupt test failed.
               %MCA0 CPU00:  ... Couldn't get mgmt lock.
               %MCA0 CPU00:  ... ERR - ucb offline and adapter not crashing .
               %MCA0 CPU00:  ... Couldn't get mgmt lock.
               %MCA0 CPU00:  ... ERR - ucb offline and adapter not crashing .
               %MCA0 CPU00:  ... Couldn't get mgmt lock.
               %MCA0 CPU00:  ... ERR - ucb offline and adapter not crashing .

               Note:  The first error message occurs on the first pass

               Images Affected:[SYS$LDR]SYS$MCDRIVER.EXE

     o  System crashes with a CPUSPINWAIT, CPU spinwait timer
        expired bugcheck.

               CPUSPINWAIT bugchecks may occur on any GSxxx Alphaserver
               platform (GS140,GS80/160/320) with a Memory
               Channel-adapter.  The bugchecks occur due to an eror in
               the SYS$MCDRIVER "MC$ALLOCATE_MESSAGE" routine performing
               Memory Channel message free-queue-header "loopback
               WRITE", and an incorrect timer implementation.  The
               CPUSPINWAIT bugcheck will always involve an SMP$TIMEOUT
               acquiring the SCS-spinlock while another SMP-CPU is
               holding the SCS-spinlock within the SYS$MCDRIVER /

               Crashdump Summary Information:
               Bugcheck Type:     CPUSPINWAIT, CPU spinwait timer expired
               Failing PC:        FFFFFFFF.8007A384    SMP$TIMEOUT_C+00064
               Failing PS:        28000000.00000804
               Module:            SYSTEM_SYNCHRONIZATION_MIN
               Offset:            00000384

               NOTE:  The "MC loopback write interrupt test failed"
               error is typically due to a leftover/stale Memory Channel
               adapter PCI-logic error-state that will only clear with a
               CONSOLE >>> INIT operation (to perform PCI-bus RESET).
               Users who frequently reboot without using the CONSOLE >>>
               BOOT_RESET = ON switch (Environment Variable) or without
               performing a CONSOLE >>> INIT command are susceptible to
               this "MC loopback write test" error.

               Images Affected:[SYS$LDR]SYS$MCDRIVER.EXE

     o  System can crash with a INVPTEFMT, Invalid page table
        entry format

               Any SCS-data-transfer of "0-length", using the
               Memory-Channel/MC SCS-port will result in an "INVPTEFMT,
               Invalid page table entry format" bugcheck The bugcheck is
               within IOC_STD$PTETOPFN, as a result of a call to

               Crashdump Summary Information:
               Bugcheck Type:     INVPTEFMT, Invalid page table
                                  entry format
               Current Process:   NULL
               Current Image:     
               Failing PC:        FFFFFFFF.800B88FC
               Failing PS:        38000000.00000804
               Module:            IO_ROUTINES (Link Date/Time:
                           13-DEC-2000 00:39:37.49)
               Offset:            000048FC

               Images Affected:[SYS$LDR]SYS$PMDRIVER.EXE

     o  SCS "SEND MESSAGE" and SCS data transfer commands can
        stall or hang

               SCS "SEND MESSAGE" (typically LOCK_MGR and MSCP disk
               commands) and SCS data transfer commands, issued over a
               PM/MC SCS virtual circuit (VC), can stall or hang
               following exhaustion of Memory channel
               "channel-free-queue" entries.  The duration of this stall
               or hang is entirely dependent on SCS-sysap traffic and
               flow-control (SCS "credit") patterns and will persist
               until one of the following occurs:

                o  SCS VC timeout error closes the VC

                o  SCS-sysap sends a message that breaks the stalemate

                o  SCS VC timeout mechanism sends a message that breaks
                   the stalemate

                o  PMx0:  SCS-port timeout occurs, crashing the MC port

               This SYS$PMDRIVER MC-SCS-command processing hang/stall
               can occur under the following two conditions:

                -  HANG:  Under heavy and primarily unidirectional

                -  STALL:  Under more bi-directional loads, stalls will
                   create low performance over the Memory Channel VC,
                   drastically reducing Memory Channel performance under

               Because this hang/stall will block internode SCS-sysap
               cluster communications, symptoms can be obscure and
               numerous, or may manifest as:

                o  Performance degradation over Memory Channel based SCS

                o  A SCS VC-timeout

                o  A LOCK_MGR stall/hang or performance loss

                o  MSCP served disk command timeouts or disk I/O

                o  Customer LOCK_MGR-dependent application stalls,
                   hangs, or slowdowns

               Images Affected:[SYS$LDR]SYS$PMDRIVER.EXE


This kit requires a system reboot.  Compaq strongly recommends that
a  reboot  is performed immediately after kit installation to avoid
system instability

If you have other nodes in your OpenVMS cluster, they must also  be
rebooted  in  order  to make use of the new image(s).  If it is not
possible or convenient to reboot the entire cluster at this time, a
rolling re-boot may be performed.


Install this kit with the POLYCENTER Software installation utility
by logging into the SYSTEM account, and typing the following at the
DCL prompt:


The kit location may be a tape drive, CD, or a disk directory that
contains the kit.

Additional help on installing PCSI kits can be found by typing
HELP PRODUCT INSTALL at the system prompt

Special Installation Instructions:

     o  Scripting of Answers to Installation Questions

        During installation, this kit will ask and require user
        response to several questions.  If you wish to automate the
        installation of this kit and avoid having to provide responses
        to these questions, you must create a DCL command procedure
        that includes the following definitions and commands:



           -  Add the following qualifiers to the PRODUCT INSTALL
              command and add that command to the DCL procedure.


           -  De-assign the logicals assigned

        For example, a sample command file to install the
        VMS73_MEM_CHAN-V0100 kit would be:

          $ exit

All trademarks are the property of their respective owners.

Files on this server are as follows:
privacy statement using this site means you accept its terms