Order Number: EKAS120SV. A01 This manual is for anyone who services these systems. It includes troubleshooting information, configuration rules, and instructions for removal and replacement of field-replaceable units. Digital Equipment Corporation Maynard, Massachusetts First Printing, January 1998 Digital Equipment Corporation makes no representations that the use of its products in the manner described in this publication will not infringe on existing or future patent rights, nor do the descriptions contained in this publication imply the granting of licenses to make, use, or sell equipment or software in accordance with the description. The information in this document is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may appear in this document. The software, if any, described in this document is furnished under a license and may be used or copied only in accordance with the terms of such license. No responsibility is assumed for the use or reliability of software or equipment that is not supplied by Digital Equipment Corporation or its affiliated companies. Copyright 1998 by Digital Equipment Corporation. All rights reserved. The following are trademarks of Digital Equipment Corporation: AlphaServer, OpenVMS, StorageWorks, VAX, and the DIGITAL logo. The following are third-party trademarks: Lifestyle 28.8 DATA/FAX Modem is a trademark of Motorola, Inc. UNIX is a registered trademark in the U.S. and other countries, licensed exclusively through X/Open Company Ltd. U.S. Robotics and Sportster are registered trademarks of U.S. Robotics. Windows NT is a trademark of Microsoft, Inc. All other trademarks and registered trademarks are the property of their respective holders. FCC Notice: The equipment described in this manual generates, uses, and may emit radio frequency energy. The equipment has been type tested and found to comply with the limits for a Class A digital device pursuant to Part 15 of FCC Rules, which are designed to provide reasonable protection against such radio frequency interference. Operation of this equipment in a residential area may cause interference, in which case the user at his own expense will be required to take whatever measures are required to correct the interference. Shielded Cables: If shielded cables have been supplied or specified, they must be used on the system in order to maintain international regulatory compliance. Warning! This is a Class A product. In a domestic environment this product may cause radio interference, in which case the user may be required to take adequate measures. Achtung! Dieses ist ein Gert der Funkstrgrenzwertklasse A. In Wohnbereichen knnen bei Betrieb dieses Gertes Rundfunkstrungen auftreten, in welchen Fllen der Benutzer fr entsprechende Gegenmanahmen verantwortlich ist. Avertissement! Cet appareil est un appareil de Classe A. Dans un environnement rsidentiel, cet appareil peut provoquer des brouillages radiolectriques. Dans ce cas, il peut tre demand l'utilisateur de prendre les mesures appropries. Contents Preface ...................................................................................... xi Chapter 1 Overview 1.1 System Enclosure ..................................................................................... 1-2 1.2 Operator Control Panel and Drives ........................................................... 1-4 1.3 System Consoles ...................................................................................... 1-6 1.4 System Architecture ................................................................................. 1-8 1.5 CPU Types............................................................................................. 1-10 1.6 Memory ................................................................................................. 1-12 1.7 Memory Addressing............................................................................... 1-14 1.8 System Motherboard .............................................................................. 1-16 1.8.1 System Bus (Backplane).................................................................. 1-18 1.8.2 System Bus to PCI Bus Bridge......................................................... 1-20 1.8.3 PCI I/O Subsystem .......................................................................... 1-22 1.8.4 Remote Control Logic ..................................................................... 1-24 1.8.5 Power Control Logic ....................................................................... 1-26 1.9 Power Circuit and Cover Interlock.......................................................... 1-28 1.10 Power Supply......................................................................................... 1-30 1.11 Power Up/Down Sequence ..................................................................... 1-32 2 Maintenance Bus (I C Bus)..................................................................... 1-34 1.12 1.13 StorageWorks......................................................................................... 1-36 Chapter 2 Power-Up 2.1 Control Panel ........................................................................................... 2-2 2.2 Power-Up Sequence ................................................................................. 2-4 2.3 SROM Power-Up Test Flow..................................................................... 2-8 2.4 SROM Errors Reported .......................................................................... 2-11 2.5 XSROM Power-Up Test Flow ................................................................ 2-12 2.6 XSROM Errors Reported ....................................................................... 2-15 2.7 Console Power-Up Tests ........................................................................ 2-16 2.8 Console Device Determination ............................................................... 2-18 2.9 Console Power-Up Display..................................................................... 2-20 2.10 Fail-Safe Loader..................................................................................... 2-24 iii Chapter 3 Troubleshooting 3.1 Troubleshooting with LEDs...................................................................... 3-2 3.2 Troubleshooting Power Problems ............................................................. 3-4 3.3 Running Diagnostics -- Test Command ................................................... 3-6 3.4 Releasing Secure Mode ............................................................................ 3-7 3.5 Testing an Entire System .......................................................................... 3-8 3.5.1 Testing Memory .............................................................................. 3-10 3.5.2 Testing PCI ..................................................................................... 3-12 3.6 Other Useful Console Commands ........................................................... 3-14 Chapter 4 Error Logs 4.1 Using Error Logs...................................................................................... 4-2 4.1.1 Hard Errors ....................................................................................... 4-4 4.1.2 Soft Errors......................................................................................... 4-4 4.1.3 Error Log Events ............................................................................... 4-5 4.2 Using DECevent ...................................................................................... 4-6 4.2.1 Translating Event Files ...................................................................... 4-7 4.2.2 Filtering Events ................................................................................. 4-8 4.2.3 Selecting Alternative Reports........................................................... 4-10 4.3 Error Log Examples and Analysis........................................................... 4-11 4.3.1 MCHK 670 CPU-Detected Failure................................................... 4-11 4.3.2 MCHK 670 CPU and IOD-Detected Failure..................................... 4-16 4.3.3 MCHK 670 Read Dirty CPU-Detected Failure ................................. 4-21 4.3.4 MCHK 660 IOD-Detected Failure (System Bus Error)..................... 4-27 4.3.5 MCHK 660 IOD-Detected Failure (PCI Error) ................................. 4-32 4.3.6 MCHK 630 Correctable CPU Error.................................................. 4-41 4.3.7 MCHK 620 Correctable Error.......................................................... 4-44 4.4 Troubleshooting IOD-Detected Errors .................................................... 4-47 4.4.1 System Bus ECC Error .................................................................... 4-48 4.4.2 System Bus Nonexistent Address Error............................................ 4-49 4.4.3 System Bus Address Parity Error ..................................................... 4-50 4.4.4 PIO Buffer Overflow Error (PIO_OVFL)......................................... 4-51 4.4.5 Page Table Entry Invalid Error ........................................................ 4-52 4.4.6 PCI Master Abort ............................................................................ 4-52 4.4.7 PCI System Error ............................................................................ 4-52 4.4.8 PCI Parity Error............................................................................... 4-52 4.4.9 Broken Memory .............................................................................. 4-53 4.4.10 Command Codes ............................................................................. 4-55 4.4.11 Node IDs......................................................................................... 4-56 4.5 Double Error Halts and Machine Checks While in PAL Mode ................ 4-57 4.5.1 PALcode Overview ......................................................................... 4-57 4.5.2 Double Error Halt............................................................................ 4-58 iv 4.5.3 Machine Checks While in PAL........................................................ 4-58 Chapter 5 Error Registers 5.1 External Interface Status Register - EL_STAT .......................................... 5-2 5.2 External Interface Address Register - EI_ADDR ....................................... 5-6 5.3 MC Error Information Register 0 (MC_ERR0 - Offset = 800) .................. 5-8 5.4 MC Error Information Register 1 (MC_ERR1 - Offset = 840) .................. 5-9 5.5 CAP Error Register (CAP_ERR - Offset = 880) ..................................... 5-11 5.6 PCI Error Status Register 1 (PCI_ERR1 - Offset = 1040) ........................ 5-14 Chapter 6 Removal and Replacement 6.1 System Safety .......................................................................................... 6-1 6.2 FRU List .................................................................................................. 6-2 6.3 System Exposure...................................................................................... 6-6 6.4 CPU Removal and Replacement ............................................................... 6-8 6.5 CPU Fan Removal and Replacement ...................................................... 6-10 6.6 Memory Riser Card Removal and Replacement ...................................... 6-12 6.7 DIMM Removal and Replacement.......................................................... 6-14 6.8 System Motherboard Removal and Replacement..................................... 6-16 6.9 PCI/EISA Option Removal and Replacement.......................................... 6-18 6.10 Power Supply Removal and Replacement ............................................... 6-20 6.11 Power Harness Removal and Replacement.............................................. 6-22 6.12 System Fan Removal and Replacement................................................... 6-24 6.13 Cover Interlock Removal and Replacement............................................. 6-26 6.14 Operator Control Panel Removal and Replacement ................................. 6-28 6.15 CD-ROM Removal and Replacement ..................................................... 6-30 6.16 Floppy Removal and Replacement.......................................................... 6-32 6.17 SCSI Disk Removal and Replacement .................................................... 6-34 6.18 StorageWorks Backplane Removal and Replacement.............................. 6-36 6.19 StorageWorks Ultra SCSI Bus Extender Removal and Replacement........ 6-38 Appendix A Running Utilities A.1 Running Utilities from a Graphics Monitor .............................................. A-2 A.2 Running Utilities from a Serial Terminal ................................................. A-3 A.3 Running ECU.......................................................................................... A-4 A.4 Running RAID Standalone Configuration Utility ..................................... A-5 A.5 Updating Firmware with LFU.................................................................. A-6 A.5.1 Updating Firmware from the CD-ROM ................................................... A-8 A.5.2 Updating Firmware from the Floppy Disk -- Creating the Diskettes ...... A-12 A.5.3 Updating Firmware from the Floppy Disk -- Performing the Update ..... A-14 A.5.4 Updating Firmware from a Network Device........................................... A-18 A.5.5 LFU Commands................................................................................... A-22 v A.6 Updating Firmware from AlphaBIOS .................................................... A-25 A.7 Upgrading AlphaBIOS .......................................................................... A-26 Appendix B Halts, Console Commands, and Environment Variables B.1 Halt Button Functions.............................................................................. B-2 B.2 Using the Halt Button ............................................................................. B-3 B.3 Halt Assertion ........................................................................................ B-4 B.1 Summary of SRM Console Commands ................................................... B-6 B.1.1 Summary of SRM Environment Variables ............................................... B-8 B. 2 Recording Environment Variables ......................................................... B-10 Appendix C Running Utilities C.1 RCM Overview....................................................................................... C-2 C.2 First-Time Setup ..................................................................................... C-3 C.2.1 Configuring the Modem .......................................................................... C-4 C.2.2 Dialing In and Invoking RCM ................................................................. C-5 C.2.3 Using RCM Locally ................................................................................ C-6 C.3 RCM Commands..................................................................................... C-7 C.4 Dial-Out Alerts...................................................................................... C-16 C.5 Using the RCM Switchpack................................................................... C-19 C.6 Troubleshooting Guide ......................................................................... C-23 C.7 Modem Dialog Details .......................................................................... C-26 Index Examples 21 SROM Errors Reported at Power-Up ...................................................... 2-11 22 XSROM Errors Reported at Power-Up ................................................... 2-15 31 Test Command Syntax ............................................................................. 3-6 32 Releasing/Reestablishing Secure Mode..................................................... 3-7 33 Sample Test Command............................................................................. 3-8 34 Sample Test Memory Command............................................................. 3-10 35 Sample Test Command for PCI .............................................................. 3-12 36 Show Power........................................................................................... 3-14 37 Show Memory........................................................................................ 3-14 38 Show FRU ............................................................................................. 3-15 41 MCHK 670 ............................................................................................ 4-12 42 MCHK 670 CPU and IOD-Detected Failure ........................................... 4-17 43 MCHK 670 Read Dirty Failure............................................................... 4-22 44 MCHK 660 IOD-Detected Failure (System Bus Error)............................ 4-28 45 MCHK 660 IOD-Detected Failure (PCI Error)........................................ 4-33 vi 46 MCHK 630 Correctable CPU Error ........................................................ 4-42 47 MCHK 620 Correctable Error................................................................. 4-45 48 INFO 3 Command.................................................................................. 4-59 49 INFO 5 Command.................................................................................. 4-61 410 INFO 8 Command.................................................................................. 4-63 A1 Starting LFU from the SRM Console ....................................................... A-5 A2 Booting LFU from the CD-ROM............................................................. A-6 A3 Updating Firmware from the Internal CD-ROM....................................... A-7 A4 Creating Update Diskettes on an OpenVMS System............................... A-12 A5 Updating Firmware from the Internal Floppy Disk ................................. A-13 A6 Selecting AS1200FW to Update Firmware from the Internal Floppy ..... A-16 A7 Updating Firmware from a Network Device........................................... A-17 C1 Sample Remote Dial-In Dialog ................................................................ C-5 C2 Invoking and Leaving RCM Locally........................................................ C-6 C3 Configuring the Modem for Dial-Out Alerts .......................................... C-16 C4 Typical RCM Dial-Out Command ......................................................... C-17 Figures 1-1 System Enclosure ..................................................................................... 1-2 1-2 Cover Interlock Circuit............................................................................. 1-3 1-3 Control Panel Assembly ........................................................................... 1-4 1-4 Architecture Diagram ............................................................................... 1-8 1-5 CPU Module Placement ......................................................................... 1-10 1-6 Memory Placement ................................................................................ 1-12 1-7 How Memory Addressing Is Calculated.................................................. 1-14 1-8 System Motherboard .............................................................................. 1-16 1-9 System Bus Block Diagram .................................................................... 1-18 1-10 System Bus to PCI Bus Bridge Block Diagram ....................................... 1-20 1-11 PCI Block Diagram ................................................................................ 1-22 1-12 Remote Control Logic ............................................................................ 1-24 1-13 Power Control Logic .............................................................................. 1-26 1-14 Power Circuit Diagram........................................................................... 1-28 1-15 Back of Power Supply and Location ....................................................... 1-30 1-16 Power Up/Down Sequence Flowchart..................................................... 1-32 2 I C Bus Block Diagram .......................................................................... 1-34 1-17 1-18 StorageWorks Drive Location................................................................. 1-36 2-1 Control Panel and LCD Display................................................................ 2-2 2-2 Power-Up Flow........................................................................................ 2-4 2-3 Contents of FEPROMs ............................................................................. 2-5 2-4 Console Code Critical Path (1200 Block Diagram).................................... 2-6 2-5 SROM Power-Up Test Flow..................................................................... 2-8 2-6 XSROM Power-Up Flowchart ................................................................ 2-12 2-7 Console Device Determination Flowchart ............................................... 2-18 vii 3-1 System Motherboard LEDs....................................................................... 3-2 4-1 Error Detector Placement ......................................................................... 4-2 6-1 System FRU Locations............................................................................. 6-2 6-2 Exposing the System ................................................................................ 6-6 6-3 Removing CPU Module ........................................................................... 6-8 6-4 Removing CPU Fan ............................................................................... 6-10 6-5 Removing Memory Riser Card ............................................................... 6-12 6-6 Removing A DIMM from a Memory Riser Card..................................... 6-14 6-7 Removing System Motherboard.............................................................. 6-16 6-8 Removing PCI/EISA Option................................................................... 6-18 6-9 Removing Power Supply ........................................................................ 6-20 6-10 Removing Power Harness....................................................................... 6-22 6-11 Removing System Fan............................................................................ 6-24 6-12 Removing Cover Interlocks .................................................................... 6-26 6-13 Removing OCP ...................................................................................... 6-28 6-14 Removing CD_ROM.............................................................................. 6-30 6-15 Removing Floppy................................................................................... 6-32 6-16 Removing StorageWorks Disk................................................................ 6-34 6-17 Removing StorageWorks Backplane....................................................... 6-36 6-18 Removing StorageWorks Ultra SCSI Bus Extender................................. 6-38 A-1 Running a Utility from a Graphics Monitor.............................................. A-2 A-2 Starting LFU from the AlphaBIOS Console ............................................. A-6 A-3 AlphaBIOS Setup Screen ...................................................................... A-25 A-1 System Partition Not Defined ................................................................ A-29 C-1 RCM Connections................................................................................... C-3 C-2 Location of RCM Switchpack on System Board..................................... C-19 C-3 RCM Switches (Factory Settings) .......................................................... C-20 Tables 1-1 PCI Motherboard Slot Numbering .......................................................... 1-23 2-1 Control Panel Display............................................................................... 2-3 2-2 SROM Tests........................................................................................... 2-10 2-3 XSROM Tests........................................................................................ 2-13 2-4 Memory Tests ........................................................................................ 2-14 2-5 IOD Tests............................................................................................... 2-16 2-6 PCI Motherboard Tests........................................................................... 2-17 4-1 Types of Error Log Events........................................................................ 4-5 4-2 DECevent Report Formats...................................................................... 4-10 4-3 CAP Error Register Data Pattern............................................................. 4-47 4-4 System Bus ECC Error Data Pattern ....................................................... 4-48 4-5 System Bus Nonexistent Address Error Troubleshooting......................... 4-49 4-6 Address Parity Error Troubleshooting..................................................... 4-50 4-7 Cause of PIO_OVFL Error ..................................................................... 4-51 viii 4-8 ECC Syndrome Bits Table...................................................................... 4-54 4-9 Decoding Commands ............................................................................. 4-55 4-10 Node IDs................................................................................................ 4-56 5-1 External Interface Status Register ............................................................. 5-4 5-2 Loading and Locking Rules for External Interface Registers ..................... 5-7 5-3 MC Error Information Register 0.............................................................. 5-8 5-4 MC Error Information Register 1............................................................ 5-10 5-5 CAP Error Register ................................................................................ 5-12 5-6 PCI Error Status Register 1..................................................................... 5-14 6-1 Field-Replaceable Unit Part Numbers ....................................................... 6-3 A-1 AlphaBIOS Option Key Mapping............................................................ A-3 A-2 File Locations for Creating Update Diskettes on a PC ............................ A-12 A-3 LFU Command Summary...................................................................... A-22 B-1 Results of Pushing the Halt Button .......................................................... B-2 B-2 Summary of SRM Console Commands.................................................... B-6 B-3 Environment Variable Summary.............................................................. B-8 C-1 RCM Command Summary ...................................................................... C-7 C-2 RCM Status Command Fields................................................................ C-15 C-3 Elements of the Dial-Out String............................................................. C-18 C-4 RCM Troubleshooting........................................................................... C-23 ix Preface Intended Audience This manual is written for the customer service engineer. Document Structure This manual uses a structured documentation design. Topics are organized into small sections for efficient online and printed reference. Each topic begins with an abstract, followed by an illustration or example, and ends with descriptive text. This manual has six chapters and three appendixes, as follows: Chapter 1, System Overview, introduces the DIGITAL AlphaServer 1200 and the DIGITAL Ultimate Workstation 533 systems. It describes each system component. Chapter 2, Power-Up, provides information on how to interpret the power-up display on the operator control panel, the console screen, and system LEDs. It also describes how hardware diagnostics execute when the system is initialized. Chapter 3, Troubleshooting, describes troubleshooting during power-up and booting, as well as the test command. Chapter 4, Error Logs, explains how to interpret error logs and how to use DECevent. Chapter 5, Error Registers, describes the error registers used to hold error information. Chapter 6, Removal and Replacement, describes removal and replacement procedures for field-replaceable units (FRUs). Appendix A, Running Utilities, explains how to run utilities such as the EISA Configuration Utility and RAID Standalone Configuration Utility. Appendix B, Halts, Console Commands, and Environment Variables, summarizes the commands used to examine and alter the system configuration. Appendix C, Operating the System Remotely, describes how to use the Remote Console Manager (RCM) to monitor and control the system remotely. xi Documentation Titles Table 1 lists books in the documentation set for both systems. Table 1 System Documentation Title Order Number User and Installation Documentation Kit QZ011AAGW AlphaServer 1200 User's Guide EKAS120UG AlphaServer 1200 Basic Installation EKAS120IG User and Installation Documentation Kit QZ013AAGW DIGITAL Ultimate Workstation 533 User's Guide EKUW120UG DIGITAL Ultimate Workstation 533 Basic Installation EKUW120IG Service Information EKAS120SV AlphaServer 1200 /DIGITAL Ultimate Workstation 533 Service Manual Information on the Internet Using a Web browser you can access the AlphaServer InfoCenter at: http://www.digital.com/info/alphaserver/products.html Access the latest system firmware either with a Web browser or via FTP as follows: ftp://ftp.digital.com/pub/Digital/Alpha/firmware/ Interim firmware released since the last firmware CD is located at: ftp://ftp.digital.com/pub/Digital/Alpha/firmware/interim/ xii Chapter 1 System Overview The DIGITAL AlphaServer 1200 and DIGITAL Ultimate Workstation 533 systems are made from the same base system unit. The base unit consists of up to two CPUs, up to 2 Gbytes of memory, 6 I/O slots, and up to 7 SCSI storage devices. Both systems are enclosed in pedestals. AlphaServer 1200 systems can be mounted in a standard 19" rack. AlphaServer 1200 systems support OpenVMS, DIGITAL UNIX, and Windows NT. Ultimate Workstation 533 systems support Windows NT and graphics. Topics in this chapter include the following: System Enclosure Operator Control Panel and Drives System Consoles System Architecture CPU Types Memory Memory Addressing System Motherboard System Bus Backplane System Bus to PCI Bus Bridge PCI I/O Subsystem Remote Control Logic Power Control Logic Power Circuit and Cover Interlock Power Supply Power Up/Down Sequence 2 Maintenance Bus (I C Bus) StorageWorks Drives System Overview 1-1 1.1 System Enclosure The system has up to two CPU modules and up to 2 Gbytes of memory. A single fast wide or fast wide Ultra SCSI StorageWorks shelf provides storage. Figure 1-1 System Enclosure 4 1 3 2 6 5 PKW- 0500-97 System Overview 1-2 The numbered callouts in Figure 1-1 refer to the system components. System card cage, which holds the system motherboard and the CPU, memory, and system I/O. PCI/EISA section of the system card cage. Operator control panel assembly, which includes the control panel, the LCD display, and the floppy drive. CD-ROM drive. Cooling section containing two fans. StorageWorks shelf. Cover Interlock The system has a single cover interlock switch tripped by the top cover. Figure 1-2 Cover Interlock Circuit Power Su pply C over J30 Interlock Push button ON/OFF Switch J2 OCP pack Cover J7 Interlock DC_ENABLE_L Switch Motherboard PKW0503-97 NOTE: The cover interlock must be engaged to enable power-up. To override the cover interlock, use a suitable object to close the interlock circuit. Disk damage will result if the system is run with the top cover off. System Overview 1-3 1.2 Operator Control Panel and Drives The control panel includes the On/Off, Halt, and Reset buttons and an LCD display. Figure 1-3 Control Panel Assembly CD-ROM Floppy OCP Display 1 2 3 PKW-0501-97 OCP display. The OCP display is a 16-character LCD that indicates status during power-up and self-test. While the operating system is running, the LCD displays the system type. Its controller is on the XBUS. CD-ROM. The CD-ROM drive is used to load software, firmware, and updates. Its controller is on PCI1 on the PCI backplane on the system motherboard. Floppy disk. The floppy drive is used to load software and firmware updates. The floppy controller is on the XBUS on the PCI backplane on the system motherboard. System Overview 1-4 On/Off button. Powers the system on or off. When the LED to the right of the button is lit, the power is on. The On/Off button is connected to the power supplies through the system interlock and the RCM logic. Reset button. Initializes the system. Halt button. When the halt button is pressed, different results are manifest depending upon the state of the machine. The major function of the Halt button is to stop whatever the machine is doing and return the system to the SRM console. To get to the SRM console, for systems running OpenVMS or DIGITAL UNIX press the Halt button. To get to the SRM console, for systems running Windows NT press the Halt button and then press the Reset button. (Pressing the Halt button when the system is running Windows NT causes a "halt assertion" flag to be set in the firmware. When Reset is pressed the console reads the "halt assertion" flag and ignores environment variables that would cause the system to boot.) Function of the Halt button is complex because it depends upon the state of the machine when the button is pressed. See Section B.1 for a full discussion of the Halt button. System Overview 1-5 1.3 System Consoles There are two console programs: the SRM console and the AlphaBIOS console. SRM Console Prompt On systems running the DIGITAL UNIX or OpenVMS operating system, the following console prompt is displayed after system startup messages are displayed, or whenever the SRM console is invoked: P00>>> NOTE: The console prompt displays only after the entire power-up sequence is complete. This can take up to several minutes if the memory is very large. AlphaBIOS Boot Menu On systems running the Windows NT operating system, the Boot menu is displayed when the AlphaBIOS console is invoked: AlphaBIOS 5.32 Please select the operating system to start: Windows NT Server 4.0 Use and to move the highlight to your choice. Press Enter to choose. AlphaServer 1200 digi tal Family Press to enter SETUP PKW0560 -97 System Overview 1-6 SRM Console The SRM console is a command-line interface that is used to boot the DIGITAL UNIX and OpenVMS operating systems. It also provides support for examining and modifying the system state and configuring and testing the system. The SRM console can be run from a serial terminal or a graphics monitor. AlphaBIOS Console The AlphaBIOS console is a menu-based interface that supports the Microsoft Windows NT operating system. AlphaBIOS is used to set up operating system selections, boot Windows NT, and display information about the system configuration. The EISA Configuration Utility and the RAID Standalone Configuration Utility are run from the AlphaBIOS console. AlphaBIOS runs on either a serial or graphics terminal. Windows NT requires a graphics monitor. Environment Variables Environment variables are software parameters that define, among other things, the system configuration. They are used to pass information to different pieces of software running in the system at various times. The os_type environment variable, which can be set to VMS, UNIX, or NT, determines which of the two consoles is used. The SRM console is always brought into memory, but AlphaBIOS is loaded if os_type is set to NT and the Halt LED is not lit. Refer to Appendix B of this guide for a list of the environment variables used to configure a system. Refer to your system User's Guide for information on setting environment variables. Most environment variables are stored in the NVRAM that is placed in a socket on the system motherboard. Even though the NVRAM can be removed and replaced on a new system motherboard, it is recommended that you keep a record of the environment variables for each system that you service. Some environment variable settings are lost when a module is swapped and must be restored after the new module is installed. Refer to Appendix B for a convenient worksheet for recording environment variable settings. System Overview 1-7 1.4 System Architecture Alpha microprocessor chips are used in these systems. The CPU, memory, and the I/O modules are connected to the system motherboard. Figure 1-4 Architecture Diagram Memory CPU Pair System Bus 128-Bit Data Bus + 16 ECC and 40-Bit Command/Address Bus PCI Bus 0 PCI Bus 1 System to System to PCI Bus 64 Bits 64 Bits PCI Bus Bridge 0 Bridge 1 IOD0 IOD1 EISA PCI Slot System PCI Slot Bridge Motherboard PCI Slot PCI Slot Note: When the EISA/ISA slot EISA on PCI Bus 0 is used, the last PCI Slot PCI Slot Bus PCI slot on PCI Bus 1 is not available. EISA Slot XBUS BDATA X B US Xceivers Xc eive rs Combo I/O : F las h Real-Time Mous e/ I2C B us NVRA M serial por ts ROM parallel por t Clock Key board Interfac e 8Kx 8 2MB floppy cntrl PKW0502-97 System Overview 1-8 Both systems use the Alpha chip for the CPU. The CPU, memory, and I/O devices connect to the system motherboard. On the system motherboard is: The system bus Two system bus to PCI bus chip sets that bridge two PCI buses to the system bus Two 64-bit PCI buses with three PCI options slots each One EISA/ISA bus bridged to one of the PCIs (If an EISA/ISA option is used, one PCI slot cannot be used) One CD-ROM controller built in to the other PCI One EISA/ISA to XBUS bridge to the built-in XBUS options A fully configured system can have two CPUs, eight DIMM memory pairs, and a total of six I/O options. The I/O options can be all PCI options or a combination of PCI options and a single EISA/ISA option. The system bus has a 144-bit data bus, protected by 16 bits of ECC, and a 40-bit command/address bus, protected by parity. The bus speed is set to 66.6 MHz. The 40-bit address bus can create one terabyte of addresses (that's a million million). The bus connects CPUs, memory, and the system bus to PCI bus bridge(s). There is a cache external to the CPU chip on CPU modules. The Alpha chip has an 8- Kbyte instruction cache (I-cache), an 8-Kbyte write-through data cache (D-cache), and a 96-Kbyte, write-back secondary data cache (S-cache). The cache system is write- back. The system supports up to two CPUs. Memory on these systems is constructed of DIMM memory pairs placed onto two memory modules called riser cards. The riser cards are placed into the two memory slots on the system motherboard. One member of a DIMM pair is placed onto one riser card, and the other member is placed onto another riser card. Each riser card drives half of the system bus, along with the associated ECC bits. Memory pairs consist of two synchronous DIMMs of the same size and are placed into the same slot on each riser card. The system bus to PCI bus bridge chip set translates system bus commands and data addressed to I/O space to PCI commands and data. It also translates PCI bus commands and data addressed to system memory or CPUs to system bus commands and data. The PCI bus is a 64-bit wide bus used for I/O. Logic and sensors on the system motherboard monitor power status and the system environment (temperature and fan speeds). System Overview 1-9 1.5 CPU Types There are several CPU variants differentiated by CPU speeds. Figure 1-5 CPU Module Placement Power connectors Floppy connector Fan connectors CPU 0 MEM L Bulkhead connectors CPU 1 RCM MEM H Switch- pack LEDs PCI Br idges PCI 0 Slot 2 Internal SCSI PCI 0 Slot 3 connector PCI 0 Slot 4 RCM power-down connector PCI 1 Slot 2 PCI 1 Slot 3 PCI 1 Slot 4 Speaker connector EISA/ISA Slot OCP connector PKW0504A-97 System Overview 1-10 Alpha Chip Composition The Alpha chip is made using state-of-the-art chip technology, has a transistor count of 9.3 million, consumes 50 watts of power, and is air cooled (a fan is on the chip). The default cache system is write-back and when the module has an external cache, it is write-back. The Alpha chip used in these systems is the 21164. Chip Description Unit Description Instruction 8-Kbyt e cache, 4-way issue Execution 4-wa y execution; 2 integer units, 1 floating-point adder, 1 floating-point multiplier Memory Merge logic, 8-Kbyte write-through first-level data cache, 96-Kbyte write-back second-level data cache, bus interface unit CPU Variants Module Variant Clock Frequency Onboard Cache C olor B3007-AA 400 MHz 4 Mbytes Orange B3007-CA 533 MHz 4 Mbytes Violet CPU Configuration Rules The first CPU must be in CPU slot 0 to provide the system clock. The second CPU should be installed in CPU slot 1. Both CPUs must have the same Alpha chip clock speed. The system bus may hang without an error message if the oscillators clocking the CPUs are different. System Overview 1-11 1.6 Memory Memory consists of two riser cards and up to eight pairs of DIMMs. Each riser card receives one of the two DIMMs in the DIMM pair. There are two DIMM variants: a 32-Mbyte version and a 128-Mbyte version. Figure 1-6 Memory Placement Power connectors Floppy connector Fan connectors CPU 0 MEM L Bulkhead connectors CPU 1 RCM MEM H Switch- pack LEDs PCI Br idges PCI 0 Slot 2 Internal SCSI PCI 0 Slot 3 connector PCI 0 Slot 4 RCM power-down connector PCI 1 Slot 2 PCI 1 Slot 3 PCI 1 Slot 4 Speaker connector EISA/ISA Slot OCP connector PKW0504B-97 System Overview 1-12 Memory Variants Memory consists of two riser cards supporting eight DIMM pairs. There are two DIMM variants: a 32-Mbyte version and a 128-Mbyte version. Maximum memory using 32-Mbyte DIMMs is 128 Mbytes and the maximum memory using 128-Mbyte DIMMs is 2 Gbytes. All memory is synchronous. DRAM Option Size Module Type Number Size MS300-BA 64 MB 54-25084-DA Synch. 18 4M x 72 = 20-47405-D3 32MB MS300-DA 256 MB 54-25092-DA Synch. 18 16M x 72 = 20-45619-D3 128MB Memory Operation Each DIMM in the pair provides half the data, or 64 bits plus 8 ECC bits, of the octaword (16 byte) transferred on the system bus. DIMMs are placed in slots on the riser cards, and the riser cards are placed in the slots designated MEM L and MEM H on the system motherboard. NOTE: Memory in slot MEM L does not drive the lower 8 bytes, and memory in slot MEM H does not drive the higher 8 bytes of the 16-byte transfer. Some bits originating from MEM L are high order bits, and some bits originating from MEM H are low order bits. Memory drives the system bus in bursts. Upon each memory fetch, data is transferred in 4 consecutive cycles transferring 64 bytes. Memory Configuration Rules In a system, memories of different sizes are permitted, but: DIMMs are installed and used in pairs. Both DIMMs in a memory pair must be of the same size. Each riser card receives one DIMM of the DIMM pair. The largest DIMM pair must be in riser card slot 0. Other memory pairs must be the same size or smaller than the first memory pair. Memory pairs must be installed in consecutive slots. Memory configurations that have a 64-Mbyte pair in riser card slot 0 are limited to two DIMM pairs or 128 Mbytes for the system. (The reason for this restriction is that the bit map describing memory holes can grow larger than physical memory.) System Overview 1-13 1.7 Memory Addressing Memory addressing in these systems is fixed regardless of the size of the DIMMs. The address of a DIMM pair is fixed according to the slot in which the pair is placed. The starting address of each pair in each slot on the riser card starts on a 512-Mbyte boundary. Figure 1-7 How Memory Addressing Is Calculated Address Space Gbytes Riser Card Slot 4.0 7 3.5 e0000000 6 3.0 c0000000 5 a0000000 4 2.5 3 2.0 80000000 2 1.5 60000000 1 40000000 1.0 0 20000000 .5 00000000 0 PKW0505-97 System Overview 1-14 The rules for addressing memory are as follows: 1. A memory pair consists of two DIMMs of the same size. 2. Memory pairs in riser cards may be of different sizes. 3. The memory pair in slot 0 must be the largest of all memory pairs. Other memory pairs may be as large but none may be larger. 4. The physical starting address of each memory pair is N times 512 Mbytes (200 0000) where N is the slot number on the riser card. 5. Memory addresses are contiguous within each memory pair. 6. If memory pairs do not completely fill the 512-Mbyte space provided, memory "holes" occur in the physical address space. 7. Software creates contiguous virtual memory even though physical memory may not be contiguous. System Overview 1-15 1.8 System Motherboard The system motherboard contains five major logic sections performing five major system functions. Figure 1-8 System Motherboard Power connectors Floppy connector Fan connectors CPU 0 Power MEM L Control Logic System Bus Backpla ne CPU 1 MEM H Re mote Control System Bus L ogic to PCI Bus Br idges PCI 0 Slot 2 PCI 0 Slot 3 Inter nal SCSI connector PCI 0 Slot 4 PCI Backplane PCI 1 Slot 2 and Legacy I/O Devices PCI 1 Slot 3 PCI 1 Slot 4 Speaker connector EISA/ISA Slot OCP connector PKW0504F-97 System Overview 1-16 The five sections on the system motherboard are: The system bus or the CPU and memory backplane The power control logic The remote control logic The system bus to PCI bus bridges The PCI backplane containing two PCI buses, an EISA/ISA bus, a built-in CD- ROM controller, and an XBUS with several devices integral to the system. System Overview 1-17 1.8.1 System Bus (Backplane) The system bus consists of a 40-bit command/address bus, a 128-bit plus ECC data bus, and several control signals and clocks. The system bus is part of the system motherboard. Figure 1-9 System Bus Block Diagram MEM0 ROW SIM_ADR ADR COL DATA SYNC DRAMS CTRL MEM CTRL& CNTRL ARB MC Bus Control CPU1 MC ADR <39:4> CPU0 A CTRL L MC DATA ADR EV_ADR P <127:0> H EV_DATA A MC to PCI Br idge PCI/ISA PCI/ISA0 IOD0 IOD1 PCI1 PKW0506-97 System Overview 1-18 The system bus consists of a 40-bit command/address bus, a 128-bit plus ECC data bus, and several control signals, clocks, and a bus arbiter. The bus requires that all CPUs have the same high-speed oscillator providing the clock to the Alpha chip. The system bus connects up to two CPUs, up to eight DIMM memory pairs on two riser cards, and two I/O bus bridges. The system bus clock is provided by an oscillator on the CPU in slot CPU0. This oscillator is adjusted to maintain the system bus at a 66 MHz speed no matter what the speed of the CPU is. The system bus backplane initiates memory refresh transactions. Five volt, 3.43 volt, and 12 volt power is provided directly to the motherboard from the power supplies. System Overview 1-19 1.8.2 System Bus to PCI Bus Bridge The bridge is the physical interconnect between the system bus and the PCI bus. Figure 1-10 System Bus to PCI Bus Bridge Block Diagram System Bus PCI Bus Control CAP AD<31:0> Address Data A Control to B bus ECC & Data MDPA <63:0> Data A to B & B to A bus ECC & Data MDPB AD<63:32> <127:64> PKW0507-97 System Overview 1-20 The system bus to PCI bus bridge module converts system bus commands and data addressed to I/O space to PCI commands and data; and converts PCI bus commands and data addressed to system memory or CPUs to system bus commands and data. The bridge has two major components: Command/address processor (CAP) chip Two data path chips (MDPA and MDPB) There are two sets of these three chips, one set for each PCI. The interface on the system bus side of the bridge responds to system bus commands addressed to the upper 64 Gbytes of I/O space. I/O space is addressed whenever bit <39> on the system bus address lines is set. The space so defined is 512 Gbytes in size. The first 448 Gbytes are reserved and the last 64 Gbytes, when bits <38:36> are set, are mapped to the PCI I/O buses. The interface on the PCI side of the bridge responds to commands addressed to CPUs and memory on the system bus. On the PCI side, the bridge provides the interface to the PCIs. Each PCI bus is addressed separately. The bridge does not respond to devices communicating with each other on the same PCI bus. However, should a device on one PCI address a device on the other PCI bus, commands, addresses, and data run through the bridge out onto the system bus and back through the bridge to the other PCI bus. In addition to its bridge function, the system bus to PCI bus bridge module monitors every transaction on the system bus for errors. It monitors the data lines for ECC errors and the command/address lines for parity errors. System Overview 1-21 1.8.3 PCI I/O Subsystem The I/O subsystem consists of two 64-bit PCI buses. One has an embedded EISA/ISA bridge and three PCI option slots; the other has a built-in CD-ROM driver and three PCI option slots. Figure 1-11 PCI Block Diagram PCI-1 Bus SC SI Co ntr ol 40MHz 5 3C81 0 Clock Connector Serial PCI-1 S 3 64-bit s lots Interr upt y Logic s 33.3MHz O sc t Clock Bfr e 3&, %XV m P CI to EISA/ISA PCI-0 Br idg e C h ip set Se r ia l 2 64-bit s lots In ter r upt EISA 1 32-bit s lot B Log i c Data u XBUS Bus BDATA X B US s Xceive rs Xceiver s Combo I/O: Flash Realtime Mouse/ I2C Bus NVRAM EISA: serial por ts RO M 1 16- parallel por t Clock Keyboard Interface 8Kx8 2M B floppy cntr l bit s lot PKW0508-97 System Overview 1-22 Table 1-1 PCI Motherboard Slot Numbering Slot PCI0 PCI1 1 PCI to EISA/ISA Internal CD-ROM bridge controller 2 PCI slot PCI slot 3 PCI slot PCI slot 4 PCI slot PCI slot The logic for two PCI buses is on each PCI motherboard. PCI0 is a 64-bit bus with a built-in PCI to EISA/ISA bus bridge. PCI0 has three PCI slots and one EISA/ISA slot. When the EISA/ISA slot is used, PCI slot 4 on PCI bus 1 is not available. An 8-bit XBUS is connected to the EISA/ISA bus. On this bus there is an interface to the system I2 C bus; mouse and keyboard support; an I/O combo controller supporting two serial ports, the floppy controller, and a parallel port; a real-time clock; two 1-Mbyte flash ROMs containing system firmware, and an 8-Kbyte NVRAM. PCI1 is a 64-bit bus with a built-in CD-ROM SCSI controller with three PCI slots. Cable connectors to the CD-ROM, the floppy, and the OCP are on the motherboard. Connectors for the mouse, keyboard, two COM ports, the serial port, and a modem are on the system bulkhead. The bulkhead is part of the system motherboard. System Overview 1-23 1.8.4 Remote Control Logic A section of the motherboard provides remote control operation of the system. A four-switch switchpack enables or disables remote control features. Figure 1-12 Remote Control Logic System Motherboard RCM Switchpack 4 SET DEF 3 RPD DIS RCM power 2 MODEM OFF VAUX from power supplies EN RCM 1 PKW0504C-97 System Overview 1-24 The system allows both local and remote control. A set of switches enables or disables remote control. Table 1-2 Remote Control Switch Functions Switch Condition Function 1 EN RCM On (default) Allows remote system control Off Does not allow remote system control 2 Modem Off On Disables the RCM modem port Off (default) Enable the RCM modem port 3 RPD DIS On Disables remote power down Off (default) Enables remote power down 4 SET DEF On Resets the RCM microprocessor defaults Off (default) Allows use of conditions set by the user The default settings allow complete remote control. The user would have to change the switch settings to any other desired control. See Appendix C for information on controlling the system remotely. The remote console manager connects to a modem through the modem port on the bulkhead. The RCM uses VAUX power provided by the system power supplies. The standard I/O ports (keyboard, mouse, COM1 and COM2 serial ports, and parallel ports) are on the same bulkhead. System Overview 1-25 1.8.5 Power Control Logic The power control section of the motherboard controls power sequencing and monitors power supply voltage, system temperature, and fans. Figure 1-13 Power Control Logic System Motherboard Power control logic PKW0504D-97 System Overview 1-26 The power control logic performs these functions: Monitors system temperature and powers down the system 30 seconds after it detects that internal temperature of the system is above the value of the environment variable over_temp. Default = 550 C. Monitors the system and CPU fans at one second intervals and powers down the system 30 seconds after it detects a fan failure. Provides some visual indication of faults through LEDs. Controls reset sequencing. 2 Provides I C interface for fans, power supplies, and temperature signals: Power supply 0, 1: present Power supply 0, 1: power OK CPU fan 0, 1: OK CPU 1: present Overtemp: Temp OK System fan 0, 1: OK Fan Kit OK System Overview 1-27 1.9 Power Circuit and Cover Interlock Power is distributed throughout the system and mechanically can be broken by the On/Off switch, the cover interlock, or remotely through the RCM. Figure 1-14 Power Circuit Diagram Power Supply J30 Cover Interlock Push button ON/OFF J2 Switch OCP pack J7 DC_ENABLE_L Motherboard PKW0503A-97 System Overview 1-28 Figure 1-14 shows the distribution of power throughout the system. Opens in the circuit or the RCM signal RCM_DC_EN_L, or a power supply detected power fault interrupt DC power applied to the system. The opens can be caused by the On/Off button or the cover interlock. A failure anywhere in the circuit will result in the removal of DC power. A potential failure is the relay used in the remote control logic to control the RCM_DC_EN_L signal. The cover interlock is located under the top cover between the system card cage and the storage area. To override the interlock, place a suitable object in the interlock switch that closes it. System Overview 1-29 1.10 Power Supply Two power supplies provide system power. Figure 1-15 Back of Power Supply and Location Power Supply 1 Current share Power
| 12-24701-34 12-41768-03 17-01495-01 17-03970-02 17-03971-01 17-04019-02 17-04021-01 17-04022-03 17-04143-01 17-04685-01 17-04700-01 17-31350-01 17-31351-01 20-45619-D3 20-47405-D3 30-43120-02 54-23302-02 54-23365-01 54-25084-DA 54-25092-DA 54-25147-01 54-25149-01 70-31346-01 70-31348-01 70-31349-01 70-31350-01 70-31351-01 70-37346-01 KW-0501A-97 KW-0513A-97 PK-0726A-96 |