Project EE/06/B/FPP-169000 Learning Materials for Information Technology Professionals (EUCIP-Mat)


Authors: P. Prinetto, Politecnico di Torino, A. Cilardo, L. Coppolino and N. Mazzocca, University “Federico II” Naples, Italy

1. Number of study hours: 15 2. Short description of the course 3. This module provides the fundamentals of modern computer architecture. The module gives a detailed description of hardware components usually found in a computer system and the internal architecture of both computer systems and processors. A significant portion of the module is devoted to input/output units and components involved in user interaction. • In particular, this module • describes the key features of computer system architectures • presents the structure of a processor, introducing concepts such as registers, memories, control logic, etc • explains the life cycle of programs • provides both theoretical and practical information on memories normally found in a computer • presents the mechanisms used to support interaction with input/output units • gives a detailed explanation of the structure of uniprocessor systems • presents mass storage devices • describes the characteristics of the main peripherals normally used in modern computers • gives an overview of the different types of modern computer systems.

4. Target groups The employers of IT core level professionals are the target sector. The project objectives directly address the promotion of high knowledge and skills standards in IT area and in particular provide an innovative approach to training. The first target group consists of IT students (vocational school IT basic level training and the first courses of colleges and universities) in technology area and IT practitioners not having vocational certificates yet.

5. Prerequisites As the first module of the EUCIP Operate section, this module does not require any specific precondition related to reader’s technical knowledge.

6. Aim of the course - learning outcomes As the output of this module, the reader will know all basic aspects of processor architecture, low-level programming, internal organization of a computer, and the role of the different components within a computer system, with special emphasis on input/output units.

7. C.1 Computing components and architectures

C.1.1. Main Hardware C.1.1.1 The main components of a computer system

A computer is a device, which is able to execute programs, i.e., predefined sequences of operations (instructions), interacting with the external world in order to receive input information (input data) and produce corresponding output information (output data). A key feature of these systems is that they are able to solve problems of different nature by using different, specifically designed programs (software), while the physical structure (hardware) remains the same. A program is described by means of a language (programming language), characterized by a specific set of syntactic and semantic rules. Although each language is potentially able to describe any sequence of operations, having more programming languages enables us to choice the most appropriate one for a specific problem. The structure of a computer system A computer system usually includes the following functional units: Input/Output Units, enabling the exchange of information with the external world (users, peripherals, other computers, ...) interacting with the computer. They convert such information, when necessary, into different formats compatible with those used internally by the system. Memory Unit, needed to store programs and data (input and output data, and intermediate results) to be processed. Processing Unit (Processor), which decodes and executes program instructions stored in the Memory Unit. Interconnection Unit (Bus), needed to connect the above units to allow the information exchange required.

The architecture of a computer system The processing unit While executing a program, the processing unit, or processor, iterates the same sequence of basic steps (known as Von Neumann cycle): Reading memory and storing the next instruction to be executed within the processor (Instruction Fetch step) Figuring out which elementary operations the processor needs to perform in order to execute the instruction just loaded (Decode step) Identifying and loading the appropriate instruction operands (Operand Fetch steps) Executing the instructions as a sequence of the elementary operations defined in the Decode step (Execute step) Storing the result, when required by the instruction (Store step) Determining the location, within the Memory Unit, of the next instruction to execute. A detailed description of processor structure and operation is given in Sections 2 and 3 of this module.

C.1.1.2 The main types of peripheral units of a basic computer system

An Input/Output unit, also called peripheral, is in charge of enabling the exchange of input and output data between a computer system and the external environment (user, sensors, actuators, other computer systems, etc). Input/Output units enable the interaction with various devices and are also in charge of converting information into a format intelligible to the processor. To exchange information with peripherals, a processor needs: • an interface allowing it to send and receive signals to peripherals • a medium used for physical interconnection, called bus • a sequence of signals (“protocol”), managing the interaction between the processor and peripherals. • A peripheral is in turn comprised of two separate parts: • a part managing the protocol with the processor and the behavior of the peripheral itself, mostly independent of the specific function accomplished by the peripheral • a part, which implements the specific function of the peripheral.

In the following, a description of the main peripherals usually found in a computer system. Serial and parallel port A computer system is typically provided with two ports, called “serial” and “parallel” port, enabling different way to exchange data with the external world. Internally, a computer system typically transfers data in a parallel fashion: data are organized in groups containing a certain number of bits, typically a multiple of 8. The serial port serializes outgoing data, or parallelizes incoming data, in order to enable data exchange with the outside through a single wire. A parallel port, on the other hand, allows exchange of incoming and outgoing data in parallel. Parallel port was originally introduced for interfacing printers, while serial port was designed for slow devices, such as keyboards, mice, modems, etc.

A parallel connector and a serial connector

The visible portion of both ports is a “connector” made of 9 pins for the serial port and 25 holes for the parallel port. RS232 standard defines the connector form, the meaning of each signal, and the serial communication protocol. Keyboards The keyboard is made of a set of keys corresponding to a letter or a symbol. By pressing a key, the device sends a numerical value to the computer system corresponding to the key pressed. The so called QWERTY keyboards are the most widespread type of keyboards. They are named after the letters of the first six keys in the upper alphabetical line. There are different keyboards corresponding to different languages (e.g., the Italian keyboard of the Macintosh has a QZERTY configuration). In working environments we often use the so called “ergonomic” keyboards, which have a user-friendly form, and a layout designed to speed up keystroke. Modern keyboards are based on wireless connections (i.e., they are cordless), which requires autonomous power supply and hence the use of batteries. Mice Mice are pointing devices, i.e., devices that translate user’s physical movements into movements of the pointer on the screen. In order to detect user’s movements, first generation mice adopted an electro-mechanical solution: by rolling on a flat surface, a metal ball covered with rubber interacted with internal sensors which translated the movement into an electrical signal sent to the system. In order to work properly, this solution required rough and perfectly cleans surfaces. During the last decade, electro-mechanical solutions were replaced by optical solutions: the internal surface of the mouse is provided with a red LED and a light sensor. The LED emits a light beam, which is reflected by the desk surface and instantaneously read by the sensor, thereby detecting the direction of the movement. To improve performance, recent devices use laser light instead of red LEDs. Like keyboards, mice are often provided with radio links, which eliminate the need for wires.

A wireless keyboard/mouse

Monitors A monitor is a device, which receives signals and displays them as images or text. Traditional monitors are also called Cathode Tube Ray (CRT), after the technology used to produce them. Images on a monitor are formed by composing the so called pixels. Each pixel has three color channels associated with it (Red-Green-Blue, RGB model) whose composition makes up the point displayed.

A TFT Monitor

Network adapters These devices allow the communication between more systems through a network. The device must manage the physical connection and the translation of information between the format used by the system and the one used by the network. The types of linking may vary depending on the nature of network to which the system is connected. For example, in order to connect a PC to an Ethernet LAN we need a network adapter, while for connecting a dial up line we need a modem.

An Ethernet RJ45 connector Modems Modems (MOdulators DEModulators) are devices used for transmitting and receiving serially analogical or digital data. In the past years modems were used for transmitting data on dial up lines and hence they had to translate digital data to an analogical signal able to be transmitted over a telephone line. At the receiving end, the modem received the analogical signal and transformed it to a digital signal intelligible to the receiving PC.

A scheme of modem transmission

Wireless devices In order to connect external devices to a computer system, we need a suitable transmission medium. Air is increasingly being used as a transmission medium, enabling the so called wireless connections. Different technologies are available for connecting wireless devices. IrDA (or infrared) connection was initially used to connect PCs to mobile phones and PDA. Today, however, we are increasingly adopting other types of radio transmission technologies, such as WiFi and Bluetooth, which ensure increased reliability and performance.

An IrDA connector

USB devices USB ports have become the most common way to connect external devices to a PC. Devices connected to a USB port are powered by the port itself and do not need an external power supply. The USB protocol supports hot swapping, i.e., devices can be connected and disconnected without restarting the computer.

A USB connector

C.1.1.3 Features and performances of peripheral units

Monitors Typical characteristics of a monitor are size, resolution, and refresh frequency. The size of a monitor is expressed by measuring its diagonal in inches. The resolution indicates the number of pixels forming an image, and is given as the number of pixel in a horizontal line times the number of pixel in a vertical line. The higher the resolution, the more detailed is the image displayed. By refresh we mean the number of times the image is updated per second, and measure it in Hertz (Hz). If a low refresh frequency is set, the human eye may perceive vibrations in the image.

Network adapters From the user’s perspective, it is important to know the speed we can reach with a network connection. For an Ethernet LAN this speed is currently 10 Million bits per second (Mbps). However, last generation systems are normally provided with a 100 Mbps fast Ethernet connection. In spite of the different speed, the two types of connection appear physically identical to the user, since they both use RJ45 connectors.

Modems Modem are classified as internal (made of a board placed inside the PC) and external (connected to the PC through a serial cable) and have different transmission speeds (typically 56.6 Kbps). Today, with the introduction of digital connections (ISDN and ADSL) we do not need analogical/digital translation anymore. As a consequence, ADSL modems are now conceptually very similar to network adapters.

Wireless devices IrDA ensures a low power consumption, although the connection is not reliable and requires that communicating devices be placed in a very short range (few centimeters) and do not have any object in between (the two devices have to “see” each other). Such a connection is not suitable, for example, to connect keyboards and mice (the latter, in particular, needs to be continuously moved). For this reason, we are increasingly adopting other types of radio transmission technologies, such as WiFi and Bluetooth.

USB devices USB ports have become the most common way to connect external devices to a PC. The first version of the USB standard ensured a bandwidth (i.e., the amount of data sent per second) of Mb/s. Second generation USB standard (1.1) extended the bandwidth to 12 Mb/s, while the last version of the standard (USB 2.0) defines a bandwidth of 480 Mb/s. These three standards are also referred to as Low Speed, Full Speed e High Speed USB. USB also refers to the logical port (i.e., the USB bus and the electrical driver), instead of the connector. A USB port, in fact, can drive up to 127 connectors (by means of appropriate USB hubs). If this is the case, the bandwidth is divided between all connected devices.

C.1.1.4 The main types of memory technology

A memory is a functional unit in a computer system, which stores programs and data. Each piece of information needs a binary coding to be stores in a memory. The elementary unit is the bit. Eight bits make up a byte. Information is usually handled as a sequence of bytes. The structure of a memory is usually made of a set of cells (or locations), each containing one byte or groups of bytes. A given piece of information may need more cells to be stored. Memory size is measured in the number of bits or bytes it contains. In particular, for indicating the size of a memory, we use the following prefixes: K (Kilo - 1024), M (Mega - 1024*1024), G (Giga - 1024* 1024* 1024) followed by the word bit or byte. In the following, we will use the letters b and B to refer to bit and byte, respectively. The elementary operations on a memory include: selection, reading, writing. The elementary operations that can be performed on a memory are: selection, reading, and writing. The selection operation determines the physical portion of the memory where a datum will be stored or from which it will be fetched. Selection of a physical portion in the memory is based on a value uniquely assigned to that portion (the “address”). A piece of information can be stored in memory by means of a writing operation. The operation performed to retrieve a datum from the memory is called reading. A memory can thus be represented with the model in the following figure.

The memory logical model

Memories are characterized by many different parameters. A first aspect is the order in which we can access memory cells. A sequential memory (e.g., a tape) needs to access all cells preceding the requested cell. A random access memory (RAM), on the other hand, can directly access single cells. A second aspect relates to the type of operations we can perform on the memory. A read only memory (ROM) allows writing only at production time. Data can only be read later. A read/write memory allows repeated writing operations. A third aspect concerns how the memory retains its data in time. A volatile memory loses the information when power supply is removed, while a non-volatile memory retains the information even when there is no power supply. In order to compare different memories we need to define some parameters: • memory speed: the time it takes to read a datum from, or to write a datum to the memory • memory size: the amount of data we can store in it • cost per bit, or cost per byte: the ratio between the cost of a memory and its size in terms of bits or byte, respectively. • density: the ratio between the size of a memory and its physical size.

In order to enable a comparison between such parameters, we often use relative units of measure rather than absolute ones, like those defined above. An ideal memory is fast, large and cheap. Unfortunately, these parameters are often contrasting. So, fast memories are usually small and expensive, while large, cheap memories are usually slow. The architectural solution normally adopted in a computer system is based on the idea of using different types of memory at the same time. A first memory, directly accessible to the processor, is called main memory and is made of electrical components enabling high speed accesses with sizes typically less than 1Gbyte. A second memory unit, which is accessed by the processor through Input/Output units, is called secondary memory, and is based on electro-magnetical or optical storing system. These enable larger sizes than the main memory, but ensure a lower speed. Furthermore, unlike secondary memory, main memory is usually volatile. The main memory in a computer system is typically of RAM type, and is used to store data and programs while they are executed. Precisely, it is a random access, read/write, volatile memory. A RAM is partitioned into words, which in turn are made of one or more cells, each storing a single bit. Operations on RAM usually refer to a word, so the selection operation and the subsequent reading/writing operation, affect the whole word. A RAM can be implemented in static technologies (SRAM), or dynamic technologies (DRAM). A SRAM is faster than a DRAM, but has a higher cost per cell and a lower density. DRAMs require additional hardware components to periodically refresh the content of the memory. RAM is generally manufactured in chips of 8 or 16 MB. Chips are mounted on a small board sold as a single unit. This unit is called SIMM if electrical pins (connectors) are placed on a single side or DIMM if connectors are placed on both sides. Typically, the size of a SIMM is 32 MB or 64 MB, while for DIMM the size is 128 MB, 256 MB or 512 MB. Connectors of a board include address, data, and control signals (e.g., for selecting a read or write operation or for power supply) The processor, the main memory, the bus and some Input/Output devices are normally placed on a board, called motherboard, which provides electrical connections between the different components. Memory boards are plugged into slots on the motherboard allowing the connection with the bus of the computer. Unlike RAMs, ROMs are non volatile and must be set at production time. Among the other uses, ROMs are used in those portions of the main memory that will store non volatile data and programs: a typical example is the program, which is in charge of loading the Operating System (BIOS). A variant or the ROM, called PROM, allows data to be stored (programmed) by the user, after the production. EPROMs are erasable and reprogrammable ROMs. Erasing is performed by exposing the chip to ultra-violet light beams (hence, it is not possible to erase only a part of the memory), and thus requires to physically remove the memory module. EEPROMs are a further evolution: they can be erased by means of electrical signals and hence are reprogrammable on the field. Flash memories are a particular type of EEPROM, and are increasingly used today.

C.1.2 Processors

C.1.2.1 The processor architecture. CISC and RISC processor design

The base architecture of a processor A processor includes three fundamental components: The processing unit, which performs elementary operations on data (typically, logical and elementary operations). Information related to operations to perform and data to process are provided by the Control Unit A Register File, which constitutes a kind of internal memory inside the processor The Control Unit, which is in charge of deciding which operations the Processing Unit has to perform and which values the Register File has to store.

The base architecture of a processor

Registers Every processor contains a certain set of registers. We call the register size, i.e. the number of bits (typically 8, 16, 32, or 64 bits) the processor parallelism. Registers are usually classified as: General purpose registers. These can be accessed by the Assembler programmer and are used to store data and/or operands used by the instructions. General purpose registers are sometimes further classified as fixed point registers and floating point registers. Specialized registers, which have some special purpose in the processor operation.

The Register File is typically connected to the Address Bus and to the Data Bus. The main “specialized” registers, which are typically found in modern processors, include: The Program Counter (PC): contains the address of the memory location from which reading the next instruction to be executed. It is initialized by the programmer of by the Operating System and is automatically updated by the processor The Instruction Register (IR): contains the instruction read from the memory during the last Instruction fetch step The Memory Address Register (MAR): contains the address of the memory location the processor is currently accessing The Memory Data Register (MDR): contains the datum read from the memory or to be written to memory The Status Register (SR): contains information concerning the execution state of the processor. The Stack Pointer (SP): manages the portion of the memory used as a Stack. The architecture of a processor

Single-bus architecture The internal structure of a processor can be organized as single bus architecture. In this case, both input and output lines of all general purpose register and of the ALU inputs are connected to a single bus. The other input of the ALU is connected to a scratchpad register (labeled Y in the figure) Operations requiring two operands are performed in two steps: the first operand is loaded, through the bus, to the register Y. Then, during the following step the second operand is driven to the bus in such a way that both operands are available as inputs to the ALU. ALU outputs are typically connected to an output register (labeled Z in the figure), from which we can read the result and possibly write it to a register.

Single bus architecture

Multiple-bus architectures Although simpler in their structure, single bus architectures have a fundamental limitation: in each clock cycle they can perform just one transfer or ALU operation. The single bus becomes hence the bottleneck from a performance point of view. To mitigate this problem, modern processors are normally provided with more internal buses, and more than one operand coming from different registers can be driven at the same time. The figure below depicts a typical multiple bus architecture. There is a bus directly connected to the ALU output and other two buses connected to the two inputs of the ALU. The output lines of the registers are connected to the ALU input buses. Hence, we can read both ALU operands and read the result to store it to a register in a single step.

A multiple bus memory

Types of instructions The operations performed by the machine instructions of a processor can be normally grouped into the following classes: • data transfer (registers-to-register, memory-to-register and vice versa) • logical/arithmetic operations (additions, subtractions, rotation of operand bits, right-shift, left-shift, etc) • bit manipulation • string manipulation • control flow (conditional and unconditional branches, subroutines, etc) • exception handling • Input/output management • processor operation control.

Memory areas and stack The main memory of a computer system is generally organized in different areas, each supporting a specific function. An executing program can, for example, partition its memory into a “code” area containing programs and a “data” area. In order to enable a simple partitioning of the available memory into specialized areas, processors typically resort to specialized registers, which manage the addresses inside the different areas. In particular, nearly all processors support a structure known as stack in the main memory. Some processors, such as Motorola 68000, can support different stacks depending on the executing mode (i.e., user/supervisor). In such a case, one distinguishes between user stack and supervisor stack.

Subroutines Normally, programs contain sequences of instructions, which are executed more, then once during program execution. A typical example is a code fragment which receives as input a string of characters and converts to lower case all upper case characters appearing in the string. In such situations, instead of repeating the code segment whenever we need it, it is more convenient to set a mechanism allowing the execution of the same segment in different moments and, possibly, on different data. Code segments are called subroutines and, to be managed properly, they require some mechanisms (machine instructions) supporting: • jumps to the subroutine and return to the calling program • data parameter exchange • the possibility for a routine to call another routine.

Interrupts and exceptions The Von Neumann model iterates indefinitely a fetch/decode/execute cycle. With this base model it is impossible to address those situations (called exceptions) where the system must react to external events in instants of time that are not known to the program. Typical examples are the interaction with the Operating System, data exchange with peripherals, debugs operations, error and instruction errors, etc. In order to manage exceptions properly, we need specific instructions which can interrupt the processor (i.e., temporarily suspend the executing program), launch a subroutine in charge of managing that specific exception (called Interrupt Service Routine, ISR), and, when the Interrupt Service Routine is finished, restore the interrupted program in the exact point where it was suspended.

CISC and RISC approaches in processor design Processor architecture paradigms are normally classified as CISC and RISC (Complex and Reduced Instruction Set Computer, respectively). They basically differ in the nature of the instruction sets interpreted by their central processing units. A typical characteristic of CISC processors is that they tend to reduce the number of instructions in a program by providing a large set of instructions that cover a broad range of tasks. Instructions are often translated by the CPU into a sequence of micro-instructions that perform the complex operations required by the CISC instruction. Due to the variety of tasks performed, CISC instructions often require a variable number of bits to be coded, and more than one cycle to be fetched and executed. RISC systems, on the other hand, tend to reduce the time needed to perform an elementary operation (the clock cycle) by simplifying the nature of instructions supported by the processor. Typically, such instructions are not translated into micro-operations, but are directly implemented by hardware circuits. As a drawback, this approach typically requires more instructions to implement a given program, compared to a CISC processor, and the translation between programs and machine instructions is usually more difficult. On the other hand, RISC systems are better suited to process instructions in parallel in pipelined implementations. If this is the case, the CPU works on more than one instruction at once, by overlapping the execution of consecutive instructions, greatly increasing throughput. For this reason, RISC systems are normally believed to be faster than their CISC counterparts. Motorola processors of the 680x0 family are examples of the CISC approach. Examples of RISC processors include the PowerPC, SPARC, MIPS, and the Alpha. Processors of the Intel x86 (IA-32) family are often described as CISC. Indeed, they tend to mix the two approaches in their internal implementation.

C.1.2.2 Instruction pipelining and instruction-level parallelism

Pipelining Pipelining exploits the fact that any instruction can be viewed as a sequence of separate micro-operations. Pipelining is based on a temporal overlapping of micro-operations corresponding to different instructions, just as in an assembly line. For example, each instruction is separately loaded to the decode register (fetch), decoded, and finally executed. In a given moment the processor can manage three different instructions, one in the fetch step, one in the decode step, and the other in the execute step. Pipelines typically include five steps: • Fetch instructions from memory • Reading registers and instruction decoding • Executing the instruction and computing the address • Accessing an operand in the data memory • Writing results to the registers. It is not always possible to overlap the execution of consecutive instructions in a pipeline. It may happen, for example, that the execution of the next instruction depends on the result of the current instructions (conditional branch instruction). In this case, hence, we need to wait the completion of the current instruction before starting the next instruction. In such a situation, we say that the pipeline has a stall. A second example of stall happens when an instruction requires as an operand the result of the previous instruction. Of course, stopping the pipeline degrades the performance of the system. Several techniques exist to avoid stalls, including, for instance, instruction reordering (trying to avoid that a given instruction is preceded by instructions upon which it depends).

Instruction Level Parallelism Instruction Level Parallelism (ILP) refers to the possibility to execute several instructions concurrently. There are two alternative approaches for ILP: Executing more instructions concurrently, which however must be in different execution stages. This is the case of pipelining. Allowing distinct instructions to be in the same stage. This technique of course requires that execution resources be replicated. This is the case of superscalar processor.

A different approach is taken by Explicitly Parallel Instruction Computing (EPIC), especially Very Long Instruction Word (VLIW) architectures, where a single macro-instruction groups more instructions using different functional units.

Superscalar architectures We call superscalar those architectures, which are provided with more than one pipeline. In such architectures functional units within the pipeline are replicated. Hence, in addition to executing more than one instruction concurrently, we can have distinct instructions in the same execution stage. There are two different approaches to superscalar processor: independent pipelines: each functional unit belongs to a given pipeline overlapped pipelines: in order to save costs related to the implementation of functional units, some units are shared by different pipelines. In the second case, we need additional hardware components allowing the resolution of conflicts when pipelines try to access the same shared unit at the same time. Intel Pentium processor has a pipeline u, that can be used for any operation, and a pipeline v, that can be used only for simple operations performed on integer operands.

C.1.2.3 Features and performance of a microprocessor It is very important to be able to measure the performance of a processor from both designer’s and user’s perspective in order to compare different architectures. Complexity of modern processors, however, makes it difficult to derive consistent measurements of processor performance. In the past, the clock frequency, i.e., the inverse of the clock period (the time taken by a clock cycle to complete) was usually adopted as a parameter to measure processor performance. In modern systems, clock frequency in itself does not make much sense since, for example, a pipelined system with n stages can be as fast as n times a processor with no pipeline. Later, MIPS, or Million of Instructions per Second, was introduced as a unit of measure. But this unit also proved to be inappropriate to compare different architectures. Currently, we resort to suitable software suites (called benchmarks). When executed on different systems, they allow designers to compare performance based on experimental data.

C.1.3 Computer Architectures

C.1.3.1 The architecture of a general-purpose computer

A computer system is essentially made of the following functional units: Input/Output Units, enabling the exchange of information with the external world (users, peripherals, other computers, ...) the computer has to interact with. They adapt such information, when necessary, into different formats compatible with those used internally by the system. Memory Unit, needed to store programs and data (input and output data, and intermediate results) to be processed Processing Unit (Processor), which decodes and executes program instructions stored in the Memory Unit Interconnection Unit (Bus), needed to connect the above units to allow the information exchange required.

A memory is a functional unit in a computer system, which stores programs and data. The elementary operations that can be performed on a memory are: selection, reading, and writing. The selection operation determines the physical portion of the memory (“location”, or “cell”) where a datum will be stored (by means of a writing operation) or from which the datum will be fetched (by means of a reading operation). Selection of a physical portion in the memory is based on a value uniquely assigned to that portion (the “address”). For cost and performance optimization, computer systems often contain more types of memory, differing in technology and access modes, and used for different purposes in different moments. In particular, the system normally contains a main memory, typically made of RAM components, which contains the programs currently executed and related data, and a mass memory, typically made of magnetic disks, used to store archives of data and program not currently used. Input/Output units, or simply peripherals, are in charge of exchanging information and input/output data between the computer and the external world. They are also required to convert information into formats, which are intelligible for both the processor and the external world. Typical examples of Input units are keyboards and mice, while typical examples of Output units are monitors, printers, and speakers. There are also units, which can act as both Input, and Output units, for example modems and network interfaces. In order for the processor to exchange information with a peripheral we need: • a physical medium (“bus”) • an appropriate “interface” device, allowing the processor to exchange signals with the peripherals • a sequence of signals (“protocol”), to manage the interaction between the processor and the peripheral.

There are many parameters influencing the performance of a computer system. These include: • processor parallelism: the width, in number of bits, of internal registers and functional units; typical values are 8, 16, 32, 64 bits. • memory parallelism: the width in number of bits of a memory word, i.e., the number of bits that can be accessed concurrently with a single reading or writing operation Bus parallelism: the width in number of bits of the bus Clock frequency: the frequency of the synchronization signal (clock) driving the functional units within the processor Machine cycle: the time it takes for the processor to perform a complete fetch-execute-decode cycle. It is influenced by both processor frequency and memory speed.

C.1.3.2 Types of buses in a computer system A bus is a physical structure, typically made of a set of connectors, which links two or more devices. A bus is a shared structure, in the sense that the values written by a device to the bus are accessible to all devices connected to that bus. Conceptually, a bus is typically used to exchange three types of information: data, address, and state/control information. As a consequence, to simplify the description we often speak of three types of buses: Data bus, Address Bus, and Control Bus. In practice, in order to minimize area requirements, a single physical structure is often used to exchange different types of information (data, address, state/control) in different moments. In last-generation computer systems, in order to optimize performance we often find architectural solutions similar to that presented in the figure of Section 0, where more levels of bus are introduced. A solution based on more buses is motivated by the fact that we need high-performance implementations of the memory bus, critical from a performance viewpoint as it manages the exchange of information between the processor and the memory, while I/O buses are often less critical in their implementation as they connect slower peripherals. Bus standards This section gives a short description of the main types of buses.

ISA (Industry Standard Architecture) is a bus system for IBM PCs and PC clones. The original standard, from 1981, was an 8 bit bus that ran at 4.77 MHz. In 1984, with the introduction of the IBM AT computer (which used the 80286 processor, introduced by Intel in 1982), ISA was expanded to a 16 bit bus that ran at 8.3 MHz.

MCA (Micro Channel Architecture) is a 32 bit bus introduced in 1987 by IBM with the PS/2 computer that used the Intel 80386 processor. IBM attempted to license MCA bus to other manufacturers, but they rejected it because of the lack of ability to use the wide variety of existing ISA devices. IBM continues to use a modern variation of MCA in some of its server computers.

EISA (Extended Industry Standard Architecture) is a 32 bit bus running at 8.3 MHz created by the clone industry in response to the MCA bus. EISA is backwards compatible so that ISA devices could be connected to it. EISA also can automatically set adaptor card configurations, freeing users from having to manually set jumper switches.

NuBus is a 32 bit bus created by Texas Instruments and used in the Macintosh II and other 680x0 based Macintoshes. NuBus supports automatic configuration (for “plug and play”).

VL bus (VESA Local bus) is created in 1992 by the Video Electronics Standards Association for the Intel 80486 processor. The VL bus is 32 bits and runs at 33 MHz. The VL bus requires use of manually set jumper switches.

PCI (Peripheral Component Interconnect) is a bus created by Intel in 1993. PCI is available in both a 32 bit version running at 33 MHz and a 64 bit version running at 66 MHz. PCI supports automatic configuration (for “plug and play”). PCI automatically checks data transfers for errors. PCI uses a burst mode, increasing bus efficiency by sending several sets of data to one address.

DIB (Dual Independent Bus) was created by Intel to increase the performance of frontside L2 cache.

SECC (Single Edge Contact Cartridge) was created by Intel for high speed backside L2 cache.

AGP (Accelerated Graphics Port) was created by Intel to increase performance by separating video data from the rest of the data on PCI I/O buses. AGP is 32 bits and runs at 66 MHz. AGP 2X double pumps the data, doubling the amount of throughput at the same bus width and speed. AGP 4X runs four sets of data per clock, quadrupling the throughput.

DRDRAM was a memory bus created by Rambus to increase speed of connections between the processor and memory. DRDRAM is a 33 bit bus running at 800 MHz. 16 bits are for data, with the other 17 bits reserved for address functions. C.1.3.3 Memory hierarchy It should be clear from the considerations above, that in a computer system we could usually find different types of memory, which constitute a kind of “memory hierarchy”. We can think of it as a pyramid. Memories at upper levels are typically faster, smaller and, in general, volatile. Memories at the lower levels are typically slower, larger and, in general, non volatile. Lower levels correspond to large sized memories used for storing permanently large set of data.

The memory hierarchy

Cache memories Modern computer systems are typically provided with two (or more) levels of cache, which differ in speed and size: the first level is placed on the same chip of the CPU, while the second level, slower and larger than the first level, is usually external to the CPU, being placed on the motherboard. Caches are organized in lines (or blocks). Accessing the memory requires the loading of an entire data block to the cache. The locality principle ensures that next accesses are likely to require data already loaded to the cache and will be thus performed in an efficient way. We use the expression cache hit if we are referring a data found in the cache. Otherwise, we use the expression cache miss. A cache miss requires the loading of a new block to the cache containing the datum requested. If the cache line to be used for the new block is already in use, we must replace the block that occupies that line. Since data in the cache may be modified by the processor, a further problem is to ensure that data remain coherent with the content of the main memory. There are different architectural solutions ensuring such a coherency. The most commonly used are known as write back and write through policies. The former requires data be modified in the cache, only, and, when a block is dismissed, it is copied to the main memory only if its content was modified. With the write through policy, on the other hand, all writing operations are performed by the processor on both the cache memory and the main memory.

Cache performance In case of a cache miss, an attempt to access a cache memory requires to search the main memory for the requested datum. The two operations can be performed concurrently so that the access time, in the best case (cache hit) is equal to the cache access time (less than the main memory access time), while in the worst case (cache miss) it is equal to the main memory access time. More precisely, a cache miss also requires replacing a block, inducing a penalty with respect to the access time of the main memory. A cache may thus decrease performance if the probability of a cache hit (hit ratio) is below a certain threshold. In that respect, it is fundamental to define an appropriate size for cache lines. In fact, if the line size is too small, we will not exploit the locality principle, while if the line is too large, we will often load to the cache data, which is not needed.

Secondary memory Secondary (or mass storage) memory is a unit within the computer systems used to store large amounts of data permanently. This unit can be accessed by the processor by means of appropriate input/output devices. Historically, the first mass storage memories were made of magnetic tapes. Currently, due to technological developments, there exist several types of mass storage memory based on magnetic supports, optical supports, and flash memory devices (solid state memories). Magnetic supports available on the market can contain hundreds of Gigabytes (GB). The size of optical supports can reach tens of Gigabytes. Devices based on flash memories have sizes in the order of some Gigabytes. Magnetic supports are based on substances with particular magnetic properties. Reading and writing operations are accomplished by means of a special device (the head) placed over the magnetizable surface. During a writing operation, the head is driven with an electrical current, which enables storing a 0 or 1 value (bit), by polarizing in one of two different directions the underlying magnetic surface. A reading operation, on the other hand, is performed by sensing the polarization direction of the surface. Both magnetic tapes and disks are based on this principle. In a mass storage memory based on tapes, the surface is partitioned into different areas (traces) placed in consecutive position. In a mass storage memory based magnetic disks, the surface is made of one or more magnetic plates, organized in concentric circles (traces), each divided into sectors. Hard disks uses one or more magnetic plates, each made of two magnetizable surfaces. There is one reading/writing head for each surface. Heads are positioned on an arm, which can be moved along the radial direction. The set of traces placed at the same distance from the center is called cylinder. Accessing a data within the hard disk require the radial positioning of the heads, i.e. the selection of a cylinder (requiring the so-called seek time), the rotation of plates allowing the required sector to reach the head (rotation latency), and data transfer itself. Seek time is in the order of milliseconds. Modern disks have a high rotation speed, typically 5200 or 7200 RPM (rotations per minute), with average rotation latency in the order of milliseconds. The data transfer time for a single sector is around tens of milliseconds. In modern disks, performance is improved by installing electronic memories in the disk, constituting a rapid-access buffer for information stored in the disk. Commercial disks can be divided in two categories: IDE (Integrated Drive Electronics) and SCSI (Small Computer System Interface). The former type is mostly widespread in Intel computers. The latter type is largely used in UNIX workstation, such as Macintosh computers and more sophisticated Intel PCs. Essentially, SCSI disks differ from IDE disks in their interface, while the organization in cylinders, traces, and sectors is basically the same. Performance of SCSI disks is generally higher (typical rotation speeds are 10.000/15.000 RPM). Moreover, systems equipped with SCSI buses turn out to be more scalable than IDE systems.

Virtual memory In order to work properly, a computer system requires that the program currently executed and a part of the processed data be stored in the main memory. We need suitable mechanisms (either software or hardware) to tackle the case in which programs require a memory size larger than that actually available in the system. Such techniques, known as virtual memory, conceptually partition both programs and main memory into pages. Only some of the program pages are loaded to the main memory, while remaining pages are stored in the secondary memory. When the processor requires a page that is not stored in the main memory, it is up to the Operating System to load it from the secondary memory, possibly replacing a page in the main memory that is not need anymore.

C.1.3.4 The range of computer systems

A computer system can be viewed in different ways by different people. Let’s consider some of them. The “Designer” looks at the system as a set of “elementary blocks” (e.g., boards, integrated circuits, architecture components, logic gates, etc), properly connected to each other. The “Assembler programmer” looks at the system as a set of “functional blocks” (e.g., processor registers, ALU, memories, peripherals, etc) properly connected to buses and able to execute only the operations described by machine instructions, as defined by the designer of the processor. The “High level programmer” looks at the system as a “virtual machine” able to execute programs described in the chosen language, typically working on suitable data structures defined by the programmer. The “User” looks at the system as a “service provider” and/or as a “command executor” (those of the Operating System and/or those provided by the application she/he is interested in). We emphasize that both the architecture and the operating model of the processor, previously introduced, are completely general and underpin nearly all systems we interact with in our every day’s life. All computer systems, ranging from servers and Personal Computers (PCs) to the so-called “embedded” systems, which are hidden to the end user (mobile phones, Personal Digital Assistant - PDAs, in such devices as Playstation, controllers of domestic appliances, etc), are characterized by a similar architecture and operate based on a sequence of steps similar to that presented above, independent of their performance and implementation (e.g., a System-on-Board rather than a single System-on-Chip).

8. Links to additional materials: [HVZ01] Hamacher, C., Vranesic, Z., and Zaky, S. “Computer Organization”, McGraw-Hill, 5th edition, 2001 [S02] Stallings, W. “Computer Organization and Architecture”, Prentice Hall, 2002. [T05] Tanenbaum, A.S. “Structured Computer Organization”, Prentice Hall, 2005. [HP02] Hennessy, J.L., Patterson, D.A., Goldberg, D. “Computer Architecture: A Quantitative Approach”, Morgan Kaufmann, 2002. [PHALS04] Patterson, D.A., Hennessy, J.L., Ashenden, P.J., Larus, J.R., Sorin, D.J. “Computer Organization and Design: The Hardware/Software Interface”, Morgan Kaufmann, 2004.

9. Test Questions

Question 01 There is one incorrect sentence in the following. Indicate which one. Answer A) A computer system is a machine able to execute programs. Answer B) The physical structure of a computer does not change although it can solve different problems Answer C) Each computer system can execute only programs written in a specific programming language. Answer D) Computer programs can be described with languages which are intelligible for human beings

Question 02 In a computer, the memory unit: Answer A) contains data and instructions of the program being executed Answer B) is a physical part of the processor Answer C) allows the user to enter data Answer D) stores all programs and data contained in the computer

Question 03 There is one incorrect sentence in the following statement concerning Input/Output units. Indicate which one. Answer A) A program cannot be executed without an input/output unit Answer B) Input/Output units are essential in order for a program to provide its results to the external world. Answer C) Keyboards are an input unit. Answer D) The computer system must “know” the peripheral it wants to communicate with.

Question 04 A bus is: Answer A) a device allowing the system to be connected to a network Answer B) a uniform set of registers within the processor used for general purposes Answer C) a logical structure connecting one or more computers Answer D) a physical structure providing an electrical connection between two or more devices

Question 05 Which of the following figures is suitable to describe the quality of a peripheral instead of the performance of a computer system? Answer A) Processor parallelism Answer B) Memory parallelism Answer C) Processor clock frequency Answer D) Monitor resolution

Question 06 A computer system can be seen from several points of view in several different ways. In that respect, pick the incorrect sentence among the following ones: Answer A) A computer system can be seen as a set of “elementary blocks” (i.e. circuits, logic gates, etc) suitably connected to each other. Answer B) A computer system can be seen as a set of “functional blocks” (i.e. processor registers, ALU, memories, peripherals, etc) appropriately connected by means of buses. Answer C) A computer system can be seen as a combination of a memory unit and an Input/Output unit. Answer D) A computer system can be seen as a system providing services to its users.

Question 07 Among the specialized registers within a processor… Answer A) the Memory Address Register (MAR) contains the instruction to be executed. Answer B) the Status Register (SR) contains the data to be processed by the current instruction. Answer C) the Program Counter (PC) is used to scan the program under execution Answer D) the Program Counter (PC) counts the number of instructions already executed

Question 08 Which of the following acronyms identify the packaging of memory modules? (multiple answer possible) Answer A) SIMM Answer B) SRAM Answer C) DIMM Answer D) EPROM

Question 09 A second level cache is: Answer A) A backup cache used when the main cache fails Answer B) A cache ensuring a lower hit rate Answer C) A cache which is larger and slower than a first level cache Answer D) A cheaper cache used in low-end devices as a replacement of normal caches.

Question 10 QWERTY keyboards: Answer A) are named after the position of their keys Answer B) are ergonomic devices Answer C) are cordless keyboards Answer D) are Macintosh keyboards

Question 11 Which of the following sentences is true? Answer A) A USB port can host at most 10 devices Answer B) Devices connected to a USB port need an autonomous power supply Answer C) The USB protocol supports hot swapping Answer D) If more devices are connected to a port, they get the same bandwidth than when they are connected independently Answers (correct and false)

Answers to Question 01 Answer C) is the only incorrect sentence.

Answers to Question 02 The correct answer is Answer A).

Answers to Question 03 Answer A) is the only incorrect sentence.

Answers to Question 04 The correct answer is Answer D).

Answers to Question 05 The correct answer is Answer D)

Answers to Question 06 The correct answer is Answer C)

Answers to Question 07 The correct answer is Answer C)

Answers to Question 08 Answer A) and Answer C) indicate a memory packaging.

Answers to Question 09 Answer C)

Answers to Question 10 Answer A)

Answers to Question 11 Answer C)

Feedback for answering Question 01 Answer A) The sentence is correct: a computer system is a machine able to execute predefined sequences of operations (called instructions) which form a program. Answer B) The sentence is correct: computer systems are characterized by the fact that, in spite of a fixed physical structure (hardware), they are able to solve different problems by simply using different programs (software). Answer C) The sentence is incorrect: programs are written in abstract languages, independent of the computer, and are then converted in a specific executable format which is different for different architectures. Answer D) The sentence is correct: a program can be described in a language intelligible for human beings (programming language) characterized by a specific set of syntactical and semantic rules, which let it be automatically converted to a format read by the machine.

Question 02 Answer A) The sentence is correct: the memory unit’s role is to store data (input data, output data, and intermediate data) being processed. Answer B) The sentence is incorrect: the memory unit and the processing unit (i.e., the processor) are two distinct parts of the computer architecture. Answer C) The sentence is incorrect: it is up to the Input/Output unit to read data from the user Answer D) The sentence is incorrect: it is up to mass storage devices (e.g., hard disks) to store all programs and data available within the computer. Main memory, much faster but limited in size, only contain data and programs while they are executed.

Question 03 Answer A) The sentence is incorrect: the processor and the memory unit alone are sufficient to execute a program, provided that no interaction with the external world is required. Answer B) The sentence is correct: in fact, Input/Output units allow the exchange of input and output information between the computer and the external world (user, sensors, actuators, other computers) by translating information into format intelligible to the computer, at one end, and to the external world, at the other end. Answer C) The sentence is correct: keyboards allow the user to enter data into the system. Answer D) The sentence is correct: in fact, in order for the processor to communicate with peripherals it is not sufficient to have a physical device. We also need an interface, allowing the processor to exchange signals with peripherals, a physical connection (bus), and a sequence of signals (protocol), managing the interaction between the processor and the peripheral.

Question 04 Answer A) The sentence is incorrect: it describes the function of a network adapter. Answer B) The sentence is incorrect: we refer to such a set of register as a “register file”, not as a bus. Answer C) The sentence is incorrect: it describes the function of a computer network Answer D) The sentence is correct: a bus is in fact a physical structure, with a set of communication rules associated with it, allowing two or more devices to communicate with each other.

Question 05 Answer A) Processor parallelism represents the number of bits in internal registers in the processing unit (typical values are 8,16,32, e 64 bit), and affects the amount of information the processor can process at once. Answer B) Memory parallelism represents the number of bits in memory words, i.e. the number of bits that can be accessed at once during a reading or writing operation. Answer C) Processor clock frequency represents the speed of the processor in executing elementary operations. Answer D) Monitor resolution is a quality figure of an output device and has nothing to do with the level of performance of the computer system.

Question 06 Answer A) This answer reflects the designer’s point of view. Answer B) This answer reflects the assembler programmer’s point of view. Answer C) In this answer, there is no Processing Unit. Answer D) This answer reflects the user’s point of view.

Question 07 Answer A) The Memory Address Register (MAR) contains the address of the memory location from/to which we are reading/writing data. Answer B) The Status Register (SR) contains the information related to the state of the processor. Answer C) The Program Counter (PC) contains the address of the memory location from which the next instruction will be fetched Answer D) The Program Counter (PC) is used to scan the program currently being executed and always contains the address of the memory location from which the next instruction will be fetched.

Question 08 Answer A) and Answer C) indicate a memory packaging, while the other indicate memory technologies.

Question 09 Answer A) A second level cache is a larger and slower cache (compared to a first level cache). It is sometimes placed in the motherboard between the main memory and the first level cache. Answer B) Since the size of a second level cache is lager than a first level cache, the cache hit is generally higher. Answer C) The answer is correct. Answer D) Second level caches are usually implemented with less expensive technologies than first level caches (for a given memory size). This however does not make them alternative.

Question 10 Answer A) QWERTY is the sequence of keys in the upper line of alphabetical keys. Answer B) A QWERTY keyboard may be ergonomic or not. These are independent features. Answer C) A QWERTY keyboard may be cordless or not. These are independent features. Answer D) Macintosh keyboards are QZERTY

Question 11 Answer A) By means of special hubs, a USB port can drive up to 127 devices Answer B) Devices connected to a USB port are powered by the port itself Answer C) It is possible to connect more devices without shutting down the PC Answer D) The sentence is incorrect. Devices connected to a same USB port share the original bandwidth.