Embedded System Design Archives

Sequential custom single purpose processor design

Combinational and sequential logic design techniques are used to build datapath components and controllers.
A sequential program to implement computation of a greatest common divisor (GCD) is as follows.
Figure 4.3(a) shows a black-box diagram of the desired system.
Data inputs are: x_i and y_i
data output d_o.

The system’s functionality is straightforward: the output should represent the GCD of the inputs.
If the inputs are 12 and 8, the GCD is 4.
If the inputs are 13 and 5, the GCD is 1.
Figure 4.3(b) provides a simple program with this functionality GCD.
TO build a single-purpose processor implementing the GCD program.
To begin building our single-purpose processor implementing the GCD program.
First convert our program into a complex state diagram, in which states and arcs may include arithmetic expressions.
These expressions use external inputs and outputs or variables.
In contrast, state diagrams includes boolean expressions, use external inputs and outputs, not variables.
Complex state diagram looks like a sequential program in which statements are scheduled into states.
Templates are to convert a program to a state diagram, as shown in Figure 4.2.
Each statement can be classified into assignment statement, loop statement, or branch (if-then-else or case) statement.

For assignment statement

A state is created with the statement as its action. an arc is added from this state for the next state.

For a loop statement

For a loop statement, a condition state C and a join state J are created both with no actions.
Add an arc with the loop’s condition from state C to the first statement in the loop body.
Add a second arc from !cond to the next statement from the loop.
Also add an arc from J to C.

For a branch statement

For a branch statement, create a condition state C and a join state J, both with no actions.
Create an arc from the first branch’s condition(C1) from C to the branch’s first statement(C1 stmts).
Create another arc with the condition!(C1*C2) from C to C2 stmts.
Repeat this for other branches.
Finally, connect the arcs from C1 stmts, C2 stmts and others to J.
Add an arc from J to next statement.
Using this template approach, convert GCD program to the complex state diagram of Figure 4.3(c).
Now let us see the designing of a custom single-purpose processor that executes the GCD program.
The next step is to divide the functionality into a datapath part and a controller part, as shown in Figure 4.4.
The datapath interconnection of part should consist of an and combinational sequential components.
The controller part should consist of a basic state diagram, i.e., one containing only boolean actions and conditions.

Construct the datapath through a four-step process:
1. Create registers for declared variables. The variables are x and y. Create registers x and y and connect these registers to input ports. create register d and connect it to the output port.
2. Create a functional unit for each arithmetic operation in the state diagram. In the example, there are two subtractions, one comparison for less than, one comparison for inequality, yielding two subtractors and two comparators.
3.Connect the ports, registers and functional units. A source may be an input port, a functional unit, or another register.
4. Finally, create a unique identifier for each control input and output of the datapath components.
Now that we have a complete datapath, we can build a state diagram for our controller.
Figure 4.4 shows the controller implementation model, and Figure 4.5 shows a state table.
Note that there are 7 inputs to the controller, resulting in 128 rows for the table.

don’t cares are used to reduce the rows in the state table for some input combinations, still optimization is possible by using (CAD) tools.

1 Star 2 Stars 3 Stars 4 Stars 5 Stars (No Ratings Yet)
Loading...

RT-level sequential components

For complex sequential systems,abstract sequential components are used.
The components are:
Register
Shift register
Counter

Register

A register stores n bits from its n-bit data input I, with those stored bits appearing at its output Q.
A register has two control inputs clock and load.
For a rising edge triggered register the inputs I are only stored when load is 1 and clockis rising from 0 to 1.
The clock input is usually drawn as a small triangle, as shown in the figure.
Another common register control input is clear, which resets all bits to 0, regardless of the value of I.
Because all n bits of the register can be stored in parallel, we often refer to this type of register as a parallel-load register.

Shift Register

A shift register stores n bits, but these bits cannot be stored in parallel.
These bits are shifted into the register serially, meaning one bit per clock edge.
A shift register has a one-bit data input I, and at least two control inputs clock and shift.
When clock is rising and shift is 1,
The value of I is stored in the (n)’th bit.
The (n)’th bit is stored in the (n-1)’th bit.
The (n-1)’th bit is stored in the (n-2)’th bit.
and likewise, until the second bit is stored in the first bit.
The first bit is typically shifted out, meaning it appears over an output Q.

Counter

A counter is a register that can also increment (add binary 1) to its stored binary value.
A counter has a clear input, which resets all stored bits to 0.
It has a count input, which enables incrementing on the clock edge.
A counter often also has a parallel load data input and associated control signal.
A common counter feature is both up and down counting or incrementing and decrementing requiring an additional control input to indicate the count direction.
These control inputs can be either synchronous or asynchronous.
Asynchronous inputs are independent of the clock.
synchronous inputs are dependent of the clock.

1 Star 2 Stars 3 Stars 4 Stars 5 Stars (1 votes, average: 5.00 out of 5)
Loading...

Sequential logic design

A sequential circuit is a digital circuit whose outputs are a function of the current as well as previous input values.
Sequential logic possesses memory.
One of the most basic sequential circuits is the flip-flop.
A flip-flop stores a single bit.

D-Flip-Flop

The simplest type of flip-flop is the D flip- flop.
It has two inputs: D and clock.
When clock is 1, the value of D is stored in the flip-flop output Q.
When clock is 0, the value of D is ignored; the output Q maintains its value.

SR-Flip-Flop

An SR flip-flop has threee inputs S,R,clk
S stands for Set and R stands for Reset.
When clock is 0, the previously stored bit is maintained and appears at output Q.
When clock is 1, the inputs S and R are examined.
If S is 1, 1 is stored in Q.
If R is 1 , 0 is stored in Q.
If both are 0s there is no change
If bothe are 1s behavior is undefined.

JK-Flip-Flop

An JK flip-flop has three inputs J,K,clk
JK flip-flop is same as that of SR flip-flop except that when J and K are 1, the stored bit toggles from 1 to 0 or 0 to 1.
When clock is 1, the inputs J nad K are examined .
If J is 1, 1 is stored in Q.
If K is 1 , 0 is stored in Q.

If both are 0s there is no change
If bothe are 1s output toggles.

1 Star 2 Stars 3 Stars 4 Stars 5 Stars (No Ratings Yet)
Loading...

RT level Combinational Components

A large combinational circuit would be very complex to design.
For example, a circuit with 16 inputs would have 2^16, or 64K, rows in its truth table.
One way to reduce the complexity is to use combinational components instead of logic gates.
Such combinational components often called Register-Transfer (or) RT-level components.

Multiplexer

A multiplexor, sometimes called a selector, allows only one of its data inputs Im to pass through to the output O.
Allowing only one of multiple input tracks to connect to a single output track.
If there are m data inputs, then there are log2(m) select lines S, and an mX1 multiplexer has m data inputs, one data output.
For Example an 8X1 Mux has 8- input lines and 1-output line and 3 select lines.
The binary value of S determines which data input passes through;
00…00 means I0 may pass,
00…01 means I1 may pass,
00…10 means I2 may pass, and so on.
For example, an 8×1 multiplexor has 8 data inputs and thus 3 select lines.
If s =110, then I6 will pass through to the output. So if I6 is 1, then the output would be 1; if I6 is 0, then the output would be 0.
Commonly n-bit multiplexer is used which is a more complex device.
Each data input and output, consists of n bits( lines).
For Example, a 4-bit 8×1 multiplexer. Thus, if I6 were 0110, then the output would be 0110.
n is independent of the number of select lines.

Decoder

A decoder is another combinational circuit.
A decoder converts its binary input I into a one-hot output O.
“One-hot” means that exactly one of the output lines can be 1 at a given time.
A decoder nX2^n has n-inputs and 2^n outputs.
A decoder can also be specified log2(n) x n where log2(n) number of inputs and n represents number of outputs.
For example, a 3×8 decoder has 3 inputs and 8 outputs.
If the input is 000, then the output O0 would be 1.
If the input is 001, then the output O1 would be 1, and so on.
An extra input is enable, when enable is 0 all outputs are O. When enable is 1, the decoder functions as before.

Adder

An adder adds two n-bit binary inputs A and B, generating an n-bit output sum along with an output carry.
For example, a 4-bit adder would have a 4-bit A input, a 4-bit B input, a 4-bit sum output, and a 1-bit carry output.
If A is 1010 and B is 1001, then sum would be 0011 and carry would be 1.
An adder often comes with a carry input also, such adders can be cascaded to create larger adders.

Comparator

A comparator compares two n-bit binary inputs A and B, generating outputs.
Output indicating whether A is less than, equal to, or greater than B.
If A were 1010 and B were 1001, then less would be 0, equal would be 0, and greater would be 1.

ALU

An ALU (arithmetic-logic unit) can perform a variety of arithmetic and logic functions on its n-bit inputs A and B.
The select lines S choose the current function.
If there are m possible functions, then there must be at least log2(m) select lines.
Common functions include addition, subtraction, AND, and OR.

Shifter

Another common RT-level component is a shifter.
An n-bit input I can be shifted left or right and then output to an output O.
For example, a 4-bit shifter with an input 1010 would output 0101 when shifting right one position.
Shifters come with an additional input indicating what value should be shifted in and an additional output indicating the value of the bit being shifted out.

1 Star 2 Stars 3 Stars 4 Stars 5 Stars (No Ratings Yet)
Loading...

Basic Combinational Logic Design

A combinational circuit is a digital circuit whose output is purely a function of its current inputs.
It has no memory(past inputs).
A simple technique is used to design a combinational circuit using basic logic gates.
Step1:- Problem description(output in terms of inputs).
Step2:- Translation of description into truth table with all possible input configurations.
Step3:- Derivation of output equations from output columns.
Step4:- Minimization of output equations by using k-maps to minimize the number of logic gates.
Step5:- Obtaining the final circuit from the output equations.

1 Star 2 Stars 3 Stars 4 Stars 5 Stars (No Ratings Yet)
Loading...

Combinational logic design

A transistor is the basic electrical component of digital systems.
Combinations of transistors forms logic gates.
Logic gates are the basic building block of digital systems.
A transistor acts as a simple on/off switch.
One type of transistor is CMOS transistor.
n-MOS transistor is shown in Fig 4.1(a).
The gate controls whether or not current flows from the source to the drain.
When a high voltage (typically +5 Volts, which we’ll refer to as logic 1) is applied to the gate, the transistor conducts, so current flows.
When low voltage (or) logic 0 is applied to the gate, the transistor does not conduct.

Fig 4.1(c) shows an Inverter.
When the input x is logic 0, the top transistor conducts (and the bottom does not), so logic 1 appears at the output F.
When the input x is logic 0, then output becomes 1.

Fig 4.1(d) shows NAND gate.
x, y are inputs to NAND gate and F is the output.
When at least one of the inputs x and y is logic 0, then at least one of the top transistors conducts (and the bottom transistors do not), so logic 1 appears at F.
If both inputs are logic 1, then neither of the top transistors conducts, both of the bottom transistors conducts so logic 0 appears at F.

Fig 4.1(e) shows NOR gate.
When two inputs are at logic 0, then output F is 1.
When two inputs are at logic 1, then output F is 0.

1 Star 2 Stars 3 Stars 4 Stars 5 Stars (1 votes, average: 5.00 out of 5)
Loading...

Custom Single Purpose Processors

Introduction:-

A single-purpose processor is a digital system intended to solve a specific computation task.
A manufacturer builds a standard single-purpose processor for use in a variety of applications.
A custom single purpose processor to execute a specific task within our embedded system.
Benefits of a custom single purpose processor are:
performance is fast, due to fewer clock cycles.
It consists of simpler functional units, less multiplexers, or simpler control logic.
size is small, due to a simpler datapath.
No program memory.
Less NRE cost.
Reduced performance and size.

(No Ratings Yet)

Digital Signal Processors (DSPs)

Digital Signal Processors (DSPs)

Powerful special purpose 8/16/32 bit microprocessors designed specifically to meet the computational demands and power constraints of today’s embedded audio, video, and communications applications.
Digital Signal Processors are 2 to 3 times faster than the general purpose microprocessors in signal processing applications.
DSPs implement algorithms in hardware which speeds up the execution whereas general purpose processors implement the algorithm in firmware and the speed of execution depends primarily on the clock for the processors.
DSP can be viewed as a microchip designed for performing high speed computational operations for “addition”,“subtraction”,“multiplication” and “division”.
A typical Digital Signal Processor incorporates the following key units.
Program Memory.
Data Memory.
Computational Engine.
I/O Unit Audio video signal processing, telecommunication and multimedia applications are typical examples where DSP is employed.

Microcontrollers

Microcontrollers

A highly integrated silicon chip containing a CPU, scratch pad RAM, Special and General purpose Register Arrays, On Chip ROM/FLASH memory for program storage, Timer and Interrupt control units and dedicated I/O ports.
Microcontrollers can be considered as a super set of Microprocessors.
Microcontroller can be general purpose (like Intel 8051, designed for generic applications and domains) or application specific (Like Automotive AVR from Atmel Corporation. Designed specifically for automotive applications).
Since a microcontroller contains all the necessary functional blocks for independent working, they found greater place in the embedded domain in place of microprocessors.
Microcontrollers are cheap, cost effective and are readily available in the market.
Texas Instruments TMS 1000 is considered as the world‟s first microcontroller.

A silicon chip representing a Central Processing Unit (CPU), which is capable of performing arithmetic as well as logical operations according to a pre-defined set of Instructions, which is specific to the manufacturer.

In general the CPU contains the Arithmetic and Logic Unit (ALU), Control Unit and Working registers.

Microprocessor is a dependant unit and it requires the combination of other hardware like Memory, Timer Unit, and Interrupt Controller etc for proper functioning.

Intel claims the credit for developing the first Microprocessor unit Intel 4004, a 4 bit processor which was released in Nov 1971.

Developers of microprocessors.

Intel – Intel 4004 – November 1971(4-bit).
Intel – Intel 4040.
Intel – Intel 8008 – April 1972.
Intel – Intel 8080 – April 1974(8-bit).
Motorola – Motorola 6800.
Intel – Intel 8085 – 1976.
Zilog – Z80 – July 1976

The Core of the Embedded Systems

The Core of the Embedded Systems

The core of the embedded system falls into any one of the following categories.

General Purpose and Domain Specific Processors

Microprocessors.
Microcontrollers.
Digital Signal Processors.
Programmable Logic Devices (PLDs).
Application Specific Integrated Circuits (ASICs).
Commercial off the shelf Components (COTS).

Addressing modes

Addressing modes

In immediate addressing, the operand field contains the data itself.
In register direct addressing, the operand field contains the address of a datapath register in which the data resides.
In register-indirect addressing, the operand field contains the address of a register, which in turn contains the address of a memory location in which the data resides.
In direct addressing, the operand field contains the address of a memory location in which the data resides.
In indirect addressing, the operand field contains the address of a memory location, which in turn contains the address of a memory location in which the data resides.

Figure 2.6 shows a (trivial) instruction set with 4 data transfer instructions, 2 arithmetic instructions, and 1 branch instruction, for a hypothetical processor.

Figure 2.7(a) shows a program, written in C, that adds the numbers 1 through 10. Figure 2.7(b) shows that same program written in assembly language using the given instruction set.

Programmers view

Programmers view

A programmer

writes the program instructions that carryout the desired functionality.
may need not to know detailed information about the processors architecture or operation.

Instead may deal with architectural abstraction.
The level of abstraction depends on the level of programming.

The Two-levels of programming are

Assembly language programming.
Structured-language programming.
Assembly language programming:- represents processor specific instructions as mnemonics.
Structured language programming:- uses processor independent instructions.
A compiler automatically translates processor independent instructions into processor specific instructions.

Instruction set

The assembly language programmer must know the processors instruction set.

The instruction set describes the bit configurations in IR forms assembly instructions which in turn forms assembly program.
Instructions are classified into three categories.
Data transfer instructions:- move data between memory and registers, between input-output channels and registers and between registers themselves.
Arithmetic Logical instructions:- configure the ALU to carry out a particular function.
Channel data from the registers to the ALU, ALU to registers.
Branches can be further categorized as being unconditional jumps, conditional jumps or procedure call and return instructions.
Unconditional jumps always determine the address of the next instruction.
While conditional jumps checks the condition to jump to a particular address.
An instruction set has two parts opcode field, operand field.
Opcode field:- specifies the operation to take place during the instruction.
Operand field:- specifies the location of the actual data that takes part in an operation.
Source operands serve as input to the operation, while a destination operand stores the output.
The number of operands per instruction varies among processors.
The operand field may indicate the data’s location through one of several addressing modes.

Addressing modes

In immediate addressing, the operand field contains the data itself.
In register direct addressing, the operand field contains the address of a datapath register in which the data resides.
In register-indirect addressing, the operand field contains the address of a register, which in turn contains the address of a memory location in which the data resides.
In direct addressing, the operand field contains the address of a memory location in which the data resides.
In indirect addressing, the operand field contains the address of a memory location, which in turn contains the address of a memory location in which the data resides.

Figure 2.6 shows a (trivial) instruction set with 4 data transfer instructions, 2 arithmetic instructions, and 1 branch instruction, for a hypothetical processor.

Figure 2.7(a) shows a program, written in C, that adds the numbers 1 through 10. Figure 2.7(b) shows that same program written in assembly language using the given instruction set.

Program and data memory space

The ES programmer

must be aware of the size of program and data memory.
must check on chip program and data memory.

must not exceed program and data memory limits.

Registers

The assembly-language programmer

Must know how many registers are available for data storage.
Must be familiar with registers with special functions.

Such registers are used for configuring built-in timers, counters, and serial communication devices.

I/O

The programmer

should be aware of the processor’s (I/O).
can read or write a port by reading or writing a special register.

One common I/O facility is parallel I/O.
Another common I/O facility is a system bus, consists of address and data ports.

Interrupts

An interrupt causes the proessor to suspend execution of the main program, jumps to an ISR.
The processor stores the current PC, and sets it to the address of the ISR.
After executing the ISR control returns to main program by restoring the PC.
The programmer

Should be aware of the types of interrupts.
Places each ISR at a specific address in program memory.

Operating system

An operating system is

a layer of software that provides low-level services to the application layer.
decides what program is to run next on the CPU and for how long.

It does process/task scheduling.

It services various H/W interrupts.
Implements an environment for management of high-level application programs.

A system call is the one for an application to invoke the operating system.
operating system generates a predefined software interrupt required by a program.
Parameters are typically passed from (to) the application program to (from) the operating system through CPU registers.

Operating system

Operating system

An operating system is

a layer of software that provides low-level services to the application layer.
decides what program is to run next on the CPU and for how long.

It does process/task scheduling.

It services various H/W interrupts.
Implements an environment for management of high-level application programs.

A system call is the one for an application to invoke the operating system.
operating system generates a predefined software interrupt required by a program.
Parameters are typically passed from (to) the application program to (from) the operating system through CPU registers.

Development environment

Development environment

Software and hardware tools supports the programming of general-purpose processors.
Two processors in ES development are:

The development processor, on which we write and debug our program. This processor is part of our desktop computer.
The other processor is the target processor, to which we will send our program and which will form part of our ES’s implementation.

For example, we may develop our system on a Pentium processor, but use a Motorola 68HC11 as our target processor.
Sometimes the two processors are same.
Assemblers translate assembly instructions to binary machine instructions.
An assembler may also translate symbolic labels into actual addresses.
The mapping of assembly instructions to machine instructions is one-to-one.
A linker is a software tool that creates an executable file.
Compilers translate structured programs into machine (or assembly) programs.
Cross-compilers are extremely common in embedded system development.
Debuggers help programmers evaluate and correct their programs.
A source-level debugger enables step-by-step execution.
Device programmers download a binary machine program from the development processor’s memory into the target processor’s memory.
Emulators support debugging of the program while it executes on the target processor.
An emulator typically consists of a debugger coupled with a board connected to the desktop processor via a cable.
In-circuit emulator enables one to control and monitor the program’s execution.
In circuit emulators are available in ES itself.
The availability of low-cost or high-quality development environments for a processor influences the choice of a processor.

Microcontrollers

Microcontrollers

In the embedded systems domain. These devices may include several features.
First, they may include peripheral devices, such as timers, analog to digital converters, and serial communication devices, on the same IC as the processor.
Second, they may include some program and data memory on the same IC.
Third, they may provide the programmer with direct access to a number of pins of the IC.
Fourth, they may provide specialized instructions for bit manipulation operations.
A microcontroller is a device possessing some or all of these fea
Incorporating peripherals and memory onto the same IC reduces the number of required IC’s, resulting in compact and low-power implementations.
Providing pin access allows programs to easily monitor sensors, set actuators, and transfer data with other devices.
Providing specialized instructions improves performance for embedded systems applications; thus, microcontrollers can be considered ASIPs to some degree.
Many manufactures market devices referred to as “embedded processors”.

Pipelining

Pipelining

Pipelining is a common way to increase the instruction throughput of a microprocessor.
Throughput is the amount of data processed by a processor.
In Pipeline instruction execution After the instruction fetch unit fetches the first instruction.
The decode unit decodes it while the instruction fetch unit simultaneously fetches the next instruction and so on.
The idea of pipelining is illustrated in Figure 2.4.

For pipelining to work well, instruction execution must be decomposable into roughly equal length stages.
Each instruction requires the same number of cycles.
Branches pose a problem for pipelining, since it is not possible to know the address of the next instruction.
One solution is to stall the pipeline when a branch is in the pipeline.
An alternative is to guess the way the branch will go and fetch the next instruction.
if right, proceed with no penalty,
if wrong this incurrs a penalty.
Modern pipelined microprocessors have built in sophisticated branch predictors.

Operation

Operation

Instruction execution in a microprocessor has several basic stages:

Fetch instruction: the task of reading the next instruction from memory into the instruction register.
Decode instruction: the task of determining what operation the instruction in the instruction register represents.
Fetch operands: the task of moving the instruction’s operand data into appropriate registers.
Execute operation: the task of feeding the appropriate registers through the ALU and back into an appropriate register.
Store results: the task of writing a register into memory.

If each stage takes one clock cycle, then we can see that a single instruction may take several cycles to complete.

General Purpose Processors Software-Basic Architecture

Introduction

The designer of a general-purpose processor, or microprocessor, builds a variety of applications.
An embedded system designer simply uses a general-purpose processor, by programming the processor’s memory to carry out the required functionality.
This part of an implementation is known as software.
Using a general-purpose processor in an embedded system results in several design metric benefits. This part of an implementation is known as software.
Using a general-purpose processor in an embedded system results in several design metric benefits.

Basic architecture

A general-purpose processor, sometimes called a CPU or a microprocessor, consists of a datapath and a controller, tightly linked with a memory.
The figure shows the Basic Architecture of a processor.

Datapath

The datapath consists of the circuitry for transforming data and for storing temporary data.
The datapath contains an arithmetic-logic unit (ALU) capable of transforming data through operations such as addition, subtraction, logical AND, logical OR, inverting, and shifting.
The ALU also generates status signals, often stored in a status register.
The datapath also contains registers capable of storing temporary data.
Temporary data may include data brought in from memory but not yet sent through the ALU.
data coming from the ALU that will be needed for later ALU operations or will be sent back to memory.
The internal data bus is the bus over which data travels within the datapath.
While the external data bus is the bus over which data is brought to and from the data memory.

Controller

The controller consists of circuitry for retrieving program instructions, and for moving data to, from, and through the datapath according to those instructions.
The controller contains a program counter (PC) that holds the address of the next instruction.
The controller also contains an instruction register (IR) to hold the fetched instruction.
Based on this instruction, the controller’s control logic generates the appropriate signals to control the flow of data in the datapath.
Storing ALU results into a particular register, or moving data between memory and a register.
Finally, the next-state logic determines the next value of the PC.
For a non-branch instruction, this logic increments the PC.
For a branch instruction, this logic looks at the datapath status signals and the IR to determine the appropriate next address.
The PC’s bit-width represents the processor’s address size.
The address size is independent of the data word size; the address size is often larger.
The address size determines the number of directly accessible memory locations, referred to as the address space or memory space.
If the address size is M, then the address space is 2^M.
For each instruction, The controller typically sequences through several stages.
Such as fetching the instruction from memory, decoding it, fetching operands, executing the instruction in the datapath, and storing results.
Each stage may consist of one or more clock cycles. A clock cycle is usually the longest time required for data to travel from one register to another.
The path through the datapath or controller that results in this longest time is called the critical path.
The inverse of the clock cycle is the clock frequency.
The shorter the critical path, the higher the clock frequency.
Clock frequency is being used as one of the measures of comparing processors.
If clock frequency is higher program execution is faster.

Memory

Registers serve a processor’s short term storage requirements.
Memory serves the processor’s medium and long-term information-storage requirements.
We can classify stored information as either program or data.
Program information consists of the sequence of instructions that cause the processor to carry out the desired system functionality.
Data information represents the values being input, output and transformed by the program.
We can store program and data together or separately.
In a Princeton architecture, data and program words share the same memory space.
In a Harvard architecture, the program memory space is distinct from the data memory space.
Figure 2.2 illustrates these two methods.

The Princeton architecture may result in a simpler hardware connection to memory, since only one connection is necessary.
A Harvard architecture, while requiring two connections, can perform instruction and data fetches simultaneously, so may result in improved performance.
Most machines have a Princeton architecture.
The Intel 8051 is a well-known Harvard architecture.
Memory may be read-only memory (ROM) or readable and writable memory (RAM).
ROM is usually much more compact than RAM.
An embedded system often uses ROM for program memory.
Unlike in desktop systems, an embedded system’s program does not change.
Memory may be on-chip or off-chip.
On-chip memory resides on the same IC as the processor, while off-chip memory resides on a separate IC.
The processor can usually access on-chip memory must faster than off-chip memory.

Cache Memory

To reduce the time needed to access (read or write) memory, a local copy of memory may be kept in a fast memory called cache, as illustrated in Figure 2.3.
Cache memory often resides on-chip, and often uses fast and expensive static RAM technology.

In Cache memory processor stores location of its neighbors’ to access a particular memory location.
If the copy does reside in cache, it’s a cache hit.
If the copy does not reside in cache its a cache miss.
For intelligent caching schemes, the ratio of cache hit/cache miss must be very high.
Caches are used for both program memory (often called instruction cache, or I-cache) as well as data memory (often called D-cache)

Interfacing with a general-purpose processor

Interfacing with a general-purpose processor

The most common communication situation in ES is the input and output (I/O) of data to and from a general-purpose processor.
I/O is relative to the processor: input means data comes into the processor, while output means data goes out of the processor.
The three processor I/O issues are: I/O addressing, interrupts, and direct memory access.
A general-purpose processor in this section means a microprocessor.

I/O addressing

A microprocessor may have tens or hundreds of pins, which are control pins and input pin.
control pins are like pin for clock input.
input pin is for resetting the microprocessor.
processor I/O pins are used to communicate data to and from the microprocessor.
pins to support I/O: ports, and system buses.
A port is a set of pins that can be read and written just like any register in the microprocessor.
The port is usually connected to a dedicated register.
In contrast to a port, a system bus is a set of pins consisting of address pins, data pins, and control pins (for strobing or handshaking).
The microprocessor uses the bus to access memory as well as peripherals.
The access to the peripherals as I/O. But don’t normally consider the access to memory as I/O.
Since the memory is considered more as a part of the microprocessor.
A microprocessor may use one of two methods for communication over a system bus:
Memory-mapped I/O or I/O mapped I/O (Standard I/O).

Interrupts

Interrupt-driven I/O:-

It is one of the issues in Microprocessors.
Servicing: – The program running on a microprocessor read and process data from a peripheral whenever that peripheral has new data.
To identify the new data at unpredictable intervals is by checking for a 1 in a particular bit in a register of the peripheral.
This repeated checking by the microprocessor for data is called polling.
Polling is simple to implement, but data process is not quick.
To overcome the polling, most microprocessors uses concept of external interrupt.
It uses a pin Int.
At the end of executing each machine instruction, the processor’s checks Int.
If Int is asserted, the microprocessor jumps to a particular address at which a subroutine exists which is called an Interrupt Service Routine, or ISR. Such I/O is called interrupt-driven I/O.
There are two methods to identify the address of the ISR.

Interrupt address vector:-

The address to which the microprocessor jumps on an interrupt is fixed.
The assembly programmer either puts the ISR there, or jumps to the real ISR there. If not enough bytes are available in the memory.

Vectored Interrupt:-

Here, peripheral must provide the address.
It is Common when microprocessor has multiple peripherals connected by a system bus.

Direct Memory Access:-

The data being accumulated in a peripheral should be first stored in memory before being processed by a program running on the microprocessor.
Such temporary storage to await processing is called buffering.
Peripheral to memory transfer without DMA causes inefficiencies.
The I/O method of direct memory access (DMA) eliminates these inefficiencies.
A DMA controller is a separate single-purpose processor.

Memory-mapped I/O or I/O mapped I/O (Standard I/O)

Memory mapped I/O	I/O mapped I/O
A method to perform I/O operations between the CPU and peripheral devices in a computer that uses one address space for memory and IO devices	A method to perform I/O operations between the CPU and peripheral devices in a computer that uses two separate address spaces for memory and IO devices
Uses the same memory address space for both memory and IO devices	Uses two separate address spaces for memory and IO devices
The available addresses for memory are minimum	All the addresses can be used by the memory
Uses the same instructions for both IO and memory operations	Uses separate instructions for read and memory
In memory-mapped I/O, peripherals occupy specific addresses in the existing address space.	In standard I/O (also known as I/O-mapped I/O), the bus includes an additional pin, which we label M/IO, to indicate whether the access is to memory or to a peripheral (i.e., an I/O device).
For example, consider a bus with a 16-bit address. The lower 32K addresses may correspond to memory addresses, while the upper 32K may correspond to I/O addresses.	For example, when M/IO is 0, the address on the address bus corresponds to a memory address. When M/IO is 1, the address corresponds to a peripheral.
An advantage of memory-mapped I/O is that the microprocessor need not include special instructions for communicating with peripherals.	Advantages of standard I/O include no loss of memory addresses to use as I/O addresses, and potentially simpler address decoding logic in peripherals.
Interrupts: Can share interrupt vectors with memory	Interrupts: Requires separate interrupt vectors for I/O devices.
Hardware: It is less complex because it utilizes existing memory addressing mechanism	Hardware: it requires extra interpreting circuitry and complexity
Less efficient	More efficient

Hardware Protocol basics

Hardware Protocol basics

The Basic protocol concepts include: actors, data direction, addresses, time-multiplexing, and control methods.

Actor

An actor is a processor or memory involved in the data transfer.
A protocol typically involves two actors: a master and a servant.
A master initiates the data transfer. A servant (usually called a slave) responds to the initiation request.
In the example of Figure: bus structure,
The processor is the master and the memory is the servant.
e, The memory cannot initiate a data transfer.
The servant could also be another processor.
Masters are usually general-purpose processors, while servants are usually peripherals and memories.

Data direction

Data direction denotes the direction that the transferred data moves between the actors.
The direction is indicated by denoting each actor as either receiving or sending data.
In particular, a master may be either the receiver of data or the sender of data.

Addresses

Addresses represent a special type of data used to indicate where regular data should go to or come from.
A protocol often includes both an address and regular data, as in Figure 6.1.
The address specifies where the data should be read from or written to in the memory.
An address is necessary when a general-purpose processor communicates with multiple peripherals over a single bus.
The address not only specifies a particular peripheral, but also may specify a particular register within that peripheral.

Time multiplexing

Multiplexing means sharing a single set of wires for multiple pieces of data.
In time multiplexing, the multiple pieces of data are sent one at a time over the shared wires.
For example, Figure 6.2(a) shows a master sending 16 bits of data over an 8-bit bus using time-multiplexed data.
The master first sends the high-order byte, then the low-order byte.
The servant must receive the bytes and then demultiplex the data.
This serializing of data can be done to any extent.

Control methods

Control methods are schemes for initiating and ending the transfer.
Two of the most common methods are strobe and handshake.
In a strobe protocol, the master uses one control line, called the request line, to initiate the data transfer.
The transfer is considered to be complete after some fixed time interval after the initiation.
For example, Figure 6.3(a) shows a strobe protocol.
The second common control method is a handshake protocol, Figure 6.3(b) shows a handshake protocol.
In terms of varying response times, when response time is known, a handshake protocol may be slower than a strobe protocol.

A compromise protocol is often used, as shown in the Figure 6.4.

Interfacing, Timing Diagrams, Timing Diagrams of read and write operations

Interfacing

Introduction to Interfacing

Introduction as stated earlier, processors to implement processing, memories to implement storage, and buses to implement communication.
The earlier chapters described processors and memories. This chapter describes implementing communication with buses, i.e., interfacing.
Buses implement communication among processors or among processors and memories.
Communication is the transfer of data among those components.
Communication In a general-purpose processor means reading or writing a memory, a peripheral’s register.

Timing diagrams

A bus consists of wires connecting two or more processors or memories.
Figure 6.1(a) shows the wires of a simple bus connecting a processor with a memory.
Note that each wire may be uni-directional, as are rd/wr, enable and addr.
data is bi-directional. Also note that a set of wires with the same function is typically drawn as a thick line (or a line with a small angled line drawn through it).
addr and data each represent a set of wires;
The addr wires transmit an address, while the data wires transmit data.
A pin is the actual conducting device through which a signal is input to or output from the processor.
A bus must have an associated protocol describing the rules for transferring data over those wires.
Primarily low-level hardware protocols are discussed in this chapter,
While higher-level protocols, like IP (Internet Protocol) can be built on top of these protocols, using a layered approach.
Interfacing with a general-purpose processor is extremely common.
The three issues relating to such interfacing are: addressing, interrupts, and direct memory access.
When multiple processors attempt to access a single bus or memory simultaneously, resource contention exists.
The most common method for describing a hardware protocol is a timing diagram.
Consider the example processor-memory bus of Figure 6.1(a).
Figure 6.1(b) uses a timing diagram to describe the protocol for reading the memory over the bus.
In the diagram, time proceeds to the right along the x-axis.

Timing diagram for read operation:-

The Fig(b) shows that the processor must set the rd/wr line low for a read to occur.
The diagram also shows, using two vertical lines, that the processor must place the address on addr for at least tsetup time before setting the enable line high.
The diagram shows that the high enable line triggers the memory to put data on the data wires after a time tread.
Note that a timing diagram represents control lines, like rd/wr and enable (high or low).
While it represents data lines, like addr and data, either invalid (a single horizontal line) or valid (two horizontal lines).
The value of data lines is not normally relevant when describing a protocol.
Such a control line is typically written with a bar above it, a single quote after it (e.g., enable’), or an underscore l after it (e.g., enable_l).
The term “assert” mean setting a control line to its active value (i.e., to 1 for an active high line, to 0 for an active low line).
The term “deassert” to mean setting the control line to its inactive value.

Timing diagram for write operation:-

The Fig(c) shows that the processor must set the rd/wr line high for a write to occur.
The diagram also shows, using two vertical lines, that the processor must place the address on addr for at least tsetup time before setting the enable line high.
The diagram shows that the high enable line triggers the memory to write data from the data wires after a time to write.

IC Technology

IC Technology

The IC technology is divided into:

Full-custom/VLSI.
Semi-custom ASIC (gate array and standard cell).
PLD (Programmable Logic Device).

Introduction to IC Technology

Every processor must eventually be implemented on an IC.
IC technology involves the manner mapping a digital (gate-level) implementation onto an IC.
An IC (Integrated Circuit), often called a “chip,” is a semiconductor device consisting of a set of connected transistors and other devices.
The most popular process to design an IC is CMOS (Complementary Metal Oxide Semiconductor).
IC technology is independent from processor technology.
Any type of processor can be mapped to any type of IC technology.
To understand the differences among IC technologies, first recognize that semiconductors consist of numerous layers.
The bottom layers form the transistors.
The middle layers form logic gates.
The top layers connect these gates with wires.
The task of building the layers is actually one of designing appropriate masks.
A set of masks is often called a layout.

Full-custom/VLSI

In a full-custom IC technology, all layers are optimized for particular embedded system’s digital implementation.
such optimization includes:
Placing the transistors to minimize interconnection lengths.
Sizing the transistors to optimize signal transmissions.
Routing wires among the transistors.
Once all the masks are completed, the mask specifications are sent to a fabrication plant that builds the actual ICs.
VLSI design, has very high NRE cost and long turnaround times (typically months) before the IC becomes available.
It can yield excellent performance with small size and power.
It is usually used only in high-volume or extremely performance-critical applications.

Semi-custom ASIC (gate array and standard cell)

In an ASIC (Application-Specific IC) technology, the lower layers are fully or partially built, leaving us to finish the upper layers.
In a gate array technology, the masks for the transistor and gate levels are already built (i.e., the IC already consists of arrays of gates).
The remaining task is to connect these gates to achieve our particular implementation.
In a standard cell technology, logic-level cells (such as an AND gate or an AND-OR-INVERT combination) have their mask portions pre-designed, usually by Figure 1.7.
IC’s consist of several layers. The Figure Shown is a simplified CMOS transistor; an IC may possess millions of these, connected by layers of metal (not shown).

PLD

In a PLD (Programmable Logic Device) technology, all layers already exist, so actual IC can be purchased.
The layers implement a programmable circuit, where programming has a lower-level meaning than a software program.
The programming that takes place may consist of creating or destroying connections between wires either by blowing a fuse, or setting a bit in a programmable switch.
PLD’s can be divided into two types, simple and complex.

PLA (Programmable Logic Array), which consists of a programmable array of AND gates and a programmable array of OR gates.

Another type is a PAL (Programmable Array Logic), which uses one programmable array to reduce the number of components.
One type of complex PLD is the FPGA (Field Programmable Gate Array).
It offers connectivity among blocks of logic, rather than just arrays of logic as with PLAs and PALs, and is thus able to implement far more complex designs.
PLDs offer Very low NRE cost and Instant IC availability.
Since PLDs are typically bigger than ASICs, may have higher unit cost, may consume more power, and may be slower (especially FPGAs).
They still provide reasonable performance, though, so are especially well suited to rapid prototyping.
As mentioned earlier and illustrated in Figure 1.8, the choice of an IC technology is independent of processor types.
For example, a general-purpose processor can be implemented on a PLD, semi-custom, or full-custom IC.

Design Technology

Design Technology

Design technology involves converting desired system functionality into an implementation.
The implementation requires optimization of design metrics.
Variations of a top-down design process are illustrated in Figure.
The designer refines the system through several abstraction levels.
At the system level, the designer describes the desired functionality in C language; this is called as system specification.
The designer converts system specification into behavioral specifications by giving it to general or single purpose processors.
The designer refines these specifications into register-transfer (RT) specifications by converting behavior on general-purpose processors to assembly code.
By converting behavior on single-purpose processors to a connection of register-transfer components and state machines.
The designer then refines the register-transfer-level specification of a single-purpose processor into a logic specification consisting of Boolean equations.

Compilation/Synthesis

It lets a designer specify desired functionality in an abstract manner.
It automatically generates lower-level implementation details.
Describing a system at high abstraction levels can improve productivity.
A logic synthesis tool converts Boolean expressions into a connection of logic gates (called a net list).
A register-transfer (RT) synthesis tool converts finite-state machines and register-transfers into a datapath of RT components and a controller of Boolean equations.
A behavioral synthesis tool converts a sequential program into finite-state machines and register transfers.
A software compiler converts a sequential program to assembly code, which is register-transfer code.
Finally, a system synthesis tool converts an abstract system specification into a set of sequential programs on general and single-purpose processors.
The RT and behavioral synthesis tools has enabled a view of the design process for single-purpose and general-purpose processors.
The design for single-purpose processors-“hardware design”.
The design for general-purpose processor-“software design.”
In the past, the design processes were:
Software designers wrote sequential programs, while hardware designers connected components.
But today, synthesis tools have converted the hardware design process essentially into one of writing sequential programs.
This develops synthesis tools and simulators that enable the co-development of systems using both hardware and software.
The choice of hardware versus software for a particular function is simply a tradeoff among various design metrics, like performance, power, size, NRE cost, and especially flexibility.

Libraries/IP

Libraries involve re-use of pre-existing implementations.
It improves productivity of existing implementations.
A logic level library may consist of layouts for gates and cells.
An RT-level library may consist of layouts for RT components, like registers, multiplexers, decoders and functional units.
A behavioral-level library may consist of bus interfaces, display controllers, and even generalpurpose processors (cores).
A system-level library might consist of complete systems with operating systems and programs.

Test/Verification

Test/Verification involves ensuring that functionality is correct.

At the logic level, gatelevel simulators provide output signal timing waveforms given input signal waveforms.
At the RT-level, hardware description language (HDL) simulators execute RT-level descriptions and provide output waveforms given input waveforms.
At the behavioral level, HDL simulators simulate sequential programs, to enable hardware/software co-verification.
At the system level, a model simulators/checkers simulates the initial system specification to verify correctness and completeness of the specification.

Application-Specific-Instruction Processors (ASIP)

An application-specific instruction processor (ASIP) can serve as
a compromise between the other processor options.
• An ASIP is a programmable processor used for applications like
embedded control, digital signal processing, telecommunications,
control applications.
• ASIPs fill the architectural spectrum between General Purpose
Processors and Application Specific Integrated Circuits (ASICs) .
• The need for an ASIP arises when the traditional general purpose
processor are unable to meet the increasing application needs.
• ASIPs incorporate a processor and on-chip peripherals, demanded
by the application requirement, program and data memory.
• Example is DSP.

Single-purpose processor-Hardware

A single-purpose processor is a digital circuit designed to execute
exactly one program.
• An embedded system designer may create a single-purpose
processor by designing a custom digital circuit.
• Alternatively, the designer may purchase a predesigned single
purpose processor.
• This part of the implementation is referred to as the “hardware”
portion.
• Other common terms include co-processor, accelerator, and
peripheral.
• Examples are JPEG codec, DMA Controller, Fourier transformer.
Drawbacks
• It requires High design time.
• It provides low flexibility.
• It provides Higher NRE cost.
Benefits
• It is faster compared to General purpose processor.
• It requires Low power.
• It is in smaller size.

General-purpose processors-Software

General-purpose processors-Software
• The designer of a general-purpose processor, or microprocessor,
builds a variety of applications.
• An embedded system designer simply uses a general-purpose
processor, by programming the processor’s memory to carry out
the required functionality.
• This part of an implementation is known as software.
• Using a general-purpose processor in an embedded system results
in several design metric benefits.
• Examples are Microprocessors, Microcontrollers.
Drawbacks
• It has Low performance.
• It has Increased Size and power.
Benefits
• It provides Low time-to-market and NRE costs.
• It provides High flexibility.

Design challenges in Embedded System Design/Design metrics of ES

Unit cost: The monetary cost of manufacturing each copy of the
system, excluding NRE cost.
NRE cost: NRE cost (Non-Recurring Engineering cost): The
monetary cost of designing the system. Once the system is
designed, any number of units can be manufactured without
incurring any additional design cost (hence the term “non
recurring”).
Size: the physical space required by the system, often measured in
bytes for software, and gates or transistors for hardware.
Performance: The execution time or throughput of the system.
Time-to-market: The amount of time required to design and
manufacture the system to market.
Time-to-prototype: The amount of time required to build a
working version of the system.
Type and amount of hardware: Hardware based on SOC or VLSI
design has very high NRE cost. Design challenge is to select most
appropriate type and amount of hardware.
Power dissipation: Power is energy dissipated per second. An
Embedded system may need to be run continuously, without being
switched off. It’s a design challenge.
Energy Consumption: Challenge is to optimize the energy
consumption by appropriate hardware and software design.
Flexibility: Flexibility in design at little cost overhead is a
challenge. A product needs to process flexible features.
Ability to upgrade: Ability to upgrade the design while keeping the
cost minimum and without any significant engineering cost is a
challenge.
Reliability: Designing a reliable product need appropriate design,
thorough testing, verification and validation.

Embedded System Components

An embedded system has three main components embedded into
it:
• Hardware.
• System software.
• Real Time Operating System.
Hardware:-
• It embeds hardware similar to a computer. The processors may be
embedded processor cores.
• The hardware includes embedded memory, peripheral and input-
output devices. It embeds main application software.
• The application software may perform concurrently multiple tasks.
• The figure shows the components of embedded system hardware.

System Software:-
• Software usually embeds in the ROM, flash memory or media
card.
• The system most often does not have a secondary hard disk and
CD memory as in a computer.
Real Time Operating System:-
• It embeds a Real Time Operating System. The RTOS supervises
the application software and controls the access to system
resources.
• It enables finishing the execution of the tasks of a program within
specified time intervals.
• Examples QNX, ThreadX, Embedded Linux RTOS.