Rethinking SoC architecture around I/O, real-time tasks, and debugging

Dave Alsup By Dave Alsup
Innovasic Semiconductor

System-on-Chip (SoC) devices present great integration, but the search for a device often hinges on just one missing feature or performance attribute. A device that combines the benefits of I/O programmability, optimized real-time performance, and easier debugging has emerged from thinking about SoC architecture.

When defining an SoC for a product, designers must make a make/buy decision. They can choose to make a custom ASIC or an FPGA-based solution, or buy an existing IC that meets the system’s needs. Each choice presents some challenges.

Building a custom ASIC is a huge undertaking. Even with purchased IP, integration, meeting time constraints, and adding all the features not covered with the available IP are substantial tasks, not to mention getting the part itself produced. An FPGA solution presents many of the same difficulties. Production is less of a problem, but component cost is still generally higher.

In addition to developing code for an ASIC or FPGA, designers have to come up with a tools solution, software libraries, and other pieces to complete the environment.

Alternatively, designers can purchase an off-the-shelf part already in production. Off-the-shelf solutions, however, pose other issues, including finding a part that has the required feature set and that will be in production long enough to support the final product’s expected lifetime.

Another concern with off-the-shelf solutions is that they either don’t have all of the features required, thus forcing additional peripheral ICs/FPGAs, or they have large blocks with no particular use, adding cost, complexity, and power consumption.

For many years, parts have been available with some kind of programmable peripheral blocks, such as complex timer units, communications engines, and more. These parts provide a solution when that one feature is needed but often fall short when multiple interfaces of various types are needed.

Rethinking I/O
With the limitations of most off-the-shelf parts as well as the benefits of an FPGA in mind, Innovasic Semiconductor created a new SoC architecture to address the needs of the industrial market.

The flexible input deterministic output (fido) 1100 architecture extends the model for an SoC, providing four programmable I/O modules known as Universal I/O Controllers (UICs) and some other basic blocks, including timers and A/D functions, and a powerful 32-bit microprocessor. Each I/O module has 18 bidirectional pins and a dedicated RISC engine optimized for I/O management that can share time between up to four independent threads of execution plus blocks for SERDES, CRC generation, and similar functions.

A single UIC can support 100 Mb Ethernet, two UARTs, CAN, one UART and SPI, SPI and a custom timer function, or other combinations as shown in Figure 1. The functionality in a particular system is determined by loading a specific set of UIC firmware. These I/O modules are interfaced to the main processor through the peripheral management unit, which allows the user to allocate buffer space to each UIC as appropriate for the protocols instantiated on it.

Sorry, this area is temporarily unavailable. We are aware of this problem and hope to restore access soon.

A single UIC can support 100 Mb Ethernet, two UARTs, CAN, one UART and SPI, SPI and a custom timer function, or other combinations<MM:DUMMY_IP></MM:DUMMY_IP>
Figure 1 (click to zoom)

This approach allows the system designer to select the specific set of UIC firmware to tailor the functionality of the part to the system requirements. Specialized functionality can be programmed into custom UIC firmware to eliminate the need for external logic and/or reduce the load on the main processor. Now designers can have a customized solution without waiting for silicon or designing and integrating a design in an FPGA.

“Each Hardware Context can be put in a specific mode to allow it to operate either as a full thread, optimized interrupt-handling context, or dedicated, single-purpose thread.”

Embedded software complexity
Designing or selecting a chip is only one part of the problem. To complete the system, software must be developed. As the complexity of SoC hardware increases, so does the complexity of the associated software. With this complexity comes increased difficulty in developing a quality system on time.

A typical system performs multiple functions. Some of these functions are tightly coupled, and some have nothing to do with others. Some functions are time critical, such as impacting latency or jitter, and others are not. Some run at high frequencies, some low; some are managed with only a few hundred lines of code, some take hundreds of thousands; some are written from scratch, while others are off-the-shelf; some functions are critical for overall system performance, others less so.

These multitudes of software classes further complicate the problem of development. How do designers guarantee that small, timing-critical functions will occur on time with little understanding of third-party code and without time to thoroughly test if it is taking full advantage of the Real-Time Operating System (RTOS) used? What guarantees safety-critical functions will execute when interrupt-masking critical sections are scattered throughout the code?

Or, perhaps the system isn’t all that complicated. Say the design combines a handful of simple functions. The team isn’t happy about having to bite off a full-blown RTOS to get them to play together but doesn’t want to write a custom executive.

Rethinking the kernel
Again, the fido1100 has a set of features that can help design teams manage this complexity. The main 32-bit processor on the part includes an RTOS Kernel in a Chip.

The architecture has taken some of the core RTOS functionality and moved it into hardware, as shown in Figure 2. These functions include:

  • A set of five Hardware Contexts – the equivalent of the data stored by an RTOS during a thread switch (user registers, stack pointer, program counter, status register)
  • A preemptive priority-based scheduler to select one of the five contexts for execution
  • A rich set of timers and coordination primitives for implementing semaphores with hardware priority inheritance, signals, and so on

Sorry, this area is temporarily unavailable. We are aware of this problem and hope to restore access soon.

The architecture has taken some of the core RTOS functionality and moved it into hardware<MM:DUMMY_IP></MM:DUMMY_IP>
Figure 2 (click to zoom)

Other features of the part such as the Memory Protection Unit (MPU) and interrupt management are fully integrated into the hardware kernel, allowing interrupts to be directed to any of the five contexts. DMA transfers are paused when a high-priority context executes without any software intervention, and MPU settings are applied based on the current context or DMA channel. These features can be used in a number of ways depending on the nature of the system at hand.

Each Hardware Context can be put in a specific mode to allow it to operate either as a full thread (like a virtual copy of the CPU), optimized interrupt-handling context (faster interrupt handling, no pushing/popping of registers, status, and program counter necessary), or dedicated, single-purpose thread (no interrupts, operates in a loop). Now system software can be organized to optimize performance and reliability using features such as the following examples:

  • A handful of threads in Hardware Contexts with some very fast and deterministic interrupt handling in a dedicated context.
  • A complete multithreading RTOS in one Hardware Context, including timer interrupts, I/O, and other features, then use the other Hardware Contexts for a handful of safety-critical and/or highly deterministic operations.
  • An RTOS or stand-alone TCP/IP stack in a Hardware Context to perform general application/Ethernet processing, such as an HTTP server, with a dedicated stack installed to handle high-priority, real-time Ethernet transmissions in another context. And, with the programmability of the UIC, priority filtering can be done at the MAC layer supporting multiple transmit and receive queues (high/low-priority queues).
    A third-party application in a dedicated context, using hardware timers to guarantee that it doesn’t exceed its allocated execution time on a periodic basis or per activation. Also, using the MPU guarantees that any bad behavior does not result in corrupted data/stack for the other contexts.

In one example, a quality, commercial embedded TCP/IP stack was used to perform both high- and low-priority Ethernet communications on a fido1100. However, because of the quantity of low-priority traffic, the jitter in the high-priority communications exceeded 1 ms. In another context that allowed a high- and low-priority instantiation of the stack and modified UIC firmware to provide separate high-priority receive and transmit queues, jitter was reduced to less than 140 ms on a 66 MHz system. Figure 3 shows an example of how one fido can be configured to meet multiple system I/O needs.

Sorry, this area is temporarily unavailable. We are aware of this problem and hope to restore access soon.

an example of how one fido can be configured to meet multiple system I/O needs<MM:DUMMY_IP></MM:DUMMY_IP>
Figure 3 (click to zoom)

Debugging still needed
Once the system is designed, it still must be made to work dependably. Another critical component is the debug environment. Beyond the standard issues of breakpoints and source-level debuggers, a complex SoC has additional features that place a burden on the integrator, such as the tremendous number of control and status registers for all the peripherals.

While any standard debugger will allow the user to view or modify memory-mapped registers as a bunch of hexadecimal values, this is a time-consuming and error-prone task. Innovasic, in partnership with CodeSourcery, provides a quality debug environment that includes contextualized browsing of all control and status registers on the part. It makes quick work of double-checking initialization software or even modifying values on-the-fly to tune an interface.

Another problem exacerbated by short schedules and third-party or legacy code is trying to understand what is going on in the software. For this reason, the fido1100 includes hardware support for tracing code execution and data.

Consider the previous Ethernet example. Suppose that the system is set up and everything looks fine, except that about every two or three days, a delayed packet ruins jitter numbers. Enabling the call trace feature on fido helps capture a good set of transactions, and the challenge is to wait until the software can detect one of the delayed packets and disable trace. Trace data is captured in system memory and downloaded through the standard debug environment (JTAG, serial, or Ethernet), meaning experiments can be run with the processor in the full-up system without any dedicated cables and boxes to capture the trace. The trace data can be downloaded as needed. Viewing the good and bad traces side by side quickly shows what is causing the rogue jitter.

Architecture rethought
Rethinking the SoC architecture to provide programmable I/O, better handling of real-time tasks, and easier debugging creates a better device for designers. The fido1100 provides a flexible solution to system designers who need a specific or custom I/O set, deterministic real-time response, reliability, and rapid system development.

Industry News:
System-on-chip
Technology Partnerships:
System-on-chip
Contracts:
System-on-chip
New Products:
System-on-chip
People:
System-on-chip
Mergers and Acquisitions:
System-on-chip
Conferences and Awards:
System-on-chip
Media and Education:
System-on-chip
Standard Certifications and References:
System-on-chip



©MMIX Industrial Embedded Systems. An OpenSystems Media publication.

About this Magazine and Website | Contact Us | Industrial Embedded Systems Media Kits