|
Rethinking SoC architecture around I/O, real-time tasks, and debugging
System-on-Chip (SoC) devices present great integration, but the search for a device often hinges on just one missing feature or performance attribute. A device that combines the benefits of I/O programmability, optimized real-time performance, and easier debugging has emerged from thinking about SoC architecture.
Building a custom ASIC is a huge undertaking. Even with purchased IP, integration, meeting time constraints, and adding all the features not covered with the available IP are substantial tasks, not to mention getting the part itself produced. An FPGA solution presents many of the same difficulties. Production is less of a problem, but component cost is still generally higher. In addition to developing code for an ASIC or FPGA, designers have to come up with a tools solution, software libraries, and other pieces to complete the environment. Alternatively, designers can purchase an off-the-shelf part already in production. Off-the-shelf solutions, however, pose other issues, including finding a part that has the required feature set and that will be in production long enough to support the final product’s expected lifetime. Another concern with off-the-shelf solutions is that they either don’t have all of the features required, thus forcing additional peripheral ICs/FPGAs, or they have large blocks with no particular use, adding cost, complexity, and power consumption. For many years, parts have been available with some kind of programmable peripheral blocks, such as complex timer units, communications engines, and more. These parts provide a solution when that one feature is needed but often fall short when multiple interfaces of various types are needed. Rethinking I/O The flexible input deterministic output (fido) 1100 architecture extends the model for an SoC, providing four programmable I/O modules known as Universal I/O Controllers (UICs) and some other basic blocks, including timers and A/D functions, and a powerful 32-bit microprocessor. Each I/O module has 18 bidirectional pins and a dedicated RISC engine optimized for I/O management that can share time between up to four independent threads of execution plus blocks for SERDES, CRC generation, and similar functions. A single UIC can support 100 Mb Ethernet, two UARTs, CAN, one UART and SPI, SPI and a custom timer function, or other combinations as shown in Figure 1. The functionality in a particular system is determined by loading a specific set of UIC firmware. These I/O modules are interfaced to the main processor through the peripheral management unit, which allows the user to allocate buffer space to each UIC as appropriate for the protocols instantiated on it. Sorry, this area is temporarily unavailable. We are aware of this problem and hope to restore access soon. Figure 1 (click to zoom) This approach allows the system designer to select the specific set of UIC firmware to tailor the functionality of the part to the system requirements. Specialized functionality can be programmed into custom UIC firmware to eliminate the need for external logic and/or reduce the load on the main processor. Now designers can have a customized solution without waiting for silicon or designing and integrating a design in an FPGA.
Embedded software complexity A typical system performs multiple functions. Some of these functions are tightly coupled, and some have nothing to do with others. Some functions are time critical, such as impacting latency or jitter, and others are not. Some run at high frequencies, some low; some are managed with only a few hundred lines of code, some take hundreds of thousands; some are written from scratch, while others are off-the-shelf; some functions are critical for overall system performance, others less so. These multitudes of software classes further complicate the problem of development. How do designers guarantee that small, timing-critical functions will occur on time with little understanding of third-party code and without time to thoroughly test if it is taking full advantage of the Real-Time Operating System (RTOS) used? What guarantees safety-critical functions will execute when interrupt-masking critical sections are scattered throughout the code? Or, perhaps the system isn’t all that complicated. Say the design combines a handful of simple functions. The team isn’t happy about having to bite off a full-blown RTOS to get them to play together but doesn’t want to write a custom executive. Rethinking the kernel The architecture has taken some of the core RTOS functionality and moved it into hardware, as shown in Figure 2. These functions include:
Sorry, this area is temporarily unavailable. We are aware of this problem and hope to restore access soon. Figure 2 (click to zoom) Other features of the part such as the Memory Protection Unit (MPU) and interrupt management are fully integrated into the hardware kernel, allowing interrupts to be directed to any of the five contexts. DMA transfers are paused when a high-priority context executes without any software intervention, and MPU settings are applied based on the current context or DMA channel. These features can be used in a number of ways depending on the nature of the system at hand. Each Hardware Context can be put in a specific mode to allow it to operate either as a full thread (like a virtual copy of the CPU), optimized interrupt-handling context (faster interrupt handling, no pushing/popping of registers, status, and program counter necessary), or dedicated, single-purpose thread (no interrupts, operates in a loop). Now system software can be organized to optimize performance and reliability using features such as the following examples:
In one example, a quality, commercial embedded TCP/IP stack was used to perform both high- and low-priority Ethernet communications on a fido1100. However, because of the quantity of low-priority traffic, the jitter in the high-priority communications exceeded 1 ms. In another context that allowed a high- and low-priority instantiation of the stack and modified UIC firmware to provide separate high-priority receive and transmit queues, jitter was reduced to less than 140 ms on a 66 MHz system. Figure 3 shows an example of how one fido can be configured to meet multiple system I/O needs. Sorry, this area is temporarily unavailable. We are aware of this problem and hope to restore access soon. Figure 3 (click to zoom) Debugging still needed
Another problem exacerbated by short schedules and third-party or legacy code is trying to understand what is going on in the software. For this reason, the fido1100 includes hardware support for tracing code execution and data. Consider the previous Ethernet example. Suppose that the system is set up and everything looks fine, except that about every two or three days, a delayed packet ruins jitter numbers. Enabling the call trace feature on fido helps capture a good set of transactions, and the challenge is to wait until the software can detect one of the delayed packets and disable trace. Trace data is captured in system memory and downloaded through the standard debug environment (JTAG, serial, or Ethernet), meaning experiments can be run with the processor in the full-up system without any dedicated cables and boxes to capture the trace. The trace data can be downloaded as needed. Viewing the good and bad traces side by side quickly shows what is causing the rogue jitter. Architecture rethought |
|






