A New Standard for Multiproces

A New Standard for Multiprocessing DSP Systems

Today's applications for computer technology are more demanding than ever before, and no where is this more evident than in the world of digital signal processors (DSPs). DSP chip manufacturers and board vendors are responding to this need with products that possess an inherent ability to support scalable, multiprocessing system architectures. These systems are powerful enough to address the most demanding DSP applications of today and provide a clear upgrade path for the future.

The most common DSP applications involve a completely embedded system that is dedicated to performing a particular task in real time. As with all computer applications, speed is of the essence both for processing and data I/O. In addition, constraints on available space and power may further complicate the picture. Requirements such as these present a significant challenge that DSP chip designers and board vendors must face head on.

Multiprocessing computing has emerged as the only viable way of addressing a vast assortment of high-end DSP applications. Applications that require multiprocessing computing performance include medical, 3D graphics acceleration, military, industrial control, high end audio, telephony, and wireless communications infrastructure. The metrics that designers of these systems judge DSP performance by are processing per dollar, processing per watt, and processing per area.

The real advantage of multiprocessing systems is the ability to tune the performance and cost of a system to yield the required functionality and processing performance. This feature of multiprocessing system architectures is known as scalability. A scalable architecture allows users to tune performance based on the number of processing nodes required.

In fact, the ability of a multiprocessing system to scale is not just convenient, it's required; and the ability to design scalable multiprocessing systems starts at the chip level. The new ADSP-21160 SHARC processor, with its large internal memory blocks, multiple internal bus structure, and integrated I/O subsystem, possesses all of the features necessary to build multiprocessing systems that provide true scalability to any number of processors. Like its predecessor (the ADSP-21060 SHARC), the ADSP-21160 is setting the pace for high-performance multiprocessing floating-point DSP systems.

Defining Multiprocessing Systems

Processor technology has progressed at such a fast rate over the past decade that most of us cannot even remember how impressed we were with the power of our new 80286 PCs in the late 80's. But while chip manufacturers have awed us with their huge strides in processor technology, they have also exposed the performance limitations inherent in single-processor systems. Thus it is not surprising that high-performance system designers have started using aggregates of processors to build more powerful (multiprocessing) computing systems.

Figure 1. Multiprocessing Systems:(top) shared memory, (left) distributed memory, and (right) shared & distributed memory

This trend has become very apparent in the embedded DSP industry. As DSP applications become more and more demanding, board-level suppliers are responding with PCI and VME system components that squeeze larger numbers of processors into smaller spaces. Packaging technology has played a role here, as DSP chip-level manufacturers developed smaller package sizes that are relatively easy to cool.

However, a system of multiple processors cannot be considered truly multiprocessing based solely on the fact that more than one processor is used. The term multiprocessing implies that the processors in the system are able to work together, in an efficient manner, to perform the required calculations. This means that the exchange of data between processors is critical, and an effective multiprocessing DSP must possess a means for achieving this data transfer.

The SHARC processor family has answered this challenge with an internal I/O processor (or DMA engine) that allows data communication to occur without impeding the progress of the processing core. As a result, every time a SHARC processor is added to a multiprocessing network, both processing horsepower and data communication bandwidth are increased. This feature of the SHARC family, together with its unique link-port architecture, is one of the most important ingredients in its ability to support multiprocessing system design.

The new ADSP-21160 has increased the number of DMA channels over those available in the first generation ADSP-21060 SHARC from 10 to 14. This allows for a separate independent DMA channel for the transmit and receive buffers of the 2 serial ports, the 6 bi-directional link port buffers, and 4 bi-directional external port buffers. With these enhanced DMA capabilities, the ADSP-21160 has the flexibility to support a variety of scalable multiprocessing system architectures.

The Link Port Architecture

Multiprocessing system architectures come in two basic flavors: shared memory and distributed memory. The ADSP-21160 SHARC possesses built-in features that allow it to gluelessly support both of these architectures, as well as architectural hybrids. The key lies in the ADSP-21160's unique link-port architecture.

In shared memory systems, every processor has access to a global memory block (made up of internal and external memory) with processors exchanging data via a shared bus. This approach is reminiscent of traditional single-processor programming since all of the data is located in a single memory block. However, the shared memory architecture lacks the inherent ability to scale, since the addition of each new device on the bus decreases the average bus bandwidth available to each processor.

The SHARC family of processors gets around this issue through the use of dedicated data communication ports known as link ports. Link ports provide high-bandwidth, point-to-point connections between processors for the sole purpose of inter-processor communication. This allows the ADSP-21160 to support a distributed memory architecture in which all inter-processor communication takes place over the links, leaving the full bandwidth of the data bus for servicing external memory and I/O peripherals. Distributed memory architectures are truly scalable, and they allow users to configure very large scale multiprocessing networks using a natural mesh-like architecture.

One of the key strengths of the link-port architecture, however, is that system designers are not forced to choose between shared and distributed memory. Architectural hybrids combining these two philosophies are easy to construct, allowing users to glove-fit their system to their application.

The ADSP-21160's ability to support these multiple system architectures is another key aspect of system scaling. System designers are provided the freedom to easily tune their system's form and functionality as well as processing performance.

A Balanced Approach

It is well known that the most serious problem facing multiprocessing DSP system and chip-level designers is data flow. In order for a DSP to even approach its peak computational performance, it must be fed with a constant stream of data. This means that a multiprocessing system's ability to route data among the various nodes in the system is equally as important as its ability to process the data.

Early multiprocessing system architectures suffered from the malady of having high theoretical MFLOPS numbers but very few usable MFLOPS. This came about as a result of attaching rather inefficient communication engines to very high-performance RISC-style processors that were not designed to be used in multiprocessing systems. The result was a sub-linear scaling characteristic in which system performance increased only slightly as processors were added to the system.

The SHARC processing family, on the other hand, places its I/O subsystem and processing core on equal footing, creating a balance between processing and data routing efficiency. This balance allows the DSP application to supply the 21160's high-speed SIMD core with a constant stream of data, resulting in a nearly linear scaling over a wide range of system sizes.

Of course taking advantage of this balance in an actual application is a software development function as well. Third party board-level and software vendors supply software development tools for the SHARC family of processors to simplify this effort.

Today a variety of native and portable programming tools are available including SHARC-specific run-time environments and industry-standard real-time operating systems. These products not only simplify the task of targeting an application at a multiprocessing network, they also help programmers to take full advantage of the SHARC's balanced hardware design and squeeze the maximum performance out of their embedded SHARC systems.

COTS and the Multiprocessing DSP

Multiprocessing digital signal processing systems have become common place in a wide variety of military and commercial applications including RADAR, SONAR, industrial control, image processing and telecommunications. As the need for higher speed and more compact systems arises, multiprocessing DSP systems will become even more widespread.

One the fastest growing opportunities for the ADSP-21160 processor is the commercial off the shelf (COTS) board-level vendor market. The use of COTS products has become a mandate (over custom board developments) for both military and commercial users. Fueling this trend is a need for lower product development costs and a faster time to market.

The SHARC family, with its unique multiprocessing architecture, lends itself very well to the development of modular system components that can be used together to build high-end multiprocessing systems with essentially any performance and functionality characteristics. This flexibility is the key feature required by COTS customers.

Potential application areas for the 21160 appear boundless, with opportunities in many different markets. Regardless of the application, however, it is clear that the multiprocessing movement is here to stay in the embedded DSP marketplace. The ADSP-21160 SHARC from Analog Devices has secured a position to lead this processing revolution into the next millennium, truly setting a new standard for multiprocessing digital signal processing.

Q:当今计算机技术应用对数字信号处理器(DSP)有哪些新要求?
A:如今计算机技术应用比以往要求更高,在DSP领域体现为需要具备支持可扩展、多处理系统架构能力的产品,要满足处理和数据I/O速度要求,还要应对可用空间和功率的限制。
Q:多处理计算在DSP应用中有哪些优势?
A:能满足医疗、3D图形加速、军事、工业控制、高端音频、电话和无线通信基础设施等众多高端DSP应用需求,可根据系统所需功能和处理性能调整性能和成本,具有可扩展性。
Q:新的ADSP - 21160 SHARC处理器有哪些特点?
A:具有大内部存储块、多内部总线结构和集成I/O子系统,能构建提供真正可扩展性的多处理系统,增加了DMA通道数量,从第一代ADSP - 21060 SHARC的10个增加到14个。
Q:多处理系统有哪两种基本架构?
A:共享内存架构和分布式内存架构。
Q:共享内存架构有什么缺点?
A:缺乏固有的可扩展性,总线上每增加一个新设备,每个处理器可用的平均总线带宽就会降低。
Q:分布式内存架构有什么优点?
A:真正可扩展,允许用户使用自然的类似网格的架构配置非常大规模的多处理网络。
Q:ADSP - 21160 SHARC的链路端口架构有什么优势?
A:能无缝支持共享内存和分布式内存架构以及架构混合体,系统设计人员无需在共享和分布式内存之间做选择,可轻松构建混合架构以适配应用。
Q:多处理DSP系统和芯片级设计人员面临的最严重问题是什么?
A:数据流问题。多处理系统在各节点间路由数据的能力与处理数据的能力同样重要,早期多处理系统架构存在理论MFLOPS数高但可用MFLOPS少的问题。
Q:SHARC处理家族如何解决数据流动问题?
A:将I/O子系统和处理核心置于平等地位,在处理和数据路由效率之间取得平衡,使DSP应用能为21160的高速SIMD核心提供持续数据流,实现接近线性的扩展。
Q:ADSP - 21160处理器在商业现货(COTS)板级供应商市场的发展机遇如何?
A:这是其增长最快的机会之一,COTS产品因降低产品开发成本和加快上市时间的需求,已成为军事和商业用户的必备选择,SHARC家族的独特多处理架构很适合开发模块化系统组件,应用领域广阔。

share