Behind the Birth of M³

The Growth of the Multi-Core Processor

Demand for multi-core processing based devices (Source: IHS iSuppli)


Enjoying rich multimedia applications on PC’s and mobile devices has become a part of our everyday lives. Subsequently, with the emergence of 3D games and video player applications that demand increasingly higher definition in video and quality of audio, the need for CPUs to process data efficiently has never been greater. Furthermore, in industries where competitive advantages are gained (and lost) based on processing speed, there is no limit to the need for increased CPU performance, and there will always be a demand for CPU efficiency.

In order to respond to these demands for increased performance, semiconductor manufacturers continued to increase the operational capabilities of CPUs during the early half of the 2000’s. But around the time that processing frequencies began exceeding the 4 GHz mark, the problems arising from increased power consumption and the associated rising costs became evident. Consequently, the rate of increase in performance that was seen in the past was no longer possible. As a result, in order to make a breakthrough in further increasing performance, semiconductor manufacturers began moving towards the development of multi-core processors to harness the power of several cores to process data in parallel formation.

Today, multi-core processors are used in PCs and servers, but also widely used even in smartphones, tablets, and other mobile devices. The data above was gathered by one research company, and shows the annual increase in the number of multi-core processor units delivered in select industries. Based on this research, from 2012 onward, it is expected that the shipment of multi-core processors will increase by 40% annually.

The Increasing Complexity of the Programming Environment

For hardware manufacturers, the change to multi-core processing was the logical next step to continue improving performance, but it was not as beneficial for software developers. In order to develop software that took full advantage of multi-core processing, the serial programs that had been used in the past had to be rewritten to take advantage of parallel program coding. It is evident that the adoption rate by software developers of parallel program coding has been lacking even in the latter half of the 2000’s. Fundamentally, if software is developed on a single-core platform, even if an upgrade is made to the latest CPU, there will not necessarily be an increase in the performance speed. In order to take advantage of the full capabilities of the newest CPUs, the software must be programmed to operate using multi-cores in parallel.

Although we can talk indiscriminately about “parallel computing”, the implications can vary widely based on the individual layers of the hardware. The diagram below illustrates the layered structure of parallel processing. Let’s say we took four workstations right now to construct a cluster system in order to increase the image processing speed.

Complication of multicore systems

  1. At the top layer, the system would be configured into various calculators (i.e. nodes), to which each task or data would be partitioned. Therefore, a mechanism that performs the operation in parallel would be needed.
  2. Next, as each node possesses a processor with several operating cores, it is necessary even within each core to partition the processing in parallel operations.
  3. With SIMD (Single Instruction Multiple Data), core technologies are now also capable of processing multiple data with a single command through the inclusion of vector functional units, which require “operation parallelism”.
  4. Furthermore, when specialized processors like GPUs and FPGAs are incorporated into a system as accelerators, it is necessary to optimize in response to the required architecture.

In this way, in order to maximize the full potential of the system’s capabilities, there are numerous items to take into consideration. And this is where the greatest problem for software developers lies; dealing with the various parallel programming models at each level. As an example, the parallel node technology is not particularly useful in parallel processing. For each layer, the framework and a coding language exists to simplify software development, but in order to learn each and then to further develop software that incorporates these is a massive undertaking. The code becomes increasingly complex, and the items to test increase exponentially.

When using specialized processors as accelerators, a whole new set of issues can arise. Specialized processors with distinct capabilities usually require software to be coded in specific languages. Software developed using these specific languages will not run on other processors. Consequently, in the future, when the next generation of an existing product requires a new hardware architecture that differs from the current product, it may become necessary to completely rewrite the software from scratch. This may delay the product’s market launch, increase cost, and in the end result in a decline of the product’s competitiveness.

The Future with M³

With M³, in order to eliminate these issues, we have suppressed the complicated parallel programming environment, to offer a user-friendly interface. Software developers can drastically reduce the costs associated with developing for parallel processing and hardware optimization, and essentially focus on writing good code and creating new applications.

Along with the parallel programming framework, M³ also offers libraries for image processing, arithmetic operations, and utility functions optimized for different architecture. By using these libraries, developers can efficiently develop applications across a variety of different platforms.

For inquiries regarding M³

Please contact us here.