M3 for Batch Processing
Expanding the scope of Asakusa Framework™, further speeding up and reducing TCO
M3 for Batch Processing (hereinafter referred to as M3 for BP) is a framework for high - speed processing of tasks expressed in the form of Directed Acyclic Graph (DAG). It is provided as the execution platform of "Asakusa Framework™ (*1)" for efficiently developing and operating batch processing of business systems.
It is optimized to maximize execution performance in multi-core/multiprocessor environments on a single node, realizing faster operation batch processing and lower TCO at a higher level.
(*1) Asakusa Framework™ is a framework for performing large-scale core batch processing on the parallel distributed processing infrastructure, and it is open source software. Asakusa Framework™ supports Hadoop® MapReduce, Spark™ and M3 for BP as its parallel distributed processing engine.
Some batch processing of the business system takes place over several hours to several tens of hours in the host, the mainframe, and the database. In order to shorten the processing time in this area and to cope with the increase in throughput, there was no choice but an expensive scale-up solution until several years ago.So, as a more cost-effective scale-out solution, developed by Nautilus Technologies Inc. is Asakusa Framework, a framework for leveraging Hadoop® MapReduce and Spark™, which is a distributed processing infrastructure.
Asakusa Framework has shortened the batch processing time that took tens of hours to several tens of minutes. However, MapReduce targets TB to PB, Spark targets large data of hundreds of GB to TB, and small to medium data (several GB to tens of GB) is not the target.
Since most of business data such as backbone system is this small medium size data, even if Asakusa Framework is applied to batch processing, the overhead associated with distributed processing has become relatively large, and there was a limit in speedup . In addition, MapReduce and Spark needed to prepare multiple nodes, the difficulty of operation was an obstacle to adopt.
With the latest two-way server, more than 80 CPU cores and terabyte-class memory can be mounted on a single node. In the past, it became possible to execute data on a single node which could not be processed unless it was decentralized using multiple nodes. So, in order to solve these problems, M3 for BP was jointly developed by Fixstars and Nautilus Technologies .
Fixstars has been specializing in programs executed between heterogeneous platforms = "heterogeneous computing" and has been developing software for Cell Broadband Engine™ and GPUs. For OpenCL, which was created for heterogeneous computing, Fixstars has been providing compiler products and application development services from early on, as well as programming seminars and writing books for software developers.
Asakusa on M3BP, the cost-effective framework
Asakusa on M3BP provides a set of functions to use M3 for BP as its execution platform for batch applications created using the Asakusa Framework development base including Asakusa DSL. In Asakusa on M3BP, by using M3 for BP as the execution platform, it becomes possible to process the business batch for Apache Hadoop and Spark described using Asakusa Framework at high speed on a single node. Especially for small to medium size data, it was able to achieve higher performance than Spark though it was a single node.
Comparison of batch processing time using actual business data revealed that the Asakusa Framework took several hours in relational database was less than 40 minutes in combination with MapReduce, less than 4 minutes in Spark case, And in the case of M3 for BP, it was shortened to less than 2 minutes. By using Asakusa on M3BP in this way, it achieves about double the performance with the number of nodes of 1/5 as compared with the case of using Asakusa on Spark in the real application, together with about 10 times cost effectiveness It is.