Performance Engineering Platform

What is Fixstars AIBooster ?

Simply install Fixstars AIBooster on your GPU servers to gather detailed performance data from active AI workloads and clearly visualize bottlenecks.

Performance Observability

Monitors and visualizes performance of AI training and inference.
Identifies bottlenecks and performance issues.

Performance Intelligence

Provides a suite of tools for automatic acceleration based on collected performance observation data.
Based on data provided by Performance Observability, users can manually accelerate their AI workloads for further performance improvements.

Processing Speed up to

x5.0

Faster
(based on our actual project)

GPU Costs up to

80%

Savings
(based on our actual projects)

Performance Observability

Continuous Monitoring of Hardware Usage and AI Workloads

Efficiently collects hardware and AI workload data as time-series.
Supports multiple platforms (AWS, Azure, GCP, and on-premises), seamlessly monitoring diverse system architectures in one place.

Free Download

Performance Intelligence

Workflow

1

Data Analysis

Calculates training efficiency (identifies potential for acceleration)
Identifies areas needing acceleration from performance data

2

Acceleration

Provides a suite of tools for automatic acceleration based on performance analysis.
Offers necessary documentation to assist users in achieving manual acceleration.

+α

Performance Engineering Services (Contact us for details)

Fixstars acceleration experts will improve your performance based on AIBooster analysis data, tailored to your environment and requirements.

Examples of Training Acceleration

MHyperparameter Tuning (Learn more)
Model Compression
Applying appropriate parallelization methods for AI models
Optimizing communication library parameters
Improving memory bandwidth efficiency through re-computation

Examples of Inference Acceleration

Fully Automated Inference Acceleration (Learn more)
Automatic Mixed Precision Quantization

Hyperparameter Tuning Tool (ZenithTune)

We provide the ZenithTune library, which helps you achieve peak performance with minimal coding, unlocking your application's full potential.

Learn more about ZenithTune

Optimization history plot

Fully Automated Inference Acceleration Tool (AcuiRT)

Challenges in Accelerating Deep Learning Model Inference on NVIDIA GPUs

Complex Model Structures: The latest AI models have massive and intricate architectures.
Limitations of Manual Optimization: Manually converting every pattern is too time-consuming and impractical.
Need for Specialized Knowledge: Deep technical knowledge and experience with GPUs and TensorRT are required.

AcuiRT fully automates the conversion of AI models built with PyTorch into TensorRT. It dramatically reduces development time and boosts inference speed without requiring specialized expertise.

Learn more about AcuiRT

Automated Optimization Process

PyTorch Model

Complex multi-module structure

Automatic Structure Analysis

Automatically understands the module structure

Step-by-Step Optimization

Executes optimization completely automatically

Optimized Model

Immediately ready for use

Performance Engineering Cycle

Performance is not constant—it evolves due to new model adoption, parameter changes, and infrastructure updates. By continuously running the performance improvement cycle, you can prevent degradation and always achieve peak performance.

Factors Contributing to Performance Degradation

Adoption of New Models/Methods
Updates to Transformer architectures and multimodalization change computation patterns, disrupting the balance of GPU utilization and memory bandwidth.
Changes in Hardware Configuration/Cloud Plans
Changes in instance types, price revisions, and region migrations can make previously cost-optimized configurations obsolete, leading to over-provisioning or performance bottlenecks.
Library/Framework Updates
Version updates of CUDA, cuDNN, PyTorch, etc., can alter internal algorithms and memory management, causing unexpected increases in latency or deterioration of memory footprint.

By incorporating a continuous performance engineering cycle, you can consistently achieve optimal performance.

Proven Performance Improvements

Broadcasting Company - LLM 70B Continued Pre-training
Telecom Company - LLM 70B Continued Pre-training
LLM7B Model Training
LLM Single-batch Inference
LLM Multi-batch Inference

Note: These results include both automatic accelerations by Fixstars AIBooster and additional manual accelerates based on collected performance data.

Free Download

Software Configuration

AIBooster consists of two main components:

AIBooster Agent
The Agent is a Linux application that you install on the GPU compute nodes you manage. It collects performance data from each node and sends it to the Server. It doesn't matter whether the compute nodes are on the cloud or on-premises.

AIBooster Server
The Server stores the received data and provides a dashboard for easy data visualization. By simply accessing the dashboard from your browser, you can monitor the performance of each compute node.

AIBooster supports multi-cloud environments and server clusters distributed across multiple locations. From a single dashboard, you can view the status of your entire system, detailed information for each node, and even detailed information for each compute job.

FAQ

The software runs as a Linux daemon, meaning it's always active with minimal overhead. We refer to it as having "near-zero overhead."

Any other questions? Please contact us.

Fixstars AIBooster

What is Fixstars AIBooster ?

Performance Observability