Performance Engineering Platform

What is Fixstars AIBooster ?

Whether in the cloud or on-premises, simply install Fixstars AIBooster on your GPU servers to gather detailed performance data from active AI workloads, visualizing bottlenecks clearly.

Free for permanent use

Performance Observability

Monitors and visualizes performance of AI training and inference.
Identifies bottlenecks and performance issues.

Paid with free trial

Performance Intelligence

Provides a suite of tools for automatic acceleration based on collected performance observation data.
Based on data provided by Performance Observability, users can manually accelerate their AI workloads for further performance improvements.

Processing Speed up to

0.0

Faster
(based on our actual project)

GPU Costs up to

Savings
(based on our actual projects)

Free for permanent use

Performance Observability

Visualization of Hardware Usage
Aggregates GPU, CPU, memory, NIC, PCIe, storage, and other information, managing it as time-series data.
Visualization of AI Workload
Samples AI workloads at the function and thread level, managing this information as time-series data.

Continuous Monitoring of Hardware Usage and AI Workloads

Efficiently collects hardware and AI workload data as time-series.
Supports multiple platforms (AWS, Azure, GCP, and on-premises), seamlessly monitoring diverse system architectures in one place.

GPU Utilization GPU SM Activity Network Send Bandwidth Network Recv Bandwidth Storage Write Bandwidth Storage Read Bandwidth CPU Utilization Memory Bandwidth L2 Cache Hit Ratio L3 Cache Hit Ratio

Profiling of running applications.

Continuously saves flame graphs, breaking down application processing time to visualize internal processing details.
Identifies which functions or libraries in the program are bottlenecks.
Analyzes differences in application configurations under varying hardware utilization conditions.

Free Download

Paid with free trial

Performance Intelligence

Workflow

1
Data Analysis
- Calculates training efficiency (identifies potential for acceleration)
- Identifies areas needing acceleration from performance data
2
Acceleration
- Provides a suite of tools for automatic acceleration based on performance analysis.
- Offers necessary documentation to assist users in achieving manual acceleration.
+α

Performance Engineering Services (Contact us for details)
Fixstars acceleration experts will improve your performance based on AIBooster analysis data, tailored to your environment and requirements.

Examples of Training Acceleration

Hyperparameter Tuning (Learn more)
Model Compression
Applying appropriate parallelization methods for AI models
Optimizing communication library parameters
Improving memory bandwidth efficiency through re-computation

Examples of Inference Acceleration

Fully Automated Inference Acceleration (Learn more)
Automatic Mixed Precision Quantization

Fully Automated Inference Acceleration Tool (AcuiRT)

Challenges in Accelerating Deep Learning Model Inference on NVIDIA GPUs

Complex Model Structures: The latest AI models have massive and intricate architectures.
Limitations of Manual Optimization: Manually converting every pattern is too time-consuming and impractical.
Need for Specialized Knowledge: Deep technical knowledge and experience with GPUs and TensorRT are required.

AcuiRT fully automates the conversion of AI models built with PyTorch into TensorRT. It dramatically reduces development time and boosts inference speed without requiring specialized expertise.

Learn more about AcuiRT

Automated Optimization Process

PyTorch Model
Complex multi-module structure
Automatic Structure Analysis
Automatically understands the module structure
Step-by-Step Optimization
Executes optimization completely automatically
Optimized Model
Immediately ready for use

Performance Engineering Cycle

Performance is not constant—it evolves due to new model adoption, parameter changes, and infrastructure updates. By continuously running the performance improvement cycle, you can prevent degradation and always achieve peak performance.

Factors Contributing to Performance Degradation

Adoption of New Models/Methods
Updates to Transformer architectures and multimodalization change computation patterns, disrupting the balance of GPU utilization and memory bandwidth.
Changes in Hardware Configuration/Cloud Plans
Changes in instance types, price revisions, and region migrations can make previously cost-optimized configurations obsolete, leading to over-provisioning or performance bottlenecks.
Library/Framework Updates
Version updates of CUDA, cuDNN, PyTorch, etc., can alter internal algorithms and memory management, causing unexpected increases in latency or deterioration of memory footprint.

By incorporating a continuous performance engineering cycle, you can consistently achieve optimal performance.

Proven Performance Improvements

Broadcasting Company - LLM 70B Continued Pre-training

Telecom Company - LLM 70B Continued Pre-training

LLM7B Model Training

LLM Single-batch Inference

LLM Multi-batch Inference

Note: These results include both automatic accelerations by Fixstars AIBooster and additional manual accelerates based on collected performance data.

Free Download

Software Configuration

AIBooster consists of two main components:

AIBooster Agent
The Agent is a Linux application that you install on the GPU compute nodes you manage. It collects performance data from each node and sends it to the Server. It doesn't matter whether the compute nodes are on the cloud or on-premises.

AIBooster Server
The Server stores the received data and provides a dashboard for easy data visualization. By simply accessing the dashboard from your browser, you can monitor the performance of each compute node.

There are two ways to use the AIBooster Server:

Cloud-based Server
You can use the AIBooster Server managed by us on the cloud.
No installation is required, so you can start using it immediately. We provide a dedicated login page for each customer.
On-premises Server
You can also install and run the AIBooster Server on a Linux server in your own on-premises environment.
Please contact us for more details.

AIBooster supports multi-cloud environments and server clusters distributed across multiple locations. From a single dashboard, you can view the status of your entire system, detailed information for each node, and even detailed information for each compute job.

Software Configuration Example

When using the AIBooster Server provided by Fixstars on the cloud:
Ideal for those who want to get started easily and immediately.
When using the AIBooster Server in your own on-premises environment:
Ideal for those who cannot use external services due to security reasons.

This option is for those who want to get started easily and immediately, without needing to install a management server.

You will install the AIBooster Agent on each GPU compute node.
The management dashboard is provided as a web application on the cloud, managed by Fixstars.
To begin, you will create an account and enter your user information on the management screen. A dedicated URL will be issued for you to access the dashboard through your browser.

This option is for those who need to build everything on-premises and cannot use external services for security reasons.

You will designate one management node and install the AIBooster Server on it, and then install the AIBooster Agent on each GPU compute node.
From your personal computer, you can view the dashboard provided by the management node through a browser via TCP port 3000.
This is the recommended configuration for most GPU cluster server systems.

FAQ

Q. What's the overhead of Fixstars AIBooster?

The software runs as a Linux daemon, meaning it's always active with minimal overhead. We refer to it as having "near-zero overhead."

Q. What's the supported environment?

It runs on Debian-based Linux environments. We have verified operation on Ubuntu 22.04 LTS. It can also run without an NVIDIA GPU, but the available data and functionality will be limited.

Q. What features are free?

Fixstars AIBooster is free to use. However, the Performance Intelligence (PI) feature is available at no cost for the first month after activation and becomes a paid feature thereafter. Please refer to the Fixstars AIBooster's End User License Agreement for details.

Q. Does Fixstars collect any user-specific data?

Fixstars does not collect user-specific data (such as your application data or detailed analysis results). We only gather general usage statistics for product improvement purposes. Contact us for more details.

Q. What is different from other APM tools?

Traditional tools (e.g., DataDog, NewRelic) show hardware utilization, but Fixstars AIBooster additionally captures detailed AI workload data. It analyzes this data to identify and resolve performance bottlenecks.

Q. How does AIBooster improve performance?

It optimizes performance by analyzing data from Performance Observability (PO). This includes changing infrastructure configurations, tuning parameters, and optimizing source code to maximize GPU utilization.

Q. What is the different from other profiling tools?

Profiling tools (like NVIDIA Nsight) capture "snapshots" triggered by specific commands. In contrast, AIBooster continuously captures detailed performance data, enabling historical analysis and identification of performance degradation. AIBooster’s automatic acceleration suggestions and implementations are unique features.

Q. Is AIBooster applicable beyond GenAI and LLMs?

Yes. Because the underlying technology is broadly applicable, other AI or GPU-accelerated workloads can also benefit. The exact improvements depend on your specific workload—please contact us for details.

Any other questions? Please contact us.

Fixstars AIBooster

What is Fixstars AIBooster ?

Performance Observability

Performance Intelligence

Performance Observability

Performance Intelligence

Data Analysis

Acceleration

Performance Engineering Services (Contact us for details)

Examples of Training Acceleration

Examples of Inference Acceleration

Hyperparameter Tuning Tool (ZenithTune)

Fully Automated Inference Acceleration Tool (AcuiRT)

PyTorch Model

Automatic Structure Analysis

Step-by-Step Optimization

Optimized Model

Performance Engineering Cycle

Adoption of New Models/Methods

Changes in Hardware Configuration/Cloud Plans

Library/Framework Updates

Proven Performance Improvements

Software Configuration

Software Configuration Example

This option is for those who want to get started easily and immediately, without needing to install a management server.

This option is for those who need to build everything on-premises and cannot use external services for security reasons.

FAQ

Performance Engineering with
Fixstars AIBooster

Performance Engineering Platform

Fixstars AIBooster

What is Fixstars AIBooster ?

Performance Observability

Performance Intelligence

Performance Observability

Performance Intelligence

Data Analysis

Acceleration

Performance Engineering Services (Contact us for details)

Examples of Training Acceleration

Examples of Inference Acceleration

Hyperparameter Tuning Tool (ZenithTune)

Fully Automated Inference Acceleration Tool (AcuiRT)

PyTorch Model

Automatic Structure Analysis

Step-by-Step Optimization

Optimized Model

Performance Engineering Cycle

Adoption of New Models/Methods

Changes in Hardware Configuration/Cloud Plans

Library/Framework Updates

Proven Performance Improvements

Software Configuration

Software Configuration Example

This option is for those who want to get started easily and immediately, without needing to install a management server.

This option is for those who need to build everything on-premises and cannot use external services for security reasons.

FAQ

Performance Engineering withFixstars AIBooster

Performance Engineering with
Fixstars AIBooster