Performance Engineering Platform

Fixstars AIBooster

Install on your GPU server and let it gather runtime data from your AI workloads. It identifies bottlenecks, automatically enhances performance, and gives you detailed insights to achieve even greater manual optimization.

Request Free Consultation

You can download the Quick Start Guide, case studies, and other resources here.

What is Fixstars AIBooster ?

Whether in the cloud or on-premises, simply install Fixstars AIBooster on your GPU servers to gather detailed performance data from active AI workloads, visualizing bottlenecks clearly.

Use these insights to drive performance improvements, creating a continuous cycle of monitoring and optimization—accelerating AI training and inference while significantly reducing infrastructure costs.

Free for permanent use

Performance Observability

Monitors and visualizes performance of AI training and inference.
Identifies bottlenecks and performance issues.

Paid with free trial

Performance Intelligence

Provides a suite of tools for automatic acceleration based on collected performance observation data.
Based on data provided by Performance Observability, users can manually accelerate their AI workloads for further performance improvements.

Processing Speed up to

0.0

Faster
(based on our actual project)

GPU Costs up to

Savings
(based on our actual projects)

Free for permanent use

Performance Observability

Visualization of Hardware Usage
Aggregates GPU, CPU, memory, NIC, PCIe, storage, and other information, managing it as time-series data.
Visualization of AI Workload
Samples AI workloads at the function and thread level, managing this information as time-series data.

Continuous Monitoring of Hardware Usage and AI Workloads

Efficiently collects hardware and AI workload data as time-series.
Supports multiple platforms (AWS, Azure, GCP, and on-premises), seamlessly monitoring diverse system architectures in one place.

GPU Utilization GPU SM Activity Network Send Bandwidth Network Recv Bandwidth Storage Write Bandwidth Storage Read Bandwidth CPU Utilization Memory Bandwidth L2 Cache Hit Ratio L3 Cache Hit Ratio

Profiling of running applications.

Continuously saves flame graphs, breaking down application processing time to visualize internal processing details.
Identifies which functions or libraries in the program are bottlenecks.
Analyzes differences in application configurations under varying hardware utilization conditions.

Request Free Consultation

Paid with free trial

Performance Intelligence

Workflow

1
Data Analysis
- Calculates training efficiency (identifies potential for acceleration)
- Identifies areas needing acceleration from performance data
2
Acceleration
- Provides a suite of tools for automatic acceleration based on performance analysis.
- Offers necessary documentation to assist users in achieving manual acceleration.
+α

Performance Engineering Services (Contact us for details)
Fixstars acceleration experts will improve your performance based on AIBooster analysis data, tailored to your environment and requirements.

Example of acceleration methods

Model optimization
Hyperparameter tuning
Selecting optimal parallelization techniques
Kernel parameter optimization
Communication library optimization
File systems optimized for the workload
Improving memory efficiency
OS, system and driver optimizations

Performance Engineering Cycle

Performance is not constant—it evolves due to new model adoption, parameter changes, and infrastructure updates. By continuously running the performance improvement cycle, you can prevent degradation and always achieve peak performance.

Factors Contributing to Performance Degradation

Adoption of New Models/Methods
Updates to Transformer architectures and multimodalization change computation patterns, disrupting the balance of GPU utilization and memory bandwidth.
Changes in Hardware Configuration/Cloud Plans
Changes in instance types, price revisions, and region migrations can make previously cost-optimized configurations obsolete, leading to over-provisioning or performance bottlenecks.
Library/Framework Updates
Version updates of CUDA, cuDNN, PyTorch, etc., can alter internal algorithms and memory management, causing unexpected increases in latency or deterioration of memory footprint.

By incorporating a continuous performance engineering cycle, you can consistently achieve optimal performance.

Software Configuration

An example of a multi-node configuration

Fixstars AIBooster (AIBooster) consists of two main components:

AIBooster Agent: Collects performance telemetry data from individual nodes.
AIBooster Server: Stores data and provides clear visualizations via an intuitive dashboard.

Typically, one AIBooster Server is installed on a management node, while multiple AIBooster Agents run on compute nodes. This configuration enables comprehensive monitoring of multiple nodes from a single management point, visualizing performance across your entire infrastructure on one unified dashboard.

You can centrally visualize server groups distributed across multiple locations, whether in a multi-cloud environment with multiple cloud vendors or a hybrid environment combining on-premise and cloud.

For simpler setups, AIBooster also supports a local configuration, where both Server and Agent run together on a single node.

Software Configuration Example

Single Node Configuration 1 - Direct AIBooster Usage on a Local Workstation

Install both AIBooster Server and AIBooster Agent on a single GPU-equipped workstation or server. Connect a monitor and check performance information directly via the dashboard. This setup offers the quickest route when you want to "just try running it" on offline test machines or benchmarking systems. No network configuration is required.

Single Node Configuration 2 - Multi-User Performance Dashboard Viewing

Install both AIBooster Server and AIBooster Agent on a single GPU-equipped workstation or server.
Users can access the dashboard provided by the server through their personal PCs using a browser via TCP port 3000.
This configuration is ideal for small-scale proof-of-concept (PoC) projects requiring dashboard viewing by multiple users.

Multi-Node Configuration 1 - Centralized Performance Monitoring with Dedicated Management Node

Install AIBooster Server on a dedicated management node, and install AIBooster Agent on each GPU compute node.
Users can access the management node’s dashboard from their personal PCs via a browser through TCP port 3000.
This configuration is recommended for most GPU cluster server systems.

Multi-Node Configuration 2 - GPU Compute Node Serving as Performance Monitoring Node

If there is no dedicated management node, select one GPU-equipped node and install both AIBooster Server and its own AIBooster Agent. Install only the AIBooster Agent on the remaining GPU-equipped nodes.
Users can access the dashboard provided by the GPU node with AIBooster Server installed, via their personal PCs through a browser using TCP port 3000.

FAQ

Q. What's the overhead of Fixstars AIBooster?

The software runs as a Linux daemon, meaning it's always active with minimal overhead. We refer to it as having "near-zero overhead."

Q. What's the supported environment?

It runs on Debian-based Linux environments. We have verified operation on Ubuntu 22.04 LTS. It can also run without an NVIDIA GPU, but the available data and functionality will be limited.

Q. What features are free?

Fixstars AIBooster is free to use. However, the Performance Intelligence (PI) feature is available at no cost for the first month after activation and becomes a paid feature thereafter. Please refer to the Fixstars AIBooster's End User License Agreement for details.

Q. Does Fixstars collect any user-specific data?

Fixstars does not collect user-specific data (such as your application data or detailed analysis results). We only gather general usage statistics for product improvement purposes. Contact us for more details.

Q. What is different from other APM tools?

Traditional tools (e.g., DataDog, NewRelic) show hardware utilization, but Fixstars AIBooster additionally captures detailed AI workload data. It analyzes this data to identify and resolve performance bottlenecks.

Q. How does AIBooster improve performance?

It optimizes performance by analyzing data from Performance Observability (PO). This includes changing infrastructure configurations, tuning parameters, and optimizing source code to maximize GPU utilization.

Q. What is the different from other profiling tools?

Profiling tools (like NVIDIA Nsight) capture "snapshots" triggered by specific commands. In contrast, AI Booster continuously captures detailed performance data, enabling historical analysis and identification of performance degradation. AI Booster’s automatic acceleration suggestions and implementations are unique features.

Q. Is AIBooster applicable beyond GenAI and LLMs?

Yes. Because the underlying technology is broadly applicable, other AI or GPU-accelerated workloads can also benefit. The exact improvements depend on your specific workload—please contact us for details.

Any other questions? Please contact us.

Performance Engineering Platform

Fixstars AIBooster