Upcoming Webinar: "Maximizing LLM Inference Throughput"

Strategies for Low Latency and High Efficiency

January 22, 2026 Events

Seminar Overview

This session focuses on the critical shift from AI training to high-efficiency inference, featuring updated insights and live Q&A with Fixstars engineers.

As LLMs move into production, inference performance becomes the primary driver of both user experience and operational costs. In this webinar, Fixstars engineers will demonstrate how to transition from "standard" inference to "high-speed" execution by tackling the "Memory Wall" and compute bottlenecks. We will explore low-level optimizations including custom GPU kernels, efficient memory handling, and advanced decoding techniques.

The session will feature real-world optimization scenarios, demonstrating how to achieve significant reductions in TTFT (Time To First Token) and increases in total throughput. We will showcase the Fixstars AIBooster platform’s capabilities in visualizing inference bottlenecks and automating the path to optimal performance. A demo will guide you through the process of pr As LLMs move into production, inference performance becomes the primary driver of both user experience and operational costs. In this webinar, Fixstars engineers will demonstrate how to transition from "standard" inference to "high-speed" execution by tackling the "Memory Wall" and compute bottlenecks. We will explore low-level optimizations including custom GPU kernels, efficient memory handling, and advanced decoding techniques.
ofiling an LLM workload, identifying latency spikes, and applying targeted optimizations to squeeze every bit of performance out of your hardware.

Speaker

Miho Yoneda

Agenda

Challenges in LLM Inference
Optimization Techniques: Software & Hardware
Case Study & Benchmark
Live Q&A Session

Total 45 minutes

* You can enter and leave the session at any time.
* Please note that the schedule and content may change without prior notice.

What is AIBooster?

AIBooster is a performance engineering platform for continuously observing and improving the performance of AI workloads.

Through comprehensive dashboards, users can visualize the utilization efficiency of various hardware resources—including CPU, GPU, interconnect, and storage—as well as identify software bottlenecks to analyze AI workload performance characteristics. Furthermore, by applying optimization frameworks specifically designed for AI workloads, users can achieve efficient performance improvements.

Learn more about AIBooster here.

Date and time

Thursday, February 26, 2026
12:00 PM - 12:45 PM PDT

Location

Zoom

Target Audience

AI Service Architects: Engineers responsible for deploying LLMs into production environments.
Inference Infrastructure Engineers: Professionals focused on scaling AI workloads and minimizing latency.
Technical Decision Makers: Leaders looking to optimize GPU cloud costs and improve AI service profitability.
MLOps Engineers: Those interested in integrating performance monitoring and optimization into the CI/CD pipeline.

Participation fee

Free

Share this article: