Rock-Solid-State Era is Coming

fixSeptember 15, 2015 – Sunnyvale, CA – Fixstars Solutions Inc., an innovator in flash storage solutions, is launching its official tech blog named  “Rock-Solid-State” .

As more and more businesses such as server manufacturers, HD/4K Video recorders, big data analytics are moving its storage solutions from traditional hard disk (HDD) to flash-memory-based solid-state-drive (SSD), data storing and processing became unprecedentedly reliable and fast. Since the price for both enterprise as well as consumer-grade SSDs keeps going down over the last few years, flash storage devices have grown into a reasonable choice for both businesses and individuals.

However, the concept of flash storage is still new to the mass of the population, remaining as a superior but somehow mysterious storage option in many people’s opinion. Under such context, Fixstars’ “Rock-Solid-State” is created to answer the frequently-asked-questions about flash memory and SSDs that readers might have. Rock-Solid-State covers a wide variety of topics ranging from the advantage points of SSD over HDD, how to read a SSD’s specification sheet, to more technical readings such as understanding the NAND flash memory. This way, Team Fixstars is striving to work together with the audience to explore the emerging era of flash storage.


Lenovo Japan Inc. Started the Adoption of “Fixstars SSD-3000M” and “Fixstars SSD-6000M” as Its Product Options

* This press release was published by Fixstars Corporation, Tokyo, Japan, and translated into English.

Fixstars Corporation (HQ: Tokyo, CEO: Satoshi Miki, Hereafter: Fixstars) today announced that Lenovo Japan Corporation (HQ: Tokyo, CEO: Masanobu Todome, Hereafter: Lenovo) started adopting “Fixstars SSD-3000M” and “Fixstars SSD-6000M” as their Vendor Logo Hardware (VLH).

Lenovo, which has an extensive experience in providing severs focusing on data center solutions, is now collaborating with Fixstars to provide “Fixstars SSD-3000M” and “Fixstars SSD-6000M” as VLH products of Lenovo’s System x3550 M5 and System x3650 M5’s. The integration of Lenovo servers and Fixstars SSDs contributes to higher storage density and power efficiency, which is an ideal solution for data centers.


“Lenovo Japan is pleased to work with Fixstars Corporation”, said Taiko Kobayashi, the Executive Director of Enterprise business group at Lenovo Japan. “The enterprise demand for flash storage is expected to increase significantly from now on. This option has a great value for our clients who seek high density, high reliability and low cost internal flash storages. Now we are able to integrate 6TB 2.5 inch SSDs, which is the highest density flash storage in the industry, into the highly reliable Lenovo System x.”

“I am pleased that Fixstars SSD-3000M and Fixstars SSD-6000M are approved as the VLH options for System x3550 M5 and System x3650 M5, which are Lenovo’s main product lines”, said Satoshi Miki, CEO of Fixstars Corporation. “One of the unique characteristics of our SSD is its consistent high performance of sequential access, which is an important feature for data logging and general data warehouse applications. Our SSD can contribute to accelerate your enterprise business. Also, Fixstars SSD-6000M can realize 156TB*1 in 2U height, and 3PB in one server rack (42U). This highest density and highest performance solution is a real game changer.”

Fixstars keeps contributing to accelerate our client’s business with original solutions based on cutting edge technology.

*1 With System x3650 M5 having front 24 slots and rear 2 slots.

About Fixstars

Fixstars Solutions is an innovator in flash storage solutions devoted to “Speed up your Business”. Combining expertise in multi-core processors programming and the use of next generation memory technology, Fixstars provides the best performance and the highest capacity storage solutions.

Learn more about how Fixstars Solutions can accelerate your business in life science, manufacturing, finance, and media & entertainment, visit

Contact information

Big data division

Director Masana Murase



Fixstars Solutions Inc. Achieves a Microsoft Silver Cloud Platform Competency


Fixstars earns distinction through demonstrated technology success and customer commitment.

Sunnyvale, CA, USA — Aug 3, 2015 — Fixstars Solutions Inc., a global leader in providing parallel processing technology, today announced it has achieved a Silver Cloud Platform competency, demonstrating its ability to meet Microsoft Corp. customers’ evolving needs in today’s dynamic business environment. To earn a Microsoft silver competency, partners must successfully demonstrate expertise through rigorous exams, culminating in Microsoft certifications. And to ensure the highest quality of services, Microsoft requires customer references for successful implementation and customer satisfaction.

Fixstars Solutions has achieved a track record of numerous successful projects with hyper-scale cluster server systems. For example, a life science application was ported and optimized for the Microsoft Azure running 64-bit Linux in IaaS, and the processing time was dramatically reduced by effectively utilizing over 5000 CPU cores. With the extensive knowledge of parallel processing technology, Fixstars Solutions provides services to port and optimize software working on a local system to the Microsoft Azure.

“This Microsoft Silver Cloud Platform competency showcases our expertise in today’s technology market and demonstrates our knowledge of Microsoft and its products,” said Akihiro Asahara, CEO of Fixstars Solutions Inc. “Our plan is to accelerate our customers’ success by serving as technology advisors for their business demands.”

“By achieving a silver competency, organizations have proven their expertise in specific technology areas, placing them among the top 5 percent of Microsoft partners worldwide,” said Phil Sorgen, corporate vice president, Worldwide Partner Group at Microsoft Corp. “When customers look for an IT partner to meet their business challenges, choosing a company that has attained Microsoft competencies is a smart move. These are highly qualified professionals with access to Microsoft technical support and product teams.”


About Cloud Platform Competency and the Microsoft Partner Network

Attaining the Microsoft Cloud Platform competency demonstrates partner expertise in building, integrating and/or extending Windows-based applications and infrastructure solutions in the cloud using the Microsoft Azure cloud platform. With more than 57 percent of the Fortune 500 using Microsoft Azure, the Microsoft Cloud Platform competency can help partners take advantage of the growing demand for infrastructure and software as a service solutions. Equipped with exclusive training, partners can help customers deploy solutions that increase customer productivity and profitability.

The Microsoft Partner Network helps partners strengthen their capabilities to showcase leadership in the marketplace on the latest technology, to better serve customers and to easily connect with one of the most active, diverse networks in the world.

About Fixstars

Fixstars Solutions is a technology company devoted to our philosophy “Speed up your Business”. Through its software parallelization/optimization expertise, its highly effective use of multi-core processors, and application acceleration for the next generation of flash memory technology that delivers high speed IO as well as power savings, Fixstars Solutions provides “Green IT”, while accelerating customers’ business in various fields.

Learn more about how Fixstars Solutions can accelerate your business in life science, manufacturing, finance, and media & entertainment. For more information, visit .


Fixstars Will See You at the Embedded Systems Conference

Fixstars will be hosting Booth 56 at ESC 2015

July 20, 2015 – Sunnyvale, CA – As a leading player in high-speed storage, Fixstars Solutions, Inc. is going to participate in this year’s Embedded Systems Conference (ESC) held from July 20th to 22nd, during which Fixstars’ booth will be open on the 21st and 22nd.

In early May, Fixstars Solutions has just released SSD-6000M, the world’s first 6TB*1 Solid State Drive (SSD).  Fixstars’ SSD series aim to provide a reliable data-storage solution to a wide variety of users in video recording, medical imaging, big data analysis, network infrastructure, and industrial applications. SSD-6000M not only stands as the largest capacity 2.5″ SSD ever, but also provides consistently high-performance throughout its lifetime. With persistent read speeds of up to 540 MB/s, and write speeds of up to 520MB/s, the SSD-6000M will serve as the best partners for individuals and enterprises seeking sleek and high-capacity storage devices with great reliability.




During ESC, Fixstars will be hosting product exhibition as well as receiving customer inquiries at booth 56, everyone is welcomed to come by and visit Fixstars at the site.

The ESC Silicon Valley 2015 Conference & Demo Hall will be held at:

Santa Clara Marriott, 2700 Mission College Blvd, Santa Clara, California 95054

*1 1TB=1,000,000,000,000 Bytes. The actual user space will be smaller.



Fixstars Releases 6 Terabyte SSD, the World’s Largest 2.5” SATA SSD

Speed Up your Application with Consistent, Fast Sequential Throughput

May 7, 2015 – Sunnyvale, CA – Fixstars Solutions, Inc. has released the world’s first 6TB*1 Solid State Drive (SSD), the Fixstars SSD-6000M, which is the largest capacity SSD*1 in 2.5″ x 9.5mm form factor. Fixstars is now accepting orders that will be shipped to customers in the United States in late July.


Fixstars SSD-6000M

The world’s first 6TB drive is built on cutting edge 15nm MLC Flash Memory which has been packed into a dense 2.5″ form factor.

The Fixstars SSD-6000M supports SATA 6Gbps, providing read speeds of up to 540 MB/s, and write speeds of up to 520MB/s for sequential access. As with the Fixstars SSD-3000M, the proprietary SSD controller enables a stable, high I/O performance for sequential access throughout the lifetime of the drive. This characteristic has proven to be highly effective in numerous areas such as video recording, medical imaging, big data analysis, network infrastructure, and industrial applications.

Satoshi Miki, the CEO of Fixstars Corporation, offered the following comment: “The unparalleled performance of our previous model’s (The SSD-3000M) sequential I/O helped propel our SSDs and garner lots of attention. Since many of our customers desire even greater capacity, I am excited to offer a new solution and grow the product line with the inclusion of the larger SSD-6000M.” He continues, “Since our SSD’s capacity is now able to compete with high-end hard drives, we feel our product can draw the attention of data centers as well.”

For more detailed information regarding Fixstars’ SSD series, please visit

*1 1TB=1,000,000,000,000 Bytes. The actual user space will be smaller.
*2 As of May 7, 2015

Fixstars Announces World’s First, Portable, 12TB SSD, 4Bay, 2.5” Storage Solution

April 13, 2015 – Sunnyvale, CA – Fixstars Solutions Inc., an innovator in flash storage solutions, today announced the release of the world’s highest density external storage solution, totaling 12TB of storage with high I/O speeds in an ultra-small footprint.

This solution combines Fixstars’ SSD-3000M and Akitio’s Thunder2QUAD mini. Fixstars SSD-3000M is a 3TB SATA SSD, which is the world’s highest capacity 2.5” SATA SSD. The Akitio Thunder2QUAD mini is an external storage enclosure that has two ThunderboltTM2 ports with a size of just 3.75”x4.5”x7.5”. This product is designed for video professionals looking for a compact, high capacity enterprise solution.

“This new product is a real game changer”, said Richard Wright, VP of Sales & Marketing for Akitio. “12 Terabytes of portable storage is truly unheard of in the current market. You would need to carry around a couple of 8 bay units to get 12 Terabytes of data and due to the size and weight of those units they certainly would not be very portable.”

Another great feature of this solution is its high bandwidth. The host server’s Thunderbolt2 ports can reach speeds of up to 1.38GB per second. “ThunderboltTM delivers unparalleled performance, flexibility and simplicity to personal computing.” said Jason Ziller, Intel’s director of Thunderbolt Marketing, “Products like the Akitio Thunder2 Quad mini helps highlight what Thunderbolt makes possible.”

“This solution is ideal for M&E professionals”, Satoshi Miki, CEO & Co-Founder of Fixstars Corporation (Tokyo), “On most ordinary SSD devices, the peak advertised performance is not consistently attainable and continuous writing will often suffer from fluctuating performance. The unique design of our SSD controller ensures consistent high throughput over the lifespan of the device.”

The Akitio Thunder2 QUAD mini with Fixstars SSD-3000M is now available at with the 12TB version retailing for $12,999.

About Fixstars
Fixstars Solutions is an innovator in flash storage solutions devoted to “Speed up your Business”. Combining expertise in multi-core processors programming and the use of next generation memory technology, Fixstars provides the best performance and the highest capacity storage solutions.

About Akitio
More information about the Thunder2 QUAD mini as well as the complete line of Akitio products can be found at or by contacting


* Thunderbolt and the Thunderbolt logo are trademarks of Intel Corporation in the US and other countries


[Tech Blog] FlashAir: Uploading to Google Drive with LUA



In this example, we’re going to show you how to upload an image to Google Drive using LUA, and Google’s device API. The idea is for it to be run from something with limited input capabilities, such as Toshiba’s “FlashAir” SD card. Unfortunately we’ll still need a device with full input capabilities to set everything up, but once the setup is complete you’re left with a headless system that can upload indefinitely.


1. Setting up the Project in Google

The first step is to create a project in Google’s “Developers Console” Once this is done, enable the Google Drive API and SDK.

Next, go to the “Credentials” section (under “APIs & auth”) in your project and select “Create new client ID”. Choose “Installed application”, then “Other” for installed application type.

2. Authorizing Our Device

Now that that’s setup, we need to authorize our device with Google – which is a two-step process. First we need to send a post request to “”. Set the content type to “application/x-www-form-urlencoded”, with two fields client_id (the Client ID we generated above, under “Client ID for native application”), and scope (which should be set to “”). You can use several tools to accomplish this – but the chrome extension “postman” makes it super easy.


POST /o/oauth2/device/code HTTP/1.1
Cache-Control: no-cache
Content-Type: application/x-www-form-urlencoded

client_id={Your client ID}

NOTE: Google tells you to use “” for scope, but this will return a “Invalid_scope: Not authorized to request the scope”. Using /feeds/ instead will grant us the Google Drive authorization we need.

Now for part two! The response will contain a user_code, and a verification_url. Navigate to that url (it’s probably then enter the user_code. Hold on the device code too!

Example Response:

"device_code": {Device code},
    "user_code": {Your user code},
    "verification_url": "",
    "expires_in": 1800,
    "interval": 5

3. Getting a Permanent Refresh Token

Now that everything’s properly authorized by Google, we still need to get a refresh token that our app is going to use. We’re going to use that to get yet another token, the temporary “auth” token, which will actually let us upload to Google Drive. There are a lot of tokens. To get the refresh token, you send a post like the following (line breaks added for readability):

POST /o/oauth2/token HTTP/1.1
Cache-Control: no-cache
Content-Type: application/x-www-form-urlencoded

client_id={Your full client ID}&
client_secret={Your client secret}&
code={Your device code]&

The grant type is actually “” however it gets switched to %’s.

You should receive something that looks like:

"access_token": {Your access token here},
    "token_type": "Bearer",
    "expires_in": 3600,
    "refresh_token": {Your refresh token here}

Finally, we can start scripting our headless upload! The main token we really need is the refresh_token. The access_token will work for a short time, but it will expire. With the refresh token we can always get a fresh access code when we need to (which is pretty constantly, as the authorization doesn’t last long).

4. Required Lua Imports

The only two Lua library imports which will be used are ssl.https and JSON which is a stand alone file which can be found at

Example code:

local https = require 'ssl.https'
local JSON = require 'JSON'

We will follow these by creating local variables containing all the necessary information.

Example code:

-- Basic account info
local client_id = “{Your full client ID}”
local client_secret = “{Your client secret}”
local refresh_token = “{Your refresh token}”
-- For refresh
local scope = ""
local response_type = "code"
local access_type = "offline"

5. Using the Refresh Token to Re-authenticate

Before we can upload anything, chances are we’re going to need to re-authenticate with the server, so lets put together a getAuth() method first. In our method we will set the message to be sent, find it’s length (as this is a required parameter), then make the https request. The response will arrive as an “Array” but it’s actually all one big string for the first and sole value as Lua can’t natively decode JSON. Conveniently we imported the JSON library earlier, so we can use that to parse it into a table and retrieve our new access token.

Example function:

local function getAuth()

  -- Set our message
  local mes="client_id="..client_id

  local length = string.len(mes)
  print("Sending: ["..mes.."]")
  print "\n"
  b, c, h = fa.request{
    url = "",
    headers = {
        ["Content-Type"] = "application/x-www-form-urlencoded",
        ["Content-Length"] = length,
    method = "POST",

  local tempTable = {}

  tempTable = cjson.decode(b)

  access_token = tempTable["access_token"]

6.The Lua Function that Does the Upload

Now that we have a new access token, we’re ready to upload! We’ll do this with another https post request using ssl.https. We just need to give it an image file, set our authorization code, and we’re done.


local function uploadTest(token)
  local fileSize = lfs.attributes(filePath,"size")
  b, c, h = fa.request{
    url = "",
    headers = {
      ["Content-Type"] = "image/jpeg",
      ["Content-Length"] = fileSize, -- calculate file size
      ["authorization"] = "Bearer "..token,
    method = "POST",
    --NOTE: You probably want to set your own file here,
    --or maybe even pass it as a parameter!


If you’ve been following our examples up until this point, combining them in the order presented and running the two functions using the following code at the bottom of the file will result in an uploaded file up to your Google drive (see

Final code to run the functions:




[Tech Blog] Accelaration of Collatz conjecture


The Wikipedia entry of the Collatz conjecture describes the simple algorithm used to generate so-called hailstone sequences:

Take any natural number n. If n is even, divide it by 2 to get n / 2. If n is odd, multiply it by 3 and add 1 to obtain 3n + 1. Repeat the process (which has been called “Half Or Triple Plus One”, or HOTPO) indefinitely. The conjecture is that no matter what number you start with, you will always eventually reach 1.

In the examples, it states:

Numbers with a total stopping time longer than any smaller starting value form a sequence beginning with:

1, 2, 3, 6, 7, 9, 18, 25, 27, 54, 73, 97, 129, 171, 231, 313, 327, 649, 703, 871, 1161, 2223, 2463, 2919, 3711, 6171, … (sequence A006877 in OEIS).

By looking up the A006877 sequence (“In the `3x+1′ problem, these values for the starting value set new records for number of steps to reach 1”) in OEIS, and following the link embedded in “T. D. Noe, Table of n, a(n) for n = 1..130 (from Eric Roosendaal’s data)” (under LINKS), one finds this list which — supposedly — lists the numbers with a larger stopping count than of any smaller number:


Here is the naive implementation of the hailstone sequence length generation algorithm:

static inline int hailstone(unsigned long n)
    int count = 0;
    while (n > 1)
        if (n & 1)
            n = 3 * n + 1;
            n >>= 1;

    return count;

By mapping this function to the elements of the above list, one obtains the following list:

1: 0
2: 1
3: 7
6: 8
7: 16
9: 19
18: 20
25: 23
27: 111
54: 112
73: 115
97: 118
129: 121
171: 124
231: 127
313: 130
327: 143
649: 144
703: 170
871: 178
1161: 181
2223: 182
2463: 208
2919: 216
3711: 237
6171: 261
10971: 267
13255: 275
17647: 278
23529: 281
26623: 307
34239: 310
35655: 323
52527: 339
77031: 350
106239: 353
142587: 374
156159: 382
216367: 385
230631: 442
410011: 448
511935: 469
626331: 508
837799: 524
1117065: 527
1501353: 530
1723519: 556
2298025: 559
3064033: 562
3542887: 583
3732423: 596
5649499: 612
6649279: 664
8400511: 685
11200681: 688
14934241: 691
15733191: 704
31466382: 705
36791535: 744
63728127: 949
127456254: 950
169941673: 953
226588897: 956
268549803: 964
537099606: 965
670617279: 986
1341234558: 987
1412987847: 1000
1674652263: 1008
2610744987: 1050
4578853915: 1087
4890328815: 1131
9780657630: 1132
12212032815: 1153
12235060455: 1184
13371194527: 1210
17828259369: 1213
31694683323: 1219
63389366646: 1220
75128138247: 1228
133561134663: 1234
158294678119: 1242
166763117679: 1255
202485402111: 1307
404970804222: 1308
426635908975: 1321
568847878633: 1324
674190078379: 1332
881715740415: 1335
989345275647: 1348
1122382791663: 1356
1444338092271: 1408
1899148184679: 1411
2081751768559: 682
2775669024745: 685
3700892032993: 688
3743559068799: 794
7487118137598: 795
7887663552367: 808
10516884736489: 811
14022512981985: 814
19536224150271: 1585
26262557464201: 833
27667550250351: 846
38903934249727: 1617
48575069253735: 1638
51173735510107: 1651
60650353197163: 1659
80867137596217: 1662
100759293214567: 1820
134345724286089: 1823
223656998090055: 1847
397612441048987: 1853
530149921398649: 1856
706866561864865: 1859
942488749153153: 1862
1256651665537537: 1865
1675535554050049: 1868
2234047405400065: 1871
2978729873866753: 1874
3586720916237671: 1895
4320515538764287: 458
4861718551722727: 470
6482291402296969: 473
7579309213675935: 512
12769884180266527: 445
17026512240355369: 448
22702016320473825: 451
45404032640947650: 452
46785696846401151: 738

As it is easily visible, the first list (which is the left column in the second list) is “broken”: up to and including 1899148184679, the sequence lengths (right column in the second list) are indeed monotonically growing, but afterwards there are some numbers which have shorter sequence lengths than previous ones.

The problem becomes recalculating the sequence lengths for all odd numbers above 1899148184679 and checking if the current sequence length is greater than all previous ones.

Unless otherwise noted, tests were run on an Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz with 4 GiB RAM running Ubuntu Linux 14.04

Using the naive implementation, one obtains ~1.37 M numbers checked per second (NCPS).

Using an inlined assembly language implementation of the function

static inline dword hailstone(qword n)
    dword retval;

    asm volatile("mov    %0,%%rax" : : "r"(n) : "rax");
    asm volatile(".intel_syntax noprefix");
    asm volatile("mov    rsi,rax" : : : "rsi");
    asm volatile("xor    r8d,r8d" : : : "r8");
    asm volatile(".align 16");
    asm volatile("top:");
    asm volatile("shr    rax,1" : : : "rax");
    asm volatile("test    rsi,1");
    asm volatile("lea    rsi,[rsi + 2 * rsi + 1]" : : : "rsi");
    asm volatile("cmovz    rsi,rax" : : : "rsi");
    asm volatile("inc    r8d" : : : "r8");
    asm volatile("mov    rax,rsi" : : : "rax");
    asm volatile("cmp    rsi,1");
    asm volatile("jnz    top");
    asm volatile(".att_syntax");
    asm volatile("mov    %%r8d,%0": "=r"(retval));

    return retval;

which uses conditional move instructions to avoid branch misprediction penalties improves the rate to ~2.02 M NCPS — a 48% improvement.

If we’re interested only in the sequence lengths of odd numbers, then the following function can be made to calculate them:

typedef unsigned int dword; 	/* 32 bit unsigned integer */
typedef unsigned long qword; 	/* 64 bit unsigned integer */

static inline dword bsfq(qword n)
    qword retval = -1;
    asm volatile("bsfq        %1,%0" : "=r"(retval) : "r"(n));
    return retval;

static inline int hailstone(unsigned long n)
    int count = 0;
        /* n is odd */
        dword shifter = bsfq(n = 3 * n + 1);
        /* accumulate the number of divisions by 2 and the initial
        tripling + 1 step */
        count += shifter + 1;
        /* n is even; do the required number of divisions by two */
        n >>= shifter;
    while (n > 1);
    return count;

Using this function results in ~6.58 M NCPS — a 380% improvement over the naive implementation.

As all hailstone sequences eventually — presumably — end with 1, at least some of the numbers in the sequence are smaller than the initial number. Accordingly, it is possible to cache (precalculate and store) sequence lengths for some set of numbers (starting with 1) and once a particular sequence reaches a number which is in cache, accumulate the precalculated sequence length into the current count and terminate the calculation.

It is easily seen that numbers in hailstone sequences aren’t divisible by 3. Accordingly, one stores in the cache the sequence lengths of only those numbers that aren’t divisible by 3. This can be done by placing n’s sequence length in the cache at index n/3.

#define    CACHEFILE_SIZE    1012645888

word cache[CACHEFILE_SIZE / sizeof(word)];

void load_cache(void)
    FILE *fin = fopen("./hailstone-cache.bin", "rb");
    if (CACHEFILE_SIZE != fread(cache, sizeof(byte), CACHEFILE_SIZE, fin))
        fprintf(stderr, "Read error on cache file\n");


static inline dword bsfq(qword n)
    qword retval = -1;
    asm volatile("bsfq        %1,%0" : "=r"(retval) : "r"(n));
    return retval;

static inline int hailstone(unsigned long n)
    int count = 0;
        dword shifter = bsfq(n = 3 * n + 1);
        count += shifter + 1;
        n >>= shifter;
        /* here n is odd and not divisible by 3 */
        if (n < sizeof(cache) / sizeof(word) * 3)
            count += cache[n / 3];

            n = 1;
    while (n > 1);
    return count;

Using this ~1 GB sized cache (don’t ask why is the odd size) results in ~17.21 M NCPS — a 1156% improvement over the naive implementation.

The next step is to parallelize the sequence length calculations.

#define qCLKS_PER_SEC	3000000000UL	/* processor clock rate */

static inline qword rdtsc(void) {
    register qword acc128lo asm ("rax");
    register qword acc128hi asm ("rdx");
    asm volatile ("rdtsc" : "=r"(acc128lo), "=r"(acc128hi));
    return (acc128hi << 32) | acc128lo;

/* max_cnt has to be initialized to a nonzero value so it gets put in the DATA
section instead of the COMMON section */
volatile unsigned max_cnt = 1;
unsigned secs = 1;

unsigned word cache[<# of words in cache file>];

typedef struct
    qword starting_n;
    qword current_n;
    qword cnts;
    qword cached_cnts;
    dword increment;
} hailstone_t;

void *hailstone(void *parm)
    qword i;
    ((hailstone_t *)parm)->cnts = 0;
    ((hailstone_t *)parm)->cached_cnts = 0;

    for (i = ((hailstone_t *)parm)->starting_n; 1;
     i += ((hailstone_t *)parm)->increment)
        qword n = i;
        dword cnt = 0;
        ((hailstone_t *)parm)->current_n = n;
            qword shifter, idx;

            n += 2 * n + 1;
            shifter = bsfq(n);
            n >>= shifter;
            cnt += shifter;

            idx = n / 3;
            if (idx < sizeof(cache) / sizeof(*cache))
                cnt += cache[idx];
                ((hailstone_t *)parm)->cached_cnts += cache[idx];

                n = 1;
        while (n != 1);

        ((hailstone_t *)parm)->cnts += cnt;

        if (cnt > max_cnt)
            max_cnt = cnt;
            printf("%lu: %u\n", i, cnt);

Which can be invoked by:

int main(int argc, char *argv[])
    FILE *fin;
    dword nthreads = atoi(argv[1]);
    dword i;
    pthread_t *pThreads;
    hailstone_t *pParms;
    qword next_sec;
    struct timespec req = { 0, 50000000 }, rem;
    if (!nthreads)
        nthreads = 1;
    if (!(fin = fopen(CACHEFILE, "rb")))
        fprintf(stderr, "Couldn't open cache file '%s'\n", CACHEFILE);

    if (1 != fread(cache, sizeof(cache), 1, fin))
        fprintf(stderr, "Read error on cache file '%s'\n", CACHEFILE);

    if (!(pThreads = (pthread_t *)malloc(nthreads * sizeof(pthread_t))))
        fprintf(stderr, "Couldn't allocate %lu bytes\n",
         nthreads * sizeof(pthread_t));

    if (!(pParms = (hailstone_t *)malloc(nthreads * sizeof(hailstone_t))))
        fprintf(stderr, "Couldn't allocate %lu bytes\n",
         nthreads * sizeof(hailstone_t));
    next_sec = rdtsc() + qCLKS_PER_SEC;

    for (i = 0; i < nthreads; ++i)
        int rc;

        pParms[i].starting_n = STARTING_N + i * 2;
        pParms[i].increment  = nthreads * 2;
        if (rc = pthread_create(pThreads + i, NULL,
                  hailstone, (void *)(pParms + i)))
            fprintf(stderr,"Error - pthread_create() return code: %d\n", rc);
    while (1)
        nanosleep(&req, &rem);
        if (rdtsc() >= next_sec)
            qword min_n = pParms[0].current_n, all_cnts = pParms[0].cnts,
              all_cached_cnts = pParms[0].cached_cnts;
            dword h, m, s;
            for (i = 1; i < nthreads; ++i)
                if (pParms[i].current_n < min_n)
                    min_n = pParms[i].current_n;
                all_cnts += pParms[i].cnts;
                all_cached_cnts += pParms[i].cached_cnts;

            next_sec += qCLKS_PER_SEC;
            s = secs++;
            m = s / 60;
            s %= 60;
            h = m / 60;
            m %= 60;
            fprintf(stderr, "\r%02u:%02u:%02u  %lu  %.0f/sec (%4.1f%%)",
           h, m, s, min_n, (min_n - STARTING_N + 1) / 2. / secs,
           100. * all_cached_cnts / all_cnts);

    /* Last thing that main() should do */

Synchronized access to max_cnt is explicitly avoided as it presents a very significant performance hit (approx. one-third loss of performance) on parallel processing.

Using this (the final code) to run three threads results in ~29 M NCPS — an approx. 2000% improvement over the naive implementation.

On an Intel 4790K CPU with 4 physical cores and 4 virtual (HyperThreading) cores running at 4.6 GHz clock rate, using 32 GiB Crucial Ballistix DDR3 RAM overclocked to 2 GHz the code runs at 80 M NCPS. Unfortunately, a simple calculation shows that it would take 10 years at that rate to check all odd numbers up to 46785696846401151 (the last — wrong — element of the original list).


Fixstars Launches the World’s Highest Density SSD, “SSD-3000M” For Media and Entertainment Professionals

The Highest Density and Performance Reliability for professionals.

Sunnyvale, CA – Feb 17, 2015 – Fixstars Solutions Inc., an innovator in flash storage solutions, today announced the start of sales for 3TB SSD, SSD-3000M, and 1TB SSD, SSD-1000M, in North America. The products feature enterprise level reliability and unprecedented sequential read/write performance aimed at professional content creation, Advanced driver assistance systems(ADAS), HPC, and Datacenters.

The 3TB SSD-3000M has the world’s highest capacity*1 for 2.5” SATA SSD. High capacity SSDs help reduce the number of drives required in professional setups reducing operational costs such as maintenance, energy, and chassis/rack infrastructure. More importantly a more reliable workflow with minimum handling failures is of significantly valuable. These disks integrate Fixstars’ proprietary NAND controller preventing latency spikes and performance deterioration ensuring consistent high performance. Applications for which fast and stable disk writes are crucial such as 4K video recording/editing and encrypted storage for film will benefit the most from Fixstars solid state disks.

_DSC3737 - flatten

“The SSD-3000M/1000M were released in Japan last November, and have been getting great feedbacks from our customers”, said Satoshi Miki, CEO & Co-Founder of Fixstars Corporation (Tokyo), “As an innovator of storage solutions, we are focused on providing high performance and reliability SSD solutions, to accelerate our customer’s business”.

For more information on the SSD-3000M/1000M, please visit our web site.

*1: As of Nov 18th 2014, according to a survey by Fixstars Corp.

About Fixstars

Fixstars Solutions is a technology company devoted to our philosophy “Speed up your Business”. Through its software parallelization/optimization expertise, its highly effective use of multi-core processors, and application acceleration for the next generation of flash memory technology that delivers high speed IO as well as power savings, Fixstars Solutions provides “Green IT”, while accelerating customers’ business in various fields. Learn more about how Fixstars Solutions can accelerate your business in life science, manufacturing, finance, and media & entertainment. For more information, visit



[Tech Blog] PCIe SSD for Genome assembly



A genome assembly software takes a huge number of small pieces of DNA sequences (called “read”), and tries to assemble them to create a long DNA sequence which represents the original chromosomes. It generally consumes not only large computational power, but also large working memory space. The required memory space depends on the input data size, but often in the tera-bytes. Although it is true that the price of DRAM has dramatically decreased, a workstation or a server with a few TB of memory is still very expensive. For example, IBM System x3690 X5 can install 2TB of memory, but the list price is more than $300k.

On the other hand, PCI Express SSD board is on the rise. Many hardware vendors like Intel, Fusion-IO (acquired by San Disk), OCZ (acquired by Toshiba), etc, release variety of PCIe-SSD boards. Generally, it has lower bandwidth than DRAM, but much higher bandwidth than a standard SATA/SAS SSD. The price is a little high compared with a standard SATA-SSD, but Fusion-IO ioFX 2.0 has 1.6TB, at $5,418.95 on Even if you insert this board into a high-end workstation, the total price is still below $10,000, which is much cheaper than a 2TB memory server.

In this blog post, I would like to explore whether using an SSD in place of DRAM is going to yield a viable solution. We will use an open-source Genome Assembler called “velvet” as a benchmark software.


First, I downloaded, compiled and installed “velvet” from the velvet web site.

$ tar xvfz velvet_1.2.08.tgz
$ cd velvet_1.2.08
$ sudo cp velvetg velveth /usr/local/bin

In this procedure, ‘MAXKMERLENGTH=51’ sets the maximum number for “k-mer”. ‘OPENMP=1’ means that multi-threading by OpenMP is active. “k-mer” is a very important parameter in genome assembly, as it affects the quality of output DNA sequence. For more detail on k-mers, please refer to the velvet user manual.

Velvet has two process, velveth and velvetg. The velveth process creates a graph file to prepare the genome assembly. The memory required by velveth is not so large. The velvetg process, which is the actual assembling process, consumes much more memory and computation time.
Before we can start testing, we need input files. In this experiment, we will use two fastq files, SRR000021 and SRR000026, which were downloaded from this site . I processed these data by velveth as follows:

$ velveth SRR2126.k11  11 -short -fastq SRR2126.fastq

The first argument is the output directory name, the second argument is the length of the k-mer, the third argument specifies short read inputs, the 4th argument specifies the “fastq” file type, and the 5th argument is the input file.

The next step is assembly. The command to do so is as follows:

$ velvetg SRR2126.k11

The argument is the output directory generated by velveth. I measured the elapsed time of this command in several different hardware configurations:

1. Memory 4GB + Generic SATA HDD
2. Memory 4GB + Fusion IO ioFX
3. Memory 8GB

In case of configuration #2, I created a swap file on ioFX like this:

# dd if=/dev/zero of=/mnt/iofx/swap0 bs=1024 count=12582912
# chmod 600 swap0
# mkswap swap0 12582912
# swapoff –a
# swapon /mnt/iofx/swap0


The velvetg process uses about 8GB memory space for this input data, so roughly half of temporary data is spilled out to swap memory space for configurations 1 and 2. Below figure  shows the elapsed time for each configuration. Using the Generic SATA HDD, the process was not finished after 2 hours, so we decided to kill the process.



So is using PCIe-SSD a viable solution? It is hard to say, as 3x difference is not so small. In this particular experiment, only half of the memory space used by velvetg happened to be placed on the PCIe-SSD. As mentioned earlier, the real-life data that bioinformaticians deals with can be a few TBs. If almost all the memory space is on PCIe-SSD, the performance is expected to be much worse.

However, considering that HDD could not even complete the process in a reasonable amount of time, the PCIe-SSD card showed that it can vastly improve performance. This shows that the SSD can serve as a good compromise point, as DRAM is much more expensive than a SSD.