The billion row challenge: do we have a bug?

0 ▲

3 hours ago · Tech · 0 comments

A couple of people contacted me with feedback about the SIMD implementation we used to find newlines in the billion-row file. One suggested a possible bug (shock!) and one suggested a way that might be more efficient. We'll take a look at both, obviously starting by writing a test that checks for the bug. After the stream I made another attempt at using the information about all newlines in a 64-byte chunk, instead of just the first one. I did it with no Vecs at all, unifying the two functions we worked with into a single one with nested loops. Surprisingly (to me) this was still slower than the original solution. Again, this seems to prove the power of simplicity! You can find this code at memmap_simd.rs#L152. Follow me on mastodon: @andybalaam@mastodon.social

No comments yet. Log in to reply on the Fediverse. Comments will appear here.