thomas-shirley.com

Base64 and chunking files for upload

Base64 encodes data in a way so that contains only websafe characters. Perfect for passing data over a network. (Note that Base64 encoding increases file size by approximately 30%).

A good way to upload data to a server is to 'chunk' the data into several pieces. Each chunk gets sent to your server and the whole file is rebuilt sequentially as each chunk is uploaded.

'Chunking' means breaking a file up at a predefined number of bytes in the data. 'Chunking' data, to encode it into Base64 has a quirk that few people have written about in plain terms.

The Quirk

When chunking a Base64 string, you must ensure that each chunk occurs at exactly 6 bit boundaries. Chunking the data at other byte boundaries makes it difficult to trivially reassemble.

How Base64 Works

Without going deep into the process, understand that Base64 takes binary data of 8bits and regroups the binary data as 6bit strings.

Next, each 6bit binary string is replaced by an ascii character from the Base64 table.

So the string 'Hello', which in binary is:

01001000 01100101 01101100 01101100 01101111

First becomes (regrouping at 6 bits):

010010 000110 010101 101100 011011 000110 1111

Then (using the Base64 encoding table) linked above, each group of 6 bits is changed for a corresponding ascii character from the table:

SGVsbG8=

Note: the = symbol is used when the string cannot be chunked without a remainder, into 6bytes. It is the base64 padding character.

The key to chunking for effective network transfer and simple reassembly is to make sure that each of your chunks are created at the right points throughout the Base64 string. So, how do we do that in Javascript?

First, using the FileReader API, when you load in your files you should use the readAsDataURL() function on the FileReader() object. This will load in the file as a base64 string, performing the Binary to Base64 encode for you.

Next, we need to write some code that will calculate where in the Base64 encoded string each chunk should be made. This is to ensure we are only creating chunks at 6 bytes boundaries.

Here's a function that will:

chunkFileForBase64 (filedata) {
 let chunkSize = 2000;
 let chunkSizeBase64Adjusted = Math.min(chunkSize, filedata.length);
 chunkSizeBase64Adjusted -= filedata.length % 6;

 chunkSize = RegExp('.{0,' + chunkSizeBase64Adjusted + '}', 'g');
 let chunks = filedata.match(chunkSize);
}

There you have it. A function to create an array of binary strings pre-chunked to 6 bytes. Now you can encode to Base64 and upload to your server. I use PHP to sequentially reassemble the whole Base64 string server-side.

Thomas - 13-06-2022