Base64 and chunking files for upload
Base64 encodes data in a way so that contains only websafe characters. Perfect for passing data over a network. (Note that Base64 encoding increases file size by approximately 30%).
A good way to upload data to a server is to 'chunk' the data into several pieces. Each chunk gets sent to your server and the whole file is rebuilt sequentially as each chunk is uploaded.
'Chunking' means breaking a file up at a predefined number of bytes in the data. 'Chunking' data, to encode it into Base64 has a quirk that few people have written about in plain terms.
The Quirk
When chunking a Base64 string, you must ensure that each chunk occurs at exactly 6 bit boundaries. Chunking the data at other byte boundaries makes it difficult to trivially reassemble.
How Base64 Works
Without going deep into the process, understand that Base64 takes binary data of 8bits and regroups the binary data as 6bit strings.
Next, each 6bit binary string is replaced by an ascii character from the Base64 table.
So the string 'Hello', which in binary is:
01001000 01100101 01101100 01101100 01101111
First becomes (regrouping at 6 bits):
010010 000110 010101 101100 011011 000110 1111
Then (using the Base64 encoding table) linked above, each group of 6 bits is changed for a corresponding ascii character from the table:
SGVsbG8=
Note: the =
symbol is used when the string cannot be chunked without a remainder, into 6bytes. It is the base64 padding character.
The key to chunking for effective network transfer and simple reassembly is to make sure that each of your chunks are created at the right points throughout the Base64 string. So, how do we do that in Javascript?
First, using the FileReader API, when you load in your files you should use the readAsDataURL() function on the FileReader() object. This will load in the file as a base64 string, performing the Binary to Base64 encode for you.
Next, we need to write some code that will calculate where in the Base64 encoded string each chunk should be made. This is to ensure we are only creating chunks at 6 bytes boundaries.
Here's a function that will:
- Define a chunk size in bytes. You can adjust this according to your preferences/ server limitations.
- Recalculate chunk size based on your input string length
- Recalculate chunk again, but remove the remainder of the lowest possible number of bytes calculated from the chunk size (to make each chunk boundary 6 bytes).
- Calculate where to split the binary string based on the recalculation
- Create a regular expression to use with match function (which returns an array containing all the chunks)
chunkFileForBase64 (filedata) {
let chunkSize = 2000;
let chunkSizeBase64Adjusted = Math.min(chunkSize, filedata.length);
chunkSizeBase64Adjusted -= filedata.length % 6;
chunkSize = RegExp('.{0,' + chunkSizeBase64Adjusted + '}', 'g');
let chunks = filedata.match(chunkSize);
}
There you have it. A function to create an array of binary strings pre-chunked to 6 bytes. Now you can encode to Base64 and upload to your server. I use PHP to sequentially reassemble the whole Base64 string server-side.
Thomas -