MANUAL FOR RLESTAT (BBC BASIC) This file describes the program RLESTAT which analyses a group of files, then calculates by how much they could be compressed using a variation of RLE compression. The manual and software is (C)2006 SPROW INSTRUCTIONS- At the BASIC prompt, type CHAIN"RLESTAT" The program will then read in a set of 7 data files to be analysed. The filenames can be changed by altering the DATA statements at the end of the program listing, the format is DATA "Description text" DATA Filename1 DATA Filename2 DATA Filename3 DATA Filename4 DATA Filename5 DATA Filename6 DATA Filename7 to save having to repeatedly enter the filenames, simply add additional groups of 7 filenames then set the value of "choice%" at the start of the program to select which group of to use. Ideally, the data files should be around 100kbytes in total. While reading in the files, a counter is updated which represents how much the data could be compressed. The RLE algorithm used is * Assign an 'escape' byte This should be a byte which doesn't occur very often in the input data and can be changedby altering the variable "escape%" * If the input byte is the 'escape' byte then output two bytes The escape code followed by the escape code itself * If the input byte is the same as last input byte Just count up how many times it has been repeated but don't output anything * If the input byte is different to the last input byte Output that byte Should that also be the end of a repeated run, output the escape code plus the run length too The maximum run length in this scheme is 255 characters, though the special case of the run length being the same as the value of the escape character is not permitted. This would need to be split into two shorter run lengths to avoid accidentally outputing an 'escape/escape' byte pair. The compression statistics are displayed after this step, expressed as a fraction of the original size. For example 40% compression would mean that for a 100k input file the result would occupy 60k. As its name suggests, run length encoding works best with data that contains long runs of the same character - graphics for example where large areas of the screen are one colour, whereas BASIC programs for example don't RLE very well. KNOWN PROBLEMS/FUTURE ENHANCEMENTS- Run lengths with the same number of repeats in as the value of the escape code aren't calculated, hopefully these should be infrequent enough that the compression statistics aren't affected. Should convert to use GBPB instead of multiple BGET commands. No known problems HISTORY- V1.00 Original