Module documentation for 0.2
There are no documented modules for this package.
This program allows you to decompose a set of audio files into chunks and use these chunks for building a new audio file that matches another given audio file. This is very similar to constructing an image from small images that are layed out in a rectangular grid.
The simplest way to use the program consists of the following two steps:
Step 1: Add chunks from an audio file to the pool:
sound-collage --chunksize=8192 decompose track00.wav pool/%06d sound-collage --chunksize=8192 decompose track01.wav pool/%06d
Attention: The chunk size and the number of (stereo) channels must be the same for all added files. These parameters are not stored in the pool itself and thus consistency cannot be checked.
Adding the same set of audio files to the chunk pool again will fool the automatic chunk size determination in the composition step. You should not add an audio file twice anyway, since it increases disk usage and computation time and has no effect to the result.
Step 2: Compose an approximation of an audio file using chunks from the pool
sound-collage auto pool/ music.wav collage.f32
It performs four steps:
Find best matching chunk from the pool for every chunk in the audio file.
Check where it is better to take an originally adjacent chunk from the pool.
Compose matching chunks to a single file.
You can run these steps manually in order to inspect the results, repeat individual steps or omit them (e.g. step 3). Here an example for stereo music file:
sound-collage --chunksize=8192 decompose music.wav music/%06d
sound-collage --chunksize=8192 --channels=2 associate pool/ music/ collage/%06d
sound-collage --chunksize=8192 --channels=2 adjacent pool/ music/ collage/
sound-collage --chunksize=8192 --channels=2 compose collage/ collage.f32
adjacent step there is the
It specifies how much adjacent chunks shall be prefered to the best matching chunk.
(Please note, that the best matching chunk is not actually the best matching one,
but only an approximatively best one. See below for details.)
If cohesion is 1 then an adjacent chunk is only prefered
if it matches better than the best matching chunk.
If cohesion is larger then adjacent chunks are more often prefered
to the best matching one.
If cohesion is zero, then always the best matching chunk is chosen.
This is like skipping the
--adjacent step completely.
You can use any input format supported by SoX,
but output is always raw
Float format, i.e.
Spectra are computed and stored in
(single precision floating point)
and chunks in the pool are stored in
This is how it works: Since there is a lot of data to process I have chosen the following optimization that however influences the result. I group all chunks according to the index of the largest Fourier coefficient. All chunks with the same index are stored in one file. For the search of matching chunks I traverse the Fourier indices. Then e.g. for Fourier index 10 I load all chunks from the pool and all chunks from the decomposed music and find best matching chunks only within this group. This way I may miss the best matching chunk, but save a lot of computation (I hope so).
Btw. if you also add
music.wav to the pool,
music.wav will not be restored by the collage algorithm
since the audio files are decomposed into overlapping chunks.
Approximation is done using simple L2 norm. It is well-known that this does not match human perception very good. Maybe it is a good idea to work with lossily compressed audio files where all non-audible waves are already eliminated. In this case the L2 norm might better match the human idea of similarity of audio chunks.