|
|
Dora |
Originally this program was named Dora after a cat that sometimes has bugs (fleas), is fun to play with and look at, but otherwise not too useful. With time, I hope the program loses some of its bugs, remains fun to play with, and becomes increasingly useful.
Dora is intended to be a program that lets you do lots of things with sounds --- sort of like those pocket-size multitools that open out into pliars, saws, and what have you and are advertised as the only implement you need to make or repair most anything in an emergency. Because Dora is free software (in the sense of the Free Software Foundation) it comes with the source code; you can easily modify the program to perform your favorite functions if they're not currently in Dora's repertoire.
There is already a very nice free software sound editor (Audacity), so in Dora I've concentrated on adding interesting things that most other sound manipulation programs don't have.
With the current version of Dora, several stand-alone applications are included. These are not intended for interactive use; rather they are supposed to perform long and boring jobs on their own. PhraseGrab reads sound data from an input device, recognizes phrases with a module like Dora's Phraser process, and saves the phrases to a file, along with data about when the phrase occurred. CorrSA takes a list of sound files and computes the correlation between all pairs of sounds. Monitor watches sound levels obtained from a sound device and reports useful statistics about them in a simple window.
I hope to learn lots of things by working on and with Dora. The program is intended to aid in experimentation and development. This document ends with a section describing a variety of activities that, over the years, I've found useful in thinking about sound.
I'd appreciate having your comments, suggestions, requests, and corrections about Dora and/or this document. And of course, I'd be happy to have any useful code you'd care to contribute.
To run this program you need a Java Virtual Machine installed on your computer. Once this is available and the binaries are in your path, switch to the directory where the file Dora.jar is stored an give the command
java -cp Dora.jar FrontEnd
If you have enough RAM to spare, you might allocate extra memory for Dora with a command like
java -Xms96MB -Xmx150MB -cp Dora.jar FrontEnd
Note that Sun Microsystems makes a JVM available at no cost. Also, many systems are case sensitive, so you might need to be sure and get the capitalization correct when issuing the above command.
When the program starts it creates on window that is filled with controls. This is the “main control window”. It contains a menu at the top and two or more panes below the menu; individual panes are exposed by clicking the tab at the top of the pane. Other windows that the program opens are free of the main control window. Dora doesn't provide a desktop or main window that contain all its sub-windows as in a typical MDI (multiple document interface) application. Usually, it is pretty clear which windows are part of Dora's display.
There are three main ways to interact with the program and your audio data. First, you can select from a number of different procedures using the menu at the top of the main control window. Second, you can adjust parameters that determine the details of how these procedures work by fooling with controls in the main window's tabbed panels. These buttons, radio-buttons, text-fields, and sliders change the way the program does the things you ask it to do by picking menu items. Finally, you can use the mouse to point at and select parts of pictures of the sound.
To begin using Dora, use the File/Open menu and the file-chooser that it creates to select a sound file. Check-boxes indicate whether channels 1 and 2 are to be Shown and Active. For a while, you can leave these at their default setting which makes all the channels visible and active for editing.
Much of what Dora lets you do is based on graphical representations of sounds so the first thing to learn is how to generate and control the appearance of these graphical displays. The two most common types of display are a power time series and a sonogram. You can choose between these possibilities with a radio button on the first pane of the main control window. When you have set the values for the parameters describing the picture you tell Dora to draw the picture by clicking the Draw button on the first pane.
To draw a power time series the program operates on blocks of sound sample values. The size of the blocks it uses, i.e. the number of samples it works with at one time, is controlled by the BlockSize parameter. Starting with the first BlockSize samples the program computes a summary of the size of the sample values as the “root mean square” of the samples. It then shifts its attention StepSize samples to the right (forward in time) and repeats the process. The graphical display shows these estimates of “average size” or “average energy” or “average loudness” plotted on a vertical scale with the horizontal scale being time (measured as a number of steps of size StepSize).
If you set StepSize equal to BlockSize, each sound sample appears in only one average that is plotted. By making StepSize smaller than BlockSize you perform a sort of moving average, resulting in a fairly smooth curve. By setting StepSize larger than BlockSize, you skip some sound samples, but manage to display more of the sound file on the screen. You can experiment with different settings of the parameters, clicking the Draw button when you have modified their values.
The vertical scale on the power time
series plot runs from 0 to the largest possible sample value. Typical
“.wav” files represent each sample with signed, 16 bit
integers, so the maximum possible value for a sound sample in such a
file is 32,768. In general the formula for the top of the scale is
where
B is the number of bits in a sound sample (assuming samples are
signed). Dora should understand both 16 and 24 bit samples, but
hasn't yet been tested with 24 bit samples.
|
|
A power time series plot |
You can cause Dora to impose coordinate markings on the display by turning on the ShowGrid radio button on the second tabbed pane of the main control window. You can also indicate the number of divisions of the vertical scale that you want marked and how frequently (in seconds) you want the horizontal (time) scale marked.
When Dora makes a sonogram, sound samples are again taken in chunks of size BlockSize, only they are Fourier transformed to obtain estimates of the power that appears at different frequencies in the sound. These are displayed as patterns of light (low power) and dark (high power) pixels arranged vertically in the display. Subsequent columns are obtained by shifting forward in time by StepSize samples and repeating the procedure.
As with the power time series, you can modify the BlockSize and StepSize for the sonogram display, although the BlockSize is restricted to being a power of 2 by the algorithm used for doing the discrete Fourier transform. You should be aware that when you select larger BlockSize values, the computation of the sonogram can take a while. Generally it is best to use the time series display to get close to the portion of a sound that interests you, then switch to the sonogram display with fairly small BlockSize (256, or 512 for example) and only really crank up the BlockSize after you have identified the section in which you're really interested. You can control the range of frequencies that appear in the sonogram using controls on the Display Panel.
|
|
A sonogram plot |
You can alter the appearance of a sonogram by modifying the relationship between the power estimates and the darkness or lightness of the pixels on display. For convenience, the total range of powers is rescaled to lie between 0 and 1. When Dora starts up, 1 minus the scaled value of the power is used as the value of the gray to represent it, so the maximum power value is shown as pure black and the minimum value is shown as pure white.
By adjusting the contrast coefficients (text-fields) on the second pane of the main control window, you can alter the relationship between power level and light/dark. The three (x,y) pairs that you can adjust represent coordinates on the graph of the scaling function that associates 1-power and grayness.
For example, many sounds have a background microphone and/or preamplifier noise that appears at all frequencies with a level of about -40 db (say). If you make a sonogram of such a sound, the entire picture will be drawn on a fairly dark gray background. To improve the appearance of the sonogram you could adjust the y values corresponding to the smaller x values down a little, so that the points are (.25,0.1),(0.5,0.2) and (0.75,0.75). This amounts to leaving the scaling of the parts of the signal of interest unchanged while reducing the values used to render the background noise.
|
|
A sonogram plot |
As with the power time series, you can choose to have coordinate value indicated with the ShowGrid radio button on the second panel.
A final adjustment you can make to the sonogram is to narrow the range of frequencies that are displayed. There's no point in giving screen space to sounds above 10kHz if your signal of interest lies under 4kHz. Use the spinners on the second pane to set the low and high frequencies that are displayed.
The overall height of the display, either sonogram or power series, is set internally to 512 and but can be changed at runtime using the PictureHeight spinner on the main panel. If you are displaying a 2 channel file, the display of each channel will get half the available height. You can also increase the size of the picture by using a check-box to turn off the display of one of the channels.
On the Display Panel you will find radio buttons that let you select between LinearScale and LogScale for the sonogram. This refers to the vertical axis of the sonogram, the frequency scale.
This plot requires that you have stored one sound in Buffer and another in the "active" sound. Dora computes the correlation between the active sound and all possible shifts of the Buffer sound. High correlations represent similarity of sounds, so this plot lets you see where in the active sound there are passages that are similar to the sound in the Buffer.
The value plotted is the actual correlation coefficient of the active sound and the shifts (in time) of the buffer sound ---

The number r always lies between -1 and 1and this is the vertical scale used for the correlation plot. Rather than plotting all the r's, one for each shift of the Buffer sound, Dora considers the r's in groups of size BlockSize, and plots the maximum (in blue) and minimum (in red) of the r values computed for shifts in that block. As in the other displays, Dora then shifts attention by StepSize samples to the right (forward in time) and plots the max and min of the next BlockSize shifts. The Block and Step sizes that govern this display are those set in the “Power” section on the main panel.
Rather than computing the exact correlation between a sound and the shifts of another sound, Dora provides the option of first computing the rms power of the two sounds, taking samples in blocks of size CorrelationBlockSize and making shifts of size CorrelationStepSize. When these two parameters are set to 1, the exact correlation between the shifts of a sound are computed (and the max and in of the values in blocks ) and displayed. The size of the blocks on which rms power is computed before the correlations are found is controlled in the”Correlation” part of the main panel.
The menu item File/Save image lets you save a a version of the current (active) graphical display as a file on your disk. The format used is PNG (portable net graphics) understood by most modern browsers, word-processors, etc.
Often you will want to concentrate on just part of a sound, either to view it in detail or to perform editing procedures on it. You can select which track (channel) of a multi-channel sound is shown with check-boxes on the first pane of the main control window. Within a graphical display you can select which channel(s) is active (i.e. will be the subject of editing procedures) with check-buttons on the left side of the display. In addition, you can further restrict attention to a subsegment of the active channels by setting the selection clicking on the graphical display with the mouse. The start (left) selection point is shown as a green vertical line, the right (stop) of the selection is shown as a red line and you can set the selection start and stop points by making left and right mouse clicks on the picture display. If you mark a selection and click the Zoom button, the display will be redrawn, but with just the selected segment shown. To magnify the display, you need to change the StepSize, smaller values of StepSize result in expanded time scales. The Unzoom button restores the display to its previous portion of the sound. You can do multiple Zoom and Unzoom commands as you mouse around and focus in on the part of a sound that interests you. When you click the Play button on the first pane of the main control window a separate thread of execution is started to play back the displayed portion of the sound. To hear just the selected region, you need to Zoom in first. If you click Play before drawing any picture, the entire sound file is played.
There are only two sounds that Dora is prepared to manipulate at
any given time. One is the active sound and the other is
called the Buffer. When you read in a sound file with the
File/open it becomes the active sound. You can further
restict the portion of the active sound that will be modified by
making a selection and by adjusting which channels of the
sound are active. Many Dora sound modification procedures work only
on the selected portion of the active channel of the active sound.
Other sound manipulations use two sounds, either combining two sounds into one, or producing two sounds from one. In these operation the first sound is the active sound and the second sound is always the Buffer.
Using the Sounds menu you can select which of the
sounds is the active sound and which is the Buffer. If there are
sounds that you no longer need, an item on this menu will allow you
to close them so that the memory they occupy ill eventually be freed
up. (WARNING: Current versions don't prompt you to save sounds that
have been modified as you close them or as you quit the program.)
Dora provides a variety of procedures to modify your sounds. Here's a list; many of the commands are probably self-explanatory
Cut to Buffer
removes the selection from the active sound and places it in the buffer.
Trim
cuts out the section of sound between the
start and stop markers and discards the beginning and ending of the
sound.
Copy to Buffer
copies the selected portion of the active sound and places it in a buffer.
Insert Buffer at selection start
behaves in one of two ways, depending on the number of channels in the active sound and in the buffer. If the number of channels in the two sounds are equal, the command inserts the corresponding channel of the Buffer sound into each active channel of the active sound at the selection start marker. If Buffer has just one channel, it is inserted into each of the active channels of the active sounds. Dora editing procedures never (what never?) produce sounds with unequal channel lengths: short channels are always filled with 0's at their end.
Mix Buffer at selection start
The rules for what channels go where are the same as for the Insert Buffer command. In this operation the sounds are mixed (by averaging each of their samples). This produces a 50-50 mixing; to mix sounds in other proportions, apply a volume change to one of the sounds before mixing them. No check is made to avoid or report on the possibility of "clipping" but the 50-50 mixing operation won't result in clipping unless one of the sounds being mixed is already clipped.
Change Volume
multiplies the samples by a fixed factor, resulting in a change in the amplitude (loudness, strength) of the sound. There are two methods available. The "integer" method only allows you to select integer multipliers for the sound and the multiplication procedure is carried out in integer arithmetic, so no rounds or truncating is done. The "double" method allows you to select arbitrary multipliers, and during the process, sample values are converted to doubles (64 bit floating point numbers), multiplied, and converted back to integers again. The reduction to integers is accomplished by truncation, but new versions will perform a rounding procedure. VolumeChange uses sliders to let you select the multiplication factor, and only multipliers that will not result in clipping are offered. Of course, multiplying by a factor between 0 and 1 reduced the volume rather than increases it and no integer multipliers can accomplish this.
Filter
(currently High pass, Low pass, Band pass, Band stop second order
Butterworth filters) when you select the Edit/Filter
item you are offered a new window in which you can choose the type
of filter and specify the parameters for the filter.
By turning on data logging (under the Info menu item) you can capture the coordinates of your mouse clicks on a graphical display and later save them to a file. There's a variety of reasons you might want to do this, but since the goal is presumably to get fairly accurate readings of the coordinates of points on the display, ONLY DISPLAY ONE CHANNEL when you're logging data. In fact, if there's more than one channel displayed, the logging routine will report incorrect power and frequency measure, although the time coordinates will still be correct. (This problem should be fixed in the next version.) Also, data logging does not work (yet) when you have selected LogScale for a sonogram display.
After drawing a picture of your sound you can heard of coordinates of points by clicking on the picture while holding the Shift key. This Shift-click command opens an additional window and shows the coordinates of the point on which you clicked. The format of the display is [t1,t2] db or [t1,t2] f, depending o whether the display is a power plot or a sonogram. Here, t1 and t2 are the start and stop time points for the pixel on which you clicked (measured in seconds from the start of the display), db is the sound strength (in decibels below saturation) and f is the frequency. Again, DISPLAY ONLY ONE CHANNEL when you're reading coordinates from the screen.
The Phraser module works on a sound file that you have selected and cuts it up into smaller sounds files, each containing a single phrase. Or at least that's the theory. After opening the file you want to cut up, go to the Phrases panel and you will see three variables you can adjust: Threshold, MinToStart, and BlockSize. The initial settings are probaly an okay starting place. Click the ComputePhrases button to have Dora determine the start and stop locations of the phrases as determined by the current parameter settings. To see these phrases, select the ShowPhrases radio button from the Display panel and then, using the Main panel, Draw a TimeDomain of Sonogram picture. The beginnings and ends of the phrases are marked with green (start) and red (stop) vertical lines.
You can adjust the values of the phraser control parameters to alter the phrases that Dora selects. The best way to understand what these do is think in terms of the process that is implements in the Phraser procedure. This process considers the sound in chunks of BlockSize samples and for each it computes the rms power of the block. The procedure recognizes the start of a phrase when MinToStart consecutive blocks have rms power at or above the Threshold value. The end of a phrase is recognized when MinToStart consecutive chunks have rms power below the Threshold value. The phrase includes all of the chunks since the Threshold was first exceeded, as well as the MinToStart consecutive blocks (at the end) that have rms power under the Threshold value. In addition, you can use the parameter PadAtStart to include extra blocks at the start of the phrase.
When you have adjusted the Phraser parameters so that the phrases of interest are selected, you can use the ExtractPhrases button to cause Dora to create a sequence of .wav files containing the phrases. As a default, these files are in the working directory an called Phrase0.wav, Phrase1.wav, etc.
Instead of using the total rms power in the sound, you can restrict the phrasing process to consider only the power between two frequencies. To do this, make the Frequency radio-button active and set the spinners for Flow and Fhigh to the desired values. You must select a BlockSize that is a power of 2 since this routine uses FT's.
Other improvements to implement include: running the phraser on a collection of files, letting the user specify the name and path of the files that are extracted, and a more flexible phrase selection algorithm . In addition, it might be nice to allow the user to modify the location of start/stop markers by dragging them with the mouse. But really, this is meant to be automated and to run on an entire list of files all at once.
If you have made a two channel recording with spaced microphones, you can use Dora's DeltaT module to estimate the bearings from the microphone location to the source of a sound. The setting in which this procedure is designed to be used is plotting the locations of a singing bird (for example). Your recording needs to be made, or cleaned up, edited, and filtered, so that aside from uniform background and system noise, the sound you want to locate stands out, representing a significant portion of the rms power in the sound sample. Be sure that, in cleaning the sound, you don't alter the relative timing of the two tracks since it is the difference in the times of arrival of the sound at the two microphones that permits an estimate of the bearing to the sound source.
Having selected the sound file (or made the sound the active one), adjust the Estimate Bearings parameters on the Phrases panel click then EstimateBearings button. The output is printed on standard out and gives the bearing in degrees to the sound source.
The parameters you need to supply are L, the distance between the two microphones used in making the recording, and V, the speed of sound in your planet's atmosphere. I tend to use values like V=345 and V=333 meters per second since they're easy to remember. Of course the distance units of L and V must be the same, and the time unit for velocity should be seconds.
Improvements to be made include allowing the procedure to be called on a collection of phrase files all at once, resulting in output going to a text file that can be used as input to a program that knows the locations and orientations of several microphone pairs, and how phrases from microphone pairs correspond and that can compute the locations of the sound sources. Additional improvements should allow more channels of sound and account for three dimensional displacements rather than assuming a planar world.
How it works. As explained here, the bearing estimation procedure relies on the difference in the times at which the sound arrives at the two microphones. To obtain a bearing, it assumes that the distance to the sound is quite large relative to the distance between the microphones.
Suppose we hace located the microphones on the x-axis symmetrically with respect to the origin, ie., at the points (-d/2,0) and (d/2,0), and suppose that the sound source is located at the point (A,B) in the plane. See the picture:

By estimating the difference in the times at which the sound arrives at the microphones we can estimate the difference L2-L1of the distance between the microphones and the location of the sound.
Since
we
have, taking derivatives, that
.
Since we are only intersted in changes in L that result from the
displacement of the microphones along the horizontal axis we take
and
.
This tells us that
![]()
Now (A/L) is the cosine of the angle that the
line connecting (A,B) with the origin makes with the
horizontal, or the sine of the angle it makes with the vertical axis.
If we let
be
the angle to the left (or right) of the vertical, our considerations
can be summarized as:
![]()
where V is the velocity of sound, Δt is the difference in the time of arrival of the sound at the two microphones, and d is the distance between the microphones.
If we suppose that the segment connecting the microphones runs east-west and that channel number 0 is to the east (right if we're facing north), then a positive value of the bearing based on the time difference channel0 – channel1 corresponds to a bearing east of north. Of course there's a bit of ambiguity because it also corresponds to a bearing east of south.
Some examples that show how well this procedure for estimating bearings works.
First, I made an outdoor recording in which I mounted two microphones 63 cm apart on a meter stick, walked steps away from the stick and the used a clicker to make clicks while I walked 0,2,4,6,10,and 18 steps parallel to the meter stick. The recording was digitized, sections corresponding to the clicks at different locations were put in different files and each file as processed with the DeltaT module. The results,along with the theoretically correct angle (assuming that my steps were equal sized and my path was parallel to the stick) are in the table:
|
Steps |
Angle (deg.) |
Estimate (deg.) |
|
0 |
0 |
0 |
|
2 |
22 |
19 |
|
4 |
29 |
35 |
|
6 |
57 |
56 |
|
10 |
63 |
60 |
|
18 |
74 |
72 |
This is pretty good agreement. Since the recording was made outside and included background sounds (passing cars and airplanes, water dripping from trees, and wind), I was encouraged to pursue the approach further.
As a second experiment I used the same recording set up with the stick oriented east-west. While recording I voiced the compass-determined bearing of two birds that made noises. These bearings were approximate since I didn't actually see the location of the birds. I digitized the recording (using Audacity) and, using Dora, cut out small sections containing primarily the bird of interest. In the case of the junco, I also applied a band-pass filter to eliminate some noise.
|
Cut |
Compass-estimate |
Dora estimate |
|
Grackle |
135 E of North |
150 E of North |
|
Junco |
10 W of South |
8 W of South |
Again, the agreement is pretty good, especially since the grackle was high in a tree and the junco was neat the height of the microphones.
Altitude tends to make the estimate of the bearings lie closer to the perpendicular to the microphone-microphone axis, so the departure in the case of the Grackle is consistent with direction an altitude induced error.

The figure shows (schematically) a hyperboloid that represents points equidistant from two microphones arranged along the AD axis with center point A. The point P, on the hyperboloid but out of the plane containing AD, AB(perpendicular to the microphone axis and parallel to the ground). The actual bearing (in the horizontal plane) to the point P is the angle BAE, but assuming the point P lies in the horizontal plane gives the baering BAC, which always lies closer to the perpendicular to the microphone axis.
If we could record signals from 3 microphones at known locations in a way that kept them synchronized, we could get 3 dimensional locations for the sound sources (by intersecting three hyperboloids). An alternative is to use 3 pairs of (relatively) unsynchronized microphones at known locations.
Given a library of phrases (parts of bird songs, for example) you can use Dora's stand alone LibrarySA lmodule to compute the correlations between a list of sounds and the clips in the library, thus (partially) classifying the sound. In the end, it would be nice to have a more sophisticated pattern recognition module available for this procedure, or even a neural network that could be trained with a library of sounds.
To use LibrarySA, generate two text files subjects.txt and library.txt. In the first you list, one per line, the names of sound files you want to classify. In the second you list the names of files in your standardized clip collection. The commandline is then
java -cp Dora.jar LibrarySA subjects.txt library.txt Blocksize
where Blocksize is the number of samples to use in computing the rms power. When Blocksize is small (like 1 or 2) this program can take a very long time to run, so do it in the background. Alternatively, run the program first with a larger value (eg. 64) and then restrict the comparisons asked for by running with different subests of the subjects.txt and library.txt file.
The output from the program is an HTML table, so you might want to redirect it to a file. An example of the output from this program is:
LibrarySA called with args: subj.txt libr.txt 1
|
Subject |
01-02-p1.wav |
01-05-p1.wav |
01-07-p1.wav |
01-07-p2.wav |
01-08-p1.wav |
01-08-p2.wav |
01-10a-p1.wav |
01-10b-p1.wav |
01-10b-p2.wav |
md01-08-p1.wav |
md01-09-p1.wav |
|
01-02.wav |
0.522 |
0.271 |
0.243 |
0.513 |
0.358 |
0.342 |
0.396 |
0.385 |
0.44 |
0.493 |
0.456 |
|
01-05.wav |
0.382 |
0.315 |
0.239 |
0.529 |
0.376 |
0.421 |
0.414 |
0.415 |
0.491 |
0.495 |
0.52 |
|
01-07.wav |
0.464 |
0.534 |
1 |
1 |
0.49 |
0.479 |
0.624 |
0.568 |
0.688 |
0.689 |
0.639 |
|
01-08.wav |
0.498 |
0.408 |
0.341 |
0.6 |
1 |
1 |
0.504 |
0.555 |
0.578 |
0.541 |
0.568 |
|
01-10a.wav |
0.588 |
0.414 |
0.466 |
0.614 |
0.473 |
0.472 |
1 |
0.656 |
0.663 |
0.596 |
0.645 |
|
01-10b.wav |
0.654 |
0.407 |
0.443 |
0.668 |
0.49 |
0.507 |
0.63 |
1 |
1 |
0.654 |
0.668 |
|
01-12.wav |
0.29 |
0.281 |
0.222 |
0.366 |
0.242 |
0.31 |
0.393 |
0.349 |
0.344 |
0.398 |
0.321 |
|
01-13.wav |
0.656 |
0.3 |
0.419 |
0.633 |
0.442 |
0.512 |
0.604 |
0.774 |
0.787 |
0.608 |
0.671 |
|
md01-04.wav |
0.565 |
0.665 |
0.435 |
0.678 |
0.472 |
0.486 |
0.614 |
0.659 |
0.701 |
0.672 |
0.615 |
|
md01-08.wav |
0.472 |
0.523 |
0.628 |
0.698 |
0.423 |
0.482 |
0.59 |
0.505 |
0.684 |
1 |
0.624 |
|
md01-09.wav |
0.541 |
0.407 |
0.356 |
0.614 |
0.381 |
0.522 |
0.535 |
0.659 |
0.656 |
0.58 |
1 |
The files listed as row labels in this table are rather large sound files of songs from different Northern Cardinals. The files that label the columns are short sound files that were selected to represent certain kinds of song elements that appear in some of the Cardinal songs. (The -p<n> portion of the name give the number of the element type, while the first part of the name gives the sound file the phrase was found in.) Note that the highest values in the row from 01-13.wav occur in columns with phrases from files 01-10b, suggesting that these file are of songs of the same basic type. (This is in fact the case).
Phrases from 01-02.wav and 01-05.wav were high-pass filtered before being placed in the library. This example suggest that it wasn't a good idea to filter them!
The LibrarySA program also allows you to compute the correlation between sonograms. To do this, you call the program with a command line like:
java -cp Dora.jar LibrarySA subj.txt libs.txt Flow Fhigh BlockSize StepSize
Output from a command of this sort (using the same lists of sound files) is:
LibrarySA called with args: subj.txt libr.txt 1000 5000 512 256
|
Subject |
01-02-p1.wav |
01-05-p1.wav |
01-07-p1.wav |
01-07-p2.wav |
01-08-p1.wav |
01-08-p2.wav |
01-10a-p1.wav |
01-10b-p1.wav |
01-10b-p2.wav |
md01-08-p1.wav |
md01-09-p1.wav |
|
01-02.wav |
0.99 |
0.55 |
0.47 |
0.58 |
0.59 |
0.63 |
0.62 |
0.42 |
0.49 |
0.61 |
0.42 |
|
01-05.wav |
0.41 |
1 |
0.37 |
0.65 |
0.65 |
0.69 |
0.56 |
0.56 |
0.56 |
0.73 |
0.24 |
|
01-07.wav |
0.31 |
0.48 |
1 |
1 |
0.56 |
0.55 |
0.48 |
0.38 |
0.46 |
0.8 |
0.11 |
|
01-08.wav |
0.41 |
0.56 |
0.31 |
0.62 |
1 |
1 |
0.55 |
0.45 |
0.52 |
0.69 |
0.17 |
|
01-10a.wav |
0.4 |
0.53 |
0.4 |
0.62 |
0.63 |
0.63 |
1 |
0.45 |
0.55 |
0.62 |
0.27 |
|
01-10b.wav |
0.49 |
0.65 |
0.38 |
0.58 |
0.64 |
0.69 |
0.6 |
1 |
1 |
0.7 |
0.38 |
|
01-12.wav |
0.34 |
0.52 |
0.32 |
0.65 |
0.74 |
0.66 |
0.53 |
0.43 |
0.51 |
0.68 |
0.08 |
|
01-13.wav |
0.48 |
0.62 |
0.31 |
0.53 |
0.61 |
0.63 |
0.54 |
0.64 |
0.76 |
0.58 |
0.42 |
|
md01-04.wav |
0.42 |
0.77 |
0.43 |
0.7 |
0.67 |
0.68 |
0.54 |
0.5 |
0.51 |
0.81 |
0.3 |
|
md01-08.wav |
0.32 |
0.53 |
0.58 |
0.76 |
0.57 |
0.61 |
0.46 |
0.34 |
0.38 |
1 |
0.19 |
|
md01-09.wav |
0.45 |
0.46 |
0.25 |
0.51 |
0.58 |
0.62 |
0.52 |
0.41 |
0.47 |
0.58 |
1 |
Note that in this case, the sound files 01-02.wav and 01-05.wav have very high correlation with the clips selected from them. This is because the clips were high-pass filtered and the correlations computed between sonograms in this run used only frequencies in the range 1-5KHz.
CorrSA: This is a standalone program that takes the name of a file on its command line. The file gives a list of .wav files, and CorrSA computes the correlation coefficient between channel 0 (the first channel, usually the right?) for each pair of files, printing the results to standard out. A second command line argument can be used to specify the BlockSize on which rms power is computed before correlations are found.
LibrarySA: This program works like CorrSA but allows you to specify two set sof sound files to be compared. In addition, you can get this program to compute correlations of sonograms rather than just the power. The command lines are
java -cp Dora LibrarySA subjects.txt library.txt BlockSize
java -cp Dora LibrarySA subjects.txt library.txt Flow Fhigh BlockSize StepSize
Don't forget to redirect the output to a file if you every want to look at it. The program lists all the filenames in the pairwise comparisons it makes before it prints the final results. Generally you can ignore this (erase it) but it can be handy for spotting problems files.
AudioTest: This standalone program is used for monitoring sound levels. It is interesting to run even if you have no inputs connected to the card: my cheapo sound card has a noise level of about -30db.
The source for this program is part of the Dora distribution, but the compiled classes are not part of Dora.jar.
The program reads audio data from your (default) sound card and displays the sound levels with right and left LED displays. In addition, it reports the current values in channel 0, as the max, mean, and minimum of the average of 50 consecutive samples. The Reset button causes the program to restart its calculation of the max and min. The volume sliders at the bottom of the panel allow you to change the sound levels in the two channels independently by +/- 30 dB.
The main purpose for this program is to monitor input and decide on appropriate settings to use in the automatic phrase selection program PhraserGrab.
PhraseGrab: is a standalone program that monitors your (default) sound card and, under certain conditions, saves a portion of the input as a sound file. The program is controlled by the same parameters as the Phraser module of the main Dora program.
Sonogram: is a stand-alone program that reads a sound file, generates a sonogram, and stores in in a .png file. The command line options let you specify then names of the files (in the short form) and also some of the parameters controlling the sonogram:
java – cp Dora.jar Sonogram filein.wav fileout.png
java -cp Dora.jara Sonogram filein.wav fileout.png BlockSize StepSize Flow Fhigh DawAxes x1 y1 x2 y2 x3 y3
You can use Dora to experiment with sounds. Here, several experiments are suggested that should help you gain familiarity with Dora's controls and, perhaps, learn something about DSP or sound perception.
There are a number of ways in which animals determine the approximate location of a sound source. By creating a single stereo sound file, you can experiment with two of these: differential volume and time of arrival.
Begin by opening a two channel sound file like Meow.wav. Remove the sounds in one of the tracks by setting it to be the only active channel, putting selection start and selection end markers at the beginning and end, and using the menu item Cut. Copy the other channel and put it in the channel just emptied. You can do this by setting the other channel to be active, setting the selection start and stop markers at the beginning and end of the sound, using the menu Copy to Buffer; make the other channel active, set the selection start at the beginning of the sound, and mix (or insert) from the buffer.
You now have a sound with two identical channels. If you listen carefully (with headphones, say) you will hear Dora (the cat) in the center of your audio field.
Set one of the channels to be active, select some of the sounds, and change their volume with the Change Volume menu item. Increase the volume of sounds in one channel by 20% or so. Now when you play the sound, Dora appears to meow towards one side; the channel with higher volume makes it appear that Dora as moved to that side.
We use volume cues to estimate the location of sound sources. If a sound is louder in one ear than the other, we tend to locate the sound on the side it is loudest. The magnitude of this effect depends on the difference in the volumes.
Try increasing the volume of one of the channels by 40% instead of 20%. Can you hear that Dora is even farther to the side?
Now return to the case of two identical channels and, at he start of one of them, insert about ½ millisecond of silence. You can accomplish this by previously making a New Sound with one channel and Copy to Buffer. Then make one of your two identical channels active, set the selection start cursor at the beginning of the display and use the menu Insert at selection start. When you listen to this modified sound, you will perceive Dora to be displaced from the center towards the side with the channel not containing the short period of silence.
We use time or arrival of sounds to help estimate the location of a sound's source. If a sound arrives at one of our ears before it arrives at the other we percieve the sound as being on the side with the earlier arrival.
The speed of sound in air is about 1,130 feet per second, or, as an even cruder but useful rule of thumb, about 1 foot per millisecond. The only differences in time of arrival that we perceive as resulting from a sound source displaced from the center are those that correspond to sound arriving at our two ears, about 1 foot apart.
Try inserting a longe enough delay in one of the channels and you will hear an echo, not sound source displacement.
You might like to experiment with how time delay and volume trade off. How much volume change is needed to compensate for a short time delay. Does the compensation depend on the frequency of the sound?
In general, we get our intensity cues from sounds with middle and high frequencies since low frequency sounds, with large wavelengths, “wrap around” our heads easily and there is not much intensity difference at our two ears even is the sound originates far to one side of our head.
Volume change resulted in an index-out-of-bounds exception. This has been fixed.
When more than one channel is displayed, the frequency computed for data logging is incorrect. It seems to be correct when only one channel is displayed. Also, take care of the case where a log scale has been used for the frequency axis in a sonogram (done).
In Phraser, check to make sure that the block-size is a power of 2 when restrict frequency radio is selected. Right now, an exception or error probably is generated by the FFT routines if the BlockSize is bad.
Managing a phrase library. Procedures to create phrase clips, match against them (by various methods). Given a list of sounds, compute a matrix of pairwise similarities (or distances). This is partially implemented in CorrSA (stand alone).
Edit text data in wav files to include accession and collecting information. Or at least write a DTD file for XML that stores this data. The Phraser program should write this file when it extracts phrases.
Allow restriction of frequencies in computing phrases (done?)
Get streams working both to process large files and to record (with processing) from an input port or sound card. From a stream we could at least ask for a scrolling power display, to extract phrases, and to record to a file.
Implement a command line interface that lets Dora be called from scripts. Options might include p=powerplot, s=sonogram, a=axes cp=compute phrases sp=show phrases rf=restrict frequencies fl=freq. low fh=freq high , settings for Block and Step parameters. In part, the Sonogram class does this.