Difference between revisions of "Contributions:AudioExtension"

From BCI2000 Wiki
Jump to: navigation, search
(Block Diagram)
(Added AudioExtension Parameters to documentation)
Line 36: Line 36:
  
 
==Parameters==
 
==Parameters==
The eyetracker is configured in the Source tab within the EyetrackerLogger section.  The configurable parameters are:
+
The AudioExtension is configured in the Source tab within the AudioExtension section.  The configurable parameters are:
  
*<code>LogEyetracker</code>  - Enables/Disables logging of Eyetracker states
+
*<code>EnableAudioExtension</code>  - Enables/Disables the AudioExtension.  This parameter performs double-duty as an audio host API selector.  The following values of this parameter are valid.  NOTE: Not all audio APIs are available on all platforms.
*<code>NetworkLocation</code> - The network address of the Eyetracker given by the Tobii Eyetracker Browser
+
**[0] - Disabled
*<code>Port</code>  - The port that the Tobii communicates over - Tobii default is 4455
+
**[1] - DirectSound
*<code>LogGazeData</code> - Enables/Disables logging of gaze data
+
**[2] - MME
*<code>LogEyePos</code> - Enables/Disables logging of eye position (as seen from the camera)
+
**[3] - ASIO
*<code>LogPupilSize</code> - Enables/Disables logging of pupil size (very rough)
+
**[4] - SoundManager
 +
**[5] - CoreAudio
 +
**[6] - Disabled
 +
**[7] - OSS
 +
**[8] - ALSA
 +
**[9] - AL
 +
**[10] - BeOs
 +
**[11] - WDMKS
 +
**[12] - JACK
 +
**[13] - WASAPI
 +
**[14] - AudioScienceHPI
 +
 
 +
*<code>AudioMixer</code> - This matrix of expressions mixes input (rows) to output(columns).  It must be dimensioned with exactly <code>n</code> columns where <code>n</code> is the number of outputs.  Row labels define the input source.  Change row labels by double clicking on the row.  The following inputs are valid row labels.
 +
**<code>X</code> - This is automatically interpreted as INPUT[X]
 +
**<code>INPUT[X]</code> - This input will come from channel X on the sound card input.
 +
**<code>FILE[X]</code> - This input will come from channel X in the AudioInputFile.
 +
**<code>TONE[X]</code> - This input will be a synthesized sine wave with the frequency of X Hz.
 +
**<code>NOISE[X]</code> - This input will be generated white noise at X Hz.  NOTE: NOISE[] is white noise at the audio sampling rate (which defaults to 44100)
 +
*<code>AudioInputDevice</code>  - The index for the device to use as the audio input device on the current Host API.  See the operator log after "Set Config" for valid device indices on the selected host API.  A value of -1 for this parameter selects the default input device on this host API.
 +
*<code>AudioOutputDevice</code>  - The index for the device to use as the audio input device on the current Host API.  See the operator log after "Set Config" for valid device indices on the selected host API.  A value of -1 for this parameter selects the default output device on this host API.
 +
*<code>AudioInputFile</code> - Audio file to use as audio input to AudioMixer.  The selected file can have any non-zero number of channels and be encoded in almost any format (except MP3), but MUST be encoded at 44100 Hz.
 +
*<code>AudioRecordInput</code> - Enables/Disables recording of audio data to a file in the DataDirectory.
 +
*<code>AudioRecordOutput</code> - Enables/Disables recording of audio data to a file in the DataDirectory.
 +
*<code>AudioRecordingFormat</code> - Changes the file format and encoding options of the recorded output files.  This parameter has the following three options:
 +
**Raw - Records to 16 bit Microsoft formatted WAV files with no compression.  These files open directly in MATLAB if that's interesting to you.
 +
**Lossless - Records to FLAC formatted files.  These files are slightly smaller than RAW files, but have no quality loss.
 +
**Lossy - Records to Ogg Vorbis files.  These files are similar to MP3 but do not have the associated licensing issues.  They are compressed using a lossy algorithm, so the resulting files are very small but sound slightly worse than lossless encoding.  This format is good for long recordings where perfect quality is not necessary.
 +
*<code>Audio[Input/Output]Filterbank</code> - A filterbank which filters audio input and output before rectification/smoothing for envelope extraction.  These butterworth filters will not be applied to the audible signal.  The format of the filter bank is as follows:
 +
**Type - The characteristic of the filter.  The following values are valid.
 +
***Lowpass - Creates a low pass filter
 +
***Highpass - Creates a high pass filter
 +
***Bandpass - Creates a band pass filter *See known issues*
 +
***Bandstop - Creates a band stop, or notch filter
 +
**Order - The order of the filter model.  Higher order filters are more accurate but more expensive computationally.
 +
**Cutoff1 - The cutoff frequency for Lowpass and Highpass filters, and the cut-on frequency for Bandpass and Bandstop filters.
 +
**Cutoff2 - The cut-off frequency for Bandpass and Bandstop filters.
 +
The matrix can have as many rows as necessary to filter the signal.  Filters can be applied in any order and their transfer functions are multiplied before filtering occurs.
 
*<code>LogEyeDist</code> - Enables/Disables logging of the distance from the screen to the eyes (again, rough)
 
*<code>LogEyeDist</code> - Enables/Disables logging of the distance from the screen to the eyes (again, rough)
*<code>GazeScale</code> - Scales the incoming gaze data first
+
*<code>AudioEnvelopeSmoothing</code> - The cutoff frequency for the low pass filter which is applied to the filtered and full-wave rectified audio dataThis should be set to the highest frequency you want to see in the resulting audio envelope.
*<code>GazeOffset</code> - Offsets the incoming gaze data after scaling
 
 
 
Note: GazeScale and GazeOffset are quick hacks to address an issue with gaze data being clamped around the edges of the screen.  The eyetracker gives back values which are between 0.0 and 1.0 for onscreen gaze but supports looking slightly offscreen by allowing gaze data returned to go above 1.0 and below 0.0.  BCI2000 needs this scaled between 0.0 and 1.0 before the gaze data is multiplied by 65535 for storage in the 16 bit state.  These two parameters account for this scaling and offset and prevents the clamping from happening as often as it would otherwiseThese parameters will be removed once BCI2000 supports typed states.
 
 
 
The following code retreives the actual ~(0.0-1.0) range that the eyetracker outputs directly (assuming you've scaled and offset the signal to avoid clipping) from each eye and averages it to find a gaze position.
 
<pre>
 
float x = State( "EyetrackerLeftEyeGazeX" ) + State( "EyetrackerRightEyeGazeX" ); x /= ( 2.0f * 65535.0f );
 
float y = State( "EyetrackerLeftEyeGazeY" ) + State( "EyetrackerRightEyeGazeY" ); y /= ( 2.0f * 65535.0f );
 
x -= ( float )Parameter( "GazeOffset" ); x /= ( float )Parameter( "GazeScale" );
 
y -= ( float )Parameter( "GazeOffset" ); y /= ( float )Parameter( "GazeScale" );
 
</pre>
 
  
 
==State Variables==
 
==State Variables==

Revision as of 20:50, 11 June 2012

Synopsis

An environment extension which manages multichannel, low latency audio I/O.

Location

http://www.bci2000.org/svn/trunk/src/contrib/Extensions/AudioExtension

Versioning

Authors

Griffin Milsap (griffin.milsap@gmail.com)

Version History

06/11/2012: Initial public release;

Source Code Revisions

  • Initial development: 4095
  • Tested under: 4095
  • Known to compile under: 4095
  • Broken since: --

Todo

  • Fix Known Issues
  • Add per-sample resolution to envelopes

Known Issues

  • Leaving the module running for long periods of time in halted state causes a long time of no state logging before signal goes to realtime. Seems to be unrelated to how long system was left running (~12-15 seconds) -- Not sure if this is an issue with the extension itself, or an issue with the bcievent interface.
  • Bandpass filtering in filterbanks doesn't appear to function

Functional Description

Experiments which require audio input or real-time audio synthesis based on system state are now possible with the AudioExtension. This extension is capable of recording multiple channels of audio input, synthesizing tones or noise, and reading encoded audio files. These channels are input to a mixing matrix which mixes those inputs to multiple channels of audio output. Both input and output are run through a simple filterbank, then they have their envelope extracted and logged into states via the bcievent interface. Audio input and output channels can be recorded into audio files losslessly and can be resynchronized offline. The mixing matrix is a matrix of expressions which can be used to dynamically change audio mixing based on the system state.

Integration into BCI2000

Compile the extension into your source module by enabling contributed extensions in your CMake configuration. You can do this by going into your root build folder and deleting CMakeCache.txt and re-running the project batch file, or by running cmake -i and enabling BUILD_AUDIOEXTENSION. Once the extension is built into the source module, enable it by starting the source module with the --EnableAudioExtension=1 command line argument.

Block Diagram

AudioExtensionBlockDiagram.png

Parameters

The AudioExtension is configured in the Source tab within the AudioExtension section. The configurable parameters are:

  • EnableAudioExtension - Enables/Disables the AudioExtension. This parameter performs double-duty as an audio host API selector. The following values of this parameter are valid. NOTE: Not all audio APIs are available on all platforms.
    • [0] - Disabled
    • [1] - DirectSound
    • [2] - MME
    • [3] - ASIO
    • [4] - SoundManager
    • [5] - CoreAudio
    • [6] - Disabled
    • [7] - OSS
    • [8] - ALSA
    • [9] - AL
    • [10] - BeOs
    • [11] - WDMKS
    • [12] - JACK
    • [13] - WASAPI
    • [14] - AudioScienceHPI
  • AudioMixer - This matrix of expressions mixes input (rows) to output(columns). It must be dimensioned with exactly n columns where n is the number of outputs. Row labels define the input source. Change row labels by double clicking on the row. The following inputs are valid row labels.
    • X - This is automatically interpreted as INPUT[X]
    • INPUT[X] - This input will come from channel X on the sound card input.
    • FILE[X] - This input will come from channel X in the AudioInputFile.
    • TONE[X] - This input will be a synthesized sine wave with the frequency of X Hz.
    • NOISE[X] - This input will be generated white noise at X Hz. NOTE: NOISE[] is white noise at the audio sampling rate (which defaults to 44100)
  • AudioInputDevice - The index for the device to use as the audio input device on the current Host API. See the operator log after "Set Config" for valid device indices on the selected host API. A value of -1 for this parameter selects the default input device on this host API.
  • AudioOutputDevice - The index for the device to use as the audio input device on the current Host API. See the operator log after "Set Config" for valid device indices on the selected host API. A value of -1 for this parameter selects the default output device on this host API.
  • AudioInputFile - Audio file to use as audio input to AudioMixer. The selected file can have any non-zero number of channels and be encoded in almost any format (except MP3), but MUST be encoded at 44100 Hz.
  • AudioRecordInput - Enables/Disables recording of audio data to a file in the DataDirectory.
  • AudioRecordOutput - Enables/Disables recording of audio data to a file in the DataDirectory.
  • AudioRecordingFormat - Changes the file format and encoding options of the recorded output files. This parameter has the following three options:
    • Raw - Records to 16 bit Microsoft formatted WAV files with no compression. These files open directly in MATLAB if that's interesting to you.
    • Lossless - Records to FLAC formatted files. These files are slightly smaller than RAW files, but have no quality loss.
    • Lossy - Records to Ogg Vorbis files. These files are similar to MP3 but do not have the associated licensing issues. They are compressed using a lossy algorithm, so the resulting files are very small but sound slightly worse than lossless encoding. This format is good for long recordings where perfect quality is not necessary.
  • Audio[Input/Output]Filterbank - A filterbank which filters audio input and output before rectification/smoothing for envelope extraction. These butterworth filters will not be applied to the audible signal. The format of the filter bank is as follows:
    • Type - The characteristic of the filter. The following values are valid.
      • Lowpass - Creates a low pass filter
      • Highpass - Creates a high pass filter
      • Bandpass - Creates a band pass filter *See known issues*
      • Bandstop - Creates a band stop, or notch filter
    • Order - The order of the filter model. Higher order filters are more accurate but more expensive computationally.
    • Cutoff1 - The cutoff frequency for Lowpass and Highpass filters, and the cut-on frequency for Bandpass and Bandstop filters.
    • Cutoff2 - The cut-off frequency for Bandpass and Bandstop filters.

The matrix can have as many rows as necessary to filter the signal. Filters can be applied in any order and their transfer functions are multiplied before filtering occurs.

  • LogEyeDist - Enables/Disables logging of the distance from the screen to the eyes (again, rough)
  • AudioEnvelopeSmoothing - The cutoff frequency for the low pass filter which is applied to the filtered and full-wave rectified audio data. This should be set to the highest frequency you want to see in the resulting audio envelope.

State Variables

Unless otherwise specified, all states are prefixed with Eyetracker<Left/Right>Eye which corresponds with each individual eye. The EyetrackerLogger extension does not support subjects with more than two eyes at the moment.

GazeX, GazeY

The eye gaze position (where - on the screen - the subject is looking) is returned from the Tobii SDK as 32 bit floating point numbers which (roughly) range from 0.0 to 1.0. They are multiplied by 65535 and stored as 16 bit integers in these states if the LogGazeData parameter is enabled. (0,0) corresponds to the top left of the screen, (65535,65535) corresponds to the right bottom of the screen. -- See EyetrackerStatesOK.

PosX, PosY

The eye position relative to the camera in 2D space is returned if LogEyePos is enabled. Again, these are returned from the library as floating point numbers from 0.0 to 1.0 and are scaled to 16 bit integer values from 0 to 65535. (0,0) corresponds to the top left of the camera's view, and (65535,65535) corresponds to the bottom right of the camera's view.

PupilSize

The pupil size in mm is saved in this state if LogPupilSize is enabled. It corresponds to the length of the longest chord drawn from one side of the pupil to the other. The size will change depending on the eye position and distance from the screen. Although it is given in mm, it would be best to use this as a relative measurement.

EyeDist

The distance between the screen and the eyes in mm is saved in this state if LogEyeDist is enabled. This measurement is an approximation. The actual measurement will depend on whether or not the test subject is wearing glasses or not.

EyeValidity

This state is a number from 0 to 4 and is documented in the Tobii SDK manual. It is repeated here for convenience.

  • 0 - The eye tracker is certain that the data for this eye is right. There is no risk of confusing data from the other eye.
  • 1 - The eye tracker has only recorded one eye and made some assumptions and estimations regarding which is the left and which is the right eye. However, it is still very likely that the assumption made is correct. The validity code for the other eye is in this case always set to 3.
  • 2 - The eye tracker has only recorded one eye, and has no way of determining which one is the left eye and which one is the right eye. The validity code for both eyes is set to 2.
  • 3 - The eye tracker is fairly confident that the actual gaze data belongs to the other eye. The other eye will always have validity code 1.
  • 4 - The actual gaze data is missing or definitely belonging to the other eye.
Code (Right - Left) Description
0 - 0 Both eyes found. Data is valid for both eyes.
0 - 4 or 4 - 0 One eye found. Gaze data is the same for both eyes.
1 - 3 or 3 - 1 One eye found. Gaze data is the same for both eyes.
2 - 2 One eye found. Gaze data is the same for both eyes.
4 - 4 No eye found. Gaze data for both eyes are invalid.

It'd probably be wise to remove all data points with a validity state of 2 or higher while running your analysis.

EyetrackerStatesOK

Early versions of the extension didn't take into account that the library may return a number greater than 1.0 or less than 0.0. This resulted in "pac-man" style wrap around of gaze coordinates in 2.0 and crashes in 3.0. If the output from the library is out of bounds, it is clamped to the boundaries and the "EyetrackerStatesOK" parameter is changed. A value of "1" corresponds to valid gaze data, a value of "0" corresponds to invalid "clamped" gaze data. Use the "GazeOffset" and "GazeScale" parameters to avoid clamping. Those parameters scale and offset the data so that when it does go out of range, it can still be fit into the 16 bit state.

See also

User Reference:Logging Input, Contributions:Extensions