Question about P3Speller Classification

Forum for software developers to discuss BCI2000 software development
Post Reply
victor_uva
Posts: 9
Joined: 13 Jan 2014, 07:13

Question about P3Speller Classification

Post by victor_uva » 22 Jul 2016, 04:39

Hi all,

I have always had a big doubt about the methodology used for ERP classification in BCI2000, specifically for P3Speller tasks, but now it is really important to me know how it works.

When I accessed to "Association.cpp" (MostLikelyTarget function) and checked how the system determines what the selected target is, I realized that it computes a kind of scores that determine the probability of selecting each matrix item. For example, for a 5x3 matrix, I would get something like this (for a certain selected character):
i->first i->second
Item 1: -53.3382
Item 2: 850.717
Item 3: -313.426
Item 4: 45.4439
Item 5: 170.787
Item 6: 305.532
Item 7: 529.658
Item 8: -61.687
Item 9: 366.503
Item 10: 375.406
Item 11: -61.6867
Item 12: 1055.33 <- This would be the selected one
Item 13: 626.591
Item 14: 734.276
Item 15: -420.557
Thus, the selected item would be the one that provides the maximum score of all, in this case, the 12th item. But, what are those scores? how are they computed? I guess it needs to have some relationship with the "log-likelihood ratio" that is mentioned here: http://www.bci2000.org/wiki/index.php/P ... Task_Class

According to what I know, P3Speller uses SWLDA for classificating each stimulus. In order to do that, the system sets a window, typically between 0 and 800 ms prior to each intensification, and downsamples the stimuli signals to a decimating frequency. For illustration purposes, let's consider that our final stimuli signals are composed of 16 samples x 8 channels. Considering that we used 15 sequences and our matrix is composed of 8 rows and colums (5x3 size), we have a 120 stimuli (8 rows&cols x 15 sequences) x 128 features (16 samples x 8 channels) matrix of data for one single character.

What is the methodology used in this system? I would think that the matrix is fed to an SWLDA classifier. Obviously, the classifier would have been previously trained for computing its own weight vector and for selecting up to 60 features instead of 128. In that case, SWLDA would return a 120x1 vector of scores that indicates the euclidean distance of every stimulus of belonging to class 1 (presence of ERP). But, if that is the case, how the system turns the 120x1 LDA score vector into the final 15x1 log-likelihood score?

An intuitive approach would be accumulating the evidence of the repeated stimuli over sequences. So, summing all the stimuli LDA scores that belongs to the same row and column would return a 8x1 vector containing the likelihood of ERP presence in each row&col... But, how can turn those 8x1 vector into a 15x1 vector that details the likelihood of each character? Is it just a combinaton of every row and column scores? For instance,

Image

I think that it is more complicated than that. Considering 15 sequences and LDA score values ranging from -1 to 1, the final item score value range of this approach would be [-30,30], and I am getting values around 1000...

Thanks in advance,
Víctor.

pbrunner
Posts: 344
Joined: 17 Sep 2010, 12:43

Re: Question about P3Speller Classification

Post by pbrunner » 02 Aug 2016, 15:43

Victor,

there is a simple explanation for the LDA scores that you see. When you analyze the data in the P300_GUI or the P300Classifier apps you actually decimate the data, e.g., to 20 Hz. This is to reduce the dimensionality of the feature space, e.g., if you are looking at at the 0 to 800 ms period you only have 16 features rather than 160 if you would sample with 200 Hz. If you look at the linear classifier weights in BCI2000 you will see however, that the weights are repeated several times, which basically is a poor man's way of downsampling. What was forgotten when applying this is to divide each weight by the decimation factor (i.e., the number of times it is repeated). For that reason you see these large LDA scores. Effectively this does not change anything in the selection of the character as the row and column with the largest LDA score is selected irrespective of the absolute value.

I have not checked that but it could also be that the LDA scores are summed rather than averaged. Like before this does not affect the selection of the character. You can easily correct this behavior by changing the code in the P300Classifier or P300_GUI to write MUD matrix with weights that are scaled by 1/decimation factor and possibly by averaging rather than summing the LDA scores across the repetitions.

Let me know if this answers your question.

Regards, Peter

Post Reply

Who is online

Users browsing this forum: No registered users and 4 guests