Why do I have to duplicate the labels used in classification process?

Forum for discussion on different signal processing algorithms
Post Reply
compubaby
Posts: 3
Joined: 07 Aug 2014, 16:23

Why do I have to duplicate the labels used in classification process?

Post by compubaby » 12 Aug 2015, 05:43

Dear All,

I'm using the BCI Competition dataset IV 2a. I'm working on Left & right hand movement. I extracted data of channels (Fz,Cz,Pz).
I filtered data [8-30], then normalized it.
I calculated the common spatial patterns of the training dataset using the CSP.m of the Biosig toolbox
I want to use the "train_sc" to classify my data.

My question is why do I need to duplicate the labels of each event?
ex:
for each right hand event (750 samples) I must have 750 label = 1
for each left hand event (750 samples) I must have 750 label = 2

labels = repmat(classlabel’,n,1)’; %where n is the number of samples/event
CC = train_sc(features,labels,'LDA');

Does that mean that classification is done on samples rather than events?
Is each sample classified independently?!!!

The results I got are very disappointing. I've tried many classifiers but the best accuracy I got was 0.5
I need to know what I'm doing wrong.

I've read too much on CSP & classification of the EEG signals, but still I feel that I'm missing many things :(
Any help please.

Best Regards,
Sahar Selim

pbrunner
Posts: 344
Joined: 17 Sep 2010, 12:43

Re: Why do I have to duplicate the labels used in classification process?

Post by pbrunner » 12 Aug 2015, 09:43

Compubaby,

your question may be too specific to be answered directly, without debugging your code.

From your description I see that you are treating the dataset as a black box with input and output variables. That makes it difficult to tell what goes wrong in your procedure. To overcome this, you could add a few steps few steps to your procedure in which you extract and visualize the physiological effect. In other words, you would use the EEG lab topoplot to visualize the statistical difference (e.g., r, r2 or z-score) between the two conditions. You then calculate and visualize your CSP filter. Now that you know what to expect for the CSP filter it will be easier to judge whether this works properly.

Regards, Peter

compubaby
Posts: 3
Joined: 07 Aug 2014, 16:23

Re: Why do I have to duplicate the labels used in classification process?

Post by compubaby » 12 Aug 2015, 09:56

Hello Peter

This is my code

Code: Select all

% This dataset is filtered [8-30] & normalized
clear;
load 'G:\Matlab\sample\2015-06-12\20150708_FilteredNormalized';
nch = size(normTrain,1);
nof = 3;
npt = 750;

% data for all epochs of the first class concatenated C1 
% all epochs of the second class concatenated C2
% each array is [#samples x #channels]
%Concatenate epochs of class 1 together along channels

%Class 1
A1 = normTrain(:, 1:72, :);%Extract data of Class1 (Right Hand)
B1 = permute(A1,[1 3 2]);
C1 = reshape(B1,nch,[],1);
C1 = permute(C1,[2 1]);

%Class 2
A2 = normTrain(:, 73:144, :);%Extract data of Class2 (Left Hand)
B2 = permute(A2,[1 3 2]);
C2 = reshape(B2,nch,[],1);
C2 = permute(C2,[2 1]);

%Labels
classlabels = repmat(sortedTrainLabel(1:144)', npt, 1)';
Trainlabels = reshape(classlabels,[],1);

%Test Data
T1 = normTest(:, 1:144, :);
T2 = permute(T1,[1 3 2]);
Test = reshape(T2,nch,[],1);
Test = permute(Test,[2 1]);

%Labels
classlabels = repmat(sortedTestLabel(1:144)', npt, 1)';
Testlabels = reshape(classlabels,[],1);

%Calculate CSP for Feature Extraction
%[V] = csp(ECM, 'CSP3');

[V] = csp(C1, C2);

%^^^^^^^^^^^^^^   Biosig Classification ^^^^^^^^^^^^
%*****************************************
%                 Training
%*****************************************

X1 = C1*V;
X2 = C2*V;
features = [X1;X2];


MODE = 'LDA'; % see full list in train _ sc.m
CC = train_sc(features,Trainlabels,MODE);

%*****************************************
%                 Testing
%*****************************************

features1 = Test*V;
[R] = test_sc(CC,features1,MODE,Testlabels);

pbrunner
Posts: 344
Joined: 17 Sep 2010, 12:43

Re: Why do I have to duplicate the labels used in classification process?

Post by pbrunner » 14 Aug 2015, 10:54

Compubaby,

it is difficult to diagnose the problem of your code from the distance. What I would suggest though is, that you verify your code step by step using synthetic data. For this you can generate an 2-dimensional dataset with two gaussian distributions, one for each class. Create it such, that no threshold on the x or y axis can separate the data classes. In other words, generate two skewed gaussian distributions that are offset along the x-y diagonal in your plot. The CSP should then transform this into an orthogonal feature space that is separable by a simple x or y threshold. Next you can verify your classification, and cross-validation, buy simply increasing the overlap between the two classes.

Regards, Peter

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest