2.1 Auditory Paradigm
In our design of EEG based BCI for person with LIS, we use auditory based P300 “speller”, where the user responds to 5 different words. We have 2 types of questions: “yes/no” questions, and questions with names of the persons, cities, etc. In “yes/no” questions, “yes” or “no” could be desired (“target”) answer, while 3 more “dummy” (“non-target”) words are added in order to resemble oddball paradigm. The words were chosen with an intention to keep the duration of the words short and similar. In questions with names, one name is desired or the “target” name, while the others are “non-target” names. More than two answers were used since it was established that P300 amplitude is inversely related to the relative probability of the evoking stimulus, and directly related to its task relevance (Kachenoura
et al.,
2008).
During the experiment the possible answers or stimuli were normalized and played at a moderate sound level, with an inter stimulus interval of 1 second, and the stimuli were randomly shuffled in a segment. This inter stimulus interval is much longer than the interval usually used in visual experiments (around 300 ms). This is an obvious disadvantage of auditory paradigm, since long inter stimulus interval increases the time needed to obtain the answer. However, in the case of persons with LIS, the primary scope of the communicator is the establishment of elementary communication or even the determination of the state of consciousness, while the speed of communication is of secondary importance. Total time for reproducing answers for one question was around 3 minutes.
“Yes/no” questions are obviously the simplest ones, providing for binary information only. Questions with names are added for two reasons. First, it allows the multiple choice (e.g. questions like “Which part of your body hurts” or “What would you like to drink/eat?” could be asked in the future). Secondly, in cases when it is necessary to establish the state of consciousness of the person, familiar names could elicit response more easily than non-familiar names (Schnakers
et al.,
2008) and therefore they can serve in the initial phase of using the communicator by the person with complete LIS. Also, several types of attention enhancement were tried. In Schnakers
et al. (
2008) it was established that in cases when persons were instructed to mentally count the number of occurrence of “target”, the P300 response was more pronounced with respect to cases where subjects were instructed to give a “mental counting”. In order to test other options, finger counting, tapping and mental tapping have been tested as well. In finger counting type of response the subject was asked to finger count the correct answers, and in tapping the subject was supposed to press the key (on regular PC keyboard), using only one finger, every time he heard the correct answer. In mental tapping type of response, the subject was asked to imagine him tapping the key on a keyboard. Although the options involving the movement of hand or fingers are not options for persons with LIS, testing was done in order to establish whether these types of response have any potential for improvement of P300 detection.
2.2 Hardware and Software
The EEG signals were obtained with Emotiv’s Epoc EEG headset. This EEG device was selected due to its low cost and mobility but also because of simple and fast electrodes placement suitable to everyday use, while maintaining the 10–20 international standards of electrode placement. The data recorded using the headset are of good EEG quality and comparable to the data obtained with research grade devices (Badcock
et al.,
2013). Nonetheless, the comparison between the results obtained with Emotiv Epoc and the research grade device has been conducted, and provided in Section
4.
Emotiv Epoc has 14 bit resolution, EEG data is internally sampled at 2048 Hz and then down sampled to 128 Hz, signal bandwidth is 0.2–43 Hz, and digital notch filters at 50 Hz and 60 Hz are used. The EEG headset consists of 14 gold plated electrodes which make direct contact with the detachable electrode tips. These electrode tips have foam which is soaked in saline solution and sits on the scalp, adapted to the scalp topology. The saline solution acts as a good conductive medium for small voltage fluctuations. The electrodes are pre-attached to the headset with fixed arms to point at the locations: AF3, F3, F7, FC5, T7, P7, O1, O2, P8, T8, FC6, F8, F4, AF4, Common Mode Sense (CMS) at the left mastoid M1 and Driven Right Leg (DRL) at right mastoid M2. Since some electrode positions typically used in P300 detection (e.g. Cz, electrodes in the parietal region) are not available, tests are conducted in order to verify whether the present configuration of electrodes could produce reliable results.
Data acquisition and processing is done in OpenVIBE software platform (Renard
et al.,
2010). The OpenVIBE’s acquisition client is used to obtain raw EEG data and the algorithm for EEG data processing was built using the OpenVIBE Designer. Apart from being a free open source software, OpenViBE allows us to run the classifier and the whole BCI software in real time as the data is getting acquired from the EEG headset. Data preprocessing includes additional filtering using a 4th order Butterworth band-pass filter with a lower cutoff frequency of 1 Hz and a higher cutoff frequency of 20 Hz, with a pass band ripple of 0.5 dB, and down sampling by a factor of 4. An epoch, a slice of 1 second of data after stimulus onset, is considered in following analyses. No artifact removal was done in this stage, although it can improve the overall performance of the system (Minguillon
et al.,
2017). Manual artifact removal was not done since the proposed communicator should work in everyday situations where data analysis experts are not available. Automatic artifact removal was not applied since all the data processing needs to be done in real time. Also, answers are repeated 30 times in order to be less affected by discarding some of the segments due to artefacts.
Linear Discriminant Analysis (LDA) classifier (Hoffmann
et al.,
2008) is used to correctly classify responses of the user. LDA classifier computes a discriminant vector that separates the “target” and “non-target” classes. LDA classifier is simple to use because it is nonparametric. Also, it was established that LDA classifier gives good results for P300 classification (Krusienski
et al.,
2008).
However, this BCI communicator uses auditory paradigm which is less reliable than visual speller paradigm (Lopez-Gordo
et al.,
2012; Klobassa
et al.,
2009) and the used headset has a reduced number of electrodes. Therefore, in order to obtain an acceptable accuracy of classification, it is needed to further enhance and preprocess the data prior to the classification. There are numerous contributions dealing with spatial filtering (Rivet
et al.,
2009; Rivet and Souloumiac,
2013) and blind source separation (Wolpaw
et al.,
2002; Albera
et al.,
2008; Xu
et al.,
2004; Bayliss and Ballard,
1999; Hill
et al.,
2004; Wang and James.,
2006; James and Hesse,
2005) for P300 speller designs, and in this case spatial filtering based on xDAWN algorithm was used (Rivet and Souloumiac,
2013). xDAWN algorithm is specially designed for P300 speller paradigm.
Only the basic idea behind the algorithm is explained, while more detailed explanations can be found in Rivet
et al. (
2009), Rivet and Souloumiac (
2013). The basic idea is to automatically estimate P300 subspace from raw EEG signals. P300 evoked potentials are then enhanced by projecting raw EEG on the estimated P300 subspace. The algorithm derives its name from the following model for the recorded data
X:
where
$X\in {\mathcal{R}^{({N_{t}}\times {N_{s}})}}$ is the matrix of recorded EEG signals,
${N_{s}}$ is the number of sensors,
${N_{t}}$ is the number of temporal samples,
$D\in {\mathcal{R}^{({N_{t}}\times {N_{e}})}}$ is the Toeplitz matrix defined from the set of stimuli onsets and estimated durations of the ERP,
${N_{e}}$ is the number of temporal samples of the ERP (1 s in our case), and matrix
N is the on-going activity of the user’s brain as well as artifacts.
DA represents the synchronous response with target stimuli. The least square estimation of response
A,
$\hat{A}={({D^{T}}D)^{(-1)}}{D^{T}}X$, is different than classical epoching of matrix
X when synchronous response
A extents over several consecutive stimuli. That is generally the case for visual P300 spellers, but not in our case, where inter stimulus interval is 1 s, which excludes the possibility of overlapping responses. xDAWN algorithm estimates
${N_{f}}$ spatial filters (with
${N_{f}}<{N_{s}}$) in order to enhance the synchronous response:
where
$U\in {\mathcal{R}^{({N_{s}}+{N_{f}})}}$ is the spatial filter matrix. The idea is similar to principal component analysis (PCA) of
$\hat{A}$, where recorded signals are projected on the
${N_{f}}$ main components associated with the
${N_{f}}$ largest principal values. In xDAWN algorithm, the spatial filters
U are designed in order to maximize signal to signal plus noise ratio. Finally, the model (
1) can be rewritten as:
where
${A^{\prime }}$ is synchronous response of reduced dimension,
W is its spatial distribution over sensors, and
${N^{\prime }}$ is the noise term. The enhanced signals are then computed by:
${N^{\prime\prime }}$ is the matrix that includes noise and artifacts, transformed by spatial filtering. Spatial filters can be computed by two QR factorizations and one singular value decomposition, or by using generalized eigenvalue decomposition of pair of covariance matrices (Rivet and Souloumiac,
2013).