Brain Computer Interface Based Communicator for Persons in Locked-in State

. Brain Computer Interfaces (BCI) are devices that use brain signals for control or communication. Since they don’t require movement of any part of the body, BCI are the natural choice for assisted communication when a person is unable to move. In this article, BCI based communicator for persons in locked-in state is described. It is based on P300 brain response of the user, thus does not require prior training, movement or imagination of movement. Auditory paradigm is selected in order to apply the communicator in cases where visual ability is also impaired. The communicator was designed to prove also whether low cost hardware with reduced electrode set could be used eﬃciently in everyday environment, without the need for expert personnel. The design of the communicator is described ﬁrst, followed by detailed analyses of the performance when used by either healthy or disabled subjects. It is shown that auditory paradigm is the primary factor that limits the accuracy of communication. Hardware characteristics and reduced electrode set inﬂuence the accuracy in a negative way as well, while diﬀerent questions and answer types produce no major diﬀerences.


Introduction
Locked-in syndrome (LIS) is the state in which a person can not consciously control his or her own body, nor it can communicate in any way with others (Wolpaw et al., 2002;Birbaumer and Cohen, 2007). This state could be the consequence of some progressive disease, such as Amyotrophic lateral sclerosis (ALS), or the consequence of a stroke or a brain injury due to an accident. For some of these patients it is hard to establish the state of consciousness because they are unable to show to others that they are conscious and that they understand what others are saying. Therefore, the rates of misdiagnosis, where such patients' states have been confused with either a coma or a semi-coma, are very high (Schnakers et al., 2008;Guger et al., 2018). In these states, complete immobility is a big problem, but an even bigger problem is the inability to communicate with others (Sellers and Donchin, 2006).
Brain Computer Interfaces (BCI) use neurological signals originating in the brain to control external devices or computers (Wolpaw et al., 2002;Birbaumer and Cohen, 2007) and therefore are the natural choice for assisted communication when a person is unable to move. Neurological electrical signals can be recorded in different ways: from the scalp using electroencephalography (EEG), from the dura mater or cortical surface using electrocorticography (ECoG). Other types of signals and recording devices, such as magnetoencephalography (MEG), positron emission tomography (PET), functional magnetic resonance imaging (fMRI), electromiogram (EMG), and optical imaging such as near infrared spectroscopy (NIRS), are also used. The approaches using EMG have the advantage of using rather small equipment, while at the same time, being very sensitive and able to detect the tiniest muscle movements. In Lesenfants et al. (2016) the authors proposed an EMG-based method which enabled them to assess the consciousness of a patient. However, if no voluntary motor control is possible, then an EMG-based approach cannot be used. For the patients with preserved eye movements, often a very successful approach in establishing communication is to use eye tracking devices . In recent years eye tracking devices have become increasingly advanced and allow the system to precisely access which part of the computer screen the person is looking at. These devices, in combination with augmentative and alternative communication software (AAC), such as Tobii Dynavox Communicator 5 or Grid 3, 2 enable a person to communicate efficiently by using their eyes only. EEG is a primary choice for the BCIs for persons with LIS, to be used at home or on a daily basis, due to its non-invasive nature and reduced dimensions, among all the above mentioned technologies.
Researches of BCIs based on EEG signals are primarily using slow cortical potentials (SCPs) (Birbaumer et al., 2000), periodic signals in 8-30 Hz frequency band -mu rhythm or sensorimotor rhythm (SMR) (Kevric and Subasi, 2017;Nguyen et al., 2015) from sensorimotor cortex generated before movement or during imagination of movement, and event-related brain potentials (ERPs), primarily P300 (Birbaumer and Cohen, 2007;Shen et al., 2015;Sellers et al., 2014). P300 is a positive change in EEG signals originating over the parietal cortex of the brain, and occurs around 300 ms after the expected event or stimulus takes place. P300 is the optimal choice for EEG based BCIs for persons with LIS since it does not require neither training nor imagination of movement. The latter feature is important since it is probable that EEG signals related to movement deteriorate and vanish during a prolonged immobility (Birbaumer and Cohen, 2007). Farwell and Donchin (1988) first used "oddball" paradigm to elicit the P300 response: an "oddball" is a relatively rare but "targeted" stimulus presented within a series of frequently occurring "non-target" stimuli (Sellers and Donchin, 2006;Sellers et al., 2014;Farwell and Donchin, 1988). The most frequently used experimental set-up is a 6 × 6 matrix of letters where rows and columns are randomly illuminated. The user focuses on a (desired) letter and when the row or column with the desired letter is illuminated, P300 response occurs. Various designs exist, and recently a four-choice display was proposed, where the four letters flash and the participant attends to the letter corresponding to the answer to the question: "Y" (yes), "N" (no), "P" (pass), or "E" (end) (Sellers and Donchin, 2006;Sellers et al., 2014). In this type of visual P300 based systems, the user still must be able to control eye movements and focus. When the visual system is not preserved, auditory (Hill et al., 2005), vibro-tactile (Guger et al., 2018), somatosensory P300 based systems and even hybrid systems (Ortner et al., 2017) could be used. The latter systems are slower and their effectiveness has not been thoroughly tested (Birbaumer and Cohen, 2007;Halder et al., 2010).
In this paper, a BCI based communicator for persons with LIS is described. It is an auditory based P300 design and therefore it can be used in most severe cases of LIS or complete LIS. The design of the communicator is described in the next section, while analyses of system performance is provided in Section 3. Finally, conclusions are given, along with directions of future research and improvements.

Auditory Paradigm
In our design of EEG based BCI for person with LIS, we use auditory based P300 "speller", where the user responds to 5 different words. We have 2 types of questions: "yes/no" questions, and questions with names of the persons, cities, etc. In "yes/no" questions, "yes" or "no" could be desired ("target") answer, while 3 more "dummy" ("nontarget") words are added in order to resemble oddball paradigm. The words were chosen with an intention to keep the duration of the words short and similar. In questions with names, one name is desired or the "target" name, while the others are "non-target" names. More than two answers were used since it was established that P300 amplitude is inversely related to the relative probability of the evoking stimulus, and directly related to its task relevance (Kachenoura et al., 2008).
During the experiment the possible answers or stimuli were normalized and played at a moderate sound level, with an inter stimulus interval of 1 second, and the stimuli were randomly shuffled in a segment. This inter stimulus interval is much longer than the interval usually used in visual experiments (around 300 ms). This is an obvious disadvantage of auditory paradigm, since long inter stimulus interval increases the time needed to obtain the answer. However, in the case of persons with LIS, the primary scope of the communicator is the establishment of elementary communication or even the determination of the state of consciousness, while the speed of communication is of secondary importance. Total time for reproducing answers for one question was around 3 minutes.
"Yes/no" questions are obviously the simplest ones, providing for binary information only. Questions with names are added for two reasons. First, it allows the multiple choice (e.g. questions like "Which part of your body hurts" or "What would you like to drink/eat?" could be asked in the future). Secondly, in cases when it is necessary to establish the state of consciousness of the person, familiar names could elicit response more easily than non-familiar names (Schnakers et al., 2008) and therefore they can serve in the initial phase of using the communicator by the person with complete LIS. Also, several types of attention enhancement were tried. In Schnakers et al. (2008) it was established that in cases when persons were instructed to mentally count the number of occurrence of "target", the P300 response was more pronounced with respect to cases where subjects were instructed to give a "mental counting". In order to test other options, finger counting, tapping and mental tapping have been tested as well. In finger counting type of response the subject was asked to finger count the correct answers, and in tapping the subject was supposed to press the key (on regular PC keyboard), using only one finger, every time he heard the correct answer. In mental tapping type of response, the subject was asked to imagine him tapping the key on a keyboard. Although the options involving the movement of hand or fingers are not options for persons with LIS, testing was done in order to establish whether these types of response have any potential for improvement of P300 detection.

Hardware and Software
The EEG signals were obtained with Emotiv's Epoc 3 EEG headset. This EEG device was selected due to its low cost and mobility but also because of simple and fast electrodes placement suitable to everyday use, while maintaining the 10-20 international standards of electrode placement. The data recorded using the headset are of good EEG quality and comparable to the data obtained with research grade devices (Badcock et al., 2013). Nonetheless, the comparison between the results obtained with Emotiv Epoc and the research grade device has been conducted, and provided in Section 4. Emotiv Epoc has 14 bit resolution, EEG data is internally sampled at 2048 Hz and then down sampled to 128 Hz, signal bandwidth is 0.2-43 Hz, and digital notch filters at 50 Hz and 60 Hz are used. The EEG headset consists of 14 gold plated electrodes which make direct contact with the detachable electrode tips. These electrode tips have foam which is soaked in saline solution and sits on the scalp, adapted to the scalp topology. The saline solution acts as a good conductive medium for small voltage fluctuations. The electrodes are pre-attached to the headset with fixed arms to point at the locations: AF3, F3, F7, FC5, T7, P7, O1, O2, P8, T8, FC6, F8, F4, AF4, Common Mode Sense (CMS) at the left mastoid M1 and Driven Right Leg (DRL) at right mastoid M2. Since some electrode positions typically used in P300 detection (e.g. Cz, electrodes in the parietal region) are not available, tests are conducted in order to verify whether the present configuration of electrodes could produce reliable results.
Data acquisition and processing is done in OpenVIBE software platform (Renard et al., 2010). The OpenVIBE's acquisition client is used to obtain raw EEG data and the algorithm for EEG data processing was built using the OpenVIBE Designer. Apart from being a free open source software, OpenViBE allows us to run the classifier and the whole BCI software in real time as the data is getting acquired from the EEG headset. Data preprocessing includes additional filtering using a 4th order Butterworth band-pass filter with a lower cutoff frequency of 1 Hz and a higher cutoff frequency of 20 Hz, with a pass band ripple of 0.5 dB, and down sampling by a factor of 4. An epoch, a slice of 1 second of data after stimulus onset, is considered in following analyses. No artifact removal was done in this stage, although it can improve the overall performance of the system (Minguillon et al., 2017). Manual artifact removal was not done since the proposed communicator should work in everyday situations where data analysis experts are not available. Automatic artifact removal was not applied since all the data processing needs to be done in real time. Also, answers are repeated 30 times in order to be less affected by discarding some of the segments due to artefacts.
Linear Discriminant Analysis (LDA) classifier (Hoffmann et al., 2008) is used to correctly classify responses of the user. LDA classifier computes a discriminant vector that separates the "target" and "non-target" classes. LDA classifier is simple to use because it is nonparametric. Also, it was established that LDA classifier gives good results for P300 classification (Krusienski et al., 2008).
However, this BCI communicator uses auditory paradigm which is less reliable than visual speller paradigm (Lopez-Gordo et al., 2012;Klobassa et al., 2009) and the used headset has a reduced number of electrodes. Therefore, in order to obtain an acceptable accuracy of classification, it is needed to further enhance and preprocess the data prior to the classification. There are numerous contributions dealing with spatial filtering (Rivet et al., 2009;Rivet and Souloumiac, 2013) and blind source separation (Wolpaw et al., 2002;Albera et al., 2008;Xu et al., 2004;Bayliss and Ballard, 1999;Hill et al., 2004;Wang and James, 2006;James and Hesse, 2005) for P300 speller designs, and in this case spatial filtering based on xDAWN algorithm was used (Rivet and Souloumiac, 2013). xDAWN algorithm is specially designed for P300 speller paradigm.
Only the basic idea behind the algorithm is explained, while more detailed explanations can be found in Rivet et al. (2009), Rivet and Souloumiac (2013). The basic idea is to automatically estimate P300 subspace from raw EEG signals. P300 evoked potentials are then enhanced by projecting raw EEG on the estimated P300 subspace. The algorithm derives its name from the following model for the recorded data X: where X ∈ R (N t ×N s ) is the matrix of recorded EEG signals, N s is the number of sensors, N t is the number of temporal samples, D ∈ R (N t ×N e ) is the Toeplitz matrix defined from the set of stimuli onsets and estimated durations of the ERP, N e is the number of temporal samples of the ERP (1 s in our case), and matrix N is the on-going activity of the user's brain as well as artifacts. DA represents the synchronous response with target stimuli. The least square estimation of response A,Â = (D T D) (−1) D T X, is different than classical epoching of matrix X when synchronous response A extents over several consecutive stimuli. That is generally the case for visual P300 spellers, but not in our case, where inter stimulus interval is 1 s, which excludes the possibility of overlapping responses. xDAWN algorithm estimates N f spatial filters (with N f < N s ) in order to enhance the synchronous response: where U ∈ R (N s +N f ) is the spatial filter matrix. The idea is similar to principal component analysis (PCA) ofÂ, where recorded signals are projected on the N f main components associated with the N f largest principal values. In xDAWN algorithm, the spatial filters U are designed in order to maximize signal to signal plus noise ratio. Finally, the model (1) can be rewritten as: where A ′ is synchronous response of reduced dimension, W is its spatial distribution over sensors, and N ′ is the noise term. The enhanced signals are then computed by: N ′′ is the matrix that includes noise and artifacts, transformed by spatial filtering. Spatial filters can be computed by two QR factorizations and one singular value decomposition, or by using generalized eigenvalue decomposition of pair of covariance matrices (Rivet and Souloumiac, 2013).

Experimental Procedure
The subjects were briefed about the experiment and asked to keep the movements at minimum to avoid artifacts. They were asked to focus on a fixed point, to make sure that there are minimum eye movement artifacts. The experiment has a training phase and an operational phase. During the training phase, the subject was asked a question the answer of which was known a priori, and such data were used to train both the spatial filter and the classifier. The training phase was conducted only once at the beginning of the session. During the operational phase the responses were classified in real time. The subject was asked questions and the classifier would classify the data based on the subject's selection. At the end of each trail healthy subjects were asked to verify the answer, while the disabled subjects were asked only those questions that had as a priori known answers.
A series of 5 sessions was conducted with the same subject on different days, and each session comprised of 5 questions. Once a question was asked, the 5 answers were played by the system in the form of 30 random segments, in such a way that the last stimulus of the previous segment was never the same as the first stimulus of the next segment. This and shuffling of stimuli in the segment ensured that the subject was not able to predict the played stimuli. As there were 30 audio events for each kind of answers, there were a total of 150 stimuli per question which means 150 epochs were considered for classification.

Analyses of Performance
The system was tested on 39 participants, 7 females and 32 males, aged between 22 and 29. 25 participants were healthy subjects that did not suffer any past physical, neurological Table 1 Tests performed for each subject. Number of tests performed with different modes of attention "enhancers" (mental counting (mc), mental attention (ma), finger counting (fc), tapping (tt) and mental tapping (mt)), type of EEG device used and type of questions used are given. Conditions: Traumatic Brain Injury (TBI), Amyotrophic Lateral Sclerosis (ALS), Neuronal Ceroid Lipofuscinosis 2 (CLN2), Muscular Dystrophy (MD), Locked-In Syndrome (LIS).  or psychiatric disorder, while 14 (12 male and 2 female) participants had some form of disability, varying from highly disabled with no motoric control, to locked in (multiple sclerosis, ALS, traumatic brain injury, dystrophy). On average 6 trials (questions) were recorded per each subject (Table 1). Different test conditions will be explained later in text.

Subject
Disabled subjects had various types of motor impairments. Not all of them were in LIS, but our primary goal was to test proposed communicator in real world situations (in not ideal situations, e.g. subjects unable to control movements, not fully accessible head positions, lying positions, etc.).
In the following, True Positive rate (TP), False Positive rate (FP) and Accuracy are given. The results are obtained from Open WIBE classifier. TP rate is calculated as a ratio of the number of positive answers correctly classified and the total number of positive answers. FP rate is similarly calculated as a ratio of the negative answers incorrectly classified as positive and the total number of negative answers. Accuracy is calculated as a ratio of the sum of true answers (True Positive and True Negative) and the total sum (Positive and Negative).
Overall average accuracy of classification during the operational phase was 75%. This average accuracy is lower than one reported for visual P300 speller (Sellers and Donchin, 2006;Sellers et al., 2014;Krusienski et al., 2008), which was expected for auditory paradigm (Halder et al., 2010;Simon et al., 2015). The minimum accuracy obtained was 65%, while the maximum accuracy achieved was 90%. Minimum, average, and maximum accuracy for each subject is given in Table 2. The usual accuracy is around 70-80% and we can say that trained subjects (subjects which attended to the experiment more than just a few sessions) tend to perform better. Table 2 also gives information transfer rate (ITR) calculated as: where ITR is given in bits/question, N is number of possible targets (5 in our case) and P is classifier accuracy (Wolpaw et al., 1998;Yuan et al., 2013). Low ITR are obtained, but that was expected since our primary goal is to develop a BCI system that can be used to establish basic communication in cases were more efficient communicator types cannot be used.

Hardware Comparison
As already mentioned in Section 2.2, Emotiv Epoc was selected due to its low cost, portability and ease of use, while its major disadvantage was the reduced number of electrodes. Therefore, the same system was tested with a different hardware and electrode set, Quick amp with 32 gel electrode set, and 24 bit A/D converter. In this case, there were more electrodes in the central and the parietal region, which could be beneficial to P300 detection. Table 3 gives the classification accuracy for the two devices, as well as the percentage of true positives (TP) and false positives (FP). Both mean values and standard deviations are given for TP and FP.
As it can be seen, Quickamp device with more electrodes, especially at central and parietal regions has higher accuracy by more than 4%. More detailed results of comparison between the two hardware amplifiers and electrode sets are given in Fig. 1. It shows the false positives versus true positives scores of measurements acquired during classifier training in auditory P300 experiment with Emotiv Epoc and BrainProducts QuickAmp devices. For each trial or question, 5 fold cross validation was done and the average values of all cross validation results for each question are given in Fig. 1. Ideally, the classification should give results located at the left upper position, corresponding to TP of 1 (or 100%) and FP of 0 (0%).

Audio vs Visual Stimuli Comparison
In order to perform comparison between audio and visual stimuli performance, seven additional healthy subjects were tested with visual P300 speller and EPOC device. Table 4 gives classification accuracy for auditory paradigm and visual P300 speller, as well as the percentage of true positives (TP) and false positives (FP). Both mean values and standard deviations are given for TP and FP. For visual P300 speller, 6 × 6 matrix was used in standard set-up: a single row or column was randomly illuminated for 200 ms, and inter stimulus interval was 300 ms. Visual P300 speller is much faster than the proposed auditory paradigm and therefore the number of epochs is greater. All other parameters for preprocessing, xDAWN algorithm and classifier have not changed. Visual P300 speller usually has higher accuracy than one obtained here, but to obtain higher accuracy a greater number of electrodes has to be employed.  As expected, visual paradigm has better accuracy, by more than 9%. There is however an interesting characteristic of auditory stimuli; the percentage of true positives is significantly greater than the same measure for visual stimuli. It means that in auditory case, the system is better to correctly classify P300 when it occurs. On the other side, during visual stimuli, misclassification of P300 in its absence occur on average in only 2.4% of cases. This is more clearly visible in Fig. 2. This was expected, since it was reported by the examined persons that subjectively it was much easier to spatially focus attention in the visual experiment than in the auditory experiment. Fig. 3 gives Receiver Operating Characteristic curves for targets, for auditory (left) and visual (right) P300 experiment. Upper left corner represents a perfect classifier with no false positives (100% specificity) and no false negatives (100% sensitivity). Again, it is visible that ROC curve for visual experiment is closer to upper left corner of the ROC space, thus indicating better performance for the visual experiment.

Healthy vs Disabled Comparison
The BCI communicator was firstly tested on healthy subjects, and later the system was tested on 14 disabled subjects. Since primarily it was designed to help the disabled persons communicate with their caregivers, it was important to test its applicability in realistic situations. Table 5 gives the comparison between healthy and disabled subjects. The values presented in Table 5 were obtained in same way as in previous comparisons. The accuracy was slightly smaller for disabled subjects, but often the testing conditions couldn't be arranged in such an optimal way as for healthy subjects. Therefore these results are suggesting that the approach taken in the system presented was suitable for the disabled subjects. This conclusion is confirmed in Fig 4, showing a clear overlapping of results from healthy and disabled persons.

Question Type Comparison
In Section 2.1. the reasons were given why two types of questions were used, "yes/no" questions, and "names" questions. It was important to see whether the different types of questions have different classification accuracy. Since training "name" questions include Fig. 4. False positives versus true positives of the measurements acquired during classifier training in auditory (x) and visual (⋄) P300 experiment. One trial in auditory experiment is data acquired for one question, and one trial in visual experiment is data acquired for 10 letters. Table 6 Mean values and standard deviations for false positives (FP), true positives (TP), and classification accuracy for "yes/no" and "names" questions.
"yes/no" "names" FP (%) 11.9 ± 3.3 12.4 ± 3.1 TP (%) 26.3 ± 7.2 25.6 ± 5.5 Accuracy (%) 75.7 75.2 familiar names, it was possible that this type of questions result in a more pronounced and easier to detect P300 response. This may be desirable in the training phase, although it might have provided overoptimistic results. Generally, a longer word could have also caused more variability and more difficulty in the classification. Once again, the comparison of "yes/no" questions and "names" questions, as provided in Table 6 and Fig. 5, show that the system performs in a similar manner for different type of questions.

Selection Type Comparison
Finally, different modes of attention "enhancers" were tested in order to gain more insights in possible improvements of system performance. Mental counting (mc), mental attention (ma), finger counting (fc), tapping (tt) and mental tapping (mt) were tested. Results given in Table 7 and Fig. 6 were tested with analysis of variance (ANOVA) test which confirmed Fig. 5. False positives versus true positives of the measurements acquired during classifier training in auditory P300 experiment for "yes/no" (x) and "names" (⋄) questions. Table 7 Mean values and standard deviations for false positives (FP), true positives (TP), and classification accuracy for different modes of attention "enhancers" (mental counting (mc), mental attention (ma), finger counting (fc), tapping (tt) and mental tapping (mt)). ma mc fc tt mt FP (%) 12.3 ± 2.6 12.0 ± 3.5 12.0 ± 3.8 12.5 ± 3.2 12.0 ± 1.9 TP (%) 25.4 ± 5.4 25.9 ± 6.9 27. that there were no significant differences between different forms of attention enhancers (p value of 0.92). Mental counting was easy to perform and an effective way of attention enhancer.

Conclusions
In this paper, a BCI communicator for persons with LIS is described. It is based on P300 ERPs and auditory paradigm which are suitable for most severe cases of LIS when even visual system cannot be used. The system is also based on low cost, portable, mobile and easy to mount hardware, in order to obtain a communicator for everyday use. With the present design choices, the system has several drawbacks, the major ones being the use of auditory paradigm which is slower and less reliable than the visual one, a smaller number of electrodes and the lack of electrodes in standard positions. In order to test whether the performance of the system is satisfactory, several comparisons were performed and results were given. It was shown that the auditory paradigm and the selected hardware give lower scores when compared with the visual experiment and with a better hardware, but nonetheless the system still can be used. There is no major difference in the results provided by either healthy or disabled persons. Finally, different choices of questions and answer types have similar results, therefore either choice can be used.
Future improvements include the use of other signal processing techniques for blind source separation and possibly the inclusion of other types of EEG signals, in order to enhance the classification accuracy.