Real-time feedback improves imagined 3D primitive object classification from EEG

Brain-computer interfaces (BCI) enable movement-independent information transfer from humans to computers. Decoding imagined 3D objects from electroencephalography (EEG) may improve design ideation in engineering design or image reconstruction from EEG for application in brain-computer interfaces, neuro-prosthetics, and cognitive neuroscience research. Object-imagery decoding studies, to date, predominantly employ functional magnetic resonance imaging (fMRI) and do not provide real-time feedback. We present four linked studies in a study series to investigate: (1) whether five imagined 3D primitive objects (sphere, cone, pyramid, cylinder, and cube) could be decoded from EEG; and (2) the influence of real-time feedback on decoding accuracy. Studies 1 ( N = 10) and 2 ( N = 3) involved a single-session and a multi-session design, respectively, without real-time feedback. Studies 3 ( N = 2) and 4 ( N = 4) involved multiple sessions, without and with real-time feedback. The four studies involved 69 sessions in total of which 26 sessions were online with real-time feedback (15,480 trials for offline and at least 6,840 trials for online sessions in total). We demonstrate that decoding accuracy over multiple sessions improves significantly with biased feedback ( p = 0.004), compared to performance without feedback. This is the first study to show the effect of real-time feedback on the performance of primitive object-imagery BCI.


Introduction
Brain-computer interface (BCI) research aims to develop systems that enable movement-independent communication between the user and a computer/ device, using information encoded in neural signals [1].BCIs have been investigated across a variety of application areas such as classifying the semantic and emotional content of imagined representations [2], monitoring cognitive state for lie detection [3], written communication using BCI spellers [4], wheelchair control [5], controlling objects in realworld situations [6,7] or virtual spaces [8][9][10], neurogaming [11], enabling assessment in prolonged disorders of consciousness (PDoC) [12] and enhancing recovery following stroke [13], to name just a few.However, only a few studies have investigated the detection of visual imagery and working memory [14], the classification of mentally imagined real-world objects [15,16], the shape of imagined 3D primitive objects [17][18][19] or different image categories [20].For example, the application of imagined object classification could be a precursor for applications of BCI in computer-aided design (CAD), computer-aided manufacturing (CAM) [21] and computer-aided engineering design (CAED) [22,23] along with augmented virtual reality (AVR) [24,25] to inform alternative and neural informed design ideation and visual creativity [25,26].In this work, we focus on the state-of the-art in decoding shape/object imagery from electroencephalography (EEG).
A noninvasive BCI system commonly uses voluntary modulation of electroencephalography (EEG) signals for controlling an electronic device.However, to date, most studies investigating the relationship between brain activity and visual object imagery tasks rely on functional magnetic resonance imaging (fMRI) which has a lower temporal resolution than EEG neuroimaging, making it less suitable for a BCI [27].fMRI does enable measuring activity from deep brain structures, providing enhanced spatial resolution compared to EEG.Therefore, existing fMRI studies underpin the rationale for investigating various neuroimaging modalities to understand neural modulations in object imagery tasks and are thus reviewed here.

fMRI studies
Visual object imagery is related to several brain functions, such as working memory [28][29][30][31], shape-specific processing in the visual cortex [32], imagined and perceptual scene-specific brain activity [33], mental imagery during dreaming [34], visual search [35], and the relationship between mental imagery and emotions [36].Visual perception and mental imagery activate similar brain patterns [37][38][39][40][41][42].Although the primary visual cortex has an important role in mental imagery and perception [43,44], the occipitotemporal cortex is shown to encode sensory, semantic, and emotional properties, which are important for both [2].The relationship between working memory and long-term memory is reviewed by Bradly and colleagues [45], highlighting that connectivity between shortterm memory and long-term memory is important for a better understanding of the mechanism underlying mental imagery and perceptual processes.Furthermore, the similarity of fMRI patterns obtained during the perception of objects and their equivalent word representation has been demonstrated [46].Mitchell and Cusack (2008) found that the limited capacity of visual short-term memory for attended objects is correlated with neural activity in the posterior parietal cortex [47].Moreover, the occipitotemporal cortex is not only important in mental imagery and visual perception but also in object-related identification [48].Furthermore, the hippocampus may also affect these processes [49], as well as the frontal and parietal cortex [50].These findings suggest that different spatio-temporal patterns, at various levels of abstraction in terms of neural signaling, should be evaluated to determine if BCIs can exploit the associated features to enable direct movementindependent interaction between the user and a computer or device.
Color, size, and rotation of perceived or imagined 3D objects may also prove useful for developing a BCI that aims to decode imagined 3D objects.It has been suggested that the physical size of visualized objects might link with the occipitotemporal cortex and is represented in the ventral stream [51,52].A recent study [53] demonstrated that object size-related neural responses are organized in bilateral topographic maps, with similar cortical extents responding to large and small objects.The importance of the visual cortex in color representation is highlighted in several papers [54][55][56].Bird et al., 2014 [57] showed that the visual cortex responded only to the size of the color differences while color categories, such as blue and green, are encoded by regions in the frontal lobe.One other important property of mental imagery and visual perception is mental rotation.The rotation of imagined objects (object-rotation) and rotation of the viewpoint of the subjects (self-rotation) have been studied [58].The results show that the primary motor cortex (M1) has an important role in an object-rotation imagery task.At the same time, the sensorimotor area (SMA) is important for the self-rotation imagery task.
Charest et al. 2014 [59] indicated that individual differences in the early visual cortex and human inferior temporal cortex were involved in the visual detection of particular objects.With this observation, they emphasized that the individual-specific sensation of the environment might be reflected in an individually unique neural pattern in visual cortical areas.Another fMRI study [34] demonstrated that perceived or visualized objects could be classified using hierarchical visual features.This method demonstrated that objects could be categorized based on the sameness of the objects' properties and the properties of an object that had been viewed in a previous training session.As shown, several properties are involved in mental imagery or visual perception [34], which might relate to different types of brain activity such as shape-specific visual memory [32] or object size-specific information processing [51,53].Due to the variety of properties involved in the mental imagery of real-world objects, a comprehensive feature selection strategy is likely required to enable accurate detection, or decoding, of 3D object imagery from noninvasive neural recordings in practical enduser BCI applications.

EEG studies
As discussed above, the majority of mental imagery studies employ fMRI techniques [28 -59] and only a very limited number of studies have focused on decoding mentally imagined real-world objects [15,16], the shape of primitive objects [17][18][19], or different image categories [20] from electroencephalography (EEG).
Kosmyna et al. [15] used twenty-six participants for the offline classification of visual observation and imagery involving two real-world objects (flower and hammer), reporting a decoding accuracy (DA) of 61.7 ± 10.5% (M±SD) for visual observation and 55.7 ± 6.8% (M±SD) for visual imagery (theoretical chance level 50.0%).Llorella et al. [16] reported a DA of 60.5 ± 13.3% (M±SD) for four participants in offline classification of four real-world objects (tree, house, plane, and dog) plus the relaxation state from EEG (theoretical chance level 20.0%).In [16], the offline decoder involved a convolutional neural network (CNN) to obtain the reconstruction of the images of the imagined real-world object and a genetic algorithm (GA) to find the optimal hyperparameters of the CNN.Regarding shape classification, Esfahani and Sundararajan [17] focused on the offline classification of five primitive objects (sphere, cone, pyramid, cylinder, and cube) from EEG, using an Emotiv 14-channel EEG neuroheadset [60].They achieved an offline DA of 44.6 ± 6.6% (M±SD) for ten participants (theoretical chance level 20.0%).Bang et al. [18], with four participants, achieved a DA of 32.6 ± 7.1% (M±SD) for offline classification of six colored primitive geometric symbols (red 'O', white 'X', yellow '-', blue 'Δ', light blue '+' and green '|' (theoretical chance level 16.7%) using a CNN.Llorella et al. [19], using a CNN and the black hole search algorithm for the classification of two simple 2D geometric objects with eighteen participants, obtained an offline DA of 69.6 ± 8.4% (M±SD) (theoretical chance level 50.0%), and classification of seven simple 2D geometric objects with seven participants obtained an offline DA of 35.1 ± 7.0% (M±SD) (theoretical chance level 14.3%).Lee et al. [20] investigated the classification accuracy during visual perception and visual imagination in three image categories using three different images per class (i.e.real-world objects: airplane, cup, tree; numeric digits: monochrome one, three, five; colored 2D shapes: red heart, yellow star, white triangle).They compared the following five classifiers: EEGNet, convolutional neural network (CNN), Multi-Rocket, MobileNet, and support vector machine (SVM).The highest DA was obtained with the MultiRocket framework.With seven participants, they achieved a classification DA of 57.0% for perception in three categories, and a DA of 46.4% for visual imagery (theoretical chance level 33.3%).A shape imagery detection application, for example, a BCI-controlled CAD or CAED application, presents a requirement where the brain response is classified online in real time whereas all the studies reviewed above involve a single-session offline assessment without providing real-time feedback to the participant (and/or a controlled BCI application) regarding the actual decoded object.
To address this shortcoming in our understanding of the effect of online classification and feedback when decoding shape imagery, as an extension of our pilot study [61], we developed an online EEG-based BCI to investigate decoding five imagined 3D primitive objects (sphere, cone, pyramid, cylinder and cube) from EEG to determine if the separability of shape-specific EEG modulations is enhanced by real-time feedback to participants.We carried out our research using a four-study series wherein the paradigm was improved between each study in the series.The offline pilot paradigm was tested and evaluated in studies 1 and 2 involving a single-session and a multi-session scenario, respectively, in which no feedback was applied.The pilot version of the online paradigm was introduced in study 3 and, based on the experience gained, it was refined and gamified in study 4. In addition to presenting an investigation involving the classification of imagined objects online in real time using BCI, we provide a comprehensive analysis for the identification of frequency bands and cortical areas engaged in the visual imagery of primitive objects.The results serve as a basis for enabling further investigation into the decoding of imagined objects for applications in CAD, CAM, CAED, or AVR systems.

Participants
Ten volunteers (male (n = 7) and female (n = 3), aged 26-44 years) participated in the first offline study (study 1), three male volunteers (aged 30-44 years) participated in the second offline study (study 2), two volunteers (one male aged 21, and one female aged 20) participated in the first online study (study 3), and four male volunteers (aged 23-34 years) participated in the second online study (study 4).There were sixteen participants in total, of which three participated in more than one study (Supplementary Table 1).The experiments were conducted in the Spatial Computing and Neurotechnology Innovation Hub (SCANi-hub) at the Intelligent Systems Research Centre (ISRC), Ulster University, United Kingdom.Before the beginning of the first session, participants were presented with information about the experimental protocol which they were asked to read.Those who wished to participate gave consent by signing an informed consent form that had been approved by the Ulster University research ethics committee (UREC).All participants were healthy and had normal, or corrected to normal, vision.Participants were recruited for each study separately.They were informed about the session number and time requirements of each session.Based on discussions with participants, we believe that each participant was motivated to provide the best performance during each session.Supplementary Table 1 provides information about the dominant hand, gender, age and BCI experiences of the participants in the study series.

Experimental paradigm
Study 1 (N = 10) comprised one offline session.Study 2 (N = 3) comprised three offline sessions.Study 3 (N = 2) comprised eight offline sessions and seven online sessions.Finally, study 4 (N = 4) comprised two offline sessions and three online sessions.Table 1 summarizes the duration of sessions performed in studies 1-4.
In each study, each session lasted approximately two hours, including EEG preparation time.Before the beginning of the experiments, participants were asked to look forward and maintain a constant head position, avoid teeth grinding, minimize unnecessary movements during task performance, focus with eyes on the middle of the screen (indicated with a fixation cross before the task during the resting period) and to avoid eye blinks during object imagery tasks.Participants were asked to blink after the task end indicator cue if possible.In each session, the participant was seated in an armchair positioned 1.5 m in front of a Fujitsu Siemens B22W-5 ECO 22" LCD monitor.For task performance, the participant was asked to perform visual mental imagery of the actual target object in 3D (i.e. to mentally project the 3D shape of the target object on the middle of the screen, as it would be seen there).Participants who reported difficulty visualizing the object in 3D were asked to imagine the object in 2D.The offline datasets recorded in studies 3 and 4 were used to prepare an initial calibration of the BCI setup for the online sessions in the associated study.The impact of feedback on subjects' performance across multiple-session sessions was a central research focus for the current study.
The structure of the paradigm was similar for studies 1-4.However, some elements of the paradigm evolved from study to study.In the following section, we describe the experimental paradigm that was applied to the final study (study 4).The differences between study 4 and the previous three studies are summarized in Section 2.2.2.

Timing of the experimental paradigm for study 4
The experimental paradigm comprised three runs, each run comprising four blocks, and was presented in a gamified format as described below.Ten seconds before commencing each block, a white fixation cross was presented in the center of the screen, and a voice message played to inform the subject the block was about to begin.Each block comprised the following sub-blocks: one block initialization (involving a trial triplet, i.e. three trials) and ten further sub-blocks (involving ten trial triplets, i.e. thirty trials).In the block initialization sub-block, three of the five 3D primitive objects (sphere, cone, pyramid, cylinder, and cube; Figure 1) were used as target objects in randomized order.The paradigm was designed using eleven trial triplets to maintain the participant's attention using a gamified scenario to enhance engagement and motivation [62], rather than presenting a monotonous series of thirty-three single trials.The ten trial triplets, comprising six repetitions of the five 3D primitive objects in randomized order, were used for the main analysis.The block initialization trials were not used in the main analysis because at the beginning of the block, after a long resting period, the subjects' task-related EEG pattern may differ from the patterns generated during continuous object imagery task performance.
The timing of a trial, and an example of how the screen content varied during the trial, are presented for the offline and online paradigms in Figures 2 and  3, respectively.At the beginning of each sub-block, a white fixation cross (in the middle) and three graycolored 3D primitive objects (on the left side) were displayed on the screen.The gray-colored objects illustrated the target triplet for the current sub-block.After a 2s pause, the fixation cross was replaced in the middle with a blue replicate of the first (bottommost) target object for a duration of 1s, indicating the next target for the oncoming task -and then disappeared, indicating the beginning of the object imagery task.During the imagery task, the middle of the screen was set to empty for 3s during the task period.The end of the task period was indicated with a 200 ms auditory tone (6 kHz beep).In parallel with the onset of the auditory tone, for the offline paradigm, the target object was displayed once again in the middle of the screen for 1s.This second appearance of the target object was replaced in the online paradigm with the decoded object to provide visual feedback.After a 1s delay, the target object was replaced with the fixation cross, and the color of the corresponding target on the left side of the screen, for the offline paradigm, changed to blue, indicating the trial had been completed.For the online paradigm, the color of the corresponding target changed to blue only if the actual task was successful.Otherwise, the corresponding target changed to yellow and the incorrectly decoded object was moved from the middle to the right side of the screen.Gamification was achieved through this stacking of correctly identified objects with the same color.All trials in each trial triplet were executed in the same way as described above.Each 23s sub-block (Figure 4a) comprises a sub-block initialization pause and three trials.Each 260s block (Figure 4b) comprises a block initialization voice    message, a block initialization sub-block, and ten subblocks for the analysis.Each 20-minute run (Figure 4c) comprises four blocks and three inter-block resting periods (IBR: 50s each).During IBR, the participants were asked to relax and not to move or talk.A session comprised three runs, which were separated by inter-run resting (IRR) periods (Figure 4d).The length of IRRs was determined by the participant (typically 5 minutes).Thus, the total duration of an offline session, comprising three runs and two inter-run resting periods, was around 70 minutes, involving 72 trials for each class (i.e.360 trials in total) (Figure 4).

Differences in the experimental paradigms used for studies 1-4
Although each of the paradigms was mainly consistent, certain elements evolved during the research from study 1 to study 4, as described below and summarized in Table 2.
• The appearance of the objects displayed on the screen was refined after study 2 to improve the appearance of the presented objects.The 3D primitive objects for studies 1-2 and studies 3-4 are presented in Figure 1.
• In studies 1 and 2, the thirty trials were presented as a continuous series (i.e. the trial triplet structure was not used).Therefore, the target triplet (presented in Figure 2a on the left side of the screen) and the 2s trial triplet initialization pause were not applied to studies 1 and 2. This was introduced in studies 3 and 4 to engage participants through gamification of the task.
• The block initialization sub-block (i.e. the extra trial triplet) was added to the paradigm only in studies 3 and 4. • When a participant failed a task in the online sessions of study 3, the task was repeated once to give the participant a second attempt to achieve the correct response.As repeated tasks increased the duration of the blocks significantly, the number of trials in the online sessions of study 3 was reduced in each block from thirty to fifteen.To avoid the reduction in trials, the repetition of failed tasks was not applied to study 4.

Data acquisition
EEG was recorded from 30 channels, and electrooculography (EOG) was recorded from two channels using 32 active EEG sensors (gLadybird) with two cross-linked 16channel g.BSamp bipolar EEG amplifiers and two AC type g.GAMMboxes.The EEG reference electrode was positioned on the left earlobe.The EEG was amplified (gain: 20000), filtered (Butterworth, 0.5-100 Hz, eighth order), and sampled (A/D resolution: 24 Bits, sampling rate: 250 samples/s).The ground electrode was positioned at AFz according to the international 10/20 EEG standard.The EEG montage is illustrated in Figure 5.
The communication between a Simulink [63] module that was used for EEG data acquisition and online signal processing and the experimental protocol application in Unity 3D Game Engine [64] was managed with the user datagram protocol (UDP).

Multi-class classification using FBCSP 2.4.1.1 EEG signal processing and trial validation.
the quality of the recorded EEG was inspected manually, and EEG channels with high-level noise (>200 mV) were removed from further processing.Recorded signals were band-pass filtered in six non-overlapped EEG bands (0.5-4 Hz (delta), 4-8 Hz (theta), 8-12 Hz (mu), 12-18 Hz (low beta), 18-28 Hz (high beta), and 28-40 Hz (low gamma)) with Simulink [63] using high-pass and low-pass finite impulse response (FIR) filters (band-pass attenuation 0 dB, band-stop attenuation 60 dB).To reduce the size of the EEG dataset, the preprocessed EEG dataset was downsampled from 250 Hz to 125 Hz.Reference (baseline) and taskrelated time intervals between −4s (prior) and 5s (after) the onset of the object imagery task were epoched out from the frequency-filtered EEG dataset for each EEG channel and stored.The quality of the EEG was inspected manually for each trial, and trials containing visually obvious artifacts overlapping the task period (i.e. between −2s (prior) and 3s (after) the onset of the object imagery task) were removed.Spatial filtering: EEG decoding was performed using filter-bank common spatial patterns (FBCSP) [65], a well-established classification technique that enables discrimination between different types of imagined movements [66].FBCSP was used to create spatial filters that maximize the discriminability of two classes by maximizing the variance of band-pass filtered EEG signals from one class while minimizing their variance for the other classes [67,68].A maximum of three CSP filter pairs for each 2-class classifier for each frequency band was used.Feature extraction: for studies 1 and 2, the time-varying logvariance of the CSP filtered EEG was calculated, in three separate analyses, using a 500 ms, 1s, or 2s width sliding window over the epochs with a 200 ms time lag between two windows.Based on experiences gained from studies 1 and 2, the 500 ms option was omitted in studies 3 and 4. Feature selection: the mutual information (MI) between features and the associated target class was estimated using a quantized feature space [69] to identify a subset of features that maximize classification accuracy.2-class classification: a regularized LDA (RLDA) algorithm using the RCSP toolbox [68] was used to create a linear hyperplane to separate data from two classes where the class assigned to an unseen feature vector depends on the polarity of the classifier output, determined by position for the hyperplane [70].Multi-class classification: the multi-class classification module involves multiple 2-class classifiers (target vs non-target classes) to separate each target class from the other (non-target) classes.Thus, the number of 2-class classifiers equaled the number of classes.The class label was determined by the class associated with the classifier that produced the largest signed distance in the task class associated side of the hyperplane.A general overview of the applied FBCSP method is presented in Figure 6.

Decoding accuracy calculation with crossvalidation for the offline studies
DA for the offline studies (studies 1 and 2) was calculated using an inner-outer (nested) cross-validation (CV) (Supplementary Figure 1).The inner-outer CV guarantees that the test data used for the outer level CV were not used in the inner level for hyperparameter optimization.Further details of the inner-outer CV are described in [71].All DA values were compared to the real (empirical) chance level [72] which was calculated using a significance level of p < 0.01.
For studies 1 and 2, six outer folds and five inner folds were assigned.During the inner fold CV, the optimal architecture (resulting in the highest DA) denoted the number of the selected CSP filter pairs (2, 3, or 4), the number of the quantization levels for the mutual information (MI) features selection module (2, 3, or 6), the number of the selected features at the output of the MI module (6, 10, 14, or 18), and the optimal width of the classification window (500 ms, 1s, or 2s).
The cross-participant/session averaged time-varying DA for both offline studies was calculated and plotted using outer-level test results obtained from multiple single-session analyses.
A Wilcoxon non-parametric test was performed to compare the significance of the difference in DA peaks obtained in the task period and reference (baseline) period, i.e. the pause period before the target object was displayed on the screen.

BCI calibration for the online studies
The BCI configurations used in the online sessions (studies 3 and 4) were calibrated using a multi-session dataset recorded in sessions conducted before the calibration.Results from studies 1 and 2 showed that the four lower (delta, theta, mu, and low-beta) EEG bands made a greater contribution to the DA compared to the high-beta and low-gamma bands.Therefore, in studies 3 and 4, the EEG bands used in the FBCSP module were limited to the four lower (delta, theta, mu, and low-beta) bands.Moreover, using the experience of studies 1 and 2 and knowledge gained around which hyperparameters produced maximum DA, in studies 3 and 4 the number of selected CSP filter pairs was set to 2 and the number of quantization levels was set to 3. The number of features that could be selected by the MI module was selected from 6, 8, and 10 and the width of the classification window was selected from 1s and 2s.
To improve the cross-session stability of the calibrated BCI, the single-session-based FBCSP calibration (used in studies 1 and 2) was replaced with a cross-session test based FBCSP calibration.In the first step, the BCI was calibrated based on a single-session dataset using each combination of denoted hyperparameter options, separately, with the six-fold CV (Supplementary Figure 2) which is equivalent to a simple outer-level CV.The time-varying DA graphs resulting from the single-session six-fold CV were plotted for each BCI configuration and compared by visual inspection.The BCI configurations resulting in a reasonably high DA peak in the task interval (compared to the DA peak obtained from other BCI configurations) were noted for the cross-session test.In the cross-session test, the DA was calculated for each session, which was not used for calibrating the tested BCI configuration.Thus, in studies 3 and 4, the six-fold CV-based BCI calibration formed the inner level of the CV, and the cross-session test formed the outer level of the CV.BCI configurations were ranked by visual inspection for each participant separately, comparing DA peaks obtained from multiple sessions with the cross-session test.The best-ranked BCI configuration was used in the first online session of the participant.In the online BCI, the delay between the onset of the task and the classification time was set to the time between the onset of the task and the DA peak obtained in the cross-session test.
Table 3 summarizes the sessions used to calibrate and test the classifiers.

Online signal processing
The online multi-class classification was performed in Simulink [63] using the calibrated BCI.Studies investigating the impact of unbiased real-time feedback show that negative feedback has a significant impact on accuracy during online task performance.The influences of positive and negative visual feedback on motor imagery task performance using EEG and electrocardiography (ECG) have been studied [73].The findings suggest that over-biased negative feedback causes mental stress that is detected in the form of significantly higher heart rate variability compared to sessions where over-biased positive feedback was presented -and accuracies correlate with the polarity (-/+) of the biased feedback.Alimardani et al. [74] studied EEG-based BCI-operated human-like robotic hands using imagined grasp or squeeze motions.They evaluate participants' performance under different presentations of feedback including: (1) non-biased direct feedback, (2) biased feedback corrected to fake positive 90% accuracy, and (3) biased feedback corrected to fake negative 20% accuracy.Participants achieved better accuracy when they received fake positive feedback, while fake negative feedback resulted in a decreased performance.These results were considered in the study for online visual feedback.When the classification was 'successful', the decoded (correct) object was displayed during the feedback period.However, if the classification was incorrect, there was a 33% chance (biased-positive feedback) that the correct object would be displayed rather than the decoded object.It is important to note that DA values presented in this paper were calculated based on 'successful' classifications rather than the displayed (biased) result, which may be positively biased.

Temporospatial spectral analysis
To identify frequency bands and cortical areas that provided the most separable features, an analysis was performed using the multi-session datasets and involving CSP filters and the MI weights of the FBCSP classifiers calibrated.This analysis was performed separately for every session and participant.For the timevarying frequency analysis, the mean values of MI weights (that weight the DA contribution of the features of the 2 class classifiers) were calculated in each analyzed frequency band and time point, separately, and were plotted in the form of participant-specific topographical maps.For the topographical analysis, all transformation values in each CSP filter were multiplied with MI weights obtained for the corresponding CSP filter at the time matching the maximal DA.

Cross-study statistical analysis
A cross-study analysis was performed to examine differences in the DA values achieved in each session for studies 1 and 2, compared to those achieved for studies 3 and 4.This analysis was performed to establish whether there was a statistically significant improvement in DA scores when feedback was included in the paradigm.The Mann-Whitney U test was chosen to compare mean ranks, due to the small and unequal data samples.Only sessions where the maximum DA obtained in the task period differed significantly from DA in the reference (baseline) period were included in the analysis.Furthermore, to ensure the independence of observations, one participant's dataset was excluded from the cross-study analysis as the participant had completed study 2 and study 4, both of which were in separate independent groups for the analyses.

Methods summary
An overview of the calibration, cross-validation, and evaluation methods applied to studies 1-4 is presented in Figure 7.

Results of studies 1-2
Figure 8a-d provides an overview of participants' performance by presenting time-varying DA plots and significant peak DA values obtained for study 1 in a single offline session, and for study 2 in three offline sessions.
The cross-participants/session averaged time-varying DA for both studies are presented in Figures 8a and  8c, respectively, while participant/session-specific differences in time-varying DA plots for study 2 are presented in Figure 8d.As indicated in Figure 8b, seven of ten participants in study 1 and each of the three participants in study 2 achieved DA peaks during the task period which were significantly higher than the DA peak obtained during the corresponding pause period (Wilcoxon non-parametric test, p < 0.05).The maximum peak DA in study 1 was achieved by participant 6 (33 ± 4%), and in study 2 by participant 3 (37 ± 3%).Cross-participants/session averaged frequency maps and object-specific topographical maps (Figure 8e) indicate that for participants who achieved a DA > 30% (empirical chance level = 20 ± 6%), the 1-4 Hz (delta) and 4-8 Hz (theta) oscillations in frontal, posterior parietal and occipitotemporal cortical areas provided the highest contribution for offline classification of five imagined 3D primitive objects.

Results of study 3
Figure 9a provides an overview of significant DA peak values obtained using datasets acquired for two participants in (1): seven offline sessions (used for BCI calibration), ( 2): one additional offline session recorded with a two-week gap after the seventh offline session (used for offline DA stability check), and (3): seven online sessions (of which the first five were used for BCI recalibration) (Table 3).Figure 9a1 presents the mean values and standard deviations of cross-session DA peak values obtained for BCI configurations calibrated using datasets acquired in offline sessions 1-7 (initial calibration) and online sessions 1-5 (recalibration).Furthermore, Figure 9a2 presents DA peak values obtained from the cross-session stability test using the initial and recalibrated online BCI configuration selected from the single-session-based calibration presented in Figure 9a1.
The cross-session stability test, using datasets acquired in offline sessions 1-7, indicates an increasing trend of DA peak values obtained over sessions 1 to 7, ranging from 25% to 34% for participant 1 and from 25% to 35% for participant 2. The long-term crosssession stability test, using data acquired in offline session 8, shows a slightly decreased DA peak (30% for participant 1 and 33% for participant 2) compared to that achieved two weeks earlier in offline session 7.However, DA in session 8 is higher than that achieved in the first two offline sessions (≈26%).
During the first online session, both participants failed to achieve above chance level performance (DA peak in the task period was similar to the DA peak obtained in the pause period; Wilcoxon non-parametric test, p > 0.05).However, during the last two online sessions, using the recalibrated BCI, the participants reached a personal online DA maximum of 29% and 32% (participants 1 and 2, respectively) (empirical chance level 20 ± 6%).
The participant-specific frequency maps of CSP-MI weights at the corresponding DA peak (Figures 9b and  9d) for both participants indicate that the 1-4 Hz (delta) band provided a maximal contribution for encoding the imagined objects.It is worth noting that, as expected, the highest values of CSP-MI weights were obtained at times which correspond with the peak DA.
The participant-specific topographical maps of MIweighed CSP patterns (Figure 9b2 and Figure 9d2) indicate that frontal, posterior parietal, and occipito-temporal cortical areas provided the highest contribution for both offline and online trials.As expected, the MI-weighed CSP patterns show higher DA contributions in task-related cortical areas compared to patterns obtained during pause periods.
Finally, participant-specific time-varying cross-session DA plots were obtained from: (1) long-term stability tests (Figure 9c1), (2) the averaged curves and standard deviation of time-varying DA graphs obtained from online sessions 1-5 (Figure 9c2), and (3) participant/session-specific time-varying DA obtained from online sessions 6 and 7 (Figure 9e); indicates that maximal DA (peak DA) for both participants was achieved with a latency that matches the latency observed during BCI calibration in cross-session CV.

Results of study 4
Results obtained in study 4 (Figures 10 and 11) are similar to those obtained in the pilot online study (study 3), even though the number of both offline and online sessions in study 4 was only half of those completed in study 3.
Figure 10a presents an overview of significant DA peak values obtained for four participants in offline sessions 1-2 (used for initial calibration of the BCI), online sessions 1-2 (based on the BCI that was calibrated based on offline sessions 1-2), and in online session 3 (using the BCI that was recalibrated based on online sessions 1-2).
Single-session CV results presented in Figure 10a1 provide a summary of DA rates from the sessions that were selected for calibrating/recalibrating the BCI (based on cross-session CV results).Cross-session CV results of the calibrated/recalibrated BCIs are presented in Figure 10a2.DA values presented in Figure 10a3 indicate that the online DA for each participant increased over the three online sessions.For each participant DA in online session 1 picked a value in the range of the empirical chance level 20 ± 6% while in online session 3 reached 28%, 32%, 23%, 32% for subjects 1-4, respectively.Participant/session-specific time-varying DA plots (presented in Figure 10b) indicate for online (green columns) for each subject of studies 1 and 2. Subjects/sessions achieved a DA peak in a similar range with DA during the pause (Wilcoxon non-parametric test, p >0.05) are indicated with 'N/A').The bottom panel of (B) displays DA values for each subject and session of study 2, separately.(C): grand average (thick curve) and cross-subject STD (shaded area) of time-varying DA calculated in studies 1 and 2. (D): subject-specific time-varying DA plots from each session of study 2. (E): cross-subject averaged frequency and topographical maps indicating frequency bands and cortical areas providing the highest contribution to DA.The cross-subject averaged frequency and topographical maps are derived using CSP filters and MI weights of subject-specifically calibrated BCIs using subjects/sessions which provided DA peak above 30% (i.e. using only those subject/session combinations for which the thick black lines in green columns of (B) indicate a DA above 30%).The session ID that was selected for calibrating the final BCI is marked with a rectangle below the DA chart.(a2): detailed results of the sessions that the DA peak for each participant was achieved near the time where it was expected following the onset of the task.However, the DA peak was slightly higher (30%, 33%, 25%, 34% for subjects 1-4, respectively) compared to that obtained at the denoted time of the online classification.

BRAIN-COMPUTER INTERFACES
offline cross-session stability test (i.e.DA rates obtained in test sessions of the best performing BCI configuration selected based on (a1)), furthermore, DA obtained in offline session 8, and online sessions 1-7 using the BCI configuration selected based on offline sessions 1-7.(b) and (d): results of an analysis investigating the subject-specifically calibrated BCI, calibrated based on offline sessions 1-7 and online sessions 1-5, respectively.(b1) and (d1): time-varying DA plots (an averaged curve (thick blue curve) and STD (shaded area)) resulted from cross-session CV during BCI calibration.(b2) and (d2): frequency bands and cortical areas with the highest DA contribution based on CSP filters and MI weights of the calibrated BCI.(c) and (e): subject-specific time-varying DA plots obtained using the BCI, which was calibrated based on offline sessions 1-7 and online sessions 1-5, respectively (c1: long-term stability test results from offline session 8, C2: the average and standard deviation of time-varying DA from online sessions 1-5, e1 and e2: timevarying DA from online sessions 6 and 7).The expected position of peak DA is indicated with a black vertical solid line in the task interval of the time-varying DA plots and frequency maps.The participant/session-specific frequency maps of CSP-MI weights (Figure 11) were calculated based on the results of single-session analyses for each online session.The results confirm that the 1-4 Hz (delta) band (in some cases along with the 4-8 Hz (theta) band) provides the highest contribution to DA of the imagined object classification.
The participant/session-specific topographical maps of MI-weighted CSP patterns (Figure 11) confirm that frontal, posterior parietal and occipitotemporal cortical areas provided the greatest contribution to the online classification of the five imagined 3D primitive objects from EEG.

Results of the cross-study statistical analysis and offline vs. online scenarios
Feedback was not provided in studies 1 and 2, while initial sessions without feedback in studies 3 and 4 were followed by sessions that provided online feedback.A preliminary comparison was made between the combined first offline (no-feedback) sessions from studies 3 and 4 and the combined first sessions from studies 1 and 2, to determine if initial differences in performance existed that could be attributed to variations in participant performance as a function of group assignment.The analysis did not yield a significant difference, indicating homogeneity in initial performance ability across studies (U = 18, Z = −0.29,p = 0.77, Figure 12).Subsequently, to determine the impact of an increased number of sessions with and without feedback, DA scores for studies 1 and 2 combined were compared against DA scores for studies 3 and 4 (with and without feedback) combined.The mean rank of DA values for studies 3 and 4 was found to be significantly greater than those for studies 1 and 2 (U = 121, Z = −2.73,p = 0.006, Figure 12).Given the difference in DA was found to be significant, a follow-up analysis was run to compare DA scores for studies 1 and 2 against DA scores for combined sessions from studies 3 and 4 in two ways: (1) without feedback only, and (2) with feedback only; to determine whether the provision of feedback significantly improved performance.The alpha level was adjusted to 0.025 to control the false discovery rate given these two post hoc comparisons.For the comparison of runs without feedback, the mean rank of DA scores achieved in studies 3 and 4 was not found to be significantly greater compared to those achieved in studies 1 and 2 (U = 68, z = −2.02,p = 0.043 (>0.025), Figure 12).However, the results for the comparison of DA scores, when feedback was provided indicated that the mean rank of DA values in feedback sessions in studies 3 and 4 was significantly greater than those achieved in studies 1 and 2 (U = 53, Z = −2.85,p = 0.004 (<0.025), Figure 12).
A comparison of DA values which were used for the cross-session analysis is presented in Figure 12.

Discussion
To date, only a limited number of offline studies have focused on decoding mentally imagined real-word objects [16] or the shape of primitive objects [17][18][19] from EEG.However, none of these studies used an online scenario providing real-time feedback from the actual DA.
The studies presented in this paper, not only intended to evaluate the separability of five imagined 3D primitive objects (sphere, cone, pyramid, cylinder, and cube) from EEG using an offline scenario (studies 1-2), but also to evaluate if closed-loop BCI training could improve separability using a multisession experimental paradigm with gamified feedback (studies 3-4), and to identify frequency bands and cortical areas providing a maximal contribution for decoding imagined objects from EEG.Our results show that: • Significant DA, above empirical chance level performance, is feasible.• The addition of feedback to the experimental paradigm, over multiple sessions, enhances performance.• Prominent frequency bands are primarily 0-4 Hz (delta) and secondarily the 4-8 Hz (theta) oscillations.
• Prominent activations during shape imagery are observed in the frontal, posterior parietal, and occipitotemporal cortex.

Decoding accuracy and multi-session learning process
In our offline pilot studies (studies 1 and 2), ten of thirteen participants achieved a DA peak during the task period, which was significantly higher than the DA peak obtained in the pause period (Wilcoxon nonparametric test, p < 0.05).The significant DA peak for these ten participants ranged between 27.1% and 37.1% (Figure 8b).In study 3, an increasing trend of crosssession DA values was detected in a comparison between the BCI when calibrated using data acquired from an early session vs a later session for both participants (see offline session 1-7 in Figure 9a1).It is important to note that the peak DA during online sessions was not only significantly higher than the empirical chance level (20 ± 6%) but the DA peak occurred with the same latency following the onset of the task as observed on cross-session CV tests, performed during the BCI calibration process.(See the relation of the DA peak and the denoted classification time indicated with a black vertical solid line during the task period in the time-varying DA plots in Figures 9 and 10).The DA for both offline and online session groups increased over sessions.Despite the stability of the BCI (confirmed in the longterm stability test, Figure 9a2) and the fact that the offline and online paradigms followed the same scenario, the DA for all participants in studies 3 and 4 dropped significantly in the first online session compared to the DA obtained in the last offline session, which took place some days prior to the first online session (Figure 9a2 and Figure 10a2-A3).This observation may relate to an initial adaptation of the feedback and/or frustration caused by misclassification.
The DA trends achieved in the online sessions of studies 3 and 4 (Figure 9a2 and Figure 10a3) show a positive learning process for all participants, during which they learned to use a participant-specifically calibrated BCI more effectively.Despite the overall performance being relatively low compared to other types of imagery (e.g.motor imagery), the results indicate that a multi-session learning process using a closed-loop scenario provides an opportunity for the user to improve performance.
It is important to note that the distribution of the data and features commonly change significantly over the participant's learning process due to the following reasons.First, the user's strategy to attempt to control the BCI commonly changes during a multi-session learning period.Second, the task-specific neural activity pattern changes significantly when the participant learns to control the BCI [74].The results of the online experiments also call attention to the importance of an adequately scheduled recalibration of the BCI: in both online studies (study 3 and 4), after the BCI re-calibration, the online DA increased significantly for most participants (see an increase in DA for study 3 in Figure 9a2 between online sessions 5 and 6; for study 4 in Figure 10a3 between online sessions 2 and 3).
It should be noted that the highest single session CV accuracy (DA = 51 ± 7%) presented in this paper was achieved by participant 2 in study 4, i.e.where the subject had completed the most sessions with biased feedback.However, as this accuracy was achieved during offline recalibration of the BCI using a dataset recorded in this participant's final online session, the online performance of this BCI configuration was not tested.Nevertheless, this result is an example of the possible improvement over time by the user mutually learning with the BCI, as well as the potential of the paradigm to enable primitive 3D object decoding from EEG.

Cross-study statistical analysis and offline vs online scenarios
The cross-study statistical analysis first established that initial performance ability was homogenous, as indicated by the results of the comparison between the combined first offline (no-feedback) sessions from studies 3 and 4 and the combined first sessions from studies 1 and 2 (all offline), which was not significant (p = 0.77).Following this, we determined that performance in studies 3 and 4 was generally improved compared to performance in studies 1 and 2 (p = 0.006).Considering that studies 3 and 4 involved several sessions for each participant, as opposed to studies 1 and 2 (which involved one and three sessions, respectively), both offline (without feedback) and online (with feedback), it was important to analyze the impact of (1): increased sessions, and (2): feedback sessions separately to determine the effect of feedback sessions alone.
Regarding the former effect, the mean rank of DA scores achieved in studies 3 and 4 offline (no-feedback) sessions was not found to be significantly greater compared to those achieved in studies 1 and 2, at the Bonferroni-adjusted alpha level of 0.025, for two post hoc tests (U = 68, Z = −2.02,p = 0.043).Therefore, despite an increase in the number of offline sessions, performance did not improve significantly.In contrast, the results for the comparison of study 3 and 4 feedback sessions with study 1 and 2 sessions (without feedback) revealed the mean rank of DA values achieved in feedback sessions in studies 3 and 4 was significantly greater compared to those achieved in the no-feedback studies 1 and 2 (U = 53, Z = −2.85,p = 0.004, given the Bonferroni-adjusted alpha level of 0.025).This significant improvement attributable to the influence of feedback is a strong indicator that real-time feedback during shape imagery improves separability of neural modulations and enhances decoding accuracy, and that participants can learn to improve performance in shape imagery to modulate brain activity.

Visual perception vs mental imagery
fMRI studies show that visual perception and mental imagery are associated with similar patterns [37][38][39][40][41][42].As in our experimental paradigms, prior to the object imagery task, the target object is presented on the screen.Therefore, it is important to investigate if the results of the object classification were linked to the neural activity involving perception (prior to the imagery task) or related to the object imagery task.Time-varying DA plots with a reasonably high DA sometimes indicate two DA peaks; a smaller (non-dominant) DA peak at the end of the 1s period when the target object was displayed on the screen, and a significantly higher (dominant) DA peak matching time interval of the task period (indicated with VP and MI labels respectively in Figure 10b).Assuming that visual perception and mental imagery rely on similar patterns, it is logical to suppose that the smaller peak at the end of the display period may rely on the visual perception of the displayed target object, while the dominant peak in task interval is a result of the mental imagery task.The delay between the onset of perception and VP peak, and between the onset of the mental imagery task and MI peak, originates not only from biological factors such as the reaction time but also the size of the classification window that was optimized participant specifically (i.e.1s or 2s).

Frequency and topographical analysis
The frequency and topographical analyses aimed to identify frequency band(s) and cortical area(s) involved primarily in mental imagery (imagined visual representation) of 3D primitive objects.
The frequency analysis performed for studies 1-4 showed clear evidence that the 0-4 Hz (delta) oscillations (for some participants along with the 4-8 Hz (theta) oscillations) provided the highest contribution to the classification of the five primitive objects from the EEG recorded during both offline and online sessions.Furthermore, the topographical analysis indicated that the frontal, posterior parietal and occipitotemporal cortical areas have an important role in object imagery (Figures 8e, 9b2, Figure 9d2, Figure 11).It is important to highlight that BCI configurations which provided the highest accuracy in single-session CV (DA > 30%, empirical chance level 20 ± 6%) also provided a sharper separation of cortical areas involved in object imagery task performance (panels highlighted with bold frame in Figure 11) compared to BCI configurations with which a lower level of DA was achieved (panels without bold frame in Figure 11).The object-specific similarities/differences of topographical maps were analyzed using a dataset from studies 1-2 (Figure 8e), indicating that imagery of the five analyzed objects generates similar, or overlapping, cortical activity patterns.As individual brain activity has a wide range of variability [75], an analysis studying participant-specific variability of object-specific cortical activity patterns generated during imagery of different 3D primitive objects may be an objective in future work.The results obtained from topographical analyses are in line with fMRI studies.For example, Stokes et al. [32] show an important role of the visual cortex in shape-specific mental imagery.Furthermore, in line with our results, the object-related mental imagery contribution of the occipitotemporal cortex [48] as well as the frontal and parietal cortex [50] has been reported.
Our result regarding the importance of delta oscillations in the decoding of the shape of imagined objects is supported by a recent study by Sburlea et al., 2021 [76], the results of which indicate that low-frequency EEG not only encodes information about properties of grasping movements but also the shape and size of the grasped objects.Regarding object imagery EEG studies, Chew et al. [77] report a maximal decoding accuracy of 80% (theoretical chance level 50%) for the binary classification, based on whether the user aesthetically liked or disliked the presented object.The features for their KNN-based classifier were extracted from 1-4 Hz (delta), 4-8 Hz (theta), and 8-13 Hz (alpha) bands but features from the 13-30 Hz (beta) and 30-49 Hz (low gamma) bands were omitted.Although this result supports our findings showing that low-frequency EEG oscillations (from the delta and in some cases the theta band) encode maximal information from the shape of an imagined 3D primitive object, the aesthetic perceptions might not rely on the same neural circuits as the shape of imagined objects.

Limitations
Our research aimed to develop an online BCI to decode five imagined 3D primitive objects and show that realtime feedback enhanced decoding accuracy.The online DA in the final session reached an average 35%, which is significantly above the empirical chance level (20 ± 6%), and the performance of participants who received multiple feedback sessions was significantly higher (p < 0.004) than the performance of those who participated in only one session and had no feedback.However, the decoding accuracy values are not sufficiently high to enable reliable real-time intended shape selection using a BCI.Nevertheless, it can be seen that in studies 3 and 4 performances are improving with feedback for all participants.Moreover, the highest offline result (DA = 51 ± 7%) across the study is achieved in study 4 by subject 2 in the final session.This observation again suggests the gamified paradigm and feedback have impacted the performance.Further training and gamification to enhance training may, therefore, improve the results and produce a BCI which could be used functionally with shape imagery alone.The results also suggest that hybridizing imagery strategies to include, for example, motor and shape imagery may increase the potential for shape imagery to be used.
For the first time, our results show that DA can be enhanced with real-time feedback in a multi-session scenario and a gamified paradigm.We can report that multiple feedback sessions enhance performance.However, we cannot conclude that positively biased feedback and/or gamification improved performance any more than unbiased and/or gamified feedback, as we do not have control groups for the latter.Future work should consider controlling for these measures to gain a better understanding of the effects of various types of feedback on shape-imagery BCI performance.
In an offline study, Llorella et al. [19] classified seven simple 2D geometric objects, achieving an average offline DA of 35.1 ± 7.0% (theoretical chance level 14.3%) using a convolutional neural network for feature selection which would indicate slightly better average performance with the CNN (7 shapes in [19] vs 5 in this study).However, in [19] the stimuli are different (sample line drawing of shape in [19] vs 3D shape imagery in this study) which could account for observed differences in accuracy.Additionally, in [19] a 2-class analysis shows that maximal DA is achieved with line vs parallelogram which are two shapes with maximum appearance distinction, suggesting that the types of stimuli/ cues for shape imagery significantly impact results.This observation is further supported in another study by Llorella et al. [16] showing offline classification of four real-world objects (tree, house, plane, and dog) plus the relaxation state obtained a DA of 60.5% (theoretical chance level 20.0%), again using CNN.Further extensive research is needed to determine optimal combinations of shapes and indeed the influence of shape and signal processing strategy.A global search of the parameter space using advanced data-driven deep learning approaches may indeed find optimal features for shape imagery classification as suggested by the results in Llorella et al. [16].The machine learning approach applied in the study is constrained in terms of the search space and optimal frequency band, number of spatial filters and a relatively simple classifier.
We recently demonstrated in Cooney et al. [78] that classification of six imagined words (theoretical chance level 16.7%) and five imagined vowels (theoretical chance level 20.0%) was enhanced by CNN frameworks (Shallow, Deep, EEGNet), achieving significantly higher DA (p < 0.0001) compared to a FBCSP-RLDA framework (words: 21 ± 2%, vowels: 26 ± 2%), similar to that applied in this study.These results were further improved using EEG and fNIRS fusion or alternative words and word pairing arrangement.Therefore, in future work, we will investigate replacing our FBCSPbased classifier with a CNN-based framework for imagined object classification and optimizing the type of images.
Here, we also note that although the number of participants for almost every study presented in this paper was relatively low: studies 1 (N = 10), study 2 (N = 3), study 3 (N = 2), and study 4 (N = 4), the overall number of the participants involved in the four interdependent studies was sixteen (from which three participated in more than one study, Supplementary Table 1).There were 69 sessions in total of which 26 sessions were online with real-time feedback (involving 15,480 trials for offline and at least 6,840 trials for online sessions in total) which is a relatively comprehensive assessment of the paradigm and sufficient to demonstrate statistically the impact of feedback on BCI performance.This approach of adapting the study design and evaluating each new study with a limited number of participants was efficient and effective in testing our hypothesis.However, in future studies, we shall undertake a single experimental protocol with many participants rather than mixing participation across multiple interdependent studies.Here, we note that the number of trials in online sessions of study 3 was not fixed because participants were permitted a second attempt to make the correct response to the failed tasks (more details in Section 2.2.2).Topographical and frequency maps obtained in studies 1-4 demonstrated similar brain activity patterns across participants during object imagery task performance, suggesting the combined results obtained from the four interdependent studies in the series can be taken as a whole.With the various observations enabled by modifications across the study series, significant progress has been made toward designing a larger trial with optimal stimuli, gamification and signal processing strategies to determine if participants can learn to modulate brain activity through shape/object imagery sufficiently to achieve accuracies that are possible in other imagery paradigms, as shown in our recent study in 2022 [79] and by Bigitiomana et al., 2020 [80].
Notably, the DA in the final experimental paradigm (study 4) for each of the four participants showed an incremental increase over three online sessions using visual feedback, reaching the highest online accuracy (DA = 35%) during the last session.However, as these results were obtained only for four participants using three online sessions, the increased trend in the DA which was detected over three online sessions should be confirmed in future work with more participants using a longitudinal multi-session scenario, as it has similarly been demonstrated in a longitudinal study based on Cybathlon results in our recent publication [79].Building on work reported by Pidgeon et al. 2016 [25] and Hay et al. 2019 [26] - Duffy et al., 2019 [81] investigating design ideation and the potential for future BCI technologies to support design ideation.For example, providing neurofeedback to allow designers to moderate their thought processes or allowing them to realize their imagination seamlessly in digital environments.Recent results by Campbell et al., 2020 [82] involving designers ideating on complex design tasks during fMRI show various brain regions are activated and may be associated with memory access, visual and motor imagery.For example: (1) a region of interest in the para-hippocampal gyrus (−27,-34,-13) revealed significant design ideation-related coactivations with the left fusiform gyrus, lingual gyrus, inferior temporal gyrus and right cerebellum during design ideation; and (2): the left lingual gyrus ROI (−15,-43,-10) was found to have significant ideation related functional connectivity with clusters in the right lingual gyrus, as well as in the left superior frontal gyrus and bilateral cerebellum, indicating a significant connectivity with visual processing regions (lingual gyrus and fusiform gyrus).The results possibly reflect the interplay between long-term memory processes, visual and motor imagery during design ideation.Our work provides evidence that we can classify shape imagery when weighting CSP features across a number of those regions.Ongoing work is focused on undertaking a detailed functional connectivity analysis to determine more specifically the regions of activation and connectivity, but this is limited by the spatial resolution provided by our EEG montage.
Finally, we should note that BCI calibration trials contained artifacts (identified via visual inspection) were removed during the offline calibration method.However, the online frameworks applied to the present studies did not involve online artifact removal and, therefore, the results of the online sessions are demonstrable of what would occur in an online setting.
Automated artifact removal may indeed provide further enhancements to online object/shape imagery classification.

Conclusion
The research presented in this paper, involving ten participants in a single offline session (study 1), three participants in three offline sessions (study 2), two participants in eight offline and seven online sessions (study 3), and four participants in two offline and three online sessions (study 4) -provides evidence that distinguishing imagined sphere, cone, pyramid, cylinder, and cube-based neural correlates in EEG is feasible and participants can improve shape imagery to modulate brain activity to enhance BCI performance when real-time feedback is provided.
Thirteen of sixteen participants achieved a DA of 30 ± 5% during the mental imagery task, significantly higher than the DA obtained during the corresponding pause period (Wilcoxon non-parametric test p < 0.05, empirical chance level 20 ± 6%).The performance of all participants improved with online feedback.To the best of the authors' knowledge, this is the first study to provide real-time feedback across multiple sessions involving mental imagery of five 3D primitive objects.The best single-session CV test accuracy was achieved by participant 2 of study 4, when the classifier was trained and tested using a dataset recorded in the last (third) online session (DA = 51 ± 7%, empirical chance level 20 ± 6%).This result suggests that it may be feasible to reach accuracy levels that would enable functional use with this type of BCI and extensive training.The evolution of the paradigm involving gamification and biased feedback may have also influenced engagement over sessions.We also showed that the features are stable through inter-session tests, where peak accuracy levels and time point of peak accuracy were consistent when classifiers were trained on one session and applied to later sessions.Recalibrating the BCI within the session may enhance the results.Improvement in online DA over sessions indicates mutual learning capability between the BCI user and the BCI.An appropriately scheduled BCI recalibration regime and more advanced signal processing pipeline together with a longitudinal multi-session scenario may lead to improved accuracy.
Results of the frequency and topographical analysis indicate that the 0-4 Hz (delta) (for some participants along with the 4-8 Hz (theta)) oscillations in the frontal, posterior parietal and occipitotemporal cortex have an important role in the mental imagery of 3D primitive objects.
In conclusion, although the performance of this BCI involving 3D object classification from EEG is likely to be too low to experience a feeling of reliable control or interaction, the results are a positive indication that with learning and real-time feedback these mental tasks, or a combination of these and other mental tasks, could be used for performing a mental-task-based operations in virtual spaces or cognitive aided engineering design using an online BCI.The low number of participants does, however, prevent us from assessing how generalizable these results are, and further work is required to confirm this preliminary evidence.

Figure 1 .
Figure 1.Illustration of five 3D primitive objects displayed in studies presented in this paper.

Figure 2 .
Figure 2. The offline experimental paradigm.(a) An example of the screen layout during offline task performance.(b) The timing of an offline trial.(c) An example of how the screen content varied during the second offline trial of a sub-block.

Figure 3 .
Figure 3.The online experimental paradigm.(a) An example of the screen layout during online task performance.(b) The timing of an online trial.(c) An example of how the screen content changed during the second online trial of a sub-block.In this example, the result of the first trial was successful as the color of the bottom-most object (cube) on the left side of the screenshots is blue.The result of the (second) trial indicates an unsuccessful trial as the object (pyramid) is different from the target object (cylinder), and the color of the middle object on the left side (c) (cylinder) changed to pale yellow.

Figure 4 .
Figure 4.The timing of the experiment in a session.(a) Timing of a sub-block.(b) Timing of a block.(c) Timing of a run.(d) Timing of the experiment.

Figure 5 .
Figure 5. Placement of the EEG and ground electrodes (reference electrode was placed to the right earlobe).

Figure 6 .
Figure 6.Filter-bank common spatial patterns (FBCSP) based multi-class classification method.The block diagram illustrates the structure of the FBCSP-based multi-class classification method using mutual information (MI) selection and linear discriminant analysis (LDA) based classifier.The number of the bands and selected features were different in offline studies 1-2 and online studies 3-4 (described in the text body).

Figure 9 .
Figure 9. Results of study 3. (a): significant DA values from study 3. (a1): cross-session CV results.The mean value (thick black lines in green columns) and STD (green columns) of peak DA rates obtained from cross-session CV are presented for each session, separately.The session ID that was selected for calibrating the final BCI is marked with a rectangle below the DA chart.(a2): detailed results of the

Figure 10 .
Figure 10.Overview of decoding accuracy achieved in studies 4. (A): significant DA values from study 4. (A1): single-session CV results of subject-specifically calibrated BCIs providing the highest DA in cross-session CV.The mean value (thick black lines in green columns) and STD (green columns) of peak DA rates obtained from the single-session CV are presented for each subject, separately.(A2): crosssession stability test results of subject-specifically calibrated BCIs providing the highest DA in cross-session CV. (A3): DA rates achieved by the subjects 1-4 in online sessions 1-3.(B): subject-specific time-varying DA plots from online sessions 1-3.The expected position of peak DA is indicated with a black vertical solid line in the task interval of the time-varying DA plots.DA peaks obtained in a time interval matching visual perception and mental imagery periods are indicated with VP and MI labels in time-varying DA plots, respectively (DA values plotted in (B) are calculated using a 1s classification window prior the plotted DA values occurs around +500 ms shift in the peak DA compared to the mid-point of the classification window).

Figure 11 .
Figure 11.Results of subject-specific frequency and topographical analyses for studies 4. The frequency and topographical maps, using CSP filters and MI weights of the calibrated BCIs, indicate frequency bands and cortical areas providing the highest DA contribution.DA from the single-session CV of the analyzed BCI configuration is indicated below the topographical maps.Panels presenting results of a BCI configuration that provided DA > 30% in single-session CV (figure 10A1) are highlighted with a bold frame.

Figure 12 .
Figure 12.Comparison significant DA values for the cross-session analysis.Colored dots displayed in the boxplots indicate DA peaks in the task period which were significantly higher compared to DA obtained in the corresponding reference (baseline) period.Sessions without feedback are indicated as offline sessions.Sessions with feedback are indicated as online sessions, the box extends from the lower to upper quartile values, with a line at the median.The whiskers extend from the box to show the range of displayed DA values.p values obtained from the Mann-Whitney U tests are also presented.

Table 1 .
The number and duration of sessions performed in studies 1-4.

Table 2 .
Differences in the experimental paradigms used for studies 1-4.

Table 3 .
Sessions used for calibration, stability test, application, and re-calibration of the online BCI.