Siying Xie, Daniel Kaiser, Radoslaw M. Cichy
(A) Stimuli were a diverse set of twelve object images and twelve spoken words denoting these objects.
(B) In the perception task, participants viewed the object images in random order.
(C) In the mental imagery task, participants were cued to imagine an object by hearing the spoken word denoting the object.
(D) EEG data recorded from 64 electrodes during both tasks were epoched into trials and subjected to time-frequency decomposition using Morlet wavelets. This was done separately for each single trial and each electrode, yielding a trial-wise representation of induced oscillatory power. We aggregated these time-frequency data into three frequency bands (theta: 5–7 Hz; alpha: 8–13 Hz; beta: 14–31 Hz). Averaging across all frequencies within each band yielded a time- and frequency-resolved response vector (across EEG sensors) for each trial. These response vectors were entered into multivariate pattern analyses.
(E) Multivariate pattern classification was performed separately for each frequency band. As perception and imagery need not emerge with similar temporal dynamics, we performed a time-generalization analysis in which we considered timing in the perception and imagery tasks independently. For every time point combination during perception (0–800 ms with respect to image onset) and imagery (0–2,500 ms with respect to word onset) separately, we conducted a pairwise cross-classification analysis where we trained support vector machine (SVM) classifiers to discriminate between response patterns for two different objects (here: car versus apple) when they were imagined and tested these classifiers on response patterns for the same two objects when they were perceived (and vice versa). We averaged classification accuracies for all pairwise classification analyses between objects, yielding a single time-generalization matrix for each frequency band. These matrices depict the temporal dynamics of representations shared between imagery and perception.
(F) We found significant cross-classification in the alpha frequency band, ranging from 200 to 660 ms in perception and from 600 to 2,280 ms in imagery. Peak decoding latency was at 480 ms (95% confidence intervals: 479–485 ms) in perception and 1,340 ms (95% confidence intervals: 1,324–1,346 ms) in imagery.
(G) To spatially localize these shared representations, we performed separate time-generalization analyses for anterior and posterior electrodes in our EEG setup. This analysis revealed significant cross-classification in the alpha band for posterior electrodes (from 20 to 800 ms during perception and from 660 to 2,500 ms during imagery), but not in the anterior electrodes. This suggests that parieto-occipital alpha sources mediate the shared representations between perception and imagery. Black outlines indicate time point combinations with above-chance classification (N = 38; non-parametric sign permutation tests; cluster-definition threshold p < 0.05; cluster threshold p < 0.05; Bonferroni corrected by 3 for the number of frequency bands tested). Dec. acc., decoding accuracy.