Introduction

Mental imagery refers to representations and the accompanying experience of sensory information in the absence of appropriate sensory inputs (Pearson et al., 2015). It enables us to remember, plan for the future, experience fantasy, and make decisions. Mental imagery may affect children’s lives in an obvious manner as they engage in imaginative and pretend play for long periods (Lillard et al., 2011; Moriguchi et al., 2015; Taylor, 1999; Walker and Gopnik, 2013). It has been shown that preschool children have four aspects of mental imagery, including image generation, maintenance, scanning, and rotation, although their performance in some aspects were worse than older children and adults (Frick et al., 2009; Kosslyn et al., 1990; Wimmer et al., 2016).

However, little is known about children’s social imagery, imagery about an agent. Social imagery can be distinguished from imagery about an object. Recent brain imaging studies have shown that the activated regions for social imagery (e.g., face) were different from those in imagery about an object (e.g., place) (O’Craven and Kanwisher 2000; Reddy et al., 2010). In this regard, although few studies have examined children’s social imagery, it is well known that young children often verbally report that they enjoy interacting, playing, and talking with an invisible agent or imaginary companion (IC) (Fernyhough et al., 2007; Gleason, 2002; Moriguchi and Shinohara, 2012; Moriguchi and Todo, 2017, 2018; Tahiroglu et al., 2011; Taylor, 1999), which is “an invisible character, named and referred to in conversation with other persons or played with directly for a period of time, at least several months, having an air of reality for the child but no apparent objective basis” (Svendsen, 1934). ICs may exemplify that young children may have a different quality of social imagery than adults. Indeed, Gleason (2002) argued that children involve ICs directly in their activities, and their relationship with ICs was similar to their relationship with real friends, whereas adults imagine conversations with real others or daydream about imaginary others. The underlying assumption is that children who spontaneously report ICs are not a unique category although the spectrum of individual differences in a given age group may exist. Instead, children’s social imagery may qualitatively differ from that of adults by involving more perceptual characteristics. These characteristics can be objectively measured through the behavioral consequences of children’s imaginative experiences.

Developmental research has not directly addressed the issue, but has focused on the ways children confuse the distinction between imagination and reality. Previous research has consistently shown that children begin to discriminate imagination from reality around the second year of life, and continue to develop during the preschool years (Goldstein and Bloom, 2015; Ma and Lillard, 2006; Wellman and Estes, 1986; Woolley 1997). The inability to separate imagination from reality may persist with age in some circumstances. For example, children were reluctant to approach a box after they were asked to pretend that the box contained a monster (Bouldin and Pratt, 2001; Harris et al., 1991; C. N. Johnson and Harris, 1994). However, this response may not necessarily reflect confusion between imagination and reality; rather, it may be attributable to curiosity, negative emotion, and cognitive availability (Golomb and Galasso 1995; Harris, 2000; Weisberg and Taylor, 2013). The dominant view is that preschool children believe that imagined entities do not intrude on real life.

Previous research suggests that children know that imaginary entities are not real, but the associated social imagery may still be extremely realistic. Here, we define “reality” as follows. Although imagination is clearly distinguished from hallucination, and children’s experience of social imagery is different from a psychiatric disorder (Taylor, 1999), we decided to use the definition of reality from psychotic studies. “Sense of reality” has been studied in relation to psychotic episodes in adults particularly in research on the perceptual experience of hallucinations, such as seeing someone’s face, even in the absence of appropriate stimuli (Blom, 2010; Jardri et al., 2012). Aggernæs (1972) proposed a definition of reality using seven qualities including sensation (perceived in an external sensory modality), behavioral relevance (relevance for one’s emotions, needs, and actions), publicness (can be experienced by others), objectivity (perceived in more than one sense), existence (exist even when no one experiences it), involuntarity (outside one’s control), and independence (independent of unusual states of mind).

Children’s descriptions about social imagery seem to be consistent with most of Aggernæs’ (1972) qualities. For example, children often report that they can “see” and “hear” an imaginary agent (sensation) (Taylor, 1999), and children who reported that they “saw” an invisible agent could easily generate visual and auditory imagery in verbal tasks (Tahiroglu et al., 2011). Regardless, verbal reports cannot be used to conclude that children’s perceptual and cognitive experiences of social imagery are identical to their perception of real humans.

In this study, we examined the behavioral significance of sensation and investigated whether young children have realistic perceptual—rather than conceptual—experiences of social imagery. We attempted to examine qualitative differences in imagery—a closely related process—of similarly aged children, relative to that of adults.

In this study, we assumed that children would be considered to have realistic perceptual experiences when presented with an imaginary item if they produced qualitatively identical involuntary behaviors when presented with a real item. Because of its salience to human perception, involuntary fixation to a face was used as the index. Even newborns are biased towards face-like stimuli (Johnson et al., 1991). Both children and adults are more likely to attend to faces rather than no-face objects, and the processing of a face interferes with other perceptual processing (Doherty-Sneddon et al., 2001; Langton et al., 2008; Ro et al., 2001). Further, children often report seeing the face of an imaginary person (Harriman, 1942; Taylor, 1999). If individuals have realistic perceptual experiences of social imagery, they would involuntarily fixate on the imaginary face, and this would interfere with other perceptual processing. Specifically, if children’s experiences of social imagery behaviorally and phenomenologically differ from that of adults, children’s perceptual processing, but not adults’, would result in interference by the imaginary face. This study aimed to test this hypothesis.

We used a predictive eye-movement paradigm to assess children’s and adults’ perception of another person’s goal-directed actions (Cannon and Woodward, 2008; Falck-Ytter et al., 2006; Kanakogi and Itakura, 2011). Previous studies have used videos where an individual—often without the face shown—performs goal-directed actions, such as grasping and placing a ball into a bucket. Participants demonstrated predictive (anticipatory) gazes at the goal position before the arrival of the hand or ball. However, face recognition research uses similar stimuli but with the face shown (Falck-Ytter et al., 2010). Moreover, although there is some evidence that infants may attend to the parents’ hands and objects rather than the parents’ faces in an interactive situation (Yoshida and Smith, 2008), Myowa-Yamakoshi et al. (2012) reported that 8-month-old and 12-month-old infants referred to an actor’s face while observing the actor’s goal-directed actions in an experimental situation similar to the predictive eye-movement paradigm. This previous evidence may indicate that the presence of a face would significantly interfere with the predictive gazes of goal-directed actions. That is, participants’ predictive gaze would be delayed when they perceive a face.

We conducted three experiments as part of this study. Experiment 1 was not directly related to experience of social imagery; rather, Experiment 1 was a precondition for Experiment 2 and Experiment 3. In Experiment 1, we investigated whether children’s and adults’ predictive eye movements were disrupted by the presence of a real person’s face. We predicted that participants’ gazes would be slower in the presence of a face than in the absence of a face, possibly because participants gazed more at the face region during the stimuli.

In Experiments 2 and 3, we evaluated our hypothesis that young children’s experience of social imagery would be more realistic—more perceptual than imaginative—relative to adults. Specifically, we examined whether adults’ (Experiment 2) and children’s (Experiment 3) predictive eye movements were affected by imagining a person, and whether the effects were different in children and adults. We conducted separate experiments for adults and children because the procedure for children was not appropriate for adult participants (See Experiment 2).

In Experiment 2, we examined whether the disruptive effects were absent when adult participants were asked to imagine an invisible person (in a video where the actor was absent). In Experiment 3, we explored whether disruptions in children’s predictive eye movements were identical to disruptions resulting from the presence of a real face when they were asked to imagine an invisible person or object. Importantly, in Experiments 3, participants were given verbal instruction to induce their imagery of an invisible agent. This was based on previous studies that used verbal induction to induce children’ imagery (e.g., Moriguchi and Shinohara, 2012; Tahiroglu et al., 2011). Based on this, this study tested whether children’s perceptual experience would be realistic in situations where children imagined an invisible person induced by verbal cues. In other words, this study cannot speak to whether children’s experiences would be entirely perceptually, not intrinsically affected by verbal cues.

Experiment 1

Participants

Twenty healthy adults (29.9 ± 6.7 years (mean ± SD); range 23–44; 10 men and 10 women) and 20 preschool children (45.4 ± 20.3 months; range 19–74; 6 boys and 14 girls) participated in Experiment 1. The minimum sample size of 15 per group (with 80% power to detect an effect, alpha = 0.05) was calculated with an a priori power analysis, using the effect sizes reported in a previous study that used similar procedures, materials, and dependent variables (Ambrosini et al., 2013). Our choice of a sample size of 20 per group is consistent with previous studies on predictive gaze (Cannon and Woodward, 2008).

Children were recruited from a registry of families maintained in the Child Development Lab at a university in Japan. Written informed consent was obtained from the parents of children prior to the children’s involvement. We obtained written informed consent from adult participants. The study was conducted in accordance with the principles of the Declaration of Helsinki and the study design was approved by the ethics review board of Joetsu University of Education (2012-1).

Stimuli and procedure

Gaze was measured with the Tobii T120 near-infrared gaze tracker system (Tobii Technology Inc., Stockholm, Sweden) that records near infrared reflections of both eyes at a 120 Hz sampling frequency as participants view a 17-inch screen (accuracy = 0.5°, spatial resolution < 0.3°). The monitor was placed ~60 cm away from the participants’ eyes. Participants were shown two videos of an adult woman sitting behind a table with three balls (yellow, blue, and pink) placed to the right of the screen and a red bucket to the left (Fig. 1a, b). The entire visual display extended to 22.3° × 14.6° of the visual angle, with each object covering 1.1° × 1.4°, and the bucket covering 4.6° × 5.1°. The angular distance between the objects and bucket was 11.5° and the vertical extent of the woman in the video was 11.4°.

Fig. 1
figure 1

Experiment 1: The agent’s goal directed actions in the a No-Face video and b Face video, c the areas of interest (AOIs), and d gaze arrival at the goal, relative to the arrival of balls. Error bars denote standard error. Written informed consent was obtained from the individual for the publication of panel b

Each Face video, lasting 20 s, depicted a woman who moved three balls to a bucket, one at a time, with her right hand. When each ball entered the bucket, 60 decibel chimes rang. It took 1.2, 1.6, and 1.2 s, to move the three objects to the bucket, respectively. The No-Face video was identical to the Face video, except that the region from the woman’s shoulders upward was blacked out.

Adult participants were seated on a chair and child participants were seated on the lap of a parent. A five-point calibration was conducted before recording performance. The presentation of Face and No-Face videos were counter-balanced. Each participant was exposed to six identical consecutive trials for each condition. Between trials, animation videos were shown until the participant refocused on the monitor. Both adults and children were instructed to only watch the Face and No-Face videos.

Three areas of interest (AOIs) were defined (depicted by rectangles in Fig. 1c); for the starting position of the three balls (Object AOI), area around the bucket (Goal AOI), and area around the face (Face AOI). Participants’ gaze shifts for each movement from the Object to Goal AOI were analyzed only if subjects fixated on the Object AOI for 200 ms (Kanakogi and Itakura 2011), shifted their gaze to the Goal AOI, and fixated on the Goal AOI before the next object was moved. Data were included in the analysis only if the participants fulfilled the criteria for at least two trials for each condition. The first trial of each condition was excluded from analyses as the gaze shifts are not predictive (Kanakogi and Itakura, 2011).

The dependent variable was the timing of gaze arrival at the Goal AOI, relative to the arrival of the moving balls (hereafter, the timing of gaze arrival). If the participant gazed at the Goal AOI before the balls arrived, the trial was considered predictive (positive values). Reactive gaze shifts were registered if the participant gazed at the Goal AOI after the balls reached the bucket (negative values). A low positive value meant that participants looked at the Goal AOI shortly before the ball arrived, whereas the high positive value meant they looked at the Goal AOI long before the ball got there. The latter was more predictive.

To assess that moving balls’ arrivals, and gazes were influenced by the presence of a face, we examined whether participants fixated on the Face AOI by comparing the median number of gaze fixations on the Face AOI and their duration at the Face AOI for the Face and No-Face videos, given that the data were not normally distributed. We also assessed whether participants’ predictive gazes in the Face video were delayed due to the presence of a face by calculating the median number of triangular trajectories of participants’ gazes from the Object to Goal AOI via the Face AOI in both the Face and No-Face videos.

Results and discussion

The mean timing of gaze arrival was 148.0 ms (95% Confidence Interval (CI) = [21.0, 275.1]) in the No-Face video and 21.9 ms (95% CI = [−118.7, 162.4]) in the Face video for children, and 256.9 ms (95% CI = [110.3, 403.3]) in the No-Face video and 180.3 ms (95% CI = [82.1, 278.6]) in the Face video for adults. We conducted a participant (adults vs. children) and face (Face vs. No-Face) mixed ANOVA on gaze arrival at the Goal AOI, relative to the moving balls’ arrivals, and found a significant main effect of having a face in the video (F (1, 38) = 4.887, p < 0.033, η2 = 0.11). No other statistically significant effects were found. Our hypothesis that timing of gaze arrival at the Goal AOI would be shorter in the Face video compared to the No Face video was supported for both children and adults (Fig. 1d).

In addition, we examined whether children’s and adults’ gazes were predictive. A difference between gaze arrival and the moving ball arrival reflected significant anticipation of the goal. Larger differences indicated larger predictive anticipation. Children shifted their gaze to the Goal AOI before the moving ball arrived in the No-Face video (t (19) = −2.349, p < 0.025, d = 0.69, two-tailed one sample t-test), but not in the Face video (t (19) = −0.326, p > 0.250, d= 0.07). Adults shifted their gaze to the Goal AOI before the moving ball arrived both in the Non-Face video (t (19) = −3.670, p < 0.002, d= 1.16) and in the Face video (t (19) = −3.844, p < 0.001, d= 1.27).

We conducted two additional analyses to assess whether participants’ gazes were indeed influenced by the presence of a face. One child was excluded from the analyses owing to the experimenter’s error. We first examined whether participants fixated on the Face AOI during the presentation of both videos by comparing the medians of the number of gaze fixations and durations. The results of the Wilcoxon signed-rank test showed that both the number of gaze fixations and average durations were greater for the Face rather than the No-Face video for adults and children (ps < 0.001, rs = 0.88, for both the number of gaze fixations and durations for adults, ps < 0.001, rs = 0.86, for both the number of gaze fixations and durations for children; Table 1). Moreover, if participants’ predictive gazes for the Face video were delayed owing to the presence of a face, the participants’ gazes would have moved from the Object AOI to the Goal AOI via the Face AOI, forming a triangular trajectory. We compared the median of triangular trajectories for videos in both adults and children. The Face video medians were 0.84 for adults and 0.67 for children, and the No-Face video medians were 0 for both adults and children (Wilcoxon signed-rank test, p < 0.021, r = 0.52 for adults; p < 0.001, r = 0.81 for children).

Table 1 Comparison of the median number of gaze fixations and average durations (s) at the Face area of interest in Experiments 1 and 3

As expected, participants’ gazes were significantly less predictive for the Face video than for the No-Face video, irrespective of participants’ age. Thus, the presence of a face interfered with predictive gaze in a qualitatively identical manner for both children and adults.

Experiment 2

Our hypothesis was that young children’s experience of social imagery would be more realistic—more perceptual than imaginative—relative to adults. To assess this, we created a Ghost video (Fig. 2a) that was identical to the Face video except that there were no persons in the video. Thus, in the Ghost video, the balls moved automatically, as if someone moved them. We presented the Face video before the Ghost video to facilitate the process of imagining the person moving the ball.

Fig. 2
figure 2

Experiment 2: a The Ghost video where the balls moved automatically, as if someone moved them. b The results of Experiment 2. The data were approximately normally distributed. Error bars refer to standard errors

In Experiment 2, we examined whether adults’ predictive eye movements (N = 40) were affected by imagining a person. Two experimental conditions were involved in Experiment 2: the Imagination and Ball conditions. Participants in the former were instructed to imagine that a person moved the ball from the starting point to the bucket, immediately followed by the Ghost video. The Ball condition, which we assumed was the baseline, was identical to the Imagination condition except that participants were instructed to only watch the movement of the balls. We predicted that the results in the Imagination condition would not be qualitatively different from those in the Ball condition.

Method

Participants

The sample size was estimated using the same method as Experiment 1 with a between subjects design (Cannon and Woodward 2008; Falck-Ytter et al., 2010; Henrichs et al., 2014; Kanakogi and Itakura, 2011). A minimum of 17 participants per group was needed to detect an effect at 80% power (alpha = 0.05). Participants included 40 healthy adults (27.0 ± 7.0 years; range 19–44; 19 men and 21 women) who were randomly assigned to either the Imagination or Ball conditions.

Stimuli and procedure

The apparatus was identical to Experiment 1. We used the Face video in Experiment 1 and a 20 s Ghost video that was identical to the Face video, except that the agent was absent (Fig. 2a). In the Ghost video, the balls moved automatically, as if someone moved them. It took 1.3, 1.8, and 1.4 s for the 3 objects to reach the bucket, respectively.

In the Imagination condition, participants were first asked to decide on a person they would imagine during the experiment, and report the name of the person to the experimenter. We did this because having a named person activates more precise imagining. Then, the participants were shown the Face video in three identical trials, followed by the Ghost video for six identical trials. Before presenting the Ghost video, participants were instructed to imagine that the person moved the ball from the starting point to the bucket. The Face videos were shown before the Ghost videos to facilitate imagination. The Ball condition was identical to the Imagination condition except that participants were instructed to only watch the movement of the ball. The predictive gaze analyses were identical to Experiment 1.

Results and discussion

The mean timing of gaze arrival was 156.4 ms (95% CI = [88.77, 223.97) in the Imagination condition, and 195.8 ms (95% CI = [100.12, 290.87] in the Ball condition. We conducted a Student’s t-test to examine whether adults’ predictive eye movement was affected by imagining a person’s actions (Fig. 2b). We found no significant differences in predictive eye movement across the two Ghost video conditions (t (38) = 0.706, p > 0.48, d = 0.27). We also examined whether participants’ gazes were predictive. The results revealed that participants’ gazes were found to be predictive in both the Imagination (t (19) = −4.484, p < 0.001, d = 1.53], two-tailed one sample t-test) and Ball condition (t (19) = −4.300, p < 0.001, d = 1.37). The results supported our prediction that adults’ predictive eye movements would not be affected by imagining a person.

Experiment 3

Next, we explored whether an invisible person affected young children’s (N = 60) predictive eye movements, considering that it did not for adults (Experiment 2). The procedure was similar to Experiment 2 with two important differences, the introduction of the task and the Fan condition. In a previous study, children were introduced to an experimenter’s invisible friend—about whom the experimenters asked questions—after which the children spontaneously interacted with the invisible agent as though it were real (Moriguchi and Shinohara, 2012). Based on this, we first presented a cover story about the experimenter’s invisible friend “Hikaru” to the children in all conditions. Participants in the Invisible condition were then shown the Face video and instructed that Hikaru would move the ball from the starting point to the bucket in the following video; the Ghost video was then presented. The procedure for the Ball condition was the same as in Experiment 2. Second, in the Fan condition, children were shown a picture of a fan before the Ghost video and were asked to imagine that the fan was present during the video (Fig. 3a). The Fan condition was designed to exclude the possibility that imagination itself (rather than specifically imagining faces) may increase cognitive load, resulting in interference with children’s predictive eye movement (which would be considered unrelated to the realistic experience of the invisible agent).

Fig. 3
figure 3

Experiment 3: a The picture used in the Fan condition. b The placement of the fan was equivalent to the agent in the Face/No-Face video. c The results of Experiment 3. Error bars signify standard errors

In Experiment 3, we explored whether imagining an invisible person interfered with children’s predictive eye movements. If children have realistic perceptual experiences of social imagery (even though they are affected by verbal cues), children in the Invisible condition would show less predictive gaze compared to those in the Fan and the Ball conditions, possibly because children gazed more at the face regions in the Invisible condition compared to the same regions in the Fan and the Ball conditions.

Method

Participants

Sample size estimation was identical to Experiment 2, with 60 preschool children (50.2 ± 16.4 months; range 20–77; 30 boys and 30 girls) participating in the study. Six additional children were tested but excluded from the final sample because of failure to collect data, such as when children’s movements prohibited the eye tracker from detecting eye gaze and providing no useable data. Children were randomly assigned to either the Invisible, Ball, or Fan condition. There were no differences in age across conditions.

Stimuli and procedure

The apparatus and videos in the Invisible and Ball conditions corresponded to those in Experiment 2. However, at the beginning of the experiment, a cover story regarding the experimenter’s invisible friend, “Hikaru,” was verbally presented to the children in all three conditions. We confirmed that no children had a friend named “Hikaru”. Children did not know “Hikaru”, and they may not have been able to precisely imagine the person.

For the Invisible condition, children were shown the Face video in three identical trials, followed by the Ghost video in six identical trials. Before presenting the Ghost video, the children were verbally instructed that Hikaru would move the ball from the starting point to the bucket (“Here is my friend, Hikaru. Hikaru is moving a ball from the starting point to the bucket. Can you see it?”). The experimenter named “Hikaru” only once at the beginning of the experiment. The Ball condition corresponded to the same condition in Experiment 2. In the Fan condition, children were shown the picture of a fan before the Ghost video and were instructed to imagine that the fan was present during the Ghost video. We used a picture of a white fan with an equivalent size and position to that of the human in the Face video (Fig. 3a). No additional instructions were given during the observation of the videos. We carefully controlled the use of verbal instructions.

The predictive gaze analyses were identical to those in Experiment 1, except that we enlarged the Face/Fan AOI horizontally in the analyses of the triangular trajectories (Supplemental Fig. S1).

Results and discussion

The mean timing of gaze arrival was 95.7 ms (95% CI = [3.1, 188.3]) in the Invisible condition, 233.8 ms (95% CI = [130.9, 336.7]) in the Ball condition, and 275.0 ms (95% CI = [194.0, 355.9]) in the Fan condition. The one-way ANOVAs yielded statistically significant differences in children’s predictive eye movement across conditions (Fig. 3c) during the Ghost video (F(2, 57) = 4.506, p < 0.015, η2 = 0.14). Shaffer’s post-hoc test revealed that children’s gazes in the Invisible condition were less predictive compared to those in the Ball (p < 0.032) and Fan conditions (p < 0.001), while no statistically significant differences were found between the latter two (p > 0.250). Given that the age range was relatively large (20–77) in this experiment, we conducted a regression analysis to assess whether children’s age in months, as well as sex and condition predicted children’s gaze behavior. Results revealed that condition [b = 87.497, SE b = 31.830, b* = 0.344, p = 0.008], but not age in months [b = −0.577, SE b = 1.606, b* = −0.0454, p > 0.250] or sex [b = −69.702, SE b = 51.434, b* = −0.168, p = 0.181], predicted children’s gaze. Thus, neither children’s age in months nor sex were statistically significant predictors at least in this study. In addition, we examined whether participants’ gazes were predictive, and found that children shifted their gaze to the goal AOI before the moving balls arrived in the Invisible (t(19) = −2.164, p < 0.043, d = 0.48, two-tailed one sample t-test), Ball (t(19) = −4.754, p < 0.001, d = 1.06), and Fan conditions (t(19) = −7.110, p < 0.001, d = 1.60).

We conducted three additional analyses. The first examined the possibility that children in the Invisible condition may show less predictive eye movement, not because imagination affected their gaze, but because they may search around the screen, resulting in a diffused gaze. Therefore, we examined whether children’s gaze in the Invisible condition was more broadly distributed than in the other conditions, implying lower total gaze fixation time (sum of Object, Goal, and Agent AOI fixation). The analyses revealed that there were no statistically significant differences in the total gaze fixation time between conditions (F(2, 57) = 0.079, p > 0.250, η2 = 0.00; Supplemental Table S1).

The second analysis examined whether participants fixated on the Face or Fan AOI (identical to the Face AOI) during the Ghost videos. We calculated the number of gaze fixations and the average durations at the Face or Fan AOI in all three conditions. We assumed that both indices in the Ball condition were smaller than those in the Invisible condition. The planned comparisons showed that the number of gaze fixations (Mann–Whitney U test, p < 0.067, r = 0.42) and the average durations (p < 0.087, r = 0.38) at the Face AOI were marginally higher in the Invisible, rather than Ball condition; however, no statistically significant differences were found between the Invisible and Fan conditions (p > 0.250, r = 0.07 for number of gaze fixations; p > 0.250, r = 0.11 for duration; Table 1). Moreover, the number of children who fixated at a Face AOI at least once differed across conditions, while more children in the Invisible condition (55%) demonstrated gaze fixation at a Face AOI than those in the Ball condition (20%; χ2 (1, 40) = 5.227, p < 0.023). No statistically significant differences were observed between the Invisible and Fan conditions (50%; χ2 (1, 40) = 0.110, p > 0.250). The results indicated that children in the Invisible and Fan conditions fixated on the Face or Fan AOI, but those in the Ball condition did not.

Children in the Ball condition did not gaze at the Face AOI, so we compared triangular trajectories in the Invisible and Fan conditions. As there were very few triangular trajectories, we enlarged the Face/Fan AOI in Experiment 3 horizontally (Supplemental Fig. S1). The results showed that the median number of triangular trajectories did not differ significantly across conditions, but the average rank was relatively high in the Invisible condition (22.6) compared to the Fan condition (18.4), with a medium effect size (Mann–Whitney U test, p > 0.158, r = 0.32).

The results partially supported the prediction. Thus, the results were similar to Experiment 1 wherein children and adults’ gazes were less predictive when they perceived a person with a face than without. However, in Experiment 3, children’s gazes were less predictive when they were introduced to an invisible person, compared to when participants were asked to watch the video or imagine an object. The interference effect on predictive eye movements was qualitatively similar to the Face video in Experiment 1.

One might argue that children in the Fan condition may fail to comply with the instruction, and may not imagine a fan during the experiment. However, our AOI analyses showed that children in the Invisible and Fan conditions equally fixated on the Face or Fan AOI. The results indicated that children in both the Invisible and Fan conditions complied with our instructions and imagined a face or a fan during the experiment. Nevertheless, the predictive gaze in the Invisible condition was significantly slower than the gaze in the Fan condition.

The fixations on the Face/Fan AOI and the number of triangular trajectories did not significantly differ between the Invisible and Fan conditions, which was different from the results in Experiment 1. This may be because the position of an imagined face/fan may have differed across participants in Experiment 3. In Experiment 1, the face of the agent was presented in the Face video during trials, and the position of the face was the same across participants. On the other hand, in Experiment 3, participants imagined the face of the agent/the fan; therefore, the position of the face/fan may be different across participants. Indeed, compared to Experiment 1, children’s fixations on the Face AOI were significantly reduced in Experiment 3. Thus, children may have fixated on the imagined face in Experiment 3, but we failed to capture the trajectory in our analyses.

However, given the inconclusive evidence of the additional analyses (i.e., fixations on the Face/Fan AOI and the number of triangular trajectories), the delay in the children’s predictive gaze in the Invisible condition compared to the Ball and the Fan conditions may not necessarily have been due to the perception of an imaginary face. We need to consider other possible interpretations about our results. One potential factor may be the detection of human-like agency when imagining an agent. That is, children may detect human-like agency from the imaginary agent’s body movements rather than a human-like face in the Invisible condition. In sum, both perception of an imaginary face and detection of agency from an imaginary body may affect a child’s predictive gaze. This possibility should be assessed in future studies.

General discussion

Experiment 1 showed that the presence of a face interfered with children’s and adults’ predictive gazes when perceiving a human’s goal directed actions. Experiments 2 and 3 found that children’s gazes, but not adults’ gazes, were less predictive when they imagined/were introduced to an invisible person, compared to when both groups only watched the video. Previous explanations regarding the fantasy/reality distinction do not explain our results (Harris, 2000; Weisberg and Taylor, 2013). One might argue that our findings could be attributed to emotion; however, the stimuli did not include emotional valence. The cognitive availability hypothesis proposes that children fail to make a fantasy/reality distinction because the pretend event’s salience is heightened through imagination (Harris, 2000). However, this does not apply in this study as only children’s gazes in the Invisible condition were affected, even though they were required to imagine events in both the Invisible and Fan conditions.

In this study, we tested whether children would have a realistic perceptual experience when imagining an invisible person. If children produced qualitatively identical involuntary behaviors when imagining an invisible person as they did when presented with a real item, children’s experiences would be perceptual. If not, this would suggest that children did not have a realistic perceptual experience. Our results generally supported the former hypothesis. However, we should note that children’s experiences may be perceptual, but this was specific to situations where children imagined an invisible person induced by verbal cues. It was still unclear whether children’s experiences would be entirely perceptual, not intrinsically affected by verbal cues. In Experiment 3, children were given a verbal cue “Hikaru” (agent name) at the beginning of experiment. Although no verbal instructions were given while observing stimuli, this study did not entirely refute the possibility that the children’s experiences were strongly affected by verbal cues. To test whether the children’s experiences were affected by verbal cues, we would need to devise an experimental procedure without verbal cues. For example, we can show children a video, where an invisible person seems to exist and is doing some activities (e.g., walking and leaving some footprints), which may make the children believe that the invisible agent exists. Then, the children would be given the stimuli used in this study. This type of experiment does not use verbal cues about the agent, which would allow us to assess the possibility that the children’s experiences are not affected by verbal cues.

Moreover, as we argued in Experiment 3, we need to consider potential factors to explain the condition differences in predictive gaze in Experiment 3. Specifically, we need to test whether both perception of an imaginary face and detection of agency from an imaginary body may affect a child’s predictive gaze. One possibility was that perception of an imaginary face is specifically important for the interference effect. Previous studies have shown that face is salient stimulus and induces involuntary fixation. Indeed, even newborns are biased towards face-like stimuli (Johnson et al., 1991). Individuals are more likely to attend to objects with rather than without faces, and the processing of a face interferes with other perceptual processing (Langton et al., 2008). Alternatively, as discussed earlier, an agent’s body/action may induce the interference effect. Moreover, a face, coupled with an agent’s body/action may be critical. Myowa-Yamakoshi et al. (2012) reported that 8-month-old and 12-month-old infants referred to an actor’s face while observing the actor’s goal-directed actions in an experimental situation similar to the predictive eye-movement paradigm. In future works, we would like to modify the Invisible conditions in Experiment 3 to test the possibilities. In this study, in the Invisible condition, children may have imagined an imaginary face as well an imaginary body, each of which may have affected children’s predictive gaze. In the future study, we would like to separate the effect of an imaginary body from the effect of an imaginary face. To do this, children’s predictive gaze will be examined when children imagine an agent with a face (Invisible Face) and without a face (Invisible No Face).

There are some limitations in this study. First, the age ranges in children are relatively large. We selected the age range because previous studies have shown that children often report an imaginary agent between 2 and 6 years of age (Taylor, 1999). The age range was matched between conditions in Experiment 3, so our method would be valid. Moreover, age in months did not affect results in Experiment 3. Nevertheless, collecting more samples for each age may lead to concerns about developmental changes in children’s responses. Second, due to differences in the methodology, we did not directly compare the results of children to those of adults. In Experiment 3, children were introduced to “Hikaru” whereas adult participants in Experiment 2 were instructed to imagine a person the participants know. We need to develop paradigms to directly compare children’s responses to adults’ responses. Third, there should be substantial individual differences of children’s experiences of perceptual reality. Although age in months and sex did not affect children’s predictive gaze, there might be other factors that may affect the children’s experiences. For example, children’s IC status and fantasy orientations may affect children’s perceptual experiences. Future research should develop experimental procedures that can address these limitations.