Visual perception is critical in helping people understand and infer states of the world in which they live. The Gestalt law of common region describes how elements belonging in the same region tend to be perceptually grouped together pre-attentively. An important feat in vision is the ability to organize an unstructured world into more coherent groups of objects. Recent evidence suggests that perceptual grouping occurs not only from visual similarity, but also from associative similarity. More specifically, visually dissimilar shapes are perceptually grouped if they have a learned association.
This study investigated how effectively perceptual grouping can form through visual statistical learning. Participants performed a repetition discrimination task (RDT), a type of visual task, on the computer and responded with keyboard presses.
Standard perceptual grouping was found in the training phase as participants had significantly faster discrimination times when the target shapes in the visual search row were grouped by common region
(within-group condition) compared to shapes from an equidistant but different region (between-group condition). The perceptual grouping effect did not replicate in a transfer phase, where common region grouping was removed but shape associations remained. Finally, there were no significant differences in discrimination time between within-group and between-group conditions. This study showed that grouping of objects through visual cues did not lead to strong learning of associative similarity.
A seemingly simple task such as searching for lost keys requires the coordination of multiple attentional processes that work together simultaneously without conscious awareness. Research into visual cognition has begun to elucidate how prioritizing one’s attention to a specific location or to groups of features that share a common region affect our perception of the objects in the visual environment. Broadly, visual processes can be categorized as either bottom-up or top-down. Bottom-up processes refer to the automatic capture of attention by salient information in the visual environment. Top-down processes refer to how attention is preferentially guided to certain visual information based on prior experiences, goals, and knowledge.
Gestalt psychologists tried to explain how we process visual information by proposing that only bottom-up processing matters in perception. They elaborated that our perception of objects and scenes is
governed by the inherent visual properties of the scenes themselves, such as the enhanced attention to objects in common region . According to them, perception of objects was mostly independent of our previous learning experience (top-down processing) with our visual environment. However, accumulating evidence suggests that prior non-visual experience with visual structures affect our perception
of objects, which is not explained by Gestalt laws. Indeed, non-visual factors such as hand position can influence perceptual grouping .
Studies by Beck and Palmer (2002) showed that perception of groups is influenced by top-down cognitive states . They demonstrated this by varying the probability that the target pair appeared in a common region within an oval (within-group) . After indicating the likelihood of the target appearing in a grouped region before each condition, participants became progressively faster at target discrimination in within-group conditions as the probability of the target appearing within common region increased . The degree to which perceptual grouping and object-based attention occurred was influenced by top down processes instead of solely bottom-up features of Gestalt visual elements .
Similarly, studies by Vickery and Jiang (2009) provided more evidence that perception can be influenced by prior experience and statistical visual learning . A repetition discrimination task (RDT) was used to demonstrate that participants can unintentionally group distinct shape pairs together due to their history of having appeared in a common region, even when this associative grouping is not helpful for the main task of locating a color repetition . In a new context (transfer phase) without the common-region grouping of the same shape pairs, participants had a within-group advantage presumably based solely on the co-occurrence of shape pairs . The RDT was previously used by Palmer and Beck (2007) and Vickery (2008), where they successfully showed within-group advantage due to common region and maintenance of the pair associations in the absence of Gestalt features [5,6]. Interestingly, young infants did not show transfer of grouping effect due to common region . The authors predicted this was because grouping through common region is a weak cue for association. However, the presence of associative learning through common region in older adults in other studies indicates that perceptual grouping is not solely due to the intrinsic properties of objects, but is also learned through experience.
More recent studies suggest that it is possible to learn associations between novel shapes through passive visual statistical learning without using common region cues to help with associations in the training phase. Zhao et al. (2014) demonstrated that the cooccurrence of novel shapes, learned only through passive viewing of those shape scenes, was enough to induce perceptual grouping . This indicates that Gestalt features are not required even for initial learning of associations between objects. Together, these studies show that cognitive factors, like associative memory and top-down sets, can determine what constitutes a perceptual group.
While a number of studies have hinted at the possibility of association-based perceptual grouping, the critical factors controlling this ability remain poorly understood. This study investigated how prior experience of Gestalt grouping and statistical co-occurrence alters perceptual grouping. The strength of the learned perceptual grouping, if any, was tested in a new scene without common region, similar to the study by Vickery and Jiang (2009) . Participants were exposed to pairs of shapes, some of which were always grouped by common region marked by rectangular outlines. They performed an RDT to detect the repetition of a feature that was different from the grouping features. Unlike the task of locating a single-color repetition overlaid in a scene with shape pairs in Vickery and Jiang (2009), this study used letter repetition as search targets while exposing participants to shape pairs using common region. Successful replication of perceptual grouping effects in a scene without grouping cues would further reveal that consistent and strong associations can produce grouping in the absence of Gestalt cues. This strengthens the conclusion that distinct shapes are perceptually treated as one whole object due to previous experience.
Participants completed this experiment on a computer in a darkened room. The monitor was located 20in away from the chin rest and displayed the experiment trials on a light grey background. The experiment was programmed using MATLAB. Participants responded using a keyboard for all trials.
Twenty students from the University of Toronto volunteered to complete this experiment for a course credit. All participants reported normal or corrected to normal vision and were naïve to the study. Participants were briefed prior to the experiment on the key selection corresponding to each response.
This experiment had two phases, each with 360 trials. In the first phase, or the training phase, eight black shapes were displayed in one row on the screen from a pool of six different novel shapes. Shape pairs were bordered with rectangular groups that defined the border for common region. All eight shapes were evenly distributed on the screen. In each trial, there were either three or four pairs displayed. The pair number displayed was evenly counterbalanced. In the trial with three pairs, two shapes at both ends were unpaired and were not contained in a rectangle (Figure 1). The trial with four pairs had all eight shapes contained in four rectangles, with two shapes in each rectangle. For instance, if each letter represented a distinct shape, and the closed bracket representing rectangle groups, the pairs would be arranged similar to the following sequence: [A B][C D][E F]. Pairs were selected randomly with the constraint that the same pair could not appear twice in a row (e.g., [A B][A B] was not possible). In the second phase, or the transfer phase, the display and the task was identical to the training phase but without the rectangles bordering the pairs of shapes. The shape pairs from the training phase still co-occurred in the transfer phase: A B C D E F (Figure 2).
In both phases the shapes contained the letter “X” or the letter “O” at the center of the figures. In a given trial, there was one instance where two adjacent shapes had the same labeled letter. The single letter repetition could occur on the pair inside the rectangle (within-group condition), or between rectangle groups (between-group condition). The repeated letter could either be “X” or “O” appearing side by side on any position in the row. The identity of the target letter as well as the type of grouping (withinor between-group condition), were counterbalanced.
The participant’s task was to select the “z” key if the adjacent repeating letter in a given trial was “X”, or the “/” key if the repeating letter was “O”. The display screen stayed on until the participants made their selection for each trial. The monitor displayed the message “Correct” if participants made the right selection, and “Incorrect” if they did not. A new trial began after the message. The MATLAB program recorded the position of the repetition, participants’ accuracy and their reaction times.
In this experiment, the same set-up and apparatus was used. This experiment replicated the main procedures of the first one, using eleven shapes on the screen instead of eight shapes, out of the same pool of six different shapes. There were always five pairs of rectangle groups in each trial, and one shape on the left or the right end was outside a rectangle without a pair. In a given trial, only four different shapes (two possible pairs) were used. The pairs alternated in position such that every other rectangle contained the same shape pairs. For instance, if each of the following letters were distinct shapes and the closed brackets were rectangles the display was arranged this order: D [A B][C D][A B][C D][A B] (Figure 3). Similar to the first experiment, the transfer phase was identical to the training phase with the exception of rectangular groups. The shape pairs from the training phase still co-occurred in the transfer phase: D A B C D A B C D A B. The task and the key selection were identical to the first experiment.
In the training phase of the first experiment, when discriminating and responding to the correct repeating letter, the participants had significantly faster mean reaction times (t(19)=2.962, p<0.008) when the repetition occurred in shapes within the rectangle group pairs compared to when the repetition was between groups (Figure 4A). However, in the transfer phase, the within-group advantage disappeared (t(19)=0.898, p=0.38) and reaction times to repetitions within groups were not significantly different from the reaction times to repetitions between groups. Therefore, even though the letter repetition occurred on shapes that were previously grouped in a common rectangular region during training, the participants did not show perceptual grouping of those shapes.
In the training phase of the second experiment, when discriminating and responding to the correct repeating letter, the participants had significantly faster mean reaction times (t(19)=3.273, p<0.004)
when the repetition occurred in shapes within the rectangle group pairs compared to when the repetition was between groups (Figure 4B). Similar to the first experiment, in the transfer phase the within group advantage disappeared (t(19)=1.786, p=0.09) and reaction times to repetitions within groups were not significantly different from the reaction times to repetitions between groups.
In order to understand how we perceive the world, it is critical to understand what influences automatic identification of patterns in a scene and how these perceptual associations can help interpret future scenes. Several studies challenge the assumptions made by Gestalt laws stating that our perception depends on the pre-existing intrinsic characteristics of our visual scenes . However, the results from this study suggest that bottom-up features of objects during grouping induce a stronger effect in the training phase than top-down features in the transfer phase.
The first experiment did not show learned perceptual grouping after participants were passively exposed to shape pairs through common region. As expected in the training phase, the letter repetition was detected faster when they were on the shapes within the rectangle. However, the purpose of the experiment was to investigate whether this effect can be transferred and replicated without needing visual grouping cues. Since the participants did not discriminate the within-group shapes differently from the between-group shapes in the absence of visual cues, the distinct objects were not associated together as strongly. This might be due to the fact that learning to group by common region is a relatively weak visual cue for association compared to other Gestalt grouping features, such as grouping through similarity .
However, this experiment was also different from Vickery and Jiang (2009) in several other ways . For example, they had 16 shapes on display, and used only two shape pairs on any given trial, potentially making the associations more salient. The second experiment was adjusted to better match Vickery and Jiang (2009) along these lines .
The second experiment investigated the effect of increasing the set size and the regularity of grouping patterns on the ability to make associations between shapes. Similar to Vickery and Jiang (2009), the grouped pairs alternated position and there were only two different pairs in a trial at a time . It was predicted that due to these changes, the co-occurrence of shape pairs would be more apparent than it was in the first experiment. However, these changes to the second experiment did not appear to produce learned grouping by association. The difference in reaction time between within- and between-groups in the training phase was also unexpectedly trending towards a between-group advantage, as shown in Figure 3. Therefore, having a predictable display organization along with larger set size did not produce learned grouping.
The lack of grouping by common region in the transfer phase could be due to the type of task irrelevant stimuli presented during RDT. Task irrelevant stimuli are features of objects that are distractors or are unnecessary to complete the task during target identification. Vickery and Jiang (2009) used color repetition of the shapes as a target feature, while this study used “X” and “O” letter repetition placed inside of the shapes. It has been suggested that processing multiple components of visual objects can be limited based on how relevant they are . If the letter and the shape were treated as separate objects, the participants might not have attended to the shape during the discrimination of the letters, hindering the learning of shape features critical to grouping association.
Moreover, the sample size was also larger than Vickery and Jiang (2009), such that any strong effect of grouping would be more easily seen . Their monitor display was wider than the computer used for this study, which helped them fit in fifteen shapes during training and eleven shapes during transfer phase. Using four more shapes during the training trial could have affected the participants’ ability to detect the repetition of the same pairs and thus consolidate information about the pairs more easily. Changing the set size to remove four shapes during transfer, thereby adjusting the display to be less cognitively demanding, could have affected their capacity to detect grouping more easily.
However, it can also be argued that since this study used the same number of shapes for both training and transfer trials, the effect observed in training would hypothetically be more likely to be transferred in the transfer phase, since the participants would not need to adjust to a new display. This suggests that very specific conditions need to be replicated from Vickery (2008), so that learning
of Gestalt features are maintained using a repetition discrimination task . It is possible that grouping by common region is not strong as a visual cue for associative learning if it does not have at least fifteen shapes displayed in a wider monitor. If it was a strong effect, the grouping could have been replicated using the same task in this study.
Some features of this study would theoretically make learned grouping more apparent. Vickery (2008) used color repetition targets, which by itself is a strong and salient feature of an object that could override learning of form pairings . This study used letter repetition, a type of shape feature that should not impede on the learning of shape pairs. However, if features of objects are not separated depending on what is attended to, but rather analyzed more holistically, this also poses a challenge to learning of groups in this study. If a certain triangle shape with “O” is treated as a whole object, then a triangle with “X” might be perceptually seen as an entirely different object. This could lead to a harder time in trying to learn about different shape pairings, as the pattern of letters are not consistent with the shape pairs. Thus, learning only of shape patterns might be inhibited by unconscious processing of the letter pattern, in conjunction with the shape pattern.
Future studies will be necessary to investigate ideal conditions for learned perceptual grouping. It is important to investigate the conditions in the environment that help reveal the automatic perceptual processes we often take for granted. Understanding how we perceive the world can help us navigate our environment more effectively, so that the next time someone is searching for those lost keys, they can use a faster and more efficient way to locate them.
This study was conducted at the Department of Psychology at the University of Toronto as part of an independent project course, PSY405Y1. The work was done under the supervision of Dr. Jay Pratt and the mentorship of Dr. Jason Rajsic. Thank you both for your support and contributions to this project.