Running head: MUSICALITY AND VOCAL EMOTION RECOGNITION 1 Enhanced Recognition of Vocal Emotions in Individuals with Naturally Good Musical Abilities

Music training is widely assumed to enhance several nonmusical abilities, including speech perception, executive abilities, reading, and emotion recognition. This assumption is based primarily on cross-sectional comparisons between musicians and nonmusicians. It remains unclear, however, whether training itself is necessary to explain the musician advantages, or whether factors such as innate predispositions and informal musical experience could produce similar effects. Here, we sought to clarify this issue by examining the association between music and vocal emotion recognition. The sample (N = 169) comprised musically trained and untrained listeners who varied widely in their music perception abilities, as assessed through self-report and performance-based measures. The emotion recognition tasks required listeners to categorize emotions in nonverbal vocalizations (e.g., laughter, crying) and in speech prosody. Music training was associated positively with emotion recognition across tasks, but the effect was small. We also found a positive association between music perception abilities and emotion recognition in the entire sample, even with music training held constant. In fact, untrained participants with good musical abilities were as good as highly trained musicians at recognizing vocal emotions. Moreover, the association of music training with emotion recognition was fully mediated by auditory and music perception skills. Thus, in the absence of formal music training, individuals who were ‘naturally’ musical showed musician-like performance at recognizing vocal emotions. These findings highlight an important role for predispositions and informal musical experience in associations between music and nonmusical domains.

4 other domains at behavioral, cognitive, and brain levels (e.g., Patel, 2014;Peretz & Coltheart, 2003). The primary focus on music training reflects a narrow view of musicality, however, because musical skills are diverse and determined by multiple factors other than formal lessons. For example, sophisticated musical abilities can be seen in individuals without any training, and such abilities must be a consequence of informal engagement with music or musical predispositions (Bigand & Poulin-Charronnat, 2006;Mankel & Bidelman, 2018;Mosing, Madison, Pedersen, Kuja-Halkola, & Ullén, 2014;. Indeed, recent perspectives on musicality consider a broad range of musical behaviors and skills beyond playing an instrument or taking classes (e.g., informal listening experience; functional uses of music in everyday life; singing along with tunes; Honing, 2019;Krishnan et al., 2018;Müllensiefen, Gingras, Musil, & Stewart, 2014).
Factors other than formal instruction could therefore account for the musician advantages reported in the literature. Enhanced capacities of trained individuals might be induced by training, but they could also reflect genetic variables, early informal engagement with music, or facets of musical experience unrelated to formal training per se (as well as more general cognitive, socio-economic or personality variables; e.g., . To distinguish between training itself and the potential effects of these factors, it is important to study the musical abilities of nonmusicians, and to identify individuals with good abilities despite not being trained. Recent evidence indicates that good music perception skills are associated with good performance in nonmusical domains, regardless of training. For example, such individuals exhibit enhanced phoneme perception in a foreign language (Swaminathan & Schellenberg, 2017) and more efficient neural encoding of speech (Mankel & Bidelman, 2018), mirroring the benefits observed in trained musicians.
In short, formal training might not be necessary, or at least not the only factor accounting for the musician advantages in nonmusical domains.
Running head: MUSICALITY AND VOCAL EMOTION RECOGNITION 5 In the present study, we focused on the association between music and one aspect of socio-emotional processing, namely the recognition of emotions in vocal expressions. Some evidence indicates that trained musicians outperform untrained individuals in their ability to recognize emotions in speech prosody, that is, emotional states expressed through a speaker's use of pitch, loudness, timing, and timbre cues in speech (Lima & Castro, 2011;Thompson et al., 2004). Other evidence documents that music training predicts efficient low-level neural encoding (auditory brainstem responses) of purely nonverbal vocalizations such as crying (Strait, Kraus, Skoe, & Ashley, 2009). Neurocognitive pathways for processing music and vocal emotions may overlap, such that formal training in music could improve vocal emotional communication, in typical and in clinical samples (Good et al., 2017). One possible mechanism is that music training fine-tunes auditory-perceptual abilities that are useful for sensory aspects of voice perception (e.g., pitch and timing processing). Another possibility is that because social-emotional interactions are a central component of many musical activities, higher-order aspects of vocal emotional processing are improved by training because the code for music and vocal emotions is at least partly shared (Juslin & Laukka, 2003; see also Clark, Downey, & Warren, 2014;Koelsch, 2015;Pinheiro et al., 2015). Nevertheless, a musician advantage in emotion recognition is not always evident (Park et al., 2015;Trimmer & Cuddy, 2008), and this question is typically asked in cross-sectional studies that do not take into account individual differences in musical abilities, particularly in nonmusicians. It remains therefore unclear whether training itself is necessary to drive the putative advantage, or whether musical predispositions and informal engagement with music could produce similar effects.
Our sample of listeners included highly trained musicians and a large number of individuals with minimal or no music training, who were assessed in detail about their music perception abilities, behaviors, and experiences. Our goals were to determine if the advantage Running head: MUSICALITY AND VOCAL EMOTION RECOGNITION 6 for musicians in vocal emotion recognition could be replicated, and to examine the potential role of 'natural' individual differences in musical abilities. Specifically, we asked whether having good listening skills, as identified in musical and non-musical tasks, could also predict the ability to recognize vocal emotions, regardless of music training. In other words, could musically adept individuals with no training could approach musician-like performance?
Musical skills, behaviors, and experience were assessed using the Goldsmiths Musical Sophistication Index, Gold-MSI (Müllensiefen et al., 2014;Portuguese version, Lima, Correia, Müllensiefen, & Castro, 2018). The Gold-MSI is a self-report tool designed to evaluate music training, music perception abilities, active engagement with music, singing abilities, and emotional responses to music in the general population. Performance-based auditory and music perception tasks were also included, which indexed pitch discrimination, duration discrimination, beat perception, and melodic memory. Our outcome measures focussed on two sources of nonverbal emotional information in the human voice (e.g., Brück, Kreifelts, & Wildgruber, 2011;Scott, Sauter, & McGettigan, 2009). One was the ability to decode emotions conveyed through prosody in actual speech; the other was the ability to decode emotions conveyed by nonverbal vocalizations (e.g., laughter, crying).
We predicted that music training would be associated with enhanced vocal emotion recognition, both for prosody and nonverbal vocalizations, which would represent a replication and extension of previous findings (Lima & Castro, 2011;Thompson et al., 2004).
We also expected that auditory and music perception skills would be positively correlated with the ability to recognize vocal emotions, even after accounting for music training. This hypothesis was based on evidence of musician-like enhancements in phoneme perception and speech processing in untrained individuals with good music perception skills (Mankel & Bidelman, 2018;Swaminathan & Schellenberg, 2017). Because domain-general cognitive abilities predict both music training and music perception skills (Swaminathan & Running head: MUSICALITY AND VOCAL EMOTION RECOGNITION 7 Schellenberg, 2018), a digit-span task was included to examine whether observed associations were simply a by-product of general factors.
More exploratory questions asked whether the link between music and emotion recognition is specific to audition or extends to vision.  identified that individuals with congenital amusia (i.e., a music disorder present throughout development) have deficits in identifying emotions expressed vocally and through facial expressions.
However, the role of individual differences in musical abilities among typically developing individuals remains unknown. We also asked whether other aspects of musical expertise and experience (i.e., active engagement with music, singing abilities, emotions) are associated with vocal emotional processing. Finally, we examined whether any association between music training and vocal emotions would be mediated by perceptual skills (music training à perceptual skills à vocal emotion recognition). Complete mediation would imply that the association depends primarily on relatively low-level listening skills, which music training may enhance. By contrast, partial or no mediation would imply that the association between music and vocal emotions is also driven by non-perceptual processes, possibly at higherorder cognitive or social levels (e.g., emotional and social components of music activities).

Method
Ethical approval for the study protocol was obtained from the Departmental Ethics Committee, Faculty of Psychology and Education Sciences, University of Porto (reference 3-1/2017). Written informed consent was collected from all participants, who were either paid or given partial course credit.

Participants
A total of 172 participants were recruited from research participant pools or in response to advertisements on campus or on social media. Three were excluded for not completing the Gold-MSI, which resulted in a final sample of 169 participants (116 female).
Running head: MUSICALITY AND VOCAL EMOTION RECOGNITION 8 They were 23.49 years of age on average (SD = 8.27, range = 18 -72). Self-reports indicated that all had normal hearing and no history of neurological or psychiatric disorders, and all were native speakers of European Portuguese. They varied widely in formal music training, as indicated by responses on the Gold-MSI item asking for years of formal instrumental training, which are illustrated in a histogram (Figure 1). The mode was no training (n = 69), but 100 had some training ranging from 0.5 to 10 or more years. Duration of music training was not associated with age (r = -.01, p = .87, BF10 = 0.10) or sex (r = -.12, p = .11, BF10 = 0.35), and had only a weak association with education (r = .18, p = .02, BF10 = 1.36). We considered duration of training as an ordinal variable in most analyses, but we also undertook group comparisons between highly trained participants and those with no training, which is the norm in this line of research (for a review, Schellenberg, 2019). Participants with 6 or more years of instrumental training were considered as highly trained (n = 30), consistent with the typically used criterion for the definition of a 'musician' in the literature (Zhang, Susino, McPherson, & Schubert, 2018).
Power analysis (with G*Power 3.1; Faul, Erdfelder, Buchner, & Lang, 2009) indicated that for our main analyses, a sample of at least 134 participants was required to be 95% certain of detecting partial associations of r = .30 or larger between each predictor variable and emotion recognition accuracy. This was estimated for a regression model that included three predictors (music training, music perception abilities, and digit span).

Materials
Self-reported musical abilities. The Gold-MSI includes 38 items that cover a wide variety of music skills, expertise, and behaviors (Müllensiefen et al., 2014). It is suited for measuring individual differences among performing musicians as well as among members of the general population who vary in musical skills and interest in music. Scale items are grouped into five subscales, each of them corresponding to a different facet of musicality: Running head: MUSICALITY AND VOCAL EMOTION RECOGNITION 9 active engagement (9 items; e.g., I spend a lot of my free time doing music-related activities), perceptual abilities (9 items, e.g., I can tell when people sing or play out of tune), music training (7 items, e.g., I have had formal training in music theory for __ years), singing abilities (7 items, e.g., I am able to hit the right notes when I sing along with a recording), and emotion (6 items, e.g., I am able to talk about the emotions that a piece of music evokes for me). For the first 31 items, participants indicate their level of agreement with each statement using a seven-point Likert scale (from 1 = completely disagree to 7 = completely agree). For the remaining items, participants use ordinal scales with seven response alternatives (e.g., I can play [number from 0 to '6 or more'] instruments). Thus, for each participant, each original item is scored with an integer that ranges from 1 to 7.
The Gold-MSI and the Portuguese translation have good psychometric properties (Müllensiefen et al., 2014;Lima et al., 2018). Construct validity has been documented with associations between index scores and performance-based music perception tasks (i.e., beat alignment and melody memory, Müllensiefen et al., 2014; discrimination of pitch and duration, Dawson, Aalto, Šimko, Vainio, & Tervaniemi, 2017).
Performance-based auditory and musical abilities. Four tasks were used to measure musical beat perception, melodic memory, pitch discrimination, and duration discrimination. The musical beat and melodic memory tasks were optimised versions of the ones used by Müllensiefen et al. (2014). For the beat alignment test, stimuli were 17 short music excerpts (10-16 s), which were overlaid with a beep track similar to a metronome. The beep track coincided with the implied beat of the music excerpt on four trials. On the other 13, the beep was phase shifted by 10% or 17.5%, or changed in tempo by 2%. On each trial, participants indicated whether the beat track was on or off the beat as in the Beat Alignment Test (Iversen & Patel, 2008). The order of trials was randomized across participants. For the melodic memory task (Harrison, Musil, & Müllensiefen, 2016), participants listened to 13 pairs of short tunes (10-17 notes) and determined whether each pair was the same or different. The second tune was always transposed by 1 or 7 semitones. Thus, the task required listeners to determine whether both melodies had the same structure of consecutive musical intervals. Five pairs had a different structure, in which 1-3 notes were changed (as in Bartlett & Dowling, 1980;Cuddy & Lyons, 1981) to alter the contour and intervals, or maintain the contour but change the intervals. Both the musical beat and melodic memory tasks were implemented in PsychoPy Experiment Builder v1.85.4 (http://www.psychopy.org/) by Estela Puig-Waldmüller and Bruno Gringras (University of Vienna), with Portuguese instructions. Each task took approximately 7 minutes to complete.
For the pitch and duration tasks, discrimination thresholds were obtained from a twodown-one-up adaptive staircase procedure, which tracked good but not perfect performance (70.7% correct) on the psychometric function (Soranzo & Grassi, 2014). For pitch discrimination, participants were presented with trials consisting of three consecutive 250 ms pure tones: two of them had the same frequency (always 1000 Hz) and one was higher. The difference was 100 Hz at the beginning but then varied adaptively from 2 to 256 Hz based on the listener's performance. Correct identification of the higher tone led to progressively smaller pitch differences until participants stopped responding correctly, whereas incorrect responses led to progressively larger differences until they responded correctly. For duration discrimination, listeners heard three pure tones on each trial, and judged which was the longest. Two of the tones were always 250 ms and one was longer by 100 ms at the beginning, but then varied adaptively between 8 and 256 ms. For both tasks, the procedure without verbal content such as laughs, screams, or sobs, as produced by two female and two male speakers. Finally, facial expressions consisted of color photographs of male and female actors with no beards, moustaches, earrings, eyeglasses, or visible make-up. Each photograph was presented for 2 s. The three tasks were similarly difficult (based on validation data from the different corpora, average recognition accuracy was 75.60% for speech prosody, 80.69% for nonverbal vocalizations, and 79.43% for facial expressions).
Participants made an eight-alternative forced-choice judgment for each stimulus in each task, selecting the emotion that was being expressed from a list that included neutrality, anger, disgust, fear, happiness, pleasure, sadness, and none of the above. Each of the three Running head: MUSICALITY AND VOCAL EMOTION RECOGNITION 12 tasks started with four practice trials. The 84 experimental trials that followed were randomized separately for each participant. Each stimulus was presented once and no feedback was given. The tasks were implemented in E-Prime 2.0 Professional (Version 2.0.10.356), and each took approximately 10 min to complete.
General cognitive ability. To index domain-general cognitive abilities, participants completed the forward and backward portions of the Digit Span subtest from the Wechsler Adult Intelligence Scale III (WAIS-III; Wechsler, 2008). A summary score was computed, corresponding to the sum of the forward and backward raw scores.

Procedure
Participants were tested individually in a quiet room at the Speech Laboratory (Department of Psychology, University of Porto) or at LAPSO (Social and Organizational Psychology Lab, ISCTE-IUL). They completed a questionnaire that asked for demographic information, and then the remaining questionnaires, the experimental tasks, and the digit span test. The order of the tasks was randomized across participants, and the testing session lasted about 1.5 hours. Short breaks were allowed between tasks. The auditory stimuli were presented via high-quality headphones (Sennheiser HD 280 Professional), with the volume adjusted to a comfortable level for each participant.
The same participants also completed a task that required them to compare the emotional features of pairs of musical excerpts (MacGregor & Müllensiefen, 2019), and a series of questionnaires that indexed emotion-and health-related variables. These results will be reported in a separate publication.

Data Preparation and Analysis
Because we had four performance-based music perception tasks (musical beat perception, melodic memory, pitch and duration discrimination), we asked whether an aggregate variable could be formed and used as an index of musical ability to reduce collinearity and the contribution of measure-specific error variance. A principal component analysis (varimax rotation) revealed that a two-factor solution accounted for 73% of the variance in the original data. Three of the tasks loaded highly on the first component (beat perception, pitch and duration discrimination, rs = -.76, .79, and .81, respectively), and melodic memory almost perfectly correlated with the second component (r = .98). In the analyses that follow we therefore used the original melodic memory accuracy scores, and an aggregate music perception variable that represented the principal component extracted from the other three variables, which was almost perfectly correlated with the first component from the original analysis (r = .98). Lower scores on this measure indicate better performance.
Accuracy rates for emotion recognition tasks were arcsine square-root transformed and corrected for possible response biases using unbiased hit rates, or Hu (Wagner, 1993; for a discussion of biases in forced-choice tasks, e.g., Isaacowitz et al., 2007). Hu values vary between 0 and 1. When all the stimuli from a category are correctly identified, and the corresponding response category is always correctly used, Hu = 1; when no stimulus from category (e.g., happy) is correctly identified, Hu = 0. Hu scores were computed separately for each emotion and task, and we also computed average scores for each task.
The data were statistically evaluated based on standard frequentist and Bayesian approaches (e.g., Jarosz & Wiley, 2014). In each analysis, a Bayes Factor (BF10) statistic was estimated, which considers the likelihood of the observed data given the alternative and null  Wagenmakers et al., 2018aWagenmakers et al., , 2018bWagenmakers, Verhagen, & Ly, 2016). BF10 values were interpreted following Jeffreys' guidelines (Jarosz & Wiley, 2014;Jeffreys, 1961), such that values between 1 and 3 correspond to weak/anecdotal evidence for the alternative hypothesis, between 3 and 10 to substantial evidence, between 10 and 30 to strong evidence, between 30 and 100 to very strong evidence, and over 100 to decisive evidence. A BF10 < 1 corresponds to evidence in favor of the null hypothesis (values below 0.33 indicate substantial evidence and below 0.10 suggest strong evidence for the null hypothesis). Table 1 shows summary statistics for the full sample, for highly trained individuals only (n = 30), and for participants with no training (n = 69). Table 2 provides correlations with duration of music training across the full sample. As in previous studies, music training was associated robustly with enhanced musical abilities on both self-report and performancebased tasks. Associations with general cognitive abilities were evident but weak.

Formal music training
With respect to associations with emotion recognition, as predicted, duration of music training correlated positively with average emotion recognition scores across the full sample, both for speech prosody and for nonverbal vocalizations (see Table 2). The effect was small (r = .21 in both cases), but Bayesian analyses indicated that the level of evidence was substantial. For facial expressions, in contrast, there was substantial evidence for a null effect.
We then conducted group comparisons between highly trained participants and those without any training. Mixed-design Analyses of Variance (ANOVAs) were conducted for each task, with the different emotions as repeated-measures factor, and music training as between-subjects factor (highly trained vs. untrained). Greenhouse-Geisser corrections were applied when necessary (Mauchly's sphericity test). For speech prosody, we found a .001, ! " = .05, BF10 > 100, but the advantage for trained participants disappeared, F(1, 93) = 2.31, p = .13, BF10 = 0.60, and there was no interaction between training and emotion, p = .23, BF10 = .08.
In short, we found evidence for an association between music training and the recognition of emotion in voices but not faces. The effect was small, however, and in the case of prosody it was partly related to individual differences in digit span.

Self-reported musical abilities
We then tested for associations between emotion recognition and facets of musical abilities other than music training, as assessed by the subscales from the Gold-MSI. Zeroorder correlations are provided in the upper part of Table 3. As predicted, we found decisive evidence that higher music perception abilities correlated with higher emotion recognition accuracy. This was observed for prosody and nonverbal vocalizations, but not for facial expressions. Exploratory analyses also revealed an unpredicted association between singing abilities and emotion recognition performance, but only for speech prosody.
An important question was whether the association between music perception and vocal emotion recognition would remain evident when music training and general cognitive abilities were held constant. Using multiple regression, we modelled average accuracy on the speech prosody task as a function of music perception abilities, duration of music training, We also confirmed that self-reported music perception abilities predicted unique variance in vocal emotion recognition even when age, sex, and education were also included in the regression models (in addition to music training and digit span), ps ≤ .02, BF10 > 3.17.

Performance-based musical abilities
In the next set of analyses we asked whether similar findings could be observed for objective measures of music perception abilities. As shown in the lower part of Table 3, no associations were found for melodic memory, but we found decisive evidence for a positive association in the case of the aggregate measure. Participants with higher music perception abilities also had improved emotion recognition for prosody and nonverbal vocalizations, but not for facial expressions.
Multiple regressions showed that these associations remained evident when music training and digit span were held constant. For speech prosody, a model with three predictor variables (aggregate measure of music perception, duration of music training and digit span) accounted for 18.8% of the variance, R = .43, F(3,148) = 11.44, p < .001, BF10 > 100. The aggregate measure of music perception abilities predicted unique variance in vocal emotion recognition even when age, sex, and education were also included in the regression models, ps < .001, BF10 > 100.

Nonmusicians with good musical abilities vs. highly trained participants
The previous analyses established that individuals with higher music perception abilities are better at recognizing vocal emotions, regardless of music training. An interesting question is whether untrained participants with good musical abilities show emotion recognition performance comparable to that of trained musicians. To address this, we divided untrained participants into high and low musical abilities groups, based on median-splits of their scores on music perception measures (separate analyses were conducted based on selfreported music perception scores and on the aggregate measure of music perception). We then compared those with high musical abilities with trained musicians. For speech prosody, there was no advantage for trained participants: self-reports, F(1, 63) = 2.06, p = .16, BF10 = 0.51; performance-based skills, F(1, 58) = 3.79, p = .06, BF10 = 1.17. Similarly, for nonverbal vocalizations, highly trained participants did not differ from untrained ones with good musical abilities: self-reports, F(1, 63) = 3.10, p = .08, BF10 = 0.65; performance-based skills, F(1,58) = 2.65, p = .11, BF10 = 0.57. In short, musician-like enhancements in vocal emotion recognition were evident in participants without any formal music training, provided that they had good musical abilities.

Mediation analyses
A final analysis determined whether the association between music training and emotion recognition was mediated by music perception skills, which are enhanced in trained individuals. These analyses were conducted using the PROCESS macro for SPSS (Version The mediation models are depicted in Figure 4. For speech prosody, the indirect effect of music training on emotion recognition scores -through self-reported music perception skills -was significant. The direct effect was not, however, indicating that there was no association between training and emotion recognition performance when music perception skills were held constant. Identical results emerged when the objective music perception measure (aggregate measure) was substituted for the self-reported one, as well as in similar analyses for nonverbal vocalizations. In short, duration of music training was positively associated with enhanced emotion recognition simply because trained individuals had enhanced music perception skills.

Discussion
The present study examined the association between musical expertise and the ability to recognize emotions in vocal expressions. We determined the effect of formal music training and, crucially, we also investigated whether, in the absence of training, having good musical abilities related to enhancements in emotion recognition similar to the ones seen in musicians. The analyses had four main findings. First, music training was associated with better emotion recognition in speech prosody and nonverbal vocalizations. The advantage was small, though, and restricted to the auditory domain (i.e., not observed for facial emotion recognition). Second, we found a robust association between music perception skills and enhanced vocal emotion recognition, which remained significant even when music training and general cognitive abilities were held constant. Importantly, untrained participants with good musical abilities showed vocal emotion recognition performance comparable to that of trained musicians. Third, in exploratory analyses, singing abilities related to emotional Running head: MUSICALITY AND VOCAL EMOTION RECOGNITION 20 prosody recognition. Fourth, mediation analyses showed that the effect of music training on vocal emotion recognition was fully mediated by music perception skills.
In previous studies, an advantage for musicians in emotional prosody recognition emerged in some studies (Lima & Castro, 2011;Thompson et al., 2004), but not in others (Mualem & Lavidor, 2015;Park et al., 2015;Trimmer & Cuddy, 2008), and it was unknown whether musicianship predicts the recognition of emotions in other types of vocal expressions. Our results corroborated the association with speech prosody and extended it to nonverbal vocalizations, indicating that it might stem from a general benefit in decoding vocal emotional cues. The advantage for vocalizations was consistent with a previous study showing a more efficient subcortical encoding of crying sounds (Strait et al., 2009). The failure of some previous studies to replicate the musicians' advantage could be because the association appears to be small and relatively weak, as suggested by our Bayesian analyses.
Thus, a relatively large sample of highly trained participants might be required for such an association to emerge. Some studies that failed to find a clear advantage included less than 15 musicians (Park et al., 2015;Pinheiro et al., 2015), or participants with only a modicum of training.
We also found that, in the case of speech prosody (but not in the case of vocalizations), the effect of music training became non-significant after accounting for individual differences in digit span. This finding suggests that the association could be partly due to domain-general cognitive abilities, even though Lima and Castro (2011) documented that expertise effects in prosody were independent of such general abilities. This discrepancy might stem from differences in samples, or the particular way domain-general abilities are measured. Our digit-span task indexed auditory-perceptual processes, which could, arguably, be a consequence of the training itself rather than a proper confounding variable. By contrast, Lima and Castro (2011) had several cognitive control tasks including purely nonverbal ones such as Raven's Advanced Progressive Matrices. The precise role of distinct domain-general processes could be addressed in future studies.
A novel but null finding of the current study was that music training had no association with emotion recognition in the visual domain. Thus, musicians' advantage in emotion recognition may be domain-specific (auditory only) rather than domain-general. Our decision to include a facial emotion recognition task was motivated by evidence of domaingeneral socio-emotional processing difficulties in congenital amusia .
There is also evidence that musicians show stronger responses to emotional prosody in brain regions involved in modality-independent inferences about mental states, including the medial prefrontal and anterior cingulate cortices (Park et al., 2015). In everyday life, socioemotional stimuli are typically multimodal, such that emotional impairments/benefits initially related to basic auditory skills could have cascading effects that extend to higher-order aspects of socio-emotional cognition. Perhaps the enhancements associated with higher musical abilities are relatively small, and thus incapable of generating behaviorally observable across domains, particularly when compared to the severe pitch deficits that are markers of congenital amusia, and likely to affect early stages of socio-emotional development. Null results in the visual domain are also consistent with recent meta-analyses that raise doubts about far transfer in general, and as a consequence of music training in particular (Sala & Gobet, 2017a, 2017b. Indeed, in some instances, there is no association between music training and performance on auditory tasks such as perceiving speech in noise (Boebinger et al., 2015;Madsen, Marschall, Dau, & Oxenham, 2019).
By assessing basic auditory and music perception skills in addition to music training, namely in participants who lack formal training, we were able to provide robust evidence that being a musician is not a necessary condition for the music-related advantages in vocal emotion recognition. Converging data from self-report and performance-based measures indicate that auditory and musical skills are broadly associated with enhanced emotional processing of speech prosody and nonverbal vocalizations, even after accounting for training.
These findings suggest overlap of neurocognitive pathways for music and vocal emotions that stem from aspects of musical expertise other than formal training. Crucially, they establish a role for factors other than formal training in associations between music and nonmusical abilities, specifically musical abilities that are driven by innate predispositions and informal engagement with music.
Twin studies confirm that genetic variation influence both music proficiency and the propensity to music training (Mosing et al., 2014). Our finding cannot be addressed in typical cross-sectional comparisons between musicians and nonmusicians, because these designs conflate training with enhanced musical abilities. In other words, the enhanced capacities seen in trained individuals cannot be teased apart from the training itself (as we could do here by studying untrained individuals), and we do not know if they are truly experiencedependent or rather, a factor that motivated individuals to pursue music lessons. Our findings also align well with recent evidence of associations between music and speech and language processing. For example, Schellenberg (2017, 2019) found that for adults and for children, better rhythm perception abilities predicted phoneme discrimination performance in a foreign language, even after controlling for music training and domaingeneral cognitive abilities. Music training, on the other hand, did not play a significant role.
In other words, rhythm perception abilities were a better predictor of speech perception than music training. Mankel and Bidelman (2018) examined neuroelectric brain responses to clear and noise-degraded speech sounds in untrained participants who differed in musical abilities.
They found that participants with higher music perception skills had better frequencyfollowing responses to speech and were more resilient to degradative noise effects. Although Running head: MUSICALITY AND VOCAL EMOTION RECOGNITION 23 the authors proposed that music training provides an additional boost, on top of pre-existing skills, to the neural processing of speech, we did not find evidence for such a boost in the current study. Rather, our highly adept nonmusicians showed similar vocal emotion recognition performance to that of highly trained musicians. This discrepancy may be the consequence of domain-specific effects (speech perception vs. vocal emotion recognition), but it could also stem from different levels of analysis. Mankel and Bidelman (2018) emphasized neural measures, whereas our evidence was behavioral. Perhaps the putative additional boost induced by training is not sufficient to translate into behavioral advantages.
In fact, Mankel and Bidelman (2018) also found different results for neural and behavioral measures: although neural measures were sensitive to fine differences in musical skills, a behavioral measure was not. Considered jointly, these results suggest that the advantage related to musical abilities (in the absence of training) is not discernible from the advantage putatively related to training at a behavioral level.
Although we documented an important role of predispositions and informal experience in the association between music and vocal emotion recognition, we have no doubt that music training drives neuroplasticity. Carefully designed longitudinal studies provide evidence for plasticity across a range of tasks, at behavioral and neural levels (Bangert & Altenmüller, 2003;Chobert, Francois, Velay, & Besson, 2012;Francois, Chobert, Besson, & Schon, 2012;Frey et al., 2019;Moreno et al., 2009;Seither-Preisler, Parncutt & Schneider, 2014), yet more research is needed to clarify the robustness and scope of such effects. One important point is that group differences evident in cross-sectional studies need to be interpreted cautiously. These studies provide ecologically valid opportunities to measure correlates of long-term music training, and to test hypotheses regarding links between music and other domains. Such associations are often causally attributed to training (Schellenberg, 2019), however, and the current study shows that similar advantages can be Running head: MUSICALITY AND VOCAL EMOTION RECOGNITION 24 seen in individuals without any training. Another important point is that psychologists and neuroscientists often equate musical expertise with classical music training, which reflects a limited perspective of the factors that shape musical abilities (both genetic and environmental), and the richness and diversity of musical behaviors and experience. A complete understanding of musicality and its role in cognition requires a complex and multifaceted exploration of musical skills and experience.
In exploratory analyses, we found that participants reporting higher singing abilities were also better at recognizing emotions in speech prosody. This finding implies that other facets of musical expertise are involved in associations with vocal emotions, and is consistent with well-documented behavioral and neural links between production, imagery, and perceptual mechanisms in voice processing (Correia et al., 2019;Lima et al., 2015;Lima, Krishnan, & Scott, 2016;McGettigan et al., 2015;Pfordresher & Halpern, 2013;Warren et al., 2006). On the one hand, vocal production (such as singing) involves not only implementing movements but also planning and anticipating outcomes (which might rely on imagery), and using auditory feedback (perceptual processes) to detect and correct errors. On the other hand, listening to sounds, namely vocal emotional sounds, recruits auditory areas in the temporal lobes and also motor system areas involved in motor planning and control. A stronger motor system involvement is also positively correlated with enhanced vocal emotional processing (Correia et al., 2019;McGettigan et al., 2015), which suggests that more efficient activation of sound-related motor representations optimizes perceptual processes (for review see Lima, Krishnan, & Scott, 2016). Such tight production-perception links in voice processing could plausibly account for the positive association between singing and speech prosody perception, a prediction that could be tested systematically in future studies. For example, singing abilities could be assessed not only via self-report but also with performance-based tasks (e.g., Pfordresher & Halpern, 2013). Previous studies provided suggestive evidence that the primary locus for transfer effects from music training to vocal emotions is at a basic auditory-perceptual level of processing. Music training could relate to a more efficient auditory-perceptual processing (Kraus & Chandrasekaran, 2010;Herholz & Zatorre, 2012), which in turn could facilitate vocal emotion recognition, given that sensory processing is central to vocal communication (Schirmer & Kotz, 2006). Our mediation analysis provided results consistent with this view, in the sense that no other component of musical expertise played an important role. Indeed, the effect of training on vocal emotion recognition was accounted for entirely by advantages in music perception skills. Because auditory-perceptual skills did not extend to visual emotion processing, the overlap between musical skill and emotion recognition does not appear to extend to higher-order levels of processing. Future studies using techniques such as EEG or fMRI could be useful to more address these questions directly, because they can tell us when cross-domain interactions occur (early vs. later stages of processing) and if they occur primarily in auditory areas or extend to regions involved in supramodal socioemotional processing.
To conclude, the present study represents the first demonstration that better music perception skills are associated robustly with enhancements in vocal emotion recognition, even in the absence of any formal music training. Untrained individuals who are naturally musical can be as good as highly trained musicians at recognizing emotions in speech prosody and nonverbal vocalizations. Our findings do not rule out the possibility that music training induces experience-dependent plasticity, but they affirm an important role of preexisting factors in associations between music and nonmusical domains that have been neglected in the literature. Collectively, the results reported here emphasize the need to interpret cross-sectional music training effects with caution. They also confirm that there are Note. For the Gold-MSI and music perception tasks, p values correspond to the statistic of independent samples t-tests (two-tailed). For the emotion recognition tasks, p values correspond to the main effect of group in mixed-design ANOVAs, including music training as between-subject factor and emotion as repeated-measures factor.