Making the cut: How panel reviewers use evaluation devices to select applications at the European Research Council

Lucas Brunet, Ruth Müller, Making the cut: How panel reviewers use evaluation devices to select applications at the European Research Council, Research Evaluation, Volume 31, Issue 4, October 2022, Pages 486–497, https://doi.org/10.1093/reseval/rvac040

Navbar Search Filter Mobile Enter search term Search Navbar Search Filter Enter search term Search

Abstract

The European Research Council (ERC) receives many high-quality applications, but funds only a few. We analyze how members of ERC review panels assess applications in the first, highly competitive step of evaluations for ERC Starting and Consolidator Grants. Drawing on interviews with ERC panel members in different fields, we show that they adopt a set of evaluation devices that offer pragmatic and standardized ways of evaluating in a time-constrained and highly competitive setting. Through the use of evaluation devices, panel reviewers enact and generate a distinct reviewing expertise that encompasses subject-specific knowledge and knowledge about how to accomplish evaluation within a situated setting. We find that ERC panel reviewers employ four evaluation devices during the first step of ERC reviews: first, reviewers base judgments on applicants’ prior achievements (delegation devices); second, they adjust their evaluations of individual applications to the quality of a given set of applications (calibration devices); third, they combine multiple elements to assess the feasibility of proposals (articulation devices); and finally, they consider the impact of the proposed research on science and society (contribution devices). We show that the current use of these devices generates what we have termed evaluative pragmatism: a mode of reviewing that is shaped by and accommodated to the need to review many high-quality proposals in a short time period with possibly limited expert knowledge. In conclusion, we discuss how the prevalence of evaluative pragmatism in the first step of ERC panel reviews shapes candidate selection, particularly regarding human and epistemic diversity in European research.

1. Introduction

Since its creation in 2007, the European Research Council (ERC) has attracted numerous applicants in a climate of fierce competition for highly selective but generous research grants. Many researchers have criticized that the ERC ranks only a small percentage of proposals as outstanding, and that an even smaller percentage receives funding. In 2017, Jean-Pierre Bourguignon, president of the ERC, declared: ‘Presently, a string of excellent candidates with exceptional ideas cannot be funded—purely for budgetary reasons’ (ERC Newsletter 2017). In this article, we explore how ERC panel reviewers conduct their work and make reviewing decisions when proposal numbers are high and only those ranked as the best of the best can receive funding. To this end, we draw on 22 interviews with ERC panel members, complemented by the analysis of various institutional documents, and examine how panel reviewers for the ERC navigate peer review processes under the conditions of chronic hypercompetition.

Often seen as the gold standard for assessing the quality of scientific work, peer review has become more ubiquitous over the last 20 years due to shifts from block funding to competitive funding and, consequently, it consumes an increasing amount of researchers’ time ( Müller 2020). Despite its importance and omnipresence, peer review still remains a relatively under-explored topic in Science & Technology Studies (STS) and related fields. Specifically, few studies have empirically investigated peer review processes in funding and hiring contexts ( Lamont 2009; Hirschauer 2010; Huutoniemi 2012; van Arensbergen et al. 2014; Derrick 2018). Peer review is a somewhat secretive aspect of academic work, and attempts to study its processes can raise suspicions ( Lamont 2009; Derrick 2018; Roumbanis 2019). At the ERC, as at most other funding organisations, peer review takes place behind the closed doors of reviewing panels, and studies of the ERC have been limited to the history of the institution ( Flink 2016; König 2017) and the implementation of different evaluation criteria during review ( Luukkonen 2012, 2014).

In this article, we analyze reviewing processes at the ERC, specifically focusing on the first step within the ERC’s two-step evaluation process, during which panel reviewers need to evaluate a large number of high-quality applications over a short period, often without possessing detailed knowledge about the research topics presented in the proposals. Building on scholarship in STS, economic sociology and valuation studies, we argue that peer review is an activity where knowledge is both drawn upon and actively generated during the evaluation process. These knowledge-intense practices are conditioned and facilitated by evaluation devices, structured assessment practices that emerge within and in relation to the specific evaluative situations each peer review context presents.

The evaluation devices we present in this article are thus specific for ERC peer reviews for Starting and Consolidator Grants. We focus on the first step of ERC peer review, where panel members assess and grade a large number of proposals, first individually and then collectively, and afterwards meet to decide which proposal should move to the second step of the evaluation process. The first step of ERC peer review is almost archetypical of hypercompetitive situations where a high number of proposals (which are, on average, of high quality) are submitted, but only a few can be funded. In this setting, our analysis shows that reviewers regularly use four types of evaluation devices, which will be detailed in the empirical section of the article. We have labeled these types of evaluation devices as delegation devices (Section 4.1), calibration devices (Section 4.2), articulation devices (Section 4.3) and contribution devices (Section 4.4). Based on our analysis of these four evaluation devices, we conclude the article by discussing the concept of reviewing expertise, a type of expertise that emerges from the practices of peer review, and the concept of evaluative pragmatism, a mode of performing peer review under the conditions of hypercompetition and time constraints. While these conditions are characteristic of the first step of ERC Starting and Consolidator Grant evaluations, we believe that they also shape a range of other evaluative settings in contemporary European research. We close the article by discussing the implications of this mode of evaluation on human and epistemic diversity in European research.

2. Literature review

2.1 Peer review in competitive conditions

Over the last decades, changing research policies have progressively pushed researchers to apply for competitive, third-party funding more regularly because of a decrease in block funding to research institutions ( Whitley and Gläser 2007; Fochler, Felt and Müller 2016; Brunet, Arpin and Peltola 2019). As acquiring third-party funding has become ever more pivotal for a career in research, scholars from various fields have argued that this has led to hypercompetition for research funds, impacting how researchers value and conduct their research ( Whitley and Gläser 2007; Fochler, Felt and Müller 2016). Historically and presently, the selection of researchers for funding has primarily been carried out through evaluation by their peers. Dating back to the development of the first English and French scientific societies in the 17th century, peer review offers a system of institutionalized mutual vigilance and self-governance of the scientific community ( Derrick 2018; Müller 2020). In grant and hiring panels, peer review legitimizes decisions by relying on the judgment of other researchers rather than by research managers. Nevertheless, in recent years, the use of standardized indicators in evaluation (e.g., performance metrics) has made peer review processes more accessible and legible to non-researchers, such as policy-makers ( Gläser and Laudel 2007) and university managers ( Musselin 2013).

The development of a hypercompetitive system for grant funding regulated through peer review generates significant challenges. Several authors have criticized the time and resources wasted on applications for external funding with low success rates. Applicants spend months writing proposals that are not likely to receive funding. Concurrently, reviewers must evaluate proposals in addition to their other scientific activities, in most cases, without any possibility to reduce their other duties. In a situation where there are ‘too many good proposals that cannot be funded’ ( Roumbanis 2019: 5), it has been argued that reviewers tend to recommend less risky and more conservative projects for funding ( Langfeldt 2006). To combat this conservative bias, a range of scholars, such as Roumbanis (2019), have suggested allocating some funds on a random basis. Alternatively, rather than accepting conservative bias as a fact of peer review, particularly under the conditions of hypercompetition, Langfeldt (2001) has argued that peer review outcomes depend on the social organization of its process. For example, she shows that if each panel member gets to choose one preferred candidate, the selection is more diverse and the chance increases that risky projects are funded. On the contrary, if panel members have to agree on all proposals for selection to occur, the decision favors the selection of mainstream projects from established research fields ( Langfeldt 2001: 835).

While we know to some degree how early-stage researchers adjust their research practices to hypercompetitive conditions (e.g. Müller 2014; Fochler, Felt and Müller 2016; Sigl 2016; Müller and de Rijcke, 2017), few studies to date have explored how reviewers react to these conditions. In her in-depth study of grant review panels, Lamont (2009) has shown that reviewers use various criteria to assess scientific excellence and interpret given criteria in situated ways in their practices (see also Luukkonen 2012). Analyzing conflicts surrounding peer review criteria and competing definitions of excellence among disciplines (see also Huutoniemi 2012), Lamont describes the important role of an ‘evaluative culture’ that is shared between reviewers and enables them to reach consensus in hypercompetitive evaluative situations (see also Brunet and Müller, in prep.). In their study of review processes at the Dutch Research Council, van Arensbergen et al. (2014) demonstrate that situated evaluative cultures can be characterized by a somewhat consecutive use of evaluation criteria among reviewers. The authors show that reviewers tend to start their evaluations by focusing on criteria perceived to be more objective, such as the number of researcher publications, and move progressively to more subjective criteria (independence, originality and personality of applicants). Van Arensbergen et al. argue that this sequential use of criteria emerges not only due to epistemic considerations but also as a response to the reviewers’ high workload and their limited knowledge about the research field of many of the proposals. In our article, we build upon this corpus of work in order to examine how panel reviewers accomplish reviewing tasks in the specific context of ERC Starting and Consolidator Grants. For this purpose, we develop the notion of evaluation devices, which we outline below.

2.2 Evaluation devices and reviewing expertise

To understand how reviewers assess and select research proposals, we draw on literature from the recently founded field of valuation studies ( Lamont 2012). With a range of researchers from this field, we share an interest in understanding how reviewers evaluate scientific activities and how their assessments shape what kind of research is considered worthy of funding. Previous works have used the concept of the judgment device to analyze how reviewers make choices between different candidates with unique biographies and proposals ( Musselin 2013; Kaltenbrunner and de Rijcke 2019; Müller 2020). Karpik (2007, 2012) defines judgment devices as mechanisms that facilitate the comparison of unique goods in competitive markets by reducing the complexity of the evaluation process. Karpik uses the Michelin Guide, a guidebook that provides a comparison of fine dining restaurants for gourmets, as an example of a judgment device. Instead of evaluating the intrinsic quality of each good, judgment devices delegate judgment to former evaluations, such as reviews, rankings and guides: in the case of the Michelin Guide, instead of visiting each restaurant themselves and weighing its different unique qualities, the guidebook allows gourmets to defer their judgment to the ratings restaurants have received by restaurant critics. In academia, judgment devices can be the number of published articles, the number of citations of these articles, the reputation of the journals where they are published and the prestige and rankings of academic institutions.

However, the concept of the judgment device is limited when analyzing how actors with knowledge about the intrinsic quality of goods evaluate them, and this concept can thus only be applied with limitations in the context of research evaluation practices ( Dubuisson-Quellier 2003; Müller 2020). Unlike customers in markets of singular products, who tend to compare these products by delegating their judgment completely to more competent actors, grant reviewers are expected to be experts in their research fields as well as in academic career trajectories ( Kaltenbrunner and de Rijcke 2019). While there have only been a few explorations of the relationship between expert knowledge and the use of judgment devices in research evaluation, authors in the field of economic sociology have begun to analyze the role of expert knowledge in relation to judgment devices in the context of markets (see, e.g., Dubuisson-Quellier 2003).

In this article, we propose the concept of evaluation devices to describe the heterogeneous material-discursive assemblages that reviewers create and use to qualify proposals and applicants. Instead of focusing only on one element of the evaluation (e.g. criteria of evaluation), the notion of evaluation devices draws our attention to how evaluations are achieved in practice by combining various elements, such as techniques, instruments, methods, knowledges and criteria. In this processual understanding of evaluation, each device organizes singular relations between various elements and guides evaluative actions to realize specific goals ( Muniesa, Millo and Callon 2007; Doganova 2019). Furthermore, unlike the concept of judgment devices, evaluation devices emphasize reviewers’ ability to make judgments by utilizing their expertise and producing knowledge in the situation of evaluation, instead of relying exclusively on knowledge that has been created by other actors in previous evaluations.

By examining reviewers’ use of evaluation devices in practice, we want to avoid a determinist approach that states the devices operate by themselves and are imposed on social actors. Instead, these devices offer qualification possibilities and selection modes that influence how reviewers make judgments without predetermining them. Inspired by the dual meaning of the expression ‘making the cut’, one meaning indicating that applicants succeed to get funding and the other being that reviewers eliminate a number of proposals, we study how reviewers can use different evaluation devices in a coordinated manner to select applications. ‘Making a cut’ implicitly references Karen Barad’s discussion of how ‘apparatuses [of knowledge production] enact agential cuts that produce determinate boundaries and properties of entities within phenomena’ ( 2007: 148). With Barad, we understand reviewing practices as distinct and situated apparatuses of knowledge production that produce specific epistemic outcomes dependent on their unique elements, including but not limited to: reviewers, guidelines, evaluation devices and the space, place and timing of evaluations. Evaluation devices are part of the complex apparatuses of peer review and both shape and are shaped by their other elements.

The concept of evaluation devices also allows for an analysis of peer review as an epistemic practice that employ and generate expert knowledge for the selection of applications. In their classification of different forms of expertise, Collins and Evans (2007) differentiate interactional expertise, which they define as the ability to master the language of a specialist domain, from contributory expertise, which is the ability to contribute to this domain. According to Collins and Evans (2007), the distinction lies at the core of peer review because ‘reviewers are only sometimes contributors to the narrow specialism being evaluated’ (p. 32). They argue that, in most cases, reviewers have little contributory expertise related to the field of study being evaluated, and they must draw on interactional expertise to form their judgments. In their account, Collins and Evans conceptualize expertise as a form of specialized knowledge possessed (or not possessed) by reviewers, which can be divided into pre-defined categories (such as interactional and contributory expertise). Conversely, in our account, we consider that reviewers also create expertise during the evaluation through ‘devices which produce, reproduce and disseminate expert statements and performances’, as stated by Eyal (2013: 872) in another context. Following Eyal (2013), we focus on ‘expertise in the making’ and study how reviewers both use pre-existing forms of expertise and create novel forms of expertise as part of their reviewing practices.

Our analysis of reviewers’ expertise as enacted through evaluation devices enables us to symmetrically consider knowledge about the evaluation process (grades, publication metrics of fields of study, usual level of proposals) and knowledge about research fields (state of the art, novelty, methods) while remaining situated in the organizational and material conditions of reviewers’ work (see also Hirschauer 2010). In that sense, we complement existing studies on the role of criteria of evaluation ( van Arensbergen et al. 2014) and disciplinary backgrounds ( Lamont 2009; Huutoniemi 2012) in the peer review process. We view expertise as reviewers’ capacity to accomplish evaluation in competitive settings with different degrees of competence concerning the evaluation process and knowledge of the topics under evaluation. We further propose that this expertise cannot be dissociated from the evaluation devices which materialize and are materialized by specific relations between criteria, instruments and evaluation practices and which invite specific sequences of evaluative actions to achieve particular goals.

3. Case study and methodology

3.1 The evaluation process of the ERC

Set up by the European Union Commission under the Seventh Framework Programme for Research in 2007, the ERC has become one of the most important research funding organizations in Europe (see Luukkonen 2014; Flink 2016; König 2017; Gengnagel 2021). It aims to ‘enhance the dynamic character, creativity and excellence of European research at the frontiers of knowledge’ by selecting ‘excellent’ proposals and researchers ‘through peer-reviewed competitions’ ( ERC Website 2021). With an annual budget of 2 billion euros in 2022, the ERC offers generous grants (€1.5–2.5 million) to scientists conducting their research in EU institutions for up to 5 years. Among the various ERC grants, we focus on those designed for individual researchers at two different career stages: the Starting Grant (2–7 years after PhD) and the Consolidator Grant (7–12 years after PhD). Highly prestigious, these grants play a key role in researchers’ career advancement, for instance, by facilitating the acquisition of tenured positions for early to mid-career researchers. Since only 10–15% of the proposals submitted are funded, universities often pre-screen and assist applicants, resulting in a significant number of high-quality proposals for review.

The review process is divided into 27 1 thematic panels covering the three main research areas addressed by the ERC: Physical Sciences and Engineering, Life Sciences, and Social Sciences and Humanities. Each panel is composed of 11–18 internationally recognized scientists who decide which proposals should receive funding. There are two groups of panel members for every panel, each coming together for review in alternating years. Reviewers are relatively free in their evaluations to assess the ‘excellence’ of the proposals submitted, regardless of other considerations concerning disciplines or nationalities ( ERC 2019). The ERC’s approach to peer review can be summarized as ‘excellent researchers will recognize excellent researchers and research’ ( Müller 2020: 200). Expected to be impartial, the selection process relies on trusting the reviewers’ abilities ‘to enact, in a Mertonian sense, an ethos of organized scepticism’ ( Müller 2020: 200).

The review process itself is organized into two steps. In the first step, panel reviewers assess candidates’ CVs, lists of publications, and a short, five-page version of the research proposal called the ‘B1’ section. Each panel reviewer is assigned about 30–50 proposals, some of which may connect to the reviewer’s personal scientific expertise, and some of which do not. Each proposal is reviewed by three panel reviewers who have about a month to complete their reviews. During this period, ERC officers send various instructions to panel reviewers, such as the reviewers’ guidelines ( ERC 2019), and collect their comments and grades in an online platform. Panel reviewers then meet in person in Brussels and discuss and decide together which proposals should be selected for further evaluation in step 2. In this second step, panel reviewers receive the long versions of the research proposals that passed the first round (B2 section) and complete application materials are sent out to subject or topic-specific experts for additional review (usually 3-5). After external reviews have been collected, the panel reviewers meet again in Brussels to interview the pre-selected applicants, make their final assessments and decide which proposals will be recommended for funding.

The analysis we present in this article focuses specifically on how panel reviewers navigate the first step of evaluation. This step in particular is characterized by conditions of hypercompetition and subject-specific knowledge deficits. Here, panel reviewers have to assess a high number of proposals, at times without much expertise about the subject of the proposal and with no external reviews at their disposal, in a relatively short amount of time. Our analysis presents findings about how, under these difficult conditions, panel reviewers still accomplish review and selection and arrive at decisions together.

3.2 A method to study peer review panels

Studying peer review panels presents a methodological challenge because the work of these panels often cannot be observed directly ( Lamont 2009; Derrick 2018). According to Derrick (2018: 21), secrecy and confidentiality maintain the ‘illusion of expertise and objectivity that drives the legitimacy of peer review as the ‘gold standard’ evaluation tool of choice’. Yet, as demonstrated by a number of studies available on peer review, semi-structured interviews with panel members provide a relevant source of information for understanding how reviewers assess proposals ( Lamont 2009; Musselin 2013; Derrick 2018; Kaltenbrunner and de Rijcke 2019). Interviews also enable researchers to study aspects of peer review that take place outside of panel meetings, such as their individual review work, and provide reviewers space to reflect on both individual and collective aspects of peer review practices as well as to compare review experiences across different settings (e.g., for different funders).

In our case, we compensated for the impossibility of observing ERC panels directly by conducting ‘reflexive peer-to-peer interviews’, a version of the semi-structured active interview, with panel reviewers ( Müller and Kenney 2014; Fochler, Felt and Müller 2016). This method considers that reflexivity is not only possessed by the social scientists conducting the inquiry but also by the interviewees discussing their working practices. Interviews took place in an academic setting as an exchange between peers, which fostered mutual recognition and trust in the discussion. We paid attention to how reviewers narrated their ‘individual experiences and interpretations of contemporary academic work practices and culture’ ( Müller 2014: 336). In line with symbolic interactionist principles, we then reconstructed how reviewers made sense of their reviewing practices and how their meaning-making practices and beliefs regarding the ERC reviewing system influenced their understanding of their decisions and actions. Furthermore, we complemented the interviews with an analysis of different documents provided to the reviewers and related our interview findings to the specificties of the ERC’s organizational process.

Between 2018 and 2019, we conducted interviews with 22 reviewers belonging to different thematic panels of the ERC (seven in Physical Sciences and Engineering (PE), seven in Life Sciences (LS) and eight in Social Sciences and Humanities (SH)) for Starting and Consolidator Grants. Panel reviewers are highly recognized scientists employed at European and international universities and research institutions. During the second round of evaluation, external reviewers are also consulted and send written reviews that panel reviewers can use to inform their final decisions. However, since the external reviewers are limited to the second round of evaluation, we focus here only on panel members named ‘panel reviewers’ in this article.

Panels have a relative stability in terms of their composition and change only progressively, allowing certain socialization and standardization processes to occur within panels. When possible, we interviewed several reviewers from the same panel to assess panel dynamics. This enabled us to compare and contrast perspectives and experiences within one panel. Informed consent was obtained before all interviews and all details enabling informant recognition were removed from published excerpts. We also asked informants to avoid providing the names of other panel members to respect their confidentiality and to avoid mentioning specific details of proposals and applicants. Protecting applicants’ names and proposal details did not prevent reviewers from reflecting on their evaluation process experience; they simply spoke in more general or abstract terms.

The overarching goal of the interviews was to invite reviewers to reflect on their evaluation practices and how they manage to come to individual and collective judgments in the specific ERC setting. We asked them questions about individual and collective ERC evaluation processes, invited them to describe specific moments from panel discussions and asked how they assessed scientific excellence. Interviews were recorded, transcribed, and analyzed alongside additional ERC documents (evaluation guidelines, newsletters and internal reports).

The material was coded inductively using an approach inspired by grounded theory ( Charmaz 2006). Following the general empirical interests of the project, we coded for how reviewers describe their practices of evaluation. After several rounds of open coding, we noted different practices that were common across interviews. These practices represented ways of enacting abstract guiding principles such as ‘excellence’ within the specific setting of the first step of ERC peer review with its distinct characteristics (high number of proposals, high quality, frequent lack of expert knowledge, limited time frame). These practices constituted ways of relating aspects of projects, CVs and panel reviewers’ experiences and expertise, and appeared as recognized and established ways of performing peer review among ERC panel reviewers. Because of their widespread character, we termed these practices ‘evaluation devices’. In consequent rounds of focused coding, the notion that such evaluation devices appear to exist among the community of ERC panel reviewers guided our further analysis as a sensitizing concept. This led us to distinguish between four different evaluative devices ERC panel reviewers draw on to navigate the first step of the ERC review process.

Thus, following the principles of grounded theory, we did not arrive at the four specific evaluation devices by deriving them from existing evaluation theory but by naming their uses and effects during evaluation. We described how reviewers delegate their judgments to the previous achievements of the applicants as presented in their CVs (4.1; delegation device), how they calibrate their judgments according to the quality of the set of research proposals they have been tasked to assess (4.2; calibration device), how they articulate different elements from the CV and the research project in relation to each other to determine a proposal’s feasibility (4.3; articulation device) and how they judge a research project’s contribution to science and society (4.4; contribution device).

Our presentation follows a sequence, which reflects that most reviewers narrated their engagement with ERC applications as characterized by a sequential use of evaluative devices. However, we do not want to claim that all reviewers follow this sequence or for all proposals they evaluate; nor do we claim that all reviewers use the four devices we describe. When reviewers reported using these devices in a different order, we noted this explicitly, as well as when reviewers reported practices aimed at subverting how evaluation devices are commonly used.

For understanding the types of evaluation devices panel reviewers use and the sequence in which they use them, it is important to keep in mind that the ERC stresses not only the need for excellent project applications, but also puts a premium on assessing the excellence of the project’s Principal Investigator (PI). The merits of the applicant’s biography and future potential are as much a focus of review as the merits of the project itself. This is illustrated by the four main questions ERC reviewers have to answer for each application, two of which pertain to the outstanding character of the research project and two that focus on the outstanding nature of the applicant. Thus, different to other application processes, where the project might be at the center of the evaluation, and the PIs are mainly assessed with regard to their capability to conduct the project successfully, assessing the excellence of the candidate plays an important role in the ERC evaluation process. This is palpable in the types of devices panel reviewers use to navigate step one of the evaluation process.

4. Results

4.1 Delegation devices: Deferring the evaluation of applicants to their previous achievements

At the beginning of the first evaluation step, ERC panel reviewers receive between 30 and 50 applications consisting of CV documents and research projects, which they must assess in about one month. From the start, they are aware that only a few of the proposals will make it to the second round of evaluations and even fewer will be funded. Consequently, the first step of peer review focuses on dividing applications into two categories: applications with no chance and applications with a great chance to advance. To make a first cut through the heap of proposals in front of them, many panel reviewers begin this categorization by assessing if the applicant’s CV is ‘competitive’ and examining the applicant’s previous achievements. They do this at the beginning of their evaluation because assessing CVs is quicker than assessing entire proposals. We identify these first evaluation devices as delegation devices to designate how panel reviewers delegate the evaluation of the applicants’ singular qualities to their previous successes. We present different delegation devices that mobilize publications, citations and other metrics to rapidly assess applicants, then we explain that these devices can be contextualized to reflect experience and disciplinary differences in order to strategically save some applicants from the cut.

A majority of panel reviewers, without clear distinction according to research field and experience in ERC reviewing, considered that the CV revealed if applicants were ‘in the game’ (ERC10, SH). These reviewers used quantified indicators of previous successes, such as publication numbers, impact factors and citation metrics, to determine if applicants had conducted previous research projects successfully. They used these indicators as ‘quality proxies’ to help identify the best applicants in their assigned application set ‘without dedicated knowledge of the research topics’ (ERC3, SH). For instance, they used publication numbers and venues as ‘checkpoints’ of ‘pass and non-pass’ (ERC3, SH) to decide whether applicants conformed to the ‘requirements’ of the ERC (ERC9, LS) and qualified for further evaluation, or if they should be eliminated directly. Panel reviewers assumed that the quality of publications could be approximated by assessing the journals in which they were published and other related metrics, such as citation numbers, instead of reading them. Through these practices, panel reviewers deferred their own judgment to the judgments of peers who had previously reviewed or referenced these articles. A reliance on metrics beyond the publication venue was especially prominent among reviewers in natural sciences (LS and PE). They used the Hirsch Index to judge the importance of publications listed in the CV. This enabled them to use the number of citations to measure the applicant’s impact on the community.

Some panel reviewers also focused on other CV elements beyond publications, such as conferences chaired and organized, funding obtained, involvement in research committees or international research stays to gauge the quality of candidates. Combined, these indicators created a standardized way of transforming unique biographies into comparable entities ( Kaltenbrunner and de Rijcke 2019). One panel reviewer described his particularly systematic process of rapidly assessing CVs through using a set of standardized grades:

I built an Excel sheet to have an objective measure for the CV. To save time, I had different criteria: international publications, level of publications, number of publications, citations, project managements, supervision. I put grades: 2 was very good, 1 was medium—and 0 was bad. Then I added all these criteria. I also had a column with the age of the person. When I looked at my file, some people were 37, and others were 52. I thought that it might not be fair to appreciate them with the same criteria (ERC8, SH).

In the context of interdisciplinary panels evaluating applicants with different research experiences, some reviewers emphasized, however, the need to consider the diversity of scientific careers and disciplinary publication practices. According to them, delegation devices were not applied in a mechanical fashion, but were adapted to specific situations. As exemplified by the interviewee quoted above (ERC8, SH), these panel reviewers tried to contextualize publications according to the age and field of the applicant.

In 2013, the ERC began separating Starting and Consolidator Grants to facilitate the evaluation of applicants with differing years of experience. Despite this division, the publication metrics of applicants could still strongly differ in the 5 years covered by each grant, and some of our interviewees took into account another metric: the research age of applicants. For instance, in the Starting Grant, panel reviewers might consider younger applicants with only a few publications, but eliminate older applicants who had not published extensively. Similarly, other panel reviewers reported contextualizing publication metrics according to the applicant’s field of study. Some panels took differing orders of authors into account, such as the alphabetical order in informatics. Other panels differentiated citation metrics according to the applicant’s field of study, such as spectroscopy researchers who had fewer citations than researchers in material sciences applying to the same panel (ERC21, PE). These reviewers also recognized that the ability to contextualize publication metrics was often the outcome of discussions with other panel reviewers, demonstrating that delegation devices could produce knowledge about the use of different metrics within the panel setting. Thus, the first cut that reviewers made on their own remained preliminary.

Furthermore, some panel reviewers also discussed situations where different delegation devices were used to save applicants from the first cut when their academic biographies did not correspond to the panel’s established norms for publication output and other metrics. For example, some considered applicants with fewer publications or publications in less prestigious journals if the applicants came from Eastern or Southern European countries or small universities, which might have limited access to research infrastructures and international research networks. Other panel reviewers valued applicants’ experience in teaching, administrative responsibilities or other social achievements, stating that they valued researchers who were not ‘egoists’ interested only in publications (ERC14, PE). In these two examples, the delegation devices were used to save applicants whose proposals contained highly original ideas, but whose biographies did not appear competitive through the lens of standard delegation devices. Yet, their use was mostly constrained to panel reviewers who were quickly able to identify the scientific contribution of the proposal, i.e., who were experts in the field and could skillfully use contribution devices (Section 4.4.). It is further important to note that a few panel reviewers in our sample (n = 3) emphasized that they explicitly refrained from reviewing the CVs first, although they understood this to be the standard practice in ERC panels, because they wanted to read the proposal without being influenced by the perceived competitiveness of the CV.

4.2 Calibration devices: Adjusting judgments across a pool of proposals

Whether panel reviewers had first assessed the CVs or not, when reading the proposals, they engaged in a practice of calibrating their judgment in relation to the pool of applications they had received. Here, panel reviewers reported the use of another evaluation device, which we call a calibration device, to adjust their judgment of a specific proposal by assessing the quality of the whole set of proposals being evaluated and ranking individual proposals within the sample. In the interviews, these panel reviewers themselves employed the term ‘calibration’ to describe their practice of assessing proposals relatively by comparing them to each other instead of evaluating them individually. We identify three types of calibration devices used by panel reviewers: adapting evaluation time to the quality of proposals, adjusting standards of judgments to the set of proposals and using grading to limit the number of proposals that are discussed in detail in the panel meetings.

First, some panel reviewers calibrated their judgments by adjusting their review time according to the proposal’s quality. They stated that they spent more time on proposals they found compelling. They called this a ‘comparative evaluation’ (ERC2, LS). If some basic elements were not present (clear research question, proper state of the art or sufficient number of pages), they considered these proposals as ‘non-competitive’ and eliminated them directly (ERC1, SH). For instance, one panel reviewer (ERC1, SH) explained that she did not spend more than 30 minutes on proposals that did not strike her as high quality or were missing some basic elements, but she might take more than 2 hours for proposals she found compelling.

Second, other panel reviewers explained that they quickly read the whole set of proposals and identified which proposals were better or worse than the others before further assessing the proposals individually. In that case, panel reviewers did not assess any specific aspect of each proposal but rather looked to establish a preliminary, almost embodied, impression of the whole set of proposals. Ultimately, panel reviewers need to rank proposals and only the top proposals would have the chance to make it to step 2. Each new proposal that was evaluated could change the previous ranking—both during their individual evaluations and in the group deliberations—and thereby modify the outcome of the selection process, demanding that reviewers take into account the consequences of new evaluations for former decisions. Even though this calibration device was used by a diversity of reviewers without clear disciplinary distinction, it tended to be especially applied by younger reviewers who had less experience in panel reviewing or who participated in the ERC review process for the first time. ‘Lacking confidence’ to evaluate proposals from various research fields, these reviewers attempted to get a ‘feel for the sample’ and ‘calibrate’ themselves accordingly (ERC4, SH). In that case, they adjusted their standards of judgment by conducting the evaluation progressively, going back to proposals they had already evaluated and adapting the previous grades to make them fit with the pool of proposals.

After I start reading more, I take notes in a provisional document. I read, adjust, and go back, not to make evaluation relativist but to recognize that I’m starting to see perhaps a better way of looking across candidates. It helps me score in a way that reflects the pool of proposals rather than just one on its own. (ERC4, SH)

However, divergence existed concerning the role of calibration devices to ensure disciplinary diversity. In cases where two proposals came from the same sub-discipline or were written on the same topic, reviewers disagreed on whether proposals should be evaluated individually, for their own qualities, or relatively, in comparison to the set of proposals. While some reviewers reported selecting only the ‘best’ projects regardless of topic or field, other reviewers asked for more representation of the fields covered by the panel, especially when they felt that their fields of research were not correctly evaluated. These reviewers used calibration devices to diversify the evaluation results by considering the representation of fields of research. However, other reviewers rejected representativity, such as in the earth science panel, where reviewers seemed to agree to select only the ‘best’ proposals regardless of whether that might lead to some fields receiving funding more regularly than others.

Third, panel reviewers used calibration devices to rank proposals and discuss only some of them. After the individual evaluation of the proposals, the secretariat of the ERC averages the individual grades given by the three panel reviewers and divides the proposals into specific groups (A, B and C) ( ERC 2019). In the panel meeting, reviewers can debate and change their grades. Most reviewers reported that grades were used to calibrate their judgments rapidly and discuss only some of the proposals in detail in the panel meeting. They explained that proposals with average high grades (A) from the assigned reviewers or with average low grades (C) were usually not discussed and were either selected or eliminated directly, while proposals that had received medium average grades (B)—either through receiving multiple Bs or As and Cs that averaged to a B—were further discussed.

Panel reviewers were aware that the lack of discussion of some proposals might eliminate promising proposals. Still, they considered that discussing all proposals was impossible due to the high number of proposals and limited amount of time. As a panel chair explains in the quote below, using grades to make a cut between proposals during the meeting allows reviewers to complete their difficult task on time. The goal of the step 1 evaluation would be primarily to ensure that no unsuitable proposals pass to step 2. The interjection ‘pfft’ in the quote below designates the action of making this cut.

A colleague who has never done that before was terrified that we would be unfair to someone every time we had to say: ‘okay, now all the ones that we want to look at are above the line, and the ones below the line *pfft*’. He had a really hard time taking that step. But I used a lot of time with him and tried to convince him that we will never be able to guarantee that there aren’t people below the line that deserve to come above the line. The important thing is that we have to ensure that there is nobody who gets an award who didn’t deserve an award. (ERC17, PE)

In a limited number of cases, panel reviewers reported discussing a few proposals located under the line before eliminating them. After having conducted a statistical analysis to understand the grading practices of reviewers present in their panel, a panel reviewer and the panel chair quoted above (ERC17) found that the nationality of reviewers had an effect on the grades they received, with some nationalities grading harsher than others. Other reviewers made similar statements concerning proposals graded by reviewers specialized in the topic (see Section 4.4). This would lead to some proposals being graded more strictly than others and hence being unjustly not discussed in detail in the panel meeting. In response, the panel chair previously mentioned (ERC17) asked the ERC secretariat which proposals received both very high and very low grades. He proposed discussing some ‘below the line’ proposals in more detail to create an opportunity to save them from the cut. Offering this possibility could give a chance to promising proposals which may not have been assessed correctly while still limiting the number of proposals to be discussed in detail collectively. However, as this example shows, whether such practices exist within panels depends on individual initiatives, in this case, by a panel chair.

Like delegation devices used to evaluate CVs, panel reviewers mobilize calibration devices to quickly eliminate some proposals when there is a very high number of proposals. The conjoined use of delegation and calibration devices is intended to ensure that, ultimately, only the best CVs and projects will be evaluated in more detail (individually and collectively) with the following evaluation devices.

4.3 Articulation devices: Coordinating the evaluation of the research project and the CV

After assessing the individual CVs and the pool of proposals, most panel reviewers reported evaluating the remaining applications by relating different elements of the project and the applicant to each other. These reviewers assessed how various elements fitted together to determine if proposals were ‘feasible’ and ‘doable’. Drawing on Fujimura (1987), we propose to name this device an articulation device, as it hinges on combining different elements to assess the research projects’ coherence and their compatablity with the applicants’ profiles. As Fujimura (1987) proposes, articulation concerns the coordination of miscellaneous elements related conjointly to the experiment, the laboratory and the social world. We present two types of articulation devices used to evaluate the internal coherence of the proposal and the compatibility of the project and the applicant.

In reading a proposal, many panel reviewers explained that they analyzed the articulation between different elements to see if the proposal was realistic and could be implemented successfully. Most of them reported examining how the proposed literature and methods could be used to answer the research question. Panel reviewers in the social sciences tended to emphasize the feasibility of accessing the field or data, while reviewers in the natural sciences focused more on the institutions where the research would be conducted, especially on the available instruments and staff. In both cases, panel reviewers explained that they assessed the general organization of the project (time frame, schedule of activities, combination of work packages, and scientific collaborators). In research fields where project funding was usually smaller than the average budget of the ERC projects (1.5 million euros), such as in the social sciences and mathematics, they verified that the project’s scope justified the amount of money requested.

Most of the panel reviewers we interview also articulated the research projects with the applicants’ backgrounds and skills. To ensure that applicants could conduct their projects, these panel reviewers used the CV to assess how the projects built upon the applicants’ previous work. As one panel reviewer explained: ‘The project needs to be good, but it also needs to be done by the right person’ (ERC20, PE). In the Starting Grant panels, many panel reviewers examined former publications to assess the applicants’ capacity to be independent and to ensure that the projects were independently written by the applicants. However, the different panel reviewers did not agree on how to assess independence in publications. Some panel reviewers in the social sciences emphasized the importance of publications written without PhD and/or postdoc supervisors, while panel reviewers in the natural sciences tended to prefer research conducted on subjects different from those of the supervisors. For both Starting and Consolidator Grants, some panel reviewers also looked for preliminary results to avoid selecting applicants who simply ‘jumped on a train’ and followed a research trend instead of developing their own ideas (ERC5, LS1).

When the articulation of different elements to each other was unusual or the proposal’s feasibility was uncertain, many panel reviewers used the notion of risk, specified in the reviewer guidelines ( ERC 2019; see also Luukkonen 2012). These panel reviewers assessed risk by articulating the evaluation of applicants’ potential with an evaluation of the potential of the project. For instance, they examined the proposed ideas and methods and compared them with the research plan and the applicant’s previous achievements. By analyzing risk, these panel reviewers implemented a strategy to minimize potential failures of the research projects. However, following the motto ‘high-risk/high-gain’ ( ERC 2019), these panel reviewers did not always view risky research as something negative that could lead to failures, but as a research aspect that could only be carried out by the best researchers. According to them, a loose articulation between different elements could lead to new results and more important advances in research fields. For instance, if further project stages were based on the results of uncertain experiments, or if a method used an approach from a disparate research field, panel reviewers might view the project as risky but of value.

There was no consensus between panel reviewers about how strong or loose the articulation between different elements should be; preferences were largely individual. Some panel reviewers preferred well-articulated and safe projects with a proposed plan B in case of failure, while others supported more risky proposals with a more original combination of different elements. In the case of the former, when reviewers emphasized the need for a strong articulation between different elements of the project as well as between the project and applicant’s profile, articulation devices could come to be in tension with the contribution devices introduced in the next section.

4.4 Contribution devices: Adapting the evaluation of the proposals to their consequences for science and society

The fourth evaluation device used for proposal selection focuses on the proposal’s possible contributions to science and society. In the instructions provided by the ERC, relevant contributions are framed as ‘ground-breaking’ discoveries, which are expected to be published in top scientific journals and to lead to the development of new research fields ( Luukkonen 2012; Laudel and Gläser, 2014). Although ‘ground-breaking’ is a specific evaluation criterion defined in the ERC guidelines ( ERC 2019), which designates fundamental novelty in opposition to ‘incremental research’ (ERC13, SH), reviewers also applied simpler criteria such as ‘novelty’, ‘originality’ and ‘social contribution’. Inspired by Collins and Evans’ (2007) concept of contributory expertise, we have named these evaluation devices contribution devices, as they assess the value of scientific ideas by comparing them to existing knowledge and by analyzing potential contributions to science and society. After explaining that contribution devices create a tension between panel reviewers concerning the level of knowledge needed to use them, we present different contribution devices that draw on the evaluation of novel ideas, method development and social contribution.

To assess how proposed projects might contribute to existing research, most interviewees reported that they needed to have pre-existing knowledge of the subject matter of a proposal. Yet, panel reviewers often received proposals that were not in their research field because ERC panels are interdisciplinary, and each proposal must be assessed by three panel members in the first step of evaluations. Although our interviewees considered contribution devices easy and quick to use when they had some knowledge of a proposal’s topic, many found them hard to use if they did not possess specialist knowledge. Some recounted that they would try to read the literature referenced in the proposals to gain some knowledge of the topic and enable themselves to pass appropriate judgments. Others recounted situations in which they had refused to assess proposals because those proposals were too disconnected from their expertise. However, panel reviewers reported that, overall, they found ways to assess the contribution of proposals even if they were not experts in the field.

At times, contribution devices and the question of how much expert knowledge is required to invoke them created tensions between reviewers during panel meetings. Panel reviewers with high levels of field-specific expertise could use these devices to convince their colleagues during panel discussions, while reviewers with less knowledge about the topic felt they had to accept the judgments of their peers, even if they disagreed. For instance, a reviewer in STS, who supported a proposal in anthropology, had to accept their anthropology colleagues’ opinions that the proposal was not ‘cutting edge’ (ERC7, SH) and that the anthropologists’ assessment would sway the overall evaluation. Based on similar experiences, some interviewees argued that, ultimately, panel reviewers with field-specific expertise should not be the only ones to judge contributions. According to them, panel reviewers whose work was quite near to the proposal would sometimes lack the necessary distance to judge contributions. While some situations were (dis-)qualified as plain ‘political engagement’ (ERC6, SH) aimed at shaping the field, or as ‘over-criticism’ (ERC19, LS) based on how the reviewers themselves would approach the problem, panel reviewers argued that there was also a more general value in allowing panel members with less knowledge about the research topics to judge proposals’ contributions to science and society. They considered that proposals could be approached more openly, without defending only one vision of the research. Such statements contradict the customary rules of ‘deferring to expertise’ and ‘respecting disciplinary sovereignty’ presented by Lamont (2009) and might be related to the extremely high levels of reputation and esteem all reviewers on ERC panels hold.

Although all interviewees reported using a range of contribution devices, they mobilized them differently according to their disciplines. In the natural sciences, panel reviewers often looked for research that had never been conducted before, framing contributions as unique discoveries in line with the criterion of ground-breaking science. Some of them checked existing publications on internet databases (ERC2, LS) or verified that the topic was not already being researched (ERC16, PE). Other panel reviewers disagreed on the methods needed to make contributions, especially concerning the design and use of novel technological instruments and the need to produce new data. For instance, some reviewers in the earth sciences considered that novel contributions could only be made with new technologies such as satellites (ERC18, PE), while others from similar fields argued that available data could be used to conduct novel research and that ‘there is no need to produce new data all the time’ (ERC14, PE). In the social sciences, our interviewees placed a higher value on the contexts in which research was conducted and argued that significant contributions could be made by introducing only some novel elements to a project, such as conducting research in a new context but with existing methods. These panel reviewers considered that a modification of previous studies, or the test of new concepts, could be a sufficient contribution without requiring the development of a completely new research approach (ERC4, SH). Consequently, most of these panel reviewers criticized the notion of ground-breaking science, which, they argued, does not reflect the diversity of possible important contributions in the social sciences but was rather based on natural and life sciences models of what constitutes novelty and originality (cf. König 2017).

Many of the panel reviewers we interviewed also evaluated how research projects could bring positive changes to society, even if this is, arguably, not part of the ERC evaluation criteria. In our material, statements concerning contributions to society were mostly made by panel reviewers in research fields directly concerned with societal issues, such as earth sciences (climate change), ecology (biodiversity crisis), molecular biology and physiology (health and medicine) and social studies of science (science in society). Yet, these panel reviewers also agreed that contributions to society were less important than contributions to science itself. They mainly discussed these additional benefits in the presence of disagreements between proposals or when they were directly associated with research contributions. For instance, a panel reviewer (ERC2, LS) opted to support an applicant who did not publish in high-impact journals but who had realized a difficult experiment whose results could ultimately provide the basis for novel medical applications.

In summation, contribution devices seem to have a somewhat precarious role in making the cut during the first step of evaluation. First, not all reviewers can appropriately use contribution devices due to a lack of field-specific expertise. Second, the optimal epistemic distance to a proposal is sometimes disputed when using contribution devices to form one’s judgment. Third, reviewers recounted that particularly original or risky proposals were usually challenged by at least one reviewer and were often ultimately rejected due to the resulting mediocre average score and the pressure to select proposals with high scores from multiple reviewers. Fourth, even if high scientific contributions are detected in a proposal, they can only pass if certain CV criteria are also fulfilled, such as a continuous high-impact publication track record. While some original applications that do not fulfill these requirements can be salvaged by particularly engaged panel reviewers, it is not the norm. Thus, while one might assume that contribution devices dominate selection in the first round of evaluations, our empirical study shows that the status of contribution devices is more precarious and unclear.

5. Discussion and conclusions

Our study complements previous work on panel reviews by describing not only the influence of evaluation criteria ( Luukkonen 2012; van Arensbergen et al. 2014), disciplinary traditions ( Lamont 2009) and operating rules ( Huutoniemi 2012), but also the role of hypercompetition for external funding in response to changing research conditions ( Whitley and Gläser 2007; Fochler, Felt and Müller 2016). This hypercompetition is particularly palpable in the first step of ERC evaluations, where a large number of proposals must be reduced to a smaller sample that will then pass on to step 2. Constrained by time and, often, lack of expert knowledge to select only the best proposals, our analysis shows that ERC panel members employ a range of evaluation devices to actively produce knowledge about applicants and projects that helps guide their assessment and, ultimately, allows them to compare and rank a range of singular applications. Importantly, our work does not aim to find fault with the panel reviewers and their practices. Rather, we are interested in better understanding, naming and constructively discussing these practices and their implications—practices that panel reviewers develop and use to accomplish the complex task of peer review under difficult conditions, characterized by high competition and time constraints on both individual and collective levels (cf. Müller 2020).

We presented four types of evaluation devices used by panel reviewers to navigate the first step of ERC evaluations: (1) delegation devices, where reviewers delegate their own judgments to candidates’ past achievements and the related evaluations; (2) calibration devices which adjust the assessment of individual proposals to the quality of a given set of proposals; (3) articulation devices which combine multiple elements to assess the coherence of projects and the compatibility of projects and applicants; and (4) contribution devices which consider the impact of the proposed research on science and society. These devices assemble a set of practices (grading, ranking, comparing and standardizing), evaluation criteria (achievements, risks and novelty), and objects (CV, proposals, scientific literature and laboratories). By using these devices, reviewers compare applicants to idealized scientific career trajectories (Section 4.1), proposals against other proposals in a given pool of applications (Section 4.2), specific elements of the proposal against each other and the applicants’ capacities (Section 4.3) and projects against research that has already been or should be conducted (Section 4.4).

Our analysis indicates that ERC panel members employ these four evaluation devices for different purposes and that their complementary use enables the synthetic assessment of various qualities of proposals and applicants. While delegation and calibration devices are used to eliminate a number of applications and discuss only the most important ones, articulation and contribution devices ensure that projects are feasible, original and significant. Delegation and calibration devices order applicants and proposals by drawing on quantified evaluations based on CV metrics or grades and ranking positions, thus objectifying the selection processes by standardizing applicants (number of publications) and proposals (list of fundamental elements). On the other hand, articulation and contribution devices allow for individual judgments, which are not based on standardized or quantified criteria and create more disagreements between reviewers about how to use them. From our analysis of this use of evaluation devices, we infer three conclusions.

First, our analysis of the first step of ERC review processes reveals a tendency to use the four evaluation devices in a sequential order. This specific use is incentivized by the high levels of competition and workload during the first step, and the need to make a cut in order to successfully distinguish proposals that should be retained from those that should not. Usually, panel reviewers start their evaluation with delegation and calibration devices because they are more generic and quicker to use, unlike articulation and contribution devices, which are more personal, less consensual and generally used at a more advanced stage of the evaluation process. Delegation and calibration devices are primarily used to determine which proposals are definitely not competitive because they lack certain attributes that are considered essential. The elimination role of the first two devices and their specific use within the sequence of all four devices may explain, to a certain degree, why peer review tends to become conservative under highly competitive conditions (cf. Langfeldt 2001).

By using evaluation devices to make a cut among the proposals, panel reviewers seem less concerned about eliminating applications with some qualities (e.g., original contributions) and more concerned about ensuring that weak applications are not selected for the second step of evaluation. To use the words of an ERC panel chair whom we have quoted before: ‘We will never be able to guarantee that there aren’t people below the line that deserve to come above the line. The important thing is that we have to ensure that there is nobody who gets an award who didn’t deserve an award’. We use the term evaluative pragmatism to capture this specific approach to peer review so eloquently outlined in this quote. This idiom seeks to describe how reviewers adjust their practices to the pragmatic needs of specific evaluative settings, in this case, to the need to eliminate a high number of proposals quickly, which reduces, for example, the social acceptance of long, drawn-out debates. The panel reviewers know that the practices we subsume under the idiom of evaluative pragmatism have drawbacks and that some excellent proposals might go undetected because of the way they currently employ evaluation devices. However, they believe that this approach at least ensures that the quality of grant recipients and their proposals is generally high.

Second, our article offers two new vantage points from which to discuss the role of expertise in evaluation practices. Firstly, we use the term reviewing expertise to describe how a specific type of expertise is created by navigating the review processes and developing the skills required to utilize different evaluation devices. While panel reviewers have varying levels of knowledge and competency to start with (concerning, e.g., methods, familiarity with literature in different research fields or prior experiences in peer review), they also create new knowledge through using specific evaluation devices in the distinct context of ERC step 1 peer review and its affordances. Reviewing expertise, in this sense, is context specific; it is an epistemic practice enacted by reviewers during distinct evaluation activities and encompasses both knowledge brought into and produced by the evaluation practices.

Furthermore, it follows that the expertise of panel reviewers cannot be properly divided into interactional and contributory expertise, contrary to findings by Collins and Evans (2007). In the case of the calibration device (Section 4.2), reviewers aim to develop an almost embodied skill to classify proposals quickly in relation to each other and along a ranking scale, but they also can opt to evaluate proposals independently where they see fit. In the case of the contribution device (Section 4.4), panel reviewers debate about the right level of knowledge needed to assess contributions, reflecting a more complex picture of what constitutes a contribution, how to evaluate it and what role field-specific expertise should play in this context. Consequently, reviewing expertise can neither be reduced to the ability to use quantified indicators (cf. van Arensbergen et al. 2014) nor to knowledge about specific research fields ( Lamont 2009; Huutoniemi 2012). Instead, reviewing expertise is a distinct epistemic practice that unfolds within the specific settings of a peer review process. While reviewers can bring some experiences and forms of expertise from one review setting to another, reviewing expertise is never fully transferable from one evaluation context to the next and must always be adapted and refigured.

Finally, we discuss how the use of specific evaluation devices in the first step of ERC peer review might influence what kinds of research and researchers are selected for funding. We have shown that the dominance of non-experts in proposal assessment makes the status of the contribution devices precarious. Panel reviewers who are quite close to the research field of the respective candidate and who hold significant renown have been known to ‘save’ certain applicants from elimination, even if other devices, such as the delegation devices focused on CVs, would have disqualified them. Our analysis suggests that current practices favor proposals that are easily understood across various fields and knowledge levels. Additionally, CVs exhibiting characteristics that are increasingly accepted as markers of excellence across fields, such as a continuous and high publication output and a strong record in third-party funding, are favored.

In settings of high competition, panel reviewers must set personal preferences aside because they can usually only champion those applications that will be backed by others and advocating for outliers too often could negatively impact their credibility. This might be another reason why peer review in highly competitive settings exhibits a conservative bias. This bias can easily lead to the exclusion of researchers who cannot perform on these registers due to particular institutional constraints, research fields, subspecialties, genders, national research systems or teaching and service duties. A question to ask in this context would be how reviewing processes and outcomes would change if external expert reviews, i.e., reviews from subject-specific reviewers, were already commissioned in step one of the ERC review process. This could possibly shift the balance between the different evaluative devices in favor of contribution devices.

Some might argue, and rightfully so, that this would increase the number of hours spent on peer review, as more people would be involved in assessing a higher number of proposals. Both the ERC and the ERC panel members whose experiences and practices we have discussed in this article are highly committed to high-quality review. Yet, they are asked to perform very difficult tasks for which they, at times, lack fundamental resources, such as sufficient time in their busy schedules 2 and field-specific expertise. Currently, the solution appears to be to coach ERC applicants to write proposals in a standardized format and language that can be read in haste by non-experts (see James and Müller, in prep., for an analysis of advice and coaching offered to ERC applicants). However, this response might diminish the value of the contribution device in the review process, as proposals increasingly appear as if they can be judged by non-experts and experts alike. We do not propose that the current approach necessarily negatively impacts the quality of the proposals selected. In fact, it might possibly favor some types of high-quality work that would otherwise not have been selected if contribution devices were at the forefront of the review process. However, we propose that there is a need for an open and empirically grounded discussion of when and how field-specific expertise should be involved in peer review processes and for analysis that seeks to understand how this involvement influences the selection of research and researchers for prestigious funding such as ERC grants.

Ultimately, we understand this explorative study as a call for more empirical research on peer review processes. We firmly believe that researchers, reviewers and research funders have a lot to gain by opening the closed doors of peer review and by facilitating empirical investigations of these essential, often difficult and underresearched practices of academic life.

Funding

Research for this article was supported by the Gender & Diversity Incentive Fund of the Technical University of Munich. It has been conducted in conversation with and intellectually supported by the Deutsche Forschungsgemeinschaft Research Unit 2448 ‘Practicing Evidence—Evidencing Practice’.

Conflict of interest statement. None declared.

Acknowledgements

We are grateful to our informants, who gave us their time and discussed their evaluation practices with us. We want to thank Julian Hamann, Frerk Blome and Anna Kosmützky, as well as the participants of the workshop, ‘Devices of Evaluation’, for their feedback on preliminary versions of this article. We acknowledge and are grateful for the important work of Kay Felder, who conducted many of the interviews which this article is based on. We further want to thank the anonymous reviewers and the editors for their thoughtful remarks and contributions to improving the manuscript.

Endnotes

At the time of the empirical study, the ERC still had 25 panels. Panel numbers and configurations are regularly reviewed and occasionally revised by the ERC.

For a discussion of the need to dedicate more time explicitly to peer review in the academic system and to offer reviewers temporal compensation, see Müller (2020).