Self-citation patterns among researchers
What's normal? What's unusual?
The landscape of academic publishing is complex, with self-citation standing as an intriguing aspect of scholarly communication. While a degree of self-citations is expected as researchers build upon their existing body of work, excessive self-referencing in some cases can obscure the true impact of research, making the work seem more broadly important than reality warrants. On the other hand, a complete absence of self-referencing in a researcher’s publication record could also raise concerns, particularly when an established body of work exists. This absence may be a symptom of the involvement of paper mills.

Understanding the prevalence and patterns of self-citation across different scientific disciplines is important for assessing research health, informing funding strategies, and guiding academic evaluation. This article presents an analysis of self-citation behaviors among active researchers, shedding light on how these dynamics vary across diverse Fields of Research (FoRs) and researcher publication profiles. By examining various self-citation thresholds, we can uncover disciplinary norms and identify areas where self-citation patterns might warrant further investigation. (See the methodology at the end of the post for more details.)
Self-Citation Distribution
Our analysis includes 5.4 million eligible researchers, and the foundational data reveal that the majority of active researchers exhibit high rates of self-citation.
The table illustrates the distribution of these researchers across five defined thresholds based on the percentage of their publications that include self-citations.
The single largest group, representing 63.8% of the total 5.4 million, falls into the >75% threshold. This indicates that for almost two-thirds of active researchers, self-citation is present in over three-quarters of their eligible publications. This practice is widespread and healthy within this defined context.
The overall population quickly drops off at the lower thresholds, confirming that high rates of self-citation are typical: the 25%−50% range accounts for 9.3% of researchers, while the <25% range accounts for just 6%.
The 0% self-citation threshold accounts for the smallest fraction of the population at only 2%. While zero self-citation might initially seem ideal, this extremely low value can also be problematic. In fields where research is highly cumulative and specialized, a 0% self-citation rate may indicate a few possibilities. It could be that the researcher is new and has no work upon which to build. Not self-citing could also signal a missed opportunity to fully contextualize their new findings within their established research thread. Problematically, though, it could also signal that the paper was not written by the author - it could have been purchased or plagiarized.
Field of Research
Another way to look at self-citations is through Field of Research (FoR), which provides a granular view of how self-citation thresholds are distributed across disciplines. Here, each researcher is categorized by their primary FoR, derived from the most common FoR associated with their body of work. This categorization reveals a clear separation between humanities/social sciences and the physical/life sciences.
The highest prevalence of researchers who have no self-citations in their eligible publications is in the Creative Arts and Writing, Law and Legal Studies and Commerce, Management, Tourism and Services fields respectively with values between 5-6%. This shows a preference for citing external work.
On the other hand, the >75% threshold reveals the fields where self-citation is an overwhelmingly dominant and defining practice: Physical Sciences (92% of researchers include self-citations in over three-quarters of their publications), Chemical Sciences and Biological Sciences (84% of researchers include self-citations in over three-quarters of their publications), Earth Sciences (83% of researchers include self-citations in over three-quarters of their publications). These figures demonstrate an extreme concentration of self-citing behavior in the STEM fields and reinforces the finding that in specialized, methodology-heavy fields, self-citation is the expected and a defining publication pattern.
The detailed FoR analysis confirms that disciplinary norms govern self-citation behavior, and deviations from these norms, at either extreme, merit closer inspection.
While low self-citation is the norm for fields like Law and Arts, its near-absence in some disciplines signals potential issues. In STEM fields the lack of self-citation is unusual. For these methodology-heavy, specialized fields, a 0% self-citation rate may be problematic, suggesting there could be paper mill activity.
Academic Age
Analyzing the self-citation patterns across different academic age ranges, from early-career researchers to highly experienced scholars, reveals which groups are driving the extreme behaviors.
As researchers gain more experience, their academic output grows, leading to a natural increase in the percentage of those who frequently cite their own work.
The early career researchers (with an academic age below 1) show a surprisingly high number of researchers who have self-citations in over three-quarters of their publications - 49%. A rate of over 75% suggests that their very limited output is heavily self-referential, potentially indicating a reliance on or expectation of co-author citations.
The >20 years brackets show that 1.2% of these highly experienced researchers have no self-citations. For a researcher with a career spanning over two decades and a minimum of five eligible publications, a 0% self-citation rate is anomalous and worth investigating.
Neither 0% nor 100% self-citation is inherently ‘good’ or ‘bad’; rather, the issue lies in deviation from disciplinary expectations: the near-zero self-citation rate in STEM fields suggests a failure to build upon established foundational work and the same applies for researchers with more than two decades of experience.
What this all reinforces is that research is nuanced, yet there are ways to understand normal and unusual patterns.
Conclusions
The analysis of self-citation patterns across 5.4 million active researchers highlights that self-citation behavior is shaped by disciplinary norms and career stage. The overall distribution confirms that high self-citation (>75% of publications containing a self-citation) is the dominant behavior for the majority of active scholars.
High Self-Citation is the standard in Physical, Chemical, and Biological Sciences. This reflects highly specialized, cumulative research that relies heavily on citing one’s own complex methodologies, while low self-citation is the norm in Humanities and Law. A complete absence of self-citation (0%) is statistically rare (2%), but it could be an indication of paper mill activity.
About the Author
Mihaela-Alina Coste is a Senior Data Analyst at Digital Science. Her work involves creating embedded Looker applications and conducting exploratory data analysis on various topics, such as research integrity. She holds a PhD in Communication Sciences.
Disclaimer
This analysis was conducted using data sourced from the Dimensions platform, a product of Digital Science. The author is employed by Digital Science and currently works on the Dimensions Author Check application, which assesses research integrity and uses metrics like those discussed in this article to identify and flag potential anomalies, such as self-citation rates.
Methodology
The data for this analysis were sourced from Dimensions. Our approach focused on isolating a population of active and impactful researchers and then defining and quantifying self-citation at the paper level.
The following criteria were applied to select the eligible research output and researchers:
Document Types: Only research outputs classified as ‘REVIEW_ARTICLE’, ‘RESEARCH_CHAPTER’, ‘RESEARCH_ARTICLE’, or ‘CONFERENCE_PAPER’ were included to focus on core academic contributions.
Reference Threshold: Each included publication was required to have a minimum of 9 references to ensure a substantive engagement with existing literature.
Recent Activity: To identify active researchers, we exclusively considered individuals whose last publication year was 2023 or later.
Publication Volume: We focused on researchers with a minimum of 5 eligible publications within the dataset.
A self-citation, at the paper level, was defined as any cited paper that includes at least one author who is also an author of the citing paper. To illustrate this definition, consider the following examples for a main researcher ‘A’ whose publication has authors ‘A, B’:
Example 1 (Direct Self-Citation): If researcher A’s publication references R1 (with authors ‘A, X’) and R2 (with authors ‘Y, Z’), this is counted as a self-citation because author ‘A’ from the main publication is also an author of R1.
Example 2 (Collaborative Self-Citation): If researcher A’s publication references R1 (with authors ‘B, X’) and R2 (with authors ‘Y, Z’), this is also counted as a self-citation because author ‘B’ from the main publication is an author of R1, even though ‘A’ (the first researcher) is not an author of R1.
Example 3 (Not a Self-Citation): If researcher A’s publication references R1 (with authors ‘Y, X’) and R2 (with authors ‘Y, Z’), this is not a self-citation because neither ‘A’ nor ‘B’ (the authors of the main publication) is listed as an author on any of the referenced papers.







