Written Evaluation guidelines

Please aim to write a report up to the standards of a high-quality referee report for a traditional journal. Consider standard guidelines as well as The Unjournal's emphases. Remember to address any specific considerations mentioned by the evaluation manager, including in our bespoke evaluation notes.

Please provide a concise summary of your evaluation below. Otherwise, please write your evaluation here, provide a link to it, or let us know you have emailed it.

If you are linking or sending a file, we prefer the 'native format' (word, latex, markdown, bibtex, etc.) and not a pdf (but if necessary we can handle a pdf).

Words: 0

Accepted: Google Docs, PubPub, Notion, Dropbox, or any public URL.

Click to choose a file, or drag and drop

Accepts: .pdf, .docx, .txt, .md, .html

Potential Evaluation Tools / Guidelines (expand to see AI & analysis tools — AI use policy)
Unjournal AI Policy: AI tools may assist with specific tasks (literature search, methodology checks, statistical verification) but should not be used to generate overall evaluations or ratings. You must independently verify AI suggestions and disclose all AI usage in the AI Disclosure section below. We expect ~8+ hours of human work per evaluation. Full policy →

As of January 2025, this is an incomplete list of tools we've tried or are considering. We aim to make this a more carefully curated and vetted list.
View full tools guide →

RegCheck Automated regression and statistical analysis checking for research papers. Open →
RoastMyPost AI agent-based tool including fact checking and reasoning checking. Open →
ChatGPT Pro / Deep Research Pro model shows substantial insight and potential; see our piloting here. Open →
NotebookLM Deep dive into a paper with AI; upload the PDF for guided analysis. Open →
PaperWizard AI research paper analysis and structured feedback. Open →
refine.ink AI-powered feedback on research papers: finding issues and limitations. Open →
Statcheck Check statistical reporting consistency (p-values, test statistics) in papers. Open →
Elicit / Consensus Find related work and research consensus on key questions. Elicit → Consensus →
Claude Anthropic's AI assistant; Opus model useful for deep analysis and evaluation assistance. Open →
Perplexity AI-powered search with cited sources; useful for quick literature checks and fact verification. Open →
Semantic Scholar AI-powered academic search engine for finding and exploring research papers and citations. Open →
Scite.ai Smart citations showing whether papers support or contradict cited claims. Open →
Scry Vector-space approach for finding related concepts and research; exploring integration with Unjournal and RePEc data. Open →
COS TOP Guidelines Transparency and Openness standards for evaluating open science practices. View →

Percentile Metrics (0-100 scale) guidelines

These are percentile ratings relative to a reference group: serious research in the same area that you have encountered in the last few years. A rating of 50 means the paper is at the median of this reference group; 80 means it is in the top 20%; 20 means only 20% of comparable work is worse. See the guidelines on quantitative metrics for details.
Tip: Use the Calibrate button above to practice rating sample papers and check your calibration.
1

Overall Assessment

Guidance

Judge the quality of the research heuristically. Consider all aspects of quality, credibility, importance to future impactful applied research, and practical relevance and usefulness, importance to knowledge production, and importance to practice.

Benchmark: serious research in the same area encountered in the last three years.

2

Claims, Strength & Characterization of Evidence

Guidance

Do the authors do a good job of:

  • Stating their main questions and claims clearly?
  • Providing strong evidence and powerful approaches to inform these?
  • Correctly characterizing the nature of their evidence?
3

Methods: Justification, Reasonableness, Validity, Robustness

Guidance

Consider the following:

  • Are the methods well-justified and explained?
  • Are they a reasonable approach to answering the question(s) in this context?
  • Are the underlying assumptions reasonable?
  • Are the results and methods likely to be robust to reasonable changes in assumptions?
  • Does the author demonstrate robustness?
  • Did the authors take steps to reduce bias from opportunistic reporting and questionable research practices?
4

Advancing Knowledge and Practice

Guidance

To what extent does the project contribute to the field or to practice, particularly in ways relevant to global priorities and impactful interventions?

  • Focus on "improvements that are actually helpful" (applied stream)
  • Originality and cleverness should be weighted less than typical journals — we focus on impact
  • More weight on "contribution to global priorities" than "contribution to academic field"
  • Do the paper's insights inform beliefs about important parameters and intervention effectiveness?
  • Does the project add useful value to other impactful research?
  • Sound, well-presented null results can also be valuable
5

Logic and Communication

Guidance
  • Are goals and questions clearly expressed?
  • Are concepts clearly defined and referenced?
  • Is the reasoning transparent? Assumptions explicit?
  • Are all logical steps clear and correct?
  • Does the writing make arguments easy to follow?
  • Are conclusions consistent with the evidence presented?
  • Do authors accurately characterize evidence and its support for main claims?
  • Are data and analysis relevant to the arguments?
  • Are tables, graphs, diagrams easy to understand (no major labeling errors)?
6

Open, Collaborative, Replicable Research

Guidance

This covers several considerations:

Replicability, reproducibility, data integrity: Would another researcher be able to perform the same analysis and get the same results? Are methods explained clearly enough for credible replication? Is code provided? Is data source clear and as available as reasonably possible?

Consistency: Do numbers in the paper and code output make sense? Are they internally consistent throughout?

Useful building blocks: Do authors provide tools, resources, data, and outputs that might enable future work and meta-analysis?

Reference: COS TOP Guidelines — a framework for evaluating transparency across 8 dimensions including data, code, materials, and preregistration.

7

Relevance to Global Priorities

Guidance
  • Is the topic and approach useful to global priorities, cause prioritization, and high-impact interventions?
  • Does the paper consider real-world relevance, policy, and implementation questions?
  • Are the setup, assumptions, and focus realistic?
  • Do authors report results relevant to practitioners?
  • Do they provide useful quantified estimates (costs, benefits) for impact quantification?
  • Do they communicate in ways policymakers can understand without misleading oversimplification?

Journal Tier Ratings (0.0-5.0 scale) guidelines

Tier "Should" — Normative Merit

Guidance

Where should this paper be published based on merit alone? Imagine a journal process that is fair, unbiased, and free of noise — where status, connections, and lobbying don't matter.

0: Won't publish / little to no value
1: OK / Somewhat valuable journal
2: Marginal B-journal / Decent field journal
3: Top B-journal / Strong field journal
4: Marginal A-journal / Top field journal
5: A-journal / Top journal

Non-integer scores encouraged (e.g., 4.6, 2.2).

0 Won't publish 1 OK 2 Marginal B 3 Top B 4 Marginal A 5 Top A

Tier "Will" — Prediction

Guidance

Where will this research actually be published? If already published and you know where, report the prediction you would have given absent that knowledge.

0: Won't publish / little to no value
1: OK / Somewhat valuable journal
2: Marginal B-journal / Decent field journal
3: Top B-journal / Strong field journal
4: Marginal A-journal / Top field journal
5: A-journal / Top journal

Non-integer scores encouraged (e.g., 4.6, 2.2).

0 Won't publish 1 OK 2 Marginal B 3 Top B 4 Marginal A 5 Top A

Claim Identification, Assessment, & Implications (optional but rewarded) guidelines

This section is meant to help practitioners use this research to inform their funding, policymaking, and other decisions. It is not intended as a metric to judge the research quality per se.

This is mainly relevant for empirical research. If 'claim assessment' does not make sense for this paper, please consult the evaluation manager, or skip this section.

Overall Summary

We generally incorporate this into the 'abstract' of your evaluation (see examples at unjournal.pubpub.org).

Words: 0

Confidential Comments

Your comments here will not be public or seen by authors. Please use this section only for comments that are personal/sensitive in nature. Please place most of your evaluation in the public section.

AI/LLM Tool Usage policy

Please disclose your use of AI/LLM tools in this evaluation. AI tools may be used for specific tasks (literature search, methodology checks, writing assistance) but not for generating overall evaluations or ratings. You must independently verify any AI-generated suggestions. See recommended tools →

Survey Questions guidelines

Responses to these will be public unless you mention in your response that you want us to keep them private.

Feedback (responses below will not be public or seen by authors)

See bit.ly/UJevalcollab. It will come with some additional compensation. If you and other evaluators are interested, we may follow up to arrange an (anonymous) discussion space or synchronous meetings.

To be contacted for compensated evaluation work when research comes up in your area. To expedite this, fill out the EOI form at Join the Unjournal.