Evaluation Tools Guide

Potential evaluation tools and guidelines for Unjournal evaluators

As of January 2025, this is an incomplete list of tools we've tried or are considering. We aim to make this a more carefully curated and vetted list.

← Back to Evaluation Form

Unjournal Policy on AI in Evaluations

The Unjournal encourages the responsible use of AI tools to enhance evaluation quality. Key principles:

AI as assistant, not author — Use AI to identify issues and check your work, but write your own evaluation and form your own judgments.
Not for overall ratings — AI tools should not be used to generate overall evaluations or ratings/predictions. They can assist with specific tasks like literature search, methodology checks, and writing clarity.
Disclose usage — Report which AI tools you used and how, including links to chat sessions where possible.
Verify independently — Always independently verify any AI-generated suggestions before incorporating them.
Minimum human effort — We expect approximately 8+ hours of human work per evaluation.

See the full policy: Unjournal's AI/LLM policy proposal

Statistical Checking

RegCheck

Automated regression and statistical analysis checking for research papers.

Open RegCheck →

Statcheck

Checks statistical reporting consistency: verifies that p-values, test statistics, and degrees of freedom in papers are internally consistent.

Tip: Upload the paper PDF to check for common reporting errors (e.g., inconsistent degrees of freedom, impossible p-values).

Open Statcheck →

AI-Powered Evaluation

RoastMyPost

AI agent-based and LLM-based tool including fact checking and reasoning checking.

Tip: Use for a quick "second opinion" before writing your evaluation. Compare its critiques to yours.

Open RoastMyPost →

ChatGPT Pro / Deep Research

Pro model shows substantial insight and potential; see our piloting here.

Open ChatGPT →

NotebookLM (Google)

Deep-dive analysis of individual papers. Upload a PDF and ask targeted questions about methodology, findings, and limitations.

Tip: Upload the paper along with the Unjournal evaluation rubric. Ask NotebookLM to identify potential issues for each metric category.

Open NotebookLM →

PaperWizard

AI research paper analysis providing structured feedback and summaries.

Open Paper-Wizard →

refine.ink

AI-powered feedback on research papers: finding issues and limitations.

Open refine.ink →

Claude (Anthropic)

AI assistant; Opus model useful for deep analysis of research papers and evaluation assistance.

Open Claude →

Perplexity

AI-powered search with cited sources; useful for quick literature checks and fact verification.

Open Perplexity →

Literature Search

Elicit

AI-powered research assistant for finding related papers, extracting key findings, and answering research questions.

Tip: Ask "What papers use similar methods?" or "What's the evidence for [key claim]?" to check the paper's positioning in the literature.

Open Elicit →

Consensus

Search engine for research findings. Quickly see what the scientific consensus is on specific claims.

Tip: Use to verify key claims in the paper against the broader literature.

Open Consensus →

Semantic Scholar

AI-powered academic search engine for finding and exploring research papers and citation networks.

Tip: Use the "Highly Influential Citations" feature to find the most impactful related work.

Open Semantic Scholar →

Scite.ai

Smart citations showing whether papers support or contradict cited claims. Helps verify how a paper's references and citing works relate to its key claims.

Tip: Search for the paper to see how its claims have been supported or contradicted by subsequent research.

Open Scite.ai →

Scry

Vector-space approach for finding related concepts and research. Exploring integration with Unjournal evaluation data and RePEc.

Tip: Use to discover conceptually related research that keyword searches might miss.

Open Scry →

Open Science Standards

COS TOP Guidelines

The Transparency and Openness Promotion (TOP) guidelines from the Center for Open Science. A framework for evaluating transparency standards across 8 dimensions: citation standards, data transparency, analytic methods, research materials, design and analysis, study preregistration, analysis plan preregistration, and replication.

Tip: Reference these when rating the "Open, Collaborative, Replicable Research" metric (#6). Check which TOP levels the paper meets.

View TOP Guidelines →