Agentbrisk
researchacademicsearch Status: active

Consensus

AI search engine for evidence-backed answers from peer-reviewed papers


Consensus is an AI-powered search engine built specifically to answer questions using peer-reviewed scientific literature. You type a question in plain English, and Consensus searches across more than 200 million academic papers, surfaces the most relevant studies, and shows you a visual Consensus Meter indicating how much of the research agrees or disagrees with a given claim. Unlike general AI search tools, every answer links directly to the source paper with a DOI. Consensus is particularly strong for health, nutrition, psychology, and behavioral science questions where the research base is large and publicly accessible. Premium at $11.99 per month adds unlimited searches, GPT-powered summaries, and the Pro Analysis feature. The tool is designed for curious non-academics who want evidence-backed answers without needing a university library subscription or expertise in reading scientific papers.

Most people, when they want to know whether something is true, type their question into a search engine and accept what comes back. That works for practical questions. It fails badly for questions that depend on scientific evidence, because search engines optimize for traffic, not for truth. Ask whether creatine supplementation helps cognitive performance and the top results might be a supplement company’s landing page, a blog post, or an AI-generated article that never saw a lab. Consensus was built to answer a different version of that question: not what’s been written about the topic, but what does the actual research say. It searches more than 200 million peer-reviewed papers and surfaces scientific consensus, or the lack of it, in a format that doesn’t require a PhD to use.

Quick verdict

Consensus is the most accessible scientific literature search tool available for non-academic users. The Consensus Meter is a genuinely useful innovation, the citation model is honest, and the coverage of health and social science research is strong. Its limits are real: it’s not a substitute for a proper systematic review, the meter can mislead on under-studied questions, and Elicit beats it on depth. But for anyone who wants to know what peer-reviewed science actually says about a topic, Consensus is the fastest, clearest entry point.

What is Consensus, exactly?

Consensus launched publicly in August 2022, founded out of Boston with a specific premise: there’s an enormous body of peer-reviewed scientific evidence that most people can’t access or interpret, and AI could bridge that gap. Not by generating plausible-sounding claims about science, but by actually searching the indexed literature and surfacing what researchers found.

It occupies a distinct niche. It’s not a database tool like PubMed, which requires knowing how to construct a search query. It’s not a general AI search engine like Perplexity, which searches the open web and pulls from whatever ranks well. Consensus searches peer-reviewed papers only. Ask a question in plain English and it runs against 200+ million papers from PubMed, Semantic Scholar, arXiv, and similar repositories. Results appear as cards with AI-generated summaries of the findings and DOI links to the originals.

It won’t search news, generate essays from training data, or help you write code. Every feature points at a single task: finding what peer-reviewed science says. That focus is both its biggest strength and its ceiling.

By 2026, the index has grown and synthesis features have been added. The user base skews toward health-conscious consumers, students, journalists, and professionals in evidence-based fields who need literature grounding without institutional database access.

The features that justify the evidence-search positioning

The Consensus Meter

The Consensus Meter is what separates Consensus from every other scientific search tool. When you submit a question, Consensus classifies each retrieved paper based on whether its findings support the claim, partially support it, or contradict it, then displays those classifications as a percentage breakdown at the top of the results page. “73% of papers support this claim” is a sentence with real informational value, especially for people who don’t have time to read seven abstracts.

The classification is AI-driven: language models read each abstract and determine the direction of findings relative to your question. This works well when the question is clear and the literature is substantial. For well-studied health questions with hundreds of relevant papers, the Meter is a reliable signal.

It works less well on thin or heterogeneous literatures. A question returning eight papers, four of which are small RCTs from the 1990s, generates a Consensus Meter that looks precise but isn’t. Consensus shows you the underlying paper count, so you can calibrate accordingly. The Meter is a directional indicator, not a meta-analysis, and it can project false confidence on poorly-studied topics. Used with that understanding, it’s a genuinely useful innovation.

Citation-grounded answers

Every paper card includes a direct link to the original paper’s DOI. The AI-generated summary is drawn from the paper’s abstract, not generated from scratch, so summaries are constrained by what the paper actually measured and found. The failure mode of most AI tools on scientific questions is confabulation: generating something that sounds like a research finding without corresponding to any specific study. Consensus avoids this structurally. It retrieves real papers first, then summarizes what they contain. If the paper isn’t in the index, it doesn’t appear.

That makes Consensus results verifiable in a way that a ChatGPT answer about a health topic usually isn’t. You can open the paper, check the sample size, and decide whether you trust the finding. You’re using the summary as a pointer to primary evidence, not accepting it on faith.

Search across peer-reviewed papers

The index draws from PubMed, Semantic Scholar, arXiv, and related academic repositories. Coverage is strongest in biomedicine and social sciences, which happen to be where most people’s practical questions live: health, nutrition, psychology, education, climate. The gaps are in fields where research is published primarily in non-English languages, in books rather than journals, or in less systematically indexed venues.

Retrieval quality has improved over the product’s life. Early Consensus sometimes surfaced tangentially related papers. The matching between natural-language questions and research terminology is tighter now, though framing your question in academic language still helps. “Aerobic exercise and depression” retrieves more cleanly than “does working out help if you feel sad.”

Topic summaries and Pro Analysis

Premium subscribers get two synthesis layers on top of the raw paper results. Topic summaries use GPT-powered synthesis to generate a short overview of what the research says, identifying the main finding, flagging qualifications, and noting where evidence is contested. Read the summary first to understand the shape of the literature, then check individual papers for claims that matter most.

Pro Analysis goes further. It’s a structured breakdown organized by study type, sample characteristics, and direction of findings. For “does intermittent fasting improve metabolic markers,” Pro Analysis separates RCT findings from observational studies, notes typical sample sizes, and surfaces whether positive findings cluster in specific subpopulations. It’s the kind of breakdown a research assistant would compile from hours of manual review. These are synthesis aids, not formal systematic reviews, but they cover most of what a professional actually needs to answer “what does the literature say about X.”

ChatGPT plugin and integrations

Consensus has a ChatGPT plugin for embedding paper search directly in a ChatGPT session. Invoke it from ChatGPT Plus and you get paper-grounded answers without switching applications. The expected plugin limitations apply: it requires manual invocation and the output is constrained by how ChatGPT integrates third-party results. Useful for people already in the ChatGPT workflow, not a standalone reason to choose Consensus. The iOS and Android apps are clean implementations of the web experience. The Consensus Meter and paper cards translate well to mobile, which makes sense for quick evidence checks while you’re not at a desk.

Pricing

Consensus uses a freemium model with a functional free tier and a single paid tier.

The free tier includes core paper search, individual paper cards with AI summaries, and a limited number of searches per day. That daily cap is the main friction. For occasional use or evaluation it’s fine. For regular research work, you’ll hit it fast.

Premium costs $11.99 per month, or roughly $8.99 per month billed annually. It removes the daily search cap, adds the full Consensus Meter on all results, opens up Pro Analysis, gives you more polished GPT-4-powered summaries, and provides priority access to new features. Under $12 per month puts it well below what institutional database access costs, and those databases lack the plain-English interface and AI synthesis that make Consensus useful to non-researchers.

Enterprise pricing is custom and available on request for teams needing higher rate limits or internal tool integration. The free-to-Premium upgrade path is clean. There’s no dark pattern beyond the search cap, which is a real functional limit rather than an artificial nudge.

Where Consensus wins and where it doesn’t

The sweet spot is specific, empirically testable health and science questions with a substantial literature behind them. Does magnesium supplementation improve sleep quality? Does mindfulness-based stress reduction reduce anxiety in clinical populations? Consensus will give you a grounded, paper-backed answer faster than any other tool.

It also wins on accessibility over Elicit. Elicit is built for researchers who know how to work with scientific literature. Consensus is built for the curious non-academic who hasn’t spent time in PubMed. That’s a much larger audience, and Consensus is well-matched to it.

The honest limits: the Consensus Meter is only as reliable as the paper set it’s calculated from. “67% support” on nine papers, two of which are 1998 rat studies, deserves much less weight than the same reading on 340 RCTs. Consensus shows you the paper count, but it takes deliberate effort to use that number rather than anchor on the percentage.

The tool also can’t do the interpretive work that separates a literature review from a literature list. It tells you what papers found; it can’t tell you whether the positive studies share a methodological flaw, whether effect sizes are consistently small despite statistical significance, or whether the field has a replication problem. That judgment requires a researcher.

Who Consensus is built for

The most natural user is someone who makes decisions that should be evidence-grounded but doesn’t have academic training or database access. Health-conscious consumers checking whether a supplement has real research behind it. Parents who want to know what controlled studies say about screen time. Journalists who need to cite a primary source rather than a secondary article. Policy professionals checking whether an intervention has evidence behind it.

Researchers are a secondary audience. Consensus isn’t a replacement for systematic review databases, but it’s a fast orientation layer for a topic outside your specialty, and a good starting point for a student building their first literature review.

What Consensus isn’t built for: software development questions (use Phind), broad web research (use Perplexity), or formal systematic review workflows (use Elicit). The product doesn’t try to be any of those things.

Consensus vs the alternatives

Consensus vs Elicit

Both tools search academic literature, use AI to synthesize findings, and link to real papers. Elicit is built for researchers who need structured data extraction, systematic review workflows, and formal research outputs. Consensus is built for directional answers a non-researcher can act on immediately. The Consensus Meter is the differentiating feature Elicit doesn’t have. If you need to know “what does the science say” quickly, Consensus wins. If you’re building a formal evidence base, Elicit is more capable. They’re complementary more than competing.

Consensus vs Perplexity

Perplexity searches the live web; Consensus searches indexed peer-reviewed papers. Perplexity is broader and faster for most factual questions. For scientific questions requiring grounding in controlled research, Consensus is more reliable. Perplexity can surface a good scientific summary if one exists; it can’t guarantee the underlying science is peer-reviewed. Consensus can. Start most questions with Perplexity; switch to Consensus when you need to know what controlled research has found.

Consensus vs general AI chatbots

GPT-5, Gemini 3, and Claude have a documented failure mode on scientific questions: they cite papers that don’t exist or don’t say what the model claims. This is structural. Language models generate from training distributions and can construct a plausible research summary regardless of whether underlying papers support it.

Consensus retrieves real papers before generating anything. If the paper isn’t in the index, it doesn’t appear. That architecture is more trustworthy for scientific claims. Pair it with a broader AI research platform when you need cross-format depth, but for checking whether a health claim is supported by evidence, Consensus is the cleaner tool.

Getting started

Go to consensus.app, type a question in natural language, and review the results. No account needed for initial searches. A free account removes the cap for a limited daily allowance and lets you save papers.

Questions that work best are specific and empirically framed. “Does X affect Y” outperforms “tell me about X.” “Is intermittent fasting effective for weight loss” returns cleaner results than “what should I know about intermittent fasting.” If a query returns few papers, try more technical terminology: “cognitive function” retrieves more than “brain performance” because the index skews toward academic language.

On Premium, try Pro Analysis on a question you genuinely care about. If the structured summary would have taken you hours to compile manually, the plan pays for itself quickly.

The bottom line

Consensus fills a gap that general AI search tools leave wide open. Most AI products will answer a question about health or science with confident fluency and variable accuracy. Consensus answers the same questions with real papers, a transparent source list, and a visual indicator of whether the field agrees or disagrees. That’s not the same as being infallible, but it’s a fundamentally more honest relationship with scientific evidence than most tools offer.

The Consensus Meter isn’t a perfect instrument. It’s accurate when the underlying literature is large and consistent, and it needs to be read carefully when it isn’t. Used with that understanding, it’s one of the better tools for bridging the gap between “what I heard” and “what the research actually shows.” For the curious non-academic who wants to hold health claims to a higher standard, Consensus is worth the $11.99 per month without much deliberation.

Key features

  • Consensus Meter: visual indicator showing the percentage of papers that support, partially support, or oppose a claim
  • GPT-powered paper summaries that synthesize findings without requiring you to read the abstract
  • Direct search across 200+ million peer-reviewed papers indexed from sources including PubMed, Semantic Scholar, and arXiv
  • Pro Analysis mode that generates structured, study-type breakdowns of evidence on a topic
  • Cited responses that link every claim to the original paper with DOI
  • ChatGPT plugin integration for embedding paper search inside a chat session
  • Topic pages that aggregate evidence on frequently asked health, nutrition, and science questions

Pros and cons

Pros

  • + The Consensus Meter gives an immediate, visual read on how much of the science agrees with a claim
  • + Every answer links to real papers with DOIs, so claims are independently verifiable
  • + Genuinely accessible to non-academics without sacrificing scientific rigor in sourcing
  • + Strong coverage of health, nutrition, psychology, and social science research
  • + Free tier is functional enough to evaluate the product before paying
  • + Pro Analysis provides structured breakdowns that save hours of manual literature review

Cons

  • − Coverage is uneven outside well-indexed fields like biomedicine and social science
  • − The Consensus Meter can mislead when a question maps to a small or low-quality body of literature
  • − Less powerful than Elicit for systematic reviews or structured data extraction across papers
  • − English-only, which excludes a large portion of international scientific literature
  • − Premium pricing is reasonable but the free tier's daily search cap limits practical use

Who is Consensus for?

  • Health-conscious consumers who want to know what the research actually says about a supplement, diet, or intervention
  • Students and journalists needing quick, cited evidence on science and health topics
  • Professionals in evidence-based fields checking whether a claim is well-supported before acting on it
  • Non-academic researchers who need a starting point for a literature review without institutional database access

Alternatives to Consensus

If Consensus isn't quite the right fit, the closest alternatives are elicit , perplexity , and scite . See our full Consensus alternatives page for side-by-side comparisons.

Frequently Asked Questions

What is Consensus?
Consensus is an AI search engine that answers questions by searching across more than 200 million peer-reviewed scientific papers. You ask a question in plain English, and Consensus finds the most relevant studies, extracts their key findings, and displays a Consensus Meter showing how much of the identified research supports, partially supports, or contradicts the claim. Every result links to the original paper. It was built for people who want evidence-backed answers without needing to be a trained researcher or have access to academic journal subscriptions. The company was founded in 2021 and launched publicly in 2022.
Is Consensus free?
Yes, Consensus has a free tier that allows a limited number of searches per day with access to paper results and basic summaries. The Premium plan at $11.99 per month (or roughly $8.99 per month billed annually) removes the daily search limit and adds GPT-powered synthesis, the full Consensus Meter, Pro Analysis mode, and priority access to new features. Enterprise plans are available for organizations and teams, with custom pricing on request.
How does Consensus compare to Elicit?
Both tools search academic literature, but they're built for different users. Consensus prioritizes accessibility and quick directional answers for non-academics. Its Consensus Meter gives an instant visual read on scientific agreement. Elicit is designed more for researchers who need to extract structured data across a set of papers, manage a literature review systematically, or compare methodologies across studies. Elicit gives you more control and depth; Consensus gives you faster, more digestible answers. If you want to know whether the evidence supports a health claim, Consensus is faster. If you're building an evidence table for a systematic review, Elicit is the better fit.
What is the Consensus Meter?
The Consensus Meter is a visual indicator that appears at the top of a Consensus search result. It shows what percentage of the papers identified for a given question support the claim, partially support it, or find evidence against it. The meter is calculated based on the papers Consensus surfaces for the query, not across all scientific literature on the topic. It's a useful orientation tool, especially for broad questions with a substantial research base. It becomes less reliable when the underlying paper set is small, methodologically mixed, or when the question is framed in a way that doesn't match how researchers have studied the topic.
Can I trust Consensus answers?
With reasonable skepticism, yes. Consensus links every claim to a real paper, which means you can verify any finding directly. The tool doesn't fabricate citations, which puts it ahead of general-purpose AI chatbots for factual research. The main limitations are in how it interprets and aggregates findings: a paper marked as "supporting" a claim may actually be measuring something slightly different, or may be a small, low-powered study. The Consensus Meter is a directional signal, not a meta-analysis. Treat it as a well-organized literature scout, not a clinical authority. For high-stakes decisions, verify the individual papers rather than relying solely on the meter reading.
Does Consensus work for non-medical questions?
Yes, though with varying quality depending on the field. Consensus indexes papers from arXiv and Semantic Scholar alongside biomedical databases, so it covers computer science, economics, psychology, environmental science, education research, and other fields with large peer-reviewed publication bases. Coverage is strongest where literature is most digitized and openly indexed. Questions about literary criticism, niche historical topics, or fields where most research isn't published in English or in indexed journals will return thinner results. For general science and social science questions, Consensus is broadly useful beyond its health and medicine reputation.

Related agents