Aug 11, 2025

How We Score Research for Commercialization Potential

How We Score Research for Commercialization Potential

At the core of the Applied Research Platform (ARP) is a system for evaluating whether new scientific work might have commercial value. To do this, we use a large language model (LLM) to act as a neutral, repeatable evaluator of research papers. This allows us to generate consistent, structured assessments at scale.

Why Use an LLM?

There is growing interest in using LLMs for evaluation tasks, including peer review, grading, and expert synthesis. In general, the benefits of such a technology are clear: Time savings, new insights, scalability, improved decision-making, etc. Several studies have shown that, with the right framing and constraints, LLMs can produce outputs that reasonably align with expert judgment in technical fields. While there are still open questions about bias, calibration, and domain-specific accuracy, LLMs offer a promising approach for first-pass evaluations. Our use-case focuses on commercialization potential of new research, specifically, the likelihood that a conducted research could form the basis of, or a contribution towards, a viable technology, product, or company. We use it to apply a set of evaluation criteria drawn from commonly accepted questions in technology commercialization and early-stage venture screening.

How the Evaluation Works

In previous versions of ARP, we included information from the abstract for our score evaluation. Going forward, for each paper, our system intelligently extracts relevant information from the entire paper and is used as input for an LLM as part of the score evaluation. The model is prompted to take on a structured role and apply a fixed assessment framework. The evaluation is divided into two major components:

  • Commercialization Assessment: This section seeks to answer whether the research solves a real-world problem, how practical or mature the idea is, its novelty, market fit, and whether it’s likely to attract interest from partners, licensees, or investors. There are multiple factors that go into this rubric-based assessment and each factor is scored individually, then combined into an overall commercialization score.
  • Technology Readiness Level (TRL): In addition, the model estimates the research’s current stage of development using the TRL scale (1 - 9), a widely used metric in both industry and government R&D to describe how close a technology is to real-world use.

These two components are combined into a total score, which is used to sort and rank papers across the platform. For example, a paper that discusses a new technology in an in-demand industry that has already been successfully deployed and tested in a live real-world environment is likely to received a high score like an 8 or 9. Meanwhile, a paper that discusses a highly theoretical concept without any foreseeable application and without any methods that can be transferred to adjacent applications might receive a low score of only 1 or 2. This helps users quickly identify what is an immediately actionable research for their project.

What You See in the Report

Each ARP report presents the result of this process in a well-defined, structured format. The report includes a brief explanation of the research topic, what real-world problems it addresses, where it stands in terms of market interest and readiness, and what kinds of next steps (towards patenting, startup formation or integration into an existing product) might follow from it. This approach is designed to surface early-stage research with commercial relevance, even when that potential isn’t immediately obvious. ARP is a tool to support exploration, not a final verdict.

Ongoing Development

As LLMs continue to improve rapidly, new research is emerging on their use as reviewers, graders, and expert proxies as well as "LLM-as-a-judge" frameworks. We are actively following developments in this area and plan to include validated improvements into our system as they become available. Additionally, we are exploring ways to include other signals in our score evaluation, such as related patents or funding for similar research-produced start-ups.