Investing in AI companies? Think Data first, AI second

By now, with ChatGPT and the doomsday media hype around it, almost everybody got the memo that AI has the potential to revolutionize industries, reshape business models, and potentially destroy humankind in the process (e.g. Choas-GPT).

As an investor in AI (seems like these days everybody is), it’s crucial to understand the key factors that contribute to the success of AI companies. In this blog post, we will delve into Recursive Venture’s underlying investment thesis in the future of AI – the importance of having proprietary data that sets a business apart and creates a robust moat around it. We call this the “AI Moat”.

Without deviating too much from the main topic (data!), having a moat is crucial for generating significant startup returns for investors. A moat establishes a sustainable competitive advantage and protects against competition. Data from a study conducted by CB Insights revealed that startups with a moat in place, such as proprietary data, were 2.2 times more likely to achieve successful exits.

Back to AI. In AI, data is the fuel that powers the various models. In a crowded AI landscape, where algorithms can be replicated and foundation models are becoming a commodity, having proprietary data becomes a game-changer (Google says that both Google and OpenAI have no moat).

The availability of quality and relevant data is crucial for training AI models, but access to vast amounts of data alone is not enough to gain a competitive edge in the AI market. The real differentiator lies in possessing proprietary data, which is either unique, exclusive, or not easily replicable by competitors (naturally, having all of the above is ideal). Proprietary data can come from various sources, such as customers, partnerships, user-generated data, or specialized data collection processes.

Exclusive data creates a long-term moat by enabling:

  1. Enhanced Accuracy and Performance
    One of the biggest issues today with AI (and even more so with Generative AI) is accuracy and reliability.

    Having access to proprietary data enables AI models to be more accurate and perform better than those relying solely on public or generic data sources. By training algorithms on unique datasets, companies can fine-tune their models to specific use cases and improve predictive capabilities. This heightened accuracy translates into better outcomes, increased customer satisfaction, and deliver stronger model performance.

  2. Deliver custom solutions to customers at scale
    In today’s era of hyper-personalization (for consumer solutions) and customization (for B2B solutions), startups can tailor their AI solutions to individual customer needs.

    Proprietary customer data allows AI companies to create customized experiences, recommendations, and solutions that resonate with the needs of the business or with individuals. This personalized approach enhances customer loyalty, drives adoption, and fortifies the company’s market position.

  3. Barrier to Entry
    Proprietary data acts as a formidable barrier to entry for potential competitors. Building a comprehensive and unique dataset takes time, resources, and domain expertise.

    As AI companies amass and refine their proprietary data, it becomes increasingly challenging for new entrants to replicate their success. Since obtaining similar datasets is challenging or even impossible, it becomes difficult for rivals to replicate the offering. This helps companies establish market dominance and defend against new entrants.

Back to investing in AI. Our thesis is that to identify promising AI investments, investors should evaluate the depth, uniqueness, and relevance of a company’s proprietary data – Assess the company’s “AI Moat”. Multiple companies in the Recursive portfolio, such as,,, Wevo, and CultureScience harness this unfair advantage and deliver higher quality models and services due their access to proprietary data.

Discovering depth and uniqueness are fairly easy to investigate, but that isn’t enough. The proprietary data also need to be one that the company can use to improve its AI models. Specifically, investors should assess the company’s ability to leverage the proprietary data for continuous model quality and performance improvements. Often the data needs significant work, labeling or other techniques to actually be effective in creating an “AI Moat”.

The AI revolution is driven by data, and the companies with the most valuable and exclusive data will be tomorrow’s winners, as long as they can leverage the data to create a virtuous cycle and continuously improve their models and services.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.