Category Archives: AI

The Power of Proprietary Data and creating an “AI Moat”

In the fast-evolving landscape of AI Data has emerged as the new currency (alongside access to Nvidia H100 GPU). Data serves as the fuel that drives AI.

AI systems solving complex problems require an immense amount of data to deliver high quality services. This is especially true in a use cases that don’t have a human-in-the-loop (e.g. Level 5 autonomous driving), use cases delivering partial pr full automation with a high degree of trust and accuracy in a consumer facing scenario (e.g. tier 1 customer support chatbots), or systems automatically executing transactional API calls to other services.

Proprietary data is not a technical topic but a business one. Proprietary data serves as a moat that helps companies differentiate and justify the (often significant) investments associated with building product based on AI models. By training AI models on proprietary data, companies can develop unique capabilities which others can’t develop (simply because others don’t have the data), deliver high quality predictions (typically measured in performance metrics like recall – the percentage of data samples correctly identified as belonging to a class of interest out of the total samples for that class), or leverage a foundation AI model doing a better job fine-tune these model for a given set of use-cases and verticals.

Most people think about proprietary data simply as a unique, exclusive information, collected or generated. Often that is indeed the case, but there are other types of “proprietary” advantages and data strategies that can deliver a significant moat. Here are a few more examples to consider:

  • Leveraging customers’ data sources – Some companies excel at accessing their customers proprietary datasets and obtain rights from their customers to leverage data derivatives for machine learning purposes. This helps both the vendor and the customer by delivering higher quality services. One example is Cherre, which helps customers connect all your real estate data (1st party and 3rd party) and better understand data quality.
  • Partnerships and data consortiums – Business Development partnerships can aid with obtaining and scaling proprietary data sources. This is a method that has been used extensively in online advertising, transactional data, and Location datasets. Other companies deploy data consortiums in which every additional partner benefits from a network effect. Deduce is one example of a data consortium that helps derive more signals from a network of participants, benefitting of all participants. Another great example is Placer, which has an exclusive data acquisition agreement with Life360, locking out significant part of the market
  • Customer led labeling – Many AI solutions sit at the intersection of Human-Machine interface. Collecting customer feedback through the actual use of the system in continuous and smart ways can help can generate data to “debug” models and better understand underdamping, data distribution issues, and mislabeling. Designing the right user experience can lead to customers (including experts in those companies) doing quite a bit of labeling heavy lifting, in turn resulting in higher quality labeled data.
  • Intelligent expert labeling – Having raw data is the first step, but labeling data for training purposes could range from a simple repetitive task to an herculean one requiring specialists and experts. Some companies build tools to leverage experts very efficiently or have tools that leverage limited expert labeled data with various deep learning and transfer learning methods to build models. Watcful.io is an example of a company that helps other companies with expert labeling techniques
  • Unique data mapping – Products built to serve specific verticals (e.g. Law, CyberSecurity) can benefit from mapping data inputs and model outputs to specialty built Data Models (typically built and maintained by humans)or leveraging Knowledge graphs as a way to transform and include relevant tokens into a prompt into an LLM. In specific verticals, this can help minimize model hallucination by adding context and producing model outputs that are more inline with customer expectations
  • Data collection through devices and Hardware – Some companies deploy hardware devices to collect real world data, or are given access to such datasets derived from devices others deploy. Any connected device can help facilitate “real world” data that would be proprietary, including IoT devices, Sensors, Smartphones, etc,

To summarize, possessing proprietary data serves as a business moat, offering protection against rivals and fostering long-term sustainability. Proprietary data and proprietary labeled data sets can comes in various shapes and forms.

A key question to consider is whether a company has a hard to replicate approach to obtaining data, at scale, or labeling it in a way that would make it harder for a new entrant (or even a incumbent that has existing data) to enter the market and deliver AI systems that perform as well. At Recursive Ventures we call this “AI Moat” and it’s inherent to how think about long term value creation in the budding AI eco-system.

AI winners and the race for the ultimate prompt UI/UX

In the rapidly evolving world of AI, prompt engineering has become a critical discipline. Learning and adopting prompt engineering has already been recognized as the future of jobs in the age of ChatGPT.

But first, what is prompt engineering? Prompt engineering, a concept in natural language processing, involves embedding the task description in the input itself. Prompt engineering enables precise instructions or queries to guide AI models towards desired outputs. It allows humans to effectively interact with AI systems, leveraging their capabilities to accomplish complex tasks with accuracy.

Learning prompt engineer might help you unlock future job opportunities, but helping users succeed with prompt engineering is a key differentiator for the success of a AI-based products.

The success of prompt engineering relies not only on algorithms and models but also on the user interface (UI) and user experience (UX) that enable seamless interaction with AI systems. At Recursive Ventures, we believe that prompt UI/UX excellence is a key pillar for AI startup success.

Similar to the Web and Mobile eras. In the AI era, companies that develop the right set of UI/UX paradigms to help their end-users leverage AI systems will emerge as winners. Creating a product with accessible and usable UI/UX enhances its value to customers, facilitates word-of-mouth, increases willingness to pay, and fosters user stickiness.

How can AI products help customer with a better UI/UX? Here are a few ideas:

Streamlined and contextual guidance

Next-generation UI/UX for prompt engineering should provide a clear and concise interface for formulating prompts by offering smart suggestions, and providing real-time feedback on the expected outputs. Instead of having the user put in a prompt, wait a few seconds (or frustratingly, minutes) to get a response, and then get to the next prompt, streamlining the prompt design in real time can save the user time and overhead.

Effective UI/UX should assist users in composing prompts by offering contextual guidance. This can include features such as auto-completion, natural language suggestions, or interactive tooltips that provide insights into the capabilities and limitations of the AI model. It can help users get to their desired output faster and deliver a higher quality (more accurate, on point) response.

One pretty impressive examples is the work that Adobe has done with various tools and toggles in the Adobe FireFly product, seamlessly integrating text and tool-tips to help users accomplish the designs they envision.

Iterative Refinement

UI/UX tools for prompt engineering should enable iterative refinement of prompts and facilitate experimentation with different inputs. This allows users to fine-tune queries, evaluate generated outputs, and iteratively improve the performance of AI systems. A well-designed UI/UX supports this iterative process, making it easier for users to iterate, learn, and adapt their prompt engineering strategies.

Naturally, having a prompt that enables iterative motions and builds up on the context from previous prompts (similar to ChatGPT) is prerequisite for iterative refinement. Having the ability to also walk back to better understand the iteration path that led to a certain output can also be valuable. One rough analogue would a bread-crumb trail in web browsing. It helps users understand how the model got to a certain result and would be valuable as users increasingly demand model explainability.

Collaboration and Community

UI/UX platforms can foster collaboration among prompt engineers by providing features for sharing, discussing, and co-creating prompts. Creating a vibrant community of prompt engineers encourages knowledge exchange and collective improvement. This collaborative aspect of UI/UX enhances the effectiveness and efficiency of prompt engineering efforts.

One of Recursive’s portfolio companies, Storytell.ai, has done essentially that with their prompt marketplace. It’s a great way to help users get up and running with powerful prompt templates and accelerate their path to getting effective responses out of AI system.

To summarize, the next set of winners in AI will likely master prompt UI/UX. By offering streamlined interaction, contextual guidance, iterative refinement, and collaboration features, AI first companies can help customers adopt prompt engineers to effectively utilize AI models. Prioritizing innovative UI/UX solutions gives startups a competitive edge, enabling them to stand out in the rapidly evolving AI landscape, and fend off competitors.

Investing in AI companies? Think Data first, AI second

By now, with ChatGPT and the doomsday media hype around it, almost everybody got the memo that AI has the potential to revolutionize industries, reshape business models, and potentially destroy humankind in the process (e.g. Choas-GPT).

As an investor in AI (seems like these days everybody is), it’s crucial to understand the key factors that contribute to the success of AI companies. In this blog post, we will delve into Recursive Venture’s underlying investment thesis in the future of AI – the importance of having proprietary data that sets a business apart and creates a robust moat around it. We call this the “AI Moat”.

Without deviating too much from the main topic (data!), having a moat is crucial for generating significant startup returns for investors. A moat establishes a sustainable competitive advantage and protects against competition. Data from a study conducted by CB Insights revealed that startups with a moat in place, such as proprietary data, were 2.2 times more likely to achieve successful exits.

Back to AI. In AI, data is the fuel that powers the various models. In a crowded AI landscape, where algorithms can be replicated and foundation models are becoming a commodity, having proprietary data becomes a game-changer (Google says that both Google and OpenAI have no moat).

The availability of quality and relevant data is crucial for training AI models, but access to vast amounts of data alone is not enough to gain a competitive edge in the AI market. The real differentiator lies in possessing proprietary data, which is either unique, exclusive, or not easily replicable by competitors (naturally, having all of the above is ideal). Proprietary data can come from various sources, such as customers, partnerships, user-generated data, or specialized data collection processes.

Exclusive data creates a long-term moat by enabling:

  1. Enhanced Accuracy and Performance
    One of the biggest issues today with AI (and even more so with Generative AI) is accuracy and reliability.

    Having access to proprietary data enables AI models to be more accurate and perform better than those relying solely on public or generic data sources. By training algorithms on unique datasets, companies can fine-tune their models to specific use cases and improve predictive capabilities. This heightened accuracy translates into better outcomes, increased customer satisfaction, and deliver stronger model performance.

  2. Deliver custom solutions to customers at scale
    In today’s era of hyper-personalization (for consumer solutions) and customization (for B2B solutions), startups can tailor their AI solutions to individual customer needs.

    Proprietary customer data allows AI companies to create customized experiences, recommendations, and solutions that resonate with the needs of the business or with individuals. This personalized approach enhances customer loyalty, drives adoption, and fortifies the company’s market position.

  3. Barrier to Entry
    Proprietary data acts as a formidable barrier to entry for potential competitors. Building a comprehensive and unique dataset takes time, resources, and domain expertise.

    As AI companies amass and refine their proprietary data, it becomes increasingly challenging for new entrants to replicate their success. Since obtaining similar datasets is challenging or even impossible, it becomes difficult for rivals to replicate the offering. This helps companies establish market dominance and defend against new entrants.

Back to investing in AI. Our thesis is that to identify promising AI investments, investors should evaluate the depth, uniqueness, and relevance of a company’s proprietary data – Assess the company’s “AI Moat”. Multiple companies in the Recursive portfolio, such as Placer.ai, Cherre.ai, Tomato.ai, Wevo, and CultureScience harness this unfair advantage and deliver higher quality models and services due their access to proprietary data.

Discovering depth and uniqueness are fairly easy to investigate, but that isn’t enough. The proprietary data also need to be one that the company can use to improve its AI models. Specifically, investors should assess the company’s ability to leverage the proprietary data for continuous model quality and performance improvements. Often the data needs significant work, labeling or other techniques to actually be effective in creating an “AI Moat”.

The AI revolution is driven by data, and the companies with the most valuable and exclusive data will be tomorrow’s winners, as long as they can leverage the data to create a virtuous cycle and continuously improve their models and services.