DeepMark.AI

Deepmark AI empowers organizations to make informed decisions when navigating through the most important performance metrics of Large Language Models.

Deepmark AI - LLM benchmarking tool for task-specific metrics on your data | Product Hunt

Introduction

Artificial Intelligence (AI) is expected to contribute approximately $15.7 trillion to the global economy by 2030, according to a recent study by PwC. As AI continues to play a crucial role in various domains, Generative AI and Large Language Models (LLM) have emerged as a powerful building block in creating AI-powered applications capable of generating enormous business value and generative AI is the key-element in these kinds of applications.

Why are We Doing This? - Problem Statement

AI sparked a revolution in the last decade and now AI Subject Matter Experts at MIT (https://horizon.mit.edu/about-us) believe that Generative AI is going to further transform several domains such as code development, chatbots, audio/video amongst many others. With the advancement of Generative AI companies such as openAI and their products such as ChatGPT, there are legal, ethical and trust issues with Gen AI. These challenges beg the need for a good assessment of the products including metrics that need to aim to improve or rank these various algorithms and models that drive the overall technology. This is also a roadblock for adaptation of GenAI in several companies today. This also leads to trustworthiness of the platforms. ChatGPT was banned at Samsung after one of its employees inadvertently pasted IP material on the platform which could be potentially used by the competitor as information flows back to OpenAI. Recently, Hollywood workers went on strike with several lawsuits alleging misuse of copyrights by Generative AI solutions which could imitate real humans with high precision.

According to recent HBR report: Generative AI cannot operate on a set-it-and-forget-it basis — the tools need constant oversight.

In summary, organizations need to be able to assess AI models on their own data to deliver verifiable results that balance accuracy, precision, recall (the model’s ability to correctly identify positive cases within a given dataset), and reliability, as models can produce different answers to the same prompts, impeding the user’s ability to assess the accuracy of outputs.

According to recent HBR report: Generative AI cannot operate on a set-it-and-forget-it basis — the tools need constant oversight. Recently, Hollywood workers went on strike with several lawsuits alleging misuse of copyrights by Generative AI solutions which could imitate real humans with high precision.

Although assessment metrics are clearly defined and intrinsic metrics are normally assessed almost instantly when an LLM model is released, there’s no available tools (open-source or proprietary) that enable developers to seamlessly make task-specific (intrinsic) assessments. The only solution close to it is the LangChain LangSmith but it is a low-code library, which is still in closed beta and is not mature enough to provide comprehensive metrics that are essential for adoption.

In summary, organizations need to be able to assess AI models on their own data to deliver verifiable results that balance accuracy, precision, recall (the model’s ability to correctly identify positive cases within a given dataset), and reliability, as models can produce different answers to the same prompts, impeding the user’s ability to assess the accuracy of outputs.

Our Solution

To address this challenge of trustworthiness and reliability, IngestAI Labs has developed the Deepmark AI technology - a benchmarking solution based on proprietary Machine Learning (ML) models, that can rank several of the most popular large language models on various intrinsic and task-specific metrics.

Current GenAI (LLM) Assessment Metrics

When it comes to assessing the performance of LLMs, there are two main types of metrics that can be used: intrinsic and extrinsic.

Examples of intrinsic metrics include, but they are not limited to

  • Entropy,
  • Perplexity,
  • Coherence, etc.

Extrinsic metrics, or also called Task-Specific metrics, may include:

  • Accuracy,
  • Latency,
  • Cost.

These assessment metrics are not exhaustive, and specific applications may have additional or alternative metrics depending on the context and requirements, but some of the task-specific metrics like latency, accuracy, or cost can be considered as the most commonly used.

Deepmark AI facilitates a unique testing environment for language models (LLM), allowing AI developers to easily diagnose inaccuracies and performance issues in a matter of seconds. By using Deepmark AI, AI applications developers can run LLM models on hundreds or thousands of iterations over specific tasks and get exact assessment results in seconds.

background3

DeepMark AI is a tool specifically designed for AI builders.This solution focuses on real-time and iterative assessing extrinsic and some of intrinsic metrics to identify predictable, reliable, and cost-effective Generative AI models based on the unique needs of a particular use case. DeepMark.AI offers cutting-edge capabilities and comprehensive assessments of various important performance metrics such as:

Extrinsic metrics (Task-Specific) metrics

  • Question answering accuracy
  • Text classification accuracy
  • PII recognition accuracy
  • Named entity recognition (NER) accuracy
  • Summarization quality (Relevance)
  • Sentiment analysis accuracy
  • Cost analysis
  • Failure rate
  • Fake data
  • Accuracy
  • Latency

Deepmark AI empowers organizations to make informed decisions when navigating through the most important performance metrics of Large Language Models.

User Adoption:

Since its launch in February 2023, IngestAI has quickly gained popularity as a community-driven platform for rapid exploration, experimentation, and rapid prototyping of various AI use cases.

The platform has gained a significant industry recognition:

  • Accepted to the StartX AI Series program,
  • ProductHunt Product of the Day,
  • Selected to the PLUGandPLAY Silicon Valley acceleration program, and
  • Is backed by the esteemed Cohere Acceleration Program.

In less than one year, IngestAI has amassed an impressive user base of over 40,000 individuals, with nearly 15,000 active users on a monthly basis and few NASDAQ-traded companies among customers and in the pipeline. This level of traction speaks to the platform's ability to attract and engage users and generate business value.

Key features of Deepmark AI include

Reliability Assessment

Reliability is a critical factor in determining the effectiveness of Generative AI models. DeepMark.AI.AI offers comprehensive reliability assessments by evaluating model performance under various conditions and capturing potential failure points. This enables developers to identify areas for improvement and enhance the overall reliability of their AI applications.

Accuracy Evaluation

Ensuring the accuracy of Generative AI models is essential for generating high-quality outputs. DeepMark.AI.AI provides developers with tools to rigorously evaluate the accuracy of their models through extensive testing and validation procedures. By leveraging advanced statistical techniques and comparison methodologies, developers can derive meaningful insights into the accuracy of their Generative AI applications.

Cost Analysis

Understanding the cost implications before deploying Generative AI models is vital for optimizing resource allocation and maximizing return on investment. DeepMark.AI incorporates cost analysis, enabling developers to make precise estimations of the financial requirements associated with running their AI applications on different GenAI models. By providing cost projections, DeepMark.AI helps developers make informed decisions to achieve cost-effective solutions.

Relevance Assessment

Ensuring the relevance of generated outputs is critical, especially in applications where Generative AI is employed to address specific use cases. DeepMark.AI.AI facilitates relevance assessment by providing developers with tools to compare generated outputs against desired criteria. This allows developers to fine-tune their models and ensure the generated content aligns with the intended goals and requirements.

Latency Assessment

The assessment of latency in APIs for Generative AI models is of critical importance to deliver high-quality, efficient AI-powered applications. Latency denotes the time taken to get a response after a request is made and is a potential indicator of performance. By evaluating latency, AI developers can identify inefficiencies and ensure that AI applications perform at an optimal speed. This contributes to overall user satisfaction and impacts the reliability and credibility of AI applications.

Failure Rate Assessment

Assessing and monitoring failure rates on hundreds or thousands of requests is an essential aspect of assessment of robustness of Generative AI applications. DeepMark.AI offers failure rates assessment capabilities, allowing developers to seamlessly track failure rates at various scales, from hundreds to thousands of requests per second. By providing insights into potential failure patterns, DeepMark.AI enables developers to proactively address issues and maintain optimal performance.

Key Benefits of Deepmark AI

Incorporating the DeepMark.AI technology developed by IngestAI Labs within a AI development can yield to numerous advantages, including:

Predictability and Cost-effectiveness

DeepMark.AI prioritizes predictability and cost-effectiveness by providing developers with reliable assessment metrics, cost estimations, and optimization recommendations. This empowers developers to make informed decisions, reducing the risks associated with designing and deploying Generative AI applications.

Data-driven Decision-making

By leveraging data and rigor, DeepMark.AI enables organizations to move away from relying solely on intuition when assessing Generative AI models. This data-driven approach instills confidence in the decision-making process, allowing for greater precision and accuracy in AI applications development.

Enhances Application Quality

The ability of DeepMark.AI to comprehensively assess reliability, accuracy, relevance, and cost-efficiency contributes to enhancing the overall quality of AI applications. Through continuous monitoring or periodic assessment, developers can iteratively improve their models’ performance (e.g. by improving metapromts or fine-tuning), ensuring optimal performance and user satisfaction.

Path Forward

IngestAI is building its own safety and bias detection models based on a proprietary comparative dataset consisting of 7,5+ millions of varied requests and responses of different large language models, which are being labeled and used for training, testing, and refining of identification of bias-related contexts, real-time detection and resolution of biases and unsafe prompts or responses. Deepmark AI is a tool built on top of proprietary ML models for AI application developers which provides reliable assessments of predictability, accuracy, cost-efficiency, and other benchmark metrics. By prioritizing safety, truthfulness, predictability, and cost-effectiveness, while leveraging data and rigor, Deepmark AI empowers developers to build high-quality reliable Generative AI-powered applications. With its comprehensive features and benefits, Deepmark AI opens up new possibilities for organizations seeking to harness the true potential of Generative AI.

GUI Screenshots

Start using Deepmark