Citadel AI (HQ: Shibuya, Tokyo; CEO: Hironori “Rick” Kobayashi), a leader in trustworthy AI technology, is announcing the release of LangCheck, an open source library available on GitHub. LangCheck is a multilingual toolkit for evaluating and improving the reliability of LLM applications.
LangCheck provides building blocks for developers and organizations to easily create evaluations, guardrails, and monitoring for their LLM applications. LangCheck is multilingual, and as of the launch date, supports both English and Japanese text.
Additionally, we have started a series of blog posts focusing on the engineering challenges of developing LLM applications. The first article describes our own experiences developing LLM-powered features at an internal hackathon (Japanese only).
Capabilities and Risks of Large Language Models
The LLMs released by OpenAI, Google, Meta, and others have made incredible technological progress, and have the potential to revolutionize how we work and live.
However, these models are primarily trained with data sourced from the internet, and sometimes pick up incorrect or unsuitable information. This issue is particularly pronounced in specialized domains or non-English languages, and leads to inaccurate outputs.
When organizations integrate these LLM applications into their business processes, a major challenge is ensuring the quality and reliability of LLM-generated outputs. Inappropriate outputs not only inconvenience users, but also pose risks to a company’s brand and compliance requirements.
Therefore, it’s important for organizations to not only to establish a governance framework for the use of generative language models, but also to implement technical controls through LLM validation and monitoring systems.
Reliability of LLM Applications
When companies try to utilize LLMs, it’s often impractical to train foundation models such as GPT-3.5 (the model underlying ChatGPT) from scratch. Instead, the common approach is to develop LLM applications that wrap existing, third-party LLMs for specific business purposes, as depicted in the diagram below:
In these cases, it’s difficult for companies to directly control the quality and reliability of the third-party foundation model. Therefore, there are two key strategies for LLM application developers:
- Directly try to improve the quality of outputs (the “offensive” approach)
- Validate and suppress problematic outputs (the “defensive” approach)
Compared to traditional ML models, the input and output format of LLMs is not standardized, and it’s difficult to find metrics to evaluate the quality of LLM-generated text. Therefore, new evaluation methods specific to LLMs are necessary. LangCheck contains a suite of state-of-the-art tools to enable developers to easily evaluate the quality of their LLM applications.
Citadel AI’s Open Source LangCheck Library
To solve these LLM reliability problems, Citadel AI has released LangCheck to enable developers and businesses to create and operate high-quality LLM applications with confidence.
LangCheck is a comprehensive toolkit of metrics and visualizations to evaluate LLM applications. Since it’s designed as a library of building blocks, you can easily adapt it to various use cases: evaluating LLM-generated text, monitoring anomalies in production, and providing guardrails on outputs. We plan to expand support to multiple languages, starting from evaluating Japanese and English text. Here are some example metrics included in LangCheck.
- Similarity checks with ground truth text in your evaluation dataset
- Factual Consistency checks to measure the degree of consistency with factual information
- Schema checks for structured input and output text
- Toxicity checks for harmful or discriminatory output
- Fluency checks for errors in grammar, vocabulary, etc
- Sentiment checks for positive and negative expressions
We hope you’ll find LangCheck useful, and we welcome your requests, contributions, and feedback.
For more information on LangCheck, including our blog posts on the engineering challenges of LLMs and LLM applications, please follow the link below (Japanese only).