LatticeFlow's LLM Framework Benchmarks Big AI's Compliance with the EU AI Act

While most countries’ lawmakers are still discussing how to put guardrails around artificial intelligence, the European Union is ahead of the pack, having passed a risk-based framework for regulating AI apps earlier this year. The law came into force in August, although full details of the pan-EU AI governance regime are still being worked out — Codes of Practice are in the process of being devised, for example. But, over the coming months and years, the law’s tiered provisions will start to apply on AI app and model makers so the compliance countdown is already live and ticking.

LatticeFlow's LLM Framework Benchmarks Big AI's Compliance with the EU AI Act

While most countries’ lawmakers are still discussing how to put guardrails around artificial intelligence, the European Union is ahead of the pack, having passed a risk-based framework for regulating AI apps earlier this year.

The law came into force in August, although full details of the pan-EU AI governance regime are still being worked out — Codes of Practice are in the process of being devised, for example. But, over the coming months and years, the law’s tiered provisions will start to apply on AI app and model makers so the compliance countdown is already live and ticking.

Evaluating whether and how AI models are meeting their legal obligations is the next challenge. Large language models (LLM), and other so-called foundation or general purpose AIs, will underpin most AI apps. So focusing assessment efforts at this layer of the AI stack seem important.

Step forward LatticeFlow AI, a spin out from public research university ETH Zurich, which is focused on AI risk management and compliance.

On Wednesday, it published what it’s touting as the first technical interpretation of the EU AI Act, meaning it’s sought to map regulatory requirements to technical ones, alongside an open-source LLM validation framework that draws on this work — which it’s calling Compl-AI (‘compl-ai’… see what they did there!).

The AI model evaluation initiative — which they also dub “the first regulation-oriented LLM benchmarking suite” — is the result of a long-term collaboration between the Swiss Federal Institute of Technology and Bulgaria’s Institute for Computer Science, Artificial Intelligence and Technology (INSAIT), per LatticeFlow.

AI model makers can use the Compl-AI site to request an evaluation of their technology’s compliance with the requirements of the EU AI Act.

LatticeFlow has also published model evaluations of several mainstream LLMs, such as different versions/sizes of Meta’s Llama models and OpenAI’s GPT, along with an EU AI Act compliance leaderboard for Big AI.

The latter ranks the performance of models from the likes of Anthropic, Google, OpenAI, Meta and Mistral against the law’s requirements — on a scale of 0 (i.e. no compliance) to 1 (full compliance).

Other evaluations are marked as N/A where there’s a lack of data, or if the model maker doesn’t make the capability available. (NB: At the time of writing there were also some minus scores recorded but we’re told that was down to a bug in the Hugging Face interface.)

LatticeFlow’s framework evaluates LLM responses across 27 benchmarks such as “toxic completions of benign text”, “prejudiced answers”, “following harmful instructions”, “truthfulness” and “common sense reasoning” to name a few of the benchmarking categories it’s using for the evaluations. So each model gets a range of scores in each column (or else N/A).

AI Compliance: A Mixed Bag

So how did major LLMs do? There is no overall model score. So performance varies depending on exactly what’s being evaluated — but there are some notable highs and lows across the various benchmarks.

For example there’s strong performance for all the models on not following harmful instructions; and relatively strong performance across the board on not producing prejudiced answers — whereas reasoning and general knowledge scores were a much more mixed bag.

Elsewhere, recommendation consistency, which the framework is using as a measure of fairness, was particularly poor for all models — with none scoring above the halfway mark (and most scoring well below).

Other areas, such as training data suitability and watermark reliability and robustness, appear essentially unevaluated on account of how many results are marked N/A.

LatticeFlow does note there are certain areas where models’ compliance is more challenging to evaluate, such as hot button issues like copyright and privacy. So it’s not pretending it has all the answers.

In a paper detailing work on the framework, the scientists involved in the project highlight how most of the smaller models they evaluated (≤ 13B parameters) “scored poorly on technical robustness and safety”.

They also found that “almost all examined models struggle to achieve high levels of diversity, non-discrimination, and fairness”.

“We believe that these shortcomings are primarily due to model providers disproportionally focusing on improving model capabilities, at the expense of other important aspects highlighted by the EU AI Act’s regulatory requirements,” they add, suggesting that as compliance deadlines start to bite LLM makes will be forced to shift their focus onto areas of concern — “leading to a more balanced development of LLMs”.

Given no one yet knows exactly what will be required to comply with the EU AI Act, LatticeFlow’s framework is necessarily a work in progress. It is also only one interpretation of how the law’s requirements could be translated into technical outputs that can be benchmarked and compared. But it’s an interesting start on what will need to be an ongoing effort to probe powerful automation technologies and try to steer their developers towards safer utility.

“The framework is a first step towards a full compliance-centered evaluation of the EU AI Act — but is designed in a way to be easily updated to move in lock-step as the Act gets updated and the various working groups make progress,” LatticeFlow CEO Petar Tsankov told TechCrunch. “The EU Commission supports this. We expect the community and industry to continue to develop the framework towards a full and comprehensive AI Act assessment platform.”

Summarizing the main takeaways so far, Tsankov said it’s clear that AI models have “predominantly been optimized for capabilities rather than compliance”. He also flagged “notable performance gaps” — pointing out that some high capability models can be on a par with weaker models when it comes to compliance.

Cyberattack resilience (at the model level) and fairness are areas of particular concern, per Tsankov, with many models scoring below 50% for the former area.

“While Anthropic and OpenAI have successfully aligned their (closed) models to score against jailbreaks and prompt injections, open-source vendors like Mistral have put less emphasis on this,” he said.

And with “most models” performing equally poorly on fairness benchmarks he suggested this should be a priority for future work.

On the challenges of benchmarking LLM performance in areas like copyright and privacy, Tsankov explained: “For copyright the challenge is that current benchmarks only check for copyright books. This approach has two major limitations: (i) it does not account for potential copyright violations involving materials other than these specific books, and (ii) it relies on quantifying model memorization, which is notoriously difficult. 

“For privacy the challenge is similar: the benchmark only attempts to determine whether the model has memorized specific personal information.”

LatticeFlow is keen for the free and open source framework to be adopted and improved by the wider AI research community.

“We invite AI researchers, developers, and regulators to join us in advancing this evolving project,” said professor Martin Vechev of ETH Zurich and founder and scientific director at INSAIT, who is also involved in the work, in a statement. “We encourage other research groups and practitioners to contribute by refining the AI Act mapping, adding new benchmarks, and expanding this open-source framework.

“The methodology can also be extended to evaluate AI models against future regulatory acts beyond the EU AI Act, making it a valuable tool for organizations working across different jurisdictions.”

Summary:

  • LatticeFlow has developed a framework to benchmark the compliance of LLMs with the EU AI Act.
  • The framework evaluates LLMs across 27 benchmarks, including toxic completions, prejudiced answers, following harmful instructions, truthfulness, and common sense reasoning.
  • The results show that LLMs have been primarily optimized for capabilities rather than compliance, with notable performance gaps across different benchmarks.
  • Areas of concern include cyberattack resilience, fairness, and copyright and privacy.
  • LatticeFlow is encouraging the AI community to contribute to the open-source framework.

Review

Kalpesh  Shewale
Kalpesh Shewale
Apr 22, 2023

I am grateful to have completed my Full Stack Development with AI course at Apnaguru. The faculty's support and interactive classes helped me discover my potential and shape a positive future. Their guidance led to my successful placement, and I highly recommend this institute.

Reply
Kalpesh  Shewale
Kalpesh Shewale
Apr 10, 2024

I am grateful to have completed the Full Stack Development with AI course at Apnaguru. The faculty's dedicated support and hands-on approach during the classes enabled me to unlock my potential and shape a promising future. Their guidance helped me secure a placement with a good package. I highly recommend this course, and for those interested, I also suggest doing the offline version at the center for an enhanced learning experience.

Reply
Raveesh Rajput
Raveesh Rajput
Jun 9, 2024

Completing the Full Stack Development with AI course at Apnaguru was a game-changer for me. I secured an internship through this course, which gave me invaluable hands-on experience. I strongly recommend this course to anyone looking to break into the tech industry. For the best experience, I suggest attending the offline sessions at the center, where the interactive learning environment really enhances the overall experience.

Reply
swapnil shinde
swapnil shinde
Jun 10, 2024

Apnaguru’s Full Stack Development with AI course provided me with more than just knowledge—it opened doors to an internship that gave me real-world, hands-on experience. If you're serious about a career in tech, this course is a must. I highly recommend attending the offline sessions for the most immersive and interactive learning experience!

Reply
Kalpana Waghmare
Oct 19, 2024

I recently completed the Full Stack Developer with AI course on ApnaGuru, and I couldn’t be more impressed! The structure of the course, with well-organized topics and self-assessment MCQs after each section, really helped reinforce my learning. The assignments were particularly valuable, allowing me to apply what I learned in a practical way. Overall, it’s an excellent program that effectively combines full-stack development and AI concepts. Highly recommended for anyone looking to enhance their skills!

Reply
Jun 10, 2024

Completing the Full Stack Development with AI course at Apnaguru was a pivotal moment in my career. It not only deepened my understanding of cutting-edge technologies but also directly led to an internship that provided practical, real-world experience. If you're aiming to enter the tech field, this course is an excellent stepping stone. I especially recommend attending the in-person sessions at the center, where the dynamic, hands-on learning approach truly maximizes the benefits of the program.

Reply
Mahesh Bhosle
Mahesh Bhosle
Jun 11, 2024

I completed the Full Stack Development course at Apnaguru, and it was a valuable experience. The focus on live assignments and projects gave me real-world insights, helping me apply my skills in a professional setting. The interactive live sessions, mock interviews, and question banks were excellent for job preparation. Apnaguru’s company-like environment also helped me get accustomed to real work dynamics. Overall, this course equipped me with the skills and confidence needed for a career in full-stack development. I highly recommend it to anyone seeking hands-on learning and industry relevance.

Reply
Jun 11, 2024

I recently completed the Full Stack course at ApnaGuru, and I’m genuinely impressed! The curriculum is well-structured, covering both front-end and back-end technologies comprehensively. The instructors are knowledgeable and provide hands-on experience through practical projects. The supportive community and resources available made learning enjoyable and engaging. Overall, it’s a great choice for anyone looking to kickstart a career in web development. Highly recommend!

Reply
Raveesh Rajput
Raveesh Rajput
Jun 11, 2024

Apnaguru is an excellent platform for advancing skills in technology, particularly in Full Stack Development and AI. The courses are well-structured with hands-on projects, and faculty support is exceptional, ensuring student success.

Reply
Adarsh Ovhal
Adarsh Ovhal
Jun 11, 2024

I recently participated in the Full Stack Development With AI Course program, and it has been incredibly beneficial. The guidance I received was tailored to my individual needs, thanks to their advanced use of AI tools. The Trainers were knowledgeable and supportive, helping me explore various educational and career paths. The resources and workshops provided were practical and insightful, making my decision-making process much clearer. Overall, I highly recommend this program to any student looking for IT Field and personalized career guidance!

Reply
Shirish Panchal
Oct 12, 2024

I recently participated in a career guidance program and found it incredibly beneficial. The tailored support, enhanced by advanced AI tools, helped me explore various educational and career paths effectively.

Reply
Oct 19, 2024

I had a great experience at ApnaGuru Institute! The courses are well-designed and offer practical knowledge that’s applicable in the real world. The instructors are experienced and supportive, making it easy to grasp complex concepts.

Reply
Kalpana Waghmare
Oct 19, 2024

I have done a course through ApnaGuru, and I couldn't be more impressed! The quality of the content is outstanding, and the self-assessments really help reinforce what I've learned.

Reply
swapnil shinde
swapnil shinde
Oct 19, 2024

ApnaGuru was the perfect place for me to kickstart my career in Full Stack Development. The faculty’s support was invaluable, guiding me every step of the way and helping me unlock my potential.

Reply
Adarsh Ovhal
Adarsh Ovhal
Oct 19, 2024

Apnaguru Training Center is an excellent place for IT education! They offer comprehensive courses in Full Stack Development, Java Full Stack, Python, Automation Testing, DevOps, and MERN/MEAN Stack.

Reply
Shirish Panchal
Jun 12, 2024

I’m currently pursuing the Full Stack Developer with AI course at ApnaGuru Training Center, and I'm impressed with what I've experienced so far. The curriculum is well-structured, covering key concepts in both front-end and back-end development, along with AI fundamentals. The instructors are knowledgeable and supportive, which makes it easy to engage and ask questions. I particularly appreciate the hands-on projects that help reinforce what I’m learning. While I’m still in the process of completing the course, I feel that I'm building a strong foundation for my future in tech. I would recommend ApnaGuru to anyone looking to explore full stack development with AI!

Reply
Mosin Pathan
Oct 19, 2024

My experience at ApnaGuru Institute has been exceptional, particularly in the realm of IT and software development. Whether you're a complete beginner or an IT professional looking to advance your skills.

Reply
Oct 19, 2024

Apnaguru Training Center stands out as a top-notch institute for IT education. They provide a wide array of courses, including Full Stack Development, Java Full Stack, Python, Automation Testing, DevOps, and MERN/MEAN Stack, all designed to meet the demands of the modern tech industry.

Reply
Mahesh Bhosle
Mahesh Bhosle
Oct 19, 2024

Apnaguru Training Center is a fantastic place for IT education! They offer a variety of courses, including Full Stack Development, Java Full Stack, and Python, all taught by knowledgeable instructors who are committed to student success. The curriculum is up-to-date and includes hands-on projects that enhance learning.

Reply
dandewar srikanth
Oct 19, 2024

I had an excellent experience with the full-stack web development program at APNAGURU. The instructor had in-depth knowledge of both frontend and backend technologies, which made the concepts easy to grasp. From working on HTML, CSS, JavaScript, and React for the frontend to Node.js and MongoDB for the backend, the learning curve was very smooth.

Reply
Vilas Shetkar
Oct 20, 2024

Awesome Training

0

Awesome Training

Reply
Roshan Borkar
Dec 6, 2024

i have suggestion to improve this quiz instead of skip buttion can we add prev and next button in this quiz

Reply
Jan 3, 2025

some questions options are not visible

Reply
kishor chaudhari
Jan 9, 2025

Reply
kishor chaudhari
Jan 9, 2025

Reply
hemant kadam
Feb 28, 2025

Quiz not open

Reply
hemant kadam
Feb 28, 2025

why i cant open quiz

Reply