header object

How Keymakr helped Evinced
develop AI solutions
for web accessibility

AI-powered tools, websites, and mobile apps

Company:
Services:
Overview:

Intro

Evinced develops software solutions that help organizations build and maintain accessible web and mobile applications. Evinced focuses on identifying, monitoring, and fixing digital accessibility issues at both the user interface and source code levels, enabling teams to meet expectations such as WCAG.

Digital accessibility aims to improve access for people with diverse abilities and assistive technologies, while helping organizations reduce legal risk and deliver inclusive user experiences. Maintaining accessibility on the web requires continuous validation and enforcement, as complications can be introduced at any stage of development and change over time.

Evinced’s products are designed for engineering teams and integrate directly into existing development workflows, including IDEs, automated testing environments, and CI/CD pipelines.

As part of its product offering, Evinced provides automated and AI-assisted capabilities that analyze interfaces and code, surface accessibility issues, and support developers in resolving them during everyday development work. The effectiveness of these capabilities depends on accurate labeling and reliable ground truth data. In this context, high-quality datasets are a critical component of Evinced’s technology and the foundation for its collaboration with Keymakr.

The challenge

Keymakr was faced with two fundamentally different, yet closely connected projects:

1. Labeling high-quality datasets for understanding the structure of web and mobile interfaces, including all interactive elements on a page.

Because the project was long-term, expectations varied over time regarding delivery speed and required capacity. The annotation team had to scale flexibly: from a small group of specialists to several dozen, without interrupting or restarting established processes.

The large number of object types introduced another level of challenge. To achieve consistent, reliable results, specialists needed a deep understanding of web layouts and the logic behind interface construction.

2. Providing expert validation and analysis of code generated by AI assistants for developers against web accessibility requirements.

This part of the project required a highly specialized web accessibility expert with hands-on experience in validating interface solutions and working directly with accessibility standards. The role required a deep understanding of standards, real-world use cases, and complex edge cases. Furthermore, high-level knowledge of common errors and patterns was required for successful RLHF (reinforcement learning with human feedback) loops.

The solution
Project 1. Annotation of web and mobile interfaces

The goal of this project was to annotate datasets that enable models to correctly “understand” the structure of digital interfaces and interpret them in the same way users do, especially people with different accessibility needs.

Within the project, annotations covered entire web pages. Every interface element, buttons, icons, links, content blocks, navigation panels, and pop-ups, was annotated using bounding boxes and assigned a semantic description.

Multi-level annotation logic was established to achieve accuracy. The same area of the screen could contain multiple bounding boxes representing different levels of hierarchy. For example, a large bounding box could represent a navigation panel as a functional block, with groups of buttons annotated separately and, within those, individual icons or links.

The project covered approximately 5,500 website images, with a total of around 290,000 interface elements labeled. The number of object types appearing on pages reached 50–60 categories. These included both standard elements (buttons, links, input fields) and more complex composite blocks such as footers, security notifications, modal windows, and interactive panels. Depending on page structure and interface version (web or mobile), the visual representation of elements could vary, but all annotations were normalized to a unified logic and data structure.

Over time, the project's scope and scale changed, so workflows needed to be adaptable. At different stages, delivery speed requirements shifted, and the annotation team scaled flexibly from 5 to 20 specialists without stopping or restarting the workflow.

As a result, a series of iterative datasets was labeled, with each new version undergoing coordinated review and calibration. This process was built on close collaboration with the Evinced team and their deep involvement at every stage, supported by a continuous and tightly coupled feedback loop. Ongoing feedback was used to refine annotation rules, align interpretations of complex and edge cases, and adapt to evolving requirements as the project progressed. This approach ensured process stability and enabled 98.5% annotation quality, providing a reliable foundation for training and the further development of Evinced models.

Project 2. Code analysis and validation (web accessibility)

The second part of the collaboration focused on how LLM-based models analyze and validate source code in real-world development scenarios. This project addressed the practical review and validation of web interface solutions designed for users with disabilities. A key principle of the work was manual expert review: a highly specialized web accessibility expert from Keymakr validated outputs from AI assistants and coding systems to ensure the proposed code met accessibility requirements.

The expert’s role went far beyond formal syntax checks or high-level recommendations. Each request required contextual analysis, an understanding of interface logic, possible interaction scenarios, and potential corner cases.

Two primary expert workflows emerged as a result:

  • Analysis for accessible design functionality.

Here, the task was to structure requirements: distinguish between critical, desirable (nice-to-have), and optional elements, and define the correct implementation sequence. The value in these cases lies in outlining the solution's structure, as the order of actions and implementation logic is often critical for accessibility-related tasks. So, the Keymakr expert had to provide actionable recommendations for the model to learn from.

  • Evaluation of existing code fragments.

Detailed breakdown of which parts were implemented correctly, where accessibility standards were violated, and how those violations should be remediated. Typical requests concerned interactive elements and components with complex behavior. The analysis accounted for formal criteria, including corner cases, content type, website domain, underlying technologies, and realistic interaction scenarios.

The expert could not rely on superficial feedback such as “this is incorrect - change required.” Each response required a detailed conclusion explaining the nature of the issue, the functionality it affected, its impact, which requirements were critical, and why the solution needed to be implemented in a specific way. In such cases, the expert provided references to external standards and guidelines, such as WebAIM.

In terms of capacity, requests varied significantly. Some cases required around 20 minutes, while others took several hours, depending on task complexity, the volume of supporting materials, and the depth of corner-case analysis. The overall pool included approximately 800 unique cases, each contributing to the formation of a validated expert knowledge base.

The core value of this project lies in building a durable repository of expert knowledge. Verified, well-structured answers were used both to support development workflows and to facilitate subsequent validation of web resources, particularly government and quasi-government websites that are legally required to comply with accessibility regulations. In this way, the system served a dual purpose: supporting the creation of accessible interfaces and acting as a compliance validation mechanism aligned with a common baseline standard and country-specific regulatory requirements.

Results

Keymakr helped Evinced establish a robust and repeatable system for data preparation and code validation. The outcomes of the collaboration directly impacted model training quality and the reliability of AI-driven tools for developers.

Scalable annotation system
A stable pipeline was built to annotate complex web and mobile interfaces, enabling flexible workload scaling without compromising quality.
High-quality datasets
The resulting datasets enabled an accurate understanding of interface structure and provided a reliable foundation for training AI in accessibility-focused scenarios.
Validated reference knowledge base for AI assistants
Expert code validation produced a durable set of verified recommendations used to train and improve AI coding systems that address accessibility challenges.
98.5% consistent annotation quality
An iterative, controlled process enabled maintaining a high, reproducible level of quality throughout the project.
Unified system combining CV, expert evaluation, and AI validation
The project became a unique yet strategically important case in which data annotation, deep software expertise, and AI validation operated as a single ecosystem, strengthening Evinced’s product offering.
“This collaboration supported the continued development of our accessibility solutions. Throughout the engagement, we maintained a strong focus on quality and consistency while adapting to evolving requirements. The partnership contributed to refining our AI-driven processes and strengthening overall reliability. We appreciate the professionalism and commitment demonstrated throughout the project”.

Dudi Mazia, Evinced Data Operations Manager

"It was a pleasure to collaborate over the course of a year-long partnership. The project vision was clear from the very beginning, while the requirements evolved over time. We iteratively improved the workflows and consistently stayed within the planned boundaries, finding a balanced compromise between project goals and implementation effort. Together, we arrived at an excellent result by the end of the year, and look forward to even more improvements.

We are grateful to the Evinced team for their deep involvement and continuous feedback - this level of engagement was a key factor in achieving strong outcomes".

Gleb Zakharov, Keymakr PM

Reviews
on

down-line
g2
star
star
star
star
star

"Delivering Quality and Excellence"

The upside of working with Keymakr is their strategy to annotations. You are given a sample of work to correct before they begin on the big batches. This saves all parties time and...

star
star
star
star
star

"Great service, fair price"

Ability to accommodate different and not consistent workflows.
Ability to scale up as well as scale down.
All the data was in the custom format that...

star
star
star
star
star

"Awesome Labeling for ML"

I have worked with Keymakr for about 2 years on several segmentation tasks.
They always provide excellent edge alignment, consistency, and speed...

More
cases

down-line

Automotive

Delivering scalable traffic detection data for a major automotive company

MusicTech

Providing consensus-based media monitoring to help SoundAware protect the intellectual property rights of creators

Robotics

Helping Cognex with high-quality OCR and object detection datasets for industrial automation