Keymakr is the official annotation partner for RUKOPYS — the first open dataset of Ukrainian handwritten text
In April 2026, the first comprehensive open dataset of Ukrainian handwritten text - RUKOPYS - was officially released. As its official data annotation partner, Keymakr contributed to the creation of the dataset with hundreds of hours of ground truth labeling.
The dataset was initiated by the Ministry of Economy, Environment, and Agriculture of Ukraine, with the support of the Ministry of Digital Transformation, in collaboration with the Ukrainian Catholic University and the Ukrainian non-profit organization AI HOUSE, which builds the largest AI community in Ukraine.
RUKOPYS serves as the foundation for the Handwritten to Data AI Challenge, an open competition where participants build Computer Vision solutions to recognize applications, certificates, logs, signatures, stamps, and archival documents. The challenge is designed to identify and implement the best-performing solution into the ePermit system, helping optimize Ukrainian ministry workflows and streamline administrative processes.
RUKOPYS covers more than a century of handwritten materials, from archival documents of the 1920s to modern notes, bringing together a wide range of writing styles, formats, and historical contexts.
Why Ukrainian handwriting matters for AI
Ukrainian is one of the most widely spoken Slavic languages, yet it has historically lacked large-scale open datasets for handwriting recognition. At the same time, Ukraine holds vast amounts of textual information across archives, libraries, universities, and government registries. Much of this data remains inaccessible, as it exists only in paper form, and a significant portion is handwritten — making large-scale digitization particularly complex.
According to Dmytro Voitekh, AI advisor to the Ministry of Economy of Ukraine and AI/ML Lead at Mriya, and the expert of the AI community AI HOUSE, this directly affects the development of modern AI systems:
“The Ukrainian language remains significantly underrepresented in the digital space. This directly affects the quality with which modern foundation models - GPT, Gemini, Claude, and others - handle Ukrainian text. Converting paper-based collections into structured datasets and incorporating them into publicly available sources and benchmarks is, in essence, a strategic objective. It is not only about training our own models, but also about providing existing flagship models with additional context, which, alone, will improve the quality and volume of Ukrainian-language content online without substantial additional costs.
However, there is a serious obstacle on this path. Adapted models capable of recognizing Ukrainian handwritten text at scale and within feasible budgets do not yet exist. The available point datasets are small and do not capture the full complexity of the task: the diversity of handwriting styles, historical periods, and document types.
At the same time, the need for such solutions extends far beyond archives. Government services still handle legacy documents that are absent from electronic registries - and processing them requires significant manual labor. In education, according to various studies, teachers spend up to 10% of their working time grading homework - and even a first level of automation, such as preliminary highlighting of potential errors, could substantially ease this burden.
That is why the Ukrainian non-profit organization AI HOUSE, in collaboration with the Ministry of Economy, Environment, and Agriculture of Ukraine and the Ukrainian Catholic University, with support from the Ministry of Digital Transformation of Ukraine, decided to create the first comprehensive open dataset of Ukrainian handwritten text, which is called RUKOPYS.”
Processing handwritten archives
RUKOPYS combines materials collected in collaboration with national institutions, universities, and archival organizations. It is fully anonymized and released under an open license for research and education, making it accessible to developers, researchers, and institutions worldwide. Keymakr supported the RUKOPYS creation process through expert manual annotation by in-house experts.
The project required careful handling of complex handwritten data, including variations in scripts, degradation of archival materials, and differences in formatting across decades.
“Our work was focused on structuring handwritten data at the line level with high precision,” said Zoya Boyko, PM at Keymakr. “We performed detailed validation for each text line, followed by accurate transcription and normalization, accounting for inconsistencies in handwriting, spacing, and document quality. Special attention was given to low-legibility cases and historical variations, with multi-stage quality control ensuring consistency across the dataset. This allowed us to convert handwritten content into a format suitable for training OCR and HTR models under real-world conditions.
Throughout the project, we closely collaborated with AI HOUSE, discussing edge cases and refining the workflow. We also proposed process optimizations that helped improve overall efficiency and data quality. Initial expectations were exceeded, resulting in a treasure trove of valuable data for Ukrainian model training.”
Foundation for the open competitions
The dataset will serve as the basis for the Handwritten to Data AI Challenge, an open competition where participants build Computer Vision solutions to recognize applications, certificates, logs, signatures, stamps, and archival documents, thereby optimizing government workflows and tasks.
AI/ML engineers, data scientists, researchers, engineering students, startups, R&D teams, and university labs gained access to RUKOPYS, the ability to use Amazon Web Services to train models, and the chance to test solutions in practical scenarios, with the potential for integration into government platforms. The prize pool totals $7,000, and selected solutions may be integrated into government systems.
The goal of the Handwritten to Data AI Challenge is to accelerate document processing, improve access to archival data, and support the development of digital public services.
A long-term initiative
RUKOPYS is only the first iteration of the dataset. The AI HOUSE team and partners plan to expand it further, with new data sources and partners already being added.
The broader objective is to create a sustainable pipeline for digitizing handwritten content and making it usable for both public and private sector applications.
For Keymakr, participation in the project reflects a wider focus on building reliable data pipelines for complex and challenging AI use cases - from document processing to multimodal systems - while also contributing to initiatives with public impact.
As Inna Nomerovska, CMO at Keymakr, noted:
“Ukraine currently ranks 5th in the world in the development of digital public services and aims to become one of the three world leaders in the use of artificial intelligence in the public sector by 2030. We’re just doing our small part by helping the digitization of documents,” said Inna Nomerovska, Chief Marketing Officer at Keymakr. “Even without focusing too much on future applications, it’s already a powerful initiative for preserving and supporting the Ukrainian language, culture, and historical records.”