Keymakr launches egocentric and robotics training data solutions for Physical AI systems

Keymakr has officially launched a new suite of tools and services focused on egocentric data for Physical AI and robotics systems. It combines complex multimodal data collection "in the field" and the updated infrastructure of its own SaaS platform, Keylabs. This step confirms that the industry is now moving towards "embodied" AI, in which models should interact with the real world in real time, control objects, and interpret their surroundings.

Why standard datasets are no longer enough

The modern AI industry has moved beyond pure text models or basic computer vision. When companies start building smart glasses, autonomous warehouse robots, or medical manipulators, they often lack the specific sensor data needed.

Training such models requires a large number of demonstrations filmed in real commercial and industrial environments, with detailed tracking of device movements and physical parameters. Understanding this, Keymakr is launching a new direction, physical AI, which offers custom production and markup of complex multimodal data "on a turnkey basis" for commercial use and training of next-generation models.

“Egocentric data helps AI systems learn how humans approach and manipulate objects, how tools are used, how tasks unfold step by step, and how hand-eye coordination operates in dynamic environments,” said Inna Nomerovska, CMO at Keymakr. “To support downstream model training, we offer fully enriched datasets along with a process of creating unique data from scratch with our scalable process. At this point, it’s practically Hollywood for AI.

New directions

The new suite is divided into two directions for training AI systems that interact with space and objects:

1. Egocentric multimodal data collection and markup

This service is focused on the development of first-person computer vision systems (FPV). Such data is necessary for the development of augmented reality (AR) helmets and humanoid robots that mimic human movements.

Keymakr organizes filming in real target environments. For this, special body-worn and head-mounted cameras are used. They capture people's natural behavior in an unpredictable environment. The shooting is carried out by considering different lighting scenarios, viewing angles, and interactions with objects to avoid data sterility and to prepare the model for real-world use.

2. Solutions for robotic arm teleoperation

The second direction is the creation of a specialized ecosystem for collecting and structuring demonstration data obtained during the control of robotic arms.

To teach AI models to generalize across tasks and interact correctly with objects, engineers need a variety of environments and task types. The main stage is the development of an infrastructure that will allow connecting real-world workflows with artificial intelligence training cycles. It is for this reason that Keymakr has implemented a large-scale update to its Keylabs platform, which handles all the work of structuring such complex data.

Capturing actions, intent, and interaction

One of the main challenges engineers face when purchasing standard datasets is the incorrect data structure. Attempts to automatically divide complex processes often lead to technical issues during model training. As a result, activity annotations often become overly granular and nearly identical to individual actions, whereas engineers typically expect broader activity categories that encompass dozens of related actions. 

Therefore, Keymakr updated the functionality of its Keylabs platform, introducing a hierarchical markup system and a set of tools specifically designed for physical and embodied AI tasks.

Overall, the new Keymakr Physical AI suit develops datasets that include: 

  1. Activity annotations. The team marks up long processes (e.g., "cooking a meal" or "sorting a package") as continuous activities that correspond to the model's long-term planning of its actions.
  2. Temporal/action annotations. Within each large activity, clear time labels are created for micro-events, each linked to a specific frame.
  3. Action labels. Each micro-action (pick up a knife, lift a box, turn a valve) receives its own unique text and categorical label.
  4. Multimodal egocentric RGB video streams. The Keylabs platform seamlessly integrates first-person view (FPV) video with synchronized data from complex sensors (IMU, LiDAR, RGB-D).
  5. Hand-joint tracking. The Keylabs allows you to track the movements of each joint of a human hand in video with pixel-precise accuracy, enabling robots to perform precise manipulations.
  6. Semantic text description. A natural-language description of its context accompanies each scene. These datasets are used to train next-generation multimodal neural networks (VLA) that connect vision, language, and action.
  7. Tracking interaction and manipulation with objects. Specialized tools allow you to capture each phase of interaction with objects in real time (touch moments, grip states, and movement trajectories).

To maintain quality, Keymakr engineers implemented synchronization control. The temporal alignment function ensures that data from lidars, motion sensors, and video cameras align to within millisecond accuracy. The frame tolerance parameter has also been implemented, eliminating the possibility of accidental label displacement at frame junctions and providing perfect smoothness and error-free data for AI training.

Since the data is intended for the creation of ready-made commercial products (OTS), legal security and intellectual property are the primary concerns. The modern corporate segment imposes strict licensing requirements. Accordingly, Keymakr builds its processes on the principles of ethical data collection (ethical data sourcing) and guarantees full compliance with standards.

The company provides transparent ownership of IP (intellectual property) or transferred rights, allowing for further sublicensing. Keymakr completely excludes the use of scraped content from the Internet, as well as illegal or legally dubious content.

Special attention is paid to the filming of complex egocentric video materials: each participant in the process undergoes a carefully documented consent procedure (participant consent). The commercial rights we provide cover all stages of the AI ​​development lifecycle. This is a guarantee of protection for Keymakr clients from any legal or reputational risks.

Keymakr's strategic plans

The launch of a new suite of tools and services is just the first step in the company's journey to building a large-scale infrastructure for embodied AI. Keymakr's next strategic step will be to expand into robotic-arm teleoperation data collection, enabling the capture of demonstration datasets on physical robotic systems (minimum 6 DoF). These datasets include joint states, control signals, and action labels, often synchronized with wrist-mounted and third-person camera feeds. 

Keymakr believes that the success of the next generation of robotics depends on accurately reproducing real, non-sterile conditions of human labor, which requires data providers to be flexible and have a variety of scenarios.

“We emphasize diversity across robotic platforms, environments, and task types to improve generalization of trained models,” added Inna Nomerovska. “Now companies are focusing on specific, repeatable tasks performed in real environments. Every dataset should be tied to a real-world workflow and can be adapted to a specific business need. This is where Physical AI is heading, and data is the key enabler.”