header object

How Keymakr helped
SoundAware with user-
generated content and
copyright protection

MusicTech, copyright & media monitoring

Company:
Services:
Overview:

Intro

SoundAware, a leading MusicTech company, specializes in Recognition Technology (MRT) that helps collective management organizations, record labels, publishers, and media platforms to monitor and protect intellectual property across video and audio content.

This technology combines advanced audio recognition algorithms, large-scale data processing, and content analysis to detect original works, remixes, mashups, and other modified media. SoundAware covers various use cases, including music, movies, TV, trends, memes, and emerging user-generated content on platforms like TikTok.

SoundAware’s approach demonstrates how technology and human expertise protect creators’ rights and offer a deeper understanding of trends and emerging media patterns, making it a powerful tool for platforms and creators alike.

The challenge

SoundAware needs to process large volumes of user-generated video content. The initial goal was to:

  • Detect the presence of original commercial content or its modifications.
  • Classify types of content (music, movies, TV, trends, ASMR, etc.).
  • Analyze the degree of alteration of original works (remixes, mash-ups, cuts, parodies, and more).
  • Summarize subjective operator assessments with consensus-based validation (majority agreement).

Tight
deadlines

10,000 videos had to be processed within 1 month.

High cost
of errors

Any mismanagement of the workflow could lead to expensive reworks.

Scalability
and optimization

Finding the right balance between speed, accuracy, and cost was a major consideration.

The solution
1. Consensus method & structured questionnaire

Copyrighted material may appear differently: some videos reuse entire works without changes, while others transform them through parody, remix, or pastiche. Parody typically involves exaggeration, distortion, or humorous imitation, while pastiche may be more of an homage, as in memes, GIFs, mashups, fan art, or fan fiction.

To capture these nuances, operators worked with a structured questionnaire. They reviewed each video from a batch of 10,000 and answered a set of 11 questions, after which a majority value was calculated, and the final answer was based on consensus.

Through simple yes/no and multiple-choice formats, the operators were asked to decide whether the video contained commercial elements, copyrighted works (music, movies, TV, brands), and whether these works were presented in their original form or modified. Modifications could take many forms: alteration of audio (pitch, tempo, remix), visual changes (filters, animations, overlays), combining multiple works (compilations, mashups), replacing original elements (swapping visuals or music), or even creating derivative works like fan art and fan fiction.

The questionnaire also covered the technical and stylistic aspects of the videos, such as whether the quality was amateur or professional, and whether the content fell into additional categories like ASMR, trends, memes, or adult content.

The questionnaire also covered the technical and stylistic aspects of the videos, such as whether the quality was amateur or professional, and whether the content fell into additional categories like ASMR, trends, memes, or adult content.

The client received both the consensus results and all individual responses, allowing them to track polarities and discrepancies across the dataset.

2. Step-by-step data delivery

To minimize risks, the data was delivered in stages. First, 5% of the dataset was annotated and shared with SoundAware for review. Once the client approved this initial portion, an additional 45% was processed with commentary. The remaining 50% of the dataset was annotated with all feedback in mind. This staged approach ensured that both the format and the quality fully met expectations before scaling up to the final dataset.

Results

Ahead
of schedule:
10k+
videos were processed in 3 weeks, ahead of the 1-month deadline.
Time
savings:
Only about 78% of the project’s budget was used due to an optimal approach. The client saved 22% of paid hours to use in future projects.
Accuracy
& reliability:
The consensus-based system reduced risks associated with subjective interpretation.
Data
versatility :
The client received both aggregated consensus results and individual operator responses, offering flexibility for further data use.
“Keymakr has proven that human expertise remains indispensable in large-scale data projects. Thanks to their dedication and collaborative approach, we were able to deliver the required analyses and reports on time."

Jeroen Kerkvliet, SoundAware CMO

“I truly enjoyed working with the SoundAware team. Their collaborative approach was a huge asset; they were always quick to respond and open to new ideas, which made the project feel like a genuine partnership. This level of cooperation was key to our rapid progress, and I'm very much looking forward to our next project together."

Roman Gron, Keymakr PM

This project demonstrated that subjective user-generated content can be annotated at an industrial scale without compromising quality. The applied consensus method, structured questionnaire, and flexible step-by-step data delivery enabled SoundAware to quickly obtain reliable datasets for copyright protection and monetization.

Reviews
on

down-line
high perfomer
high perfomer emea
leader
star
star
star
star
star

"Delivering Quality and Excellence"

The upside of working with Keymakr is their strategy to annotations. You are given a sample of work to correct before they begin on the big batches. This saves all parties time and...

star
star
star
star
star

"Great service, fair price"

Ability to accommodate different and not consistent workflows.
Ability to scale up as well as scale down.
All the data was in the custom format that...

star
star
star
star
star

"Awesome Labeling for ML"

I have worked with Keymakr for about 2 years on several segmentation tasks.
They always provide excellent edge alignment, consistency, and speed...

More
cases

down-line

Automotive

Delivering scalable traffic detection data for a major automotive company

Robotics

Helping Cognex with high-quality OCR and object detection datasets for industrial automation

AgriTech

How Keymakr supported Bluewhite in advancing real-world autonomy