How to Annotate Reports for Automated ESG Scoring
In the modern world, investors and society demand transparency from companies not only regarding profits but also regarding their impact on the environment, society, and governance. These aspects are united under the concept of ESG.
The problem is that companies present their ESG reports as long, unstructured texts, such as multi-page PDF files or annual reports. No computer program can simply "read" this text and immediately understand: "This company reduced emissions by 15%." Therefore, it becomes necessary to create artificial intelligence that can automatically read these thousands of pages and extract clear indicators for ESG scoring.
Annotation in this case is the process where specialists manually label every fact, entity, and relationship in the raw text. This transforms the "fluff" of the reports into structured data. This data becomes the "textbook" for training NLP models, which can then independently analyze new, similar reports.
Quick Take
- To teach AI to "understand" long ESG reports, specialists must manually label facts, entities, and relationships, transforming unstructured text into structured data.
- The core annotation process includes extracting key metrics, evaluating tone, and normalizing data to international standards.
- For speed and scalability, AI-assisted annotation is used, where human experts only verify and correct the labels generated by the model.
- The ultimate goal is to build automated ESG scoring models that provide real-time monitoring.

Core Annotation Tasks for ESG
Quality annotation for automated ESG scoring purposes transforms vague text into an accurate, standardized, machine-readable structure. It must be consistent and complete so that AI can unambiguously understand what information to look for and how to connect it. This is achieved through specific tasks set for annotation.
Key Entity Extraction
Essentially, this is the process where the annotator uses digital "markers" to highlight and classify all essential words and phrases in the text. Instead of perceiving the text as a stream of characters, AI learns to recognize specific entities important for ESG scoring:
- The word "Tesla" is marked as [Company].
- The phrase "Scope 1 emissions" becomes [Metric].
- The number "15%" is marked as [Value].
- "The year 2030" is marked as [Target Year].
This helps AI identify important sustainability KPIs. If AI cannot reliably find these entities, it cannot proceed to the following, more complex stages of analysis.
Fact and Relationship Extraction
After we have extracted the key entities, we must understand how they interact. The goal of this stage is to establish clear, logical relationships between the highlighted entities to obtain a complete, meaningful fact.
The annotator defines the type of relationship between two or more entities. This allows AI to see complete sentences, not just isolated words. This approach creates structured "triplets" which are ideal raw material for building knowledge bases.
Sentiment and Tone Classification
ESG reports often contain vague wording and intentions. The goal is to evaluate the emotional coloring, or tone of the statement, to clearly distinguish a fact from a risk or a promise. These fragments contain risk keywords that should be marked.
- [Tone: Negative/Risk]. Sentences describing supply chain violations, possible lawsuits, or environmental risks.
- [Tone: Positive/Action]. Description of implementing a new grievance mechanism or successfully achieving goals.
- [Tone: Neutral/Statement]. Simple statement of fact without emotional coloring.
Tone classification allows AI not only to collect numbers but also to assess the quality of the company's commitments and potential threats.
Normalization and Categorization
Different companies use different terms and standards. The goal is to align the findings with a single, internationally recognized ESG framework. This ensures that data from various companies is comparable.
Every fact extracted in the previous stages is mapped to a universal taxonomy:
- Category. [Category: Environment], [Category: Social], [Category: Governance].
- Subcategory. The phrase "Increased use of solar energy" is annotated as [Subcategory: Renewable Energy].
- Goal Alignment. This may also include mapping to SDG alignment.
This transformation makes the raw data suitable for calculating a specific indicator in the global evaluation system, which is the final goal of the entire annotation process.
Annotation Methods and Tools
The technical process of labeling ESG reports requires combining human expertise and effective technological solutions. Annotation is mainly performed using one of two primary methods, supported by specialized software.
Manual Annotation
This is the highest quality but also the most resource-intensive method. Annotation is performed by specialists who have a deep understanding of ESG standards and experience with NLP annotation.
This ensures the highest accuracy and creates the so-called gold standard. Such a dataset is important for the initial training of AI models. However, the process is slow and expensive because it requires highly skilled labor.
AI-Assisted Annotation
This approach is the most promising for scaling, as it combines the speed of AI with human accuracy. Instead of starting with a blank text, AI uses pretrained models for prelabeling the text.
The NLP model automatically extracts most entities and relationships. The human annotator then acts as a corrector, checking and adjusting labels, correcting AI errors, and adding labels in complex or ambiguous cases. This significantly speeds up the annotation process, allowing for the rapid creation of large datasets for further refinement of the AI model.

Practical Value and Future of ESG Annotation
Quality annotation of reports is the foundation for creating fair, fast, and automated intelligent systems in the field of sustainability.
Automated ESG Scoring
The principal practical value of annotation is that it enables the building of automated ESG scoring models. Thanks to normalization and categorization, AI models can evaluate thousands of companies against clear and unified criteria. This eliminates the subjectivity of manual analysis. Instead of months of analyst work, AI can perform an initial assessment of a company's ESG profile in minutes.
Real Time Monitoring
In the future, well-trained AI agents will be able to assume the role of continuous monitoring. AI agents will be able to monitor not only annual reports but also company publications, news, press releases, and social media. Upon detecting a new fact, AI will automatically update the ESG profile rating, providing investors with up-to-date information.
Transition to Multimodal Data
The future of ESG analysis extends beyond just text. For a more comprehensive understanding of company activities, the integration of various types of data is necessary. It is needed to prepare the infrastructure and expert teams for annotating multimodal data.
This involves analyzing images and audio messages to extract ESG facts. Thus, annotation is not just a stage of data processing but a strategic investment in the reliability and speed of future ESG analysis.
FAQ
Why can't AI simply "read" an ESG report without annotation?
AI works excellently with structured data, but ESG reports are unstructured texts. For AI to understand what the phrase "We reduced emissions by 15%" means, it must be trained. Annotation creates this "textbook" for AI.
What is normalization, and why is it critically important for ESG scoring?
Normalization is the process of aligning all extracted facts with a single international standard, such as GRI, SASB, or SDG alignment. This is important because different companies use different terminology. Normalization ensures data comparability: AI can compare one company's "use of solar energy" with another's "renewable energy" because both phrases are linked to a single category.
How will AI annotation transition from text to multimodal data?
Annotation will expand beyond text to include images, video, and audio. This means that experts will not only annotate text snippets, but also objects in photos, events in videos, and important moments in audio recordings. This approach allows AI to understand more context and work with a larger set of ESG signals.
Why is manual expert annotation considered the "Gold Standard", but is not a scaling solution?
Manual annotation provides the highest accuracy because it is performed by experts who understand the ESG context. This creates the "Gold Standard" dataset. However, it is too slow and expensive to process the thousands of new reports released every year. Therefore, AI-assisted annotation is used for scaling.
What is the main practical benefit of the entire annotation process?
It enables the automation of ESG assessments. This allows the rapid assessment of many companies by providing structured data rather than subjective evaluations.
