Skip to main content

What is metadata?

(in non-technical terms)

And why is RIXML metadata important?

Metadata means “data about data”; it refers to the structured data that is used to describe content in the machine-readable way that databases, search engines, and ai-powered tools need. Metadata is usually unseen by the end user, but is critical for organizing, searching for, and managing content.

Metadata: critical for humans, critical for AI

Metadata is structured data in which individual information elements (e.g., title, subtitle, publication date, author, ticker, key topics, etc.) are wrapped in explicitly labeled fields defined by a formal schema. When publishers distribute this metadata alongside their research content, these labeled tags map directly to database fields, enabling the data to be ingested, validated, indexed, stored, and queried in a consistent, machine readable manner.

You've already met metadata

An example of metadata that you are almost certainly familiar with is the information in a library catalog. The catalog record in a library database doesn’t replace the book, nor does it contain the full text of the book. It simply describes the book with information that will make it easier to find.

This allows the search engine to be more accurate, as it can distinguish between whether a search term refers to the author’s last name, a word in the title, or part of the company name. It also allows the search engine to be far more efficient, as it knows where to look when you want to do an author search or a title search – it doesn’t need to search all fields, just the relevant one.


RIXML: the metadata for investment research

RIXML is the metadata designed specifically to describe investment research and interactions. At a very high level, you can think of RIXML as the custom cataloging for investment research and interaction records. Just as libraries have developed a standardized way to describe the books, CDs, audiobooks and other print and electronic materials in their collections, the member firms of RIXML have worked together to determine how best to describe a wide variety of investment research, as well as the details of inter-firm interactions, to ensure that this content can be found by the investment professionals and systems looking for it.

For over 25 years, the RIXML Research Standard has been used across the industry to power the databases that investment professionals use to search for the research they need, create alerts, set up email subscriptions, and perform other tasks.


What about artificial intelligence?

The metadata in a RIXML record is designed to meet the needs of both the humans and the systems that will be using it, providing a description of the report it describes in the format required by the databases that power both traditional and generative search engines.

Artificial intelligence-powered tools have already brought significant changes to the ways that investment research is created, described, distributed, consumed, and analyzed, and the speed of additional changes is likely to increase. One thing that hasn't changed is the importance of structured data.

In fact, the reason that artificial intelligence-powered content analysis tools need high-quality metadata is similar to the reason that the tools that power traditional searching, filtering, and alerting need it: accuracy and efficiency!

How?

The example below compares how a generative search tool that leverages RIXML metadata versus one that does not would identify the relevant research needed to answer the prompt, “Summarize the key reasons for the recent upgrades to Acme Company’s earnings estimates.”:

As you can see, both options will result in an answer. However, using RIXML metadata will:

  • improve accuracy by ensuring that the right input content is used
  • improve efficiency by speeding up the process for finding the content needed to answer the requester's question


What metadata is in the RIXML standards?

The metadata in a RIXML record is designed to meet the needs of both the humans and the systems that will be using it, and includes tagging that identifies the content of the report, the copyright and other administrative information, and workflow tagging that facilitates content management and recordkeeping.

TypeResearch StandardInteractions Standard
Descriptive tags

authorship data
tickers & other identifiers
sector & industry
country and region
key topics

interaction host, speakers, and participants
key topics

Administrative tags
creation date, publication date, revision date
file type
entitlement data
event date, registration deadline
Structural tags
chapters
chart and graph lists

conference breakout groups
agendas and related materials


Predefined lists ensure consistency

One of the key purposes of metadata is to improve findability. To facilitate this, it is helpful for some metadata tags to be constrained by a predefined list of terms. In the RIXML standards, these enumeration lists provide a common vocabulary that ensures that similar content is tagged in a consistent manner, despite regional spelling differences, firm-specific terminology, and author word choice. Firms continue to use their preferred terminology on user-facing interfaces; mapping to the standardized RIXML taxonomy happens behind the scenes. This allows the machine-readable metadata to be standardized without affecting the word choice in the research content itself.

Whenever possible – such as for identifying countries, currencies, tickers and other company identifiers, date/time information, etc., we use ISO standards or other similar external sources. But since the RIXML standards are designed to meet the specific needs of describing investment research content and interaction data, many of our defined lists are custom lists developed by our member firms. Working together, we have developed a rich set of enumeration lists to describe coverage action, intended audience, publishing action, and other terms relevant to describing investment research and interactions.