This lesson is still being designed and assembled (Pre-Alpha version)

Metadata

Overview

Teaching: 15 min
Exercises: 15 min
Questions
  • What are metadata?

  • What is an ontology or controlled vocabulary?

  • What are metadata standards?

Objectives
  • Understand metadata and their role in FAIR data

  • Recognize different types of metadata

  • Choose an appropriate metadata standard for your data

Metadata

Metadata are data about data. In other words, metadata is the underlying definition or description of data. For example, author name, date created, date modified and file size are examples of very basic metadata for a file.

Metadata make finding and working with the data easier. Therefore, they are essential components in making your data FAIR. One could argue that metadata are more important than your data. Without metadata, the data would be just numbers. But, without the original data, the metadata can still be useful to track people, institutions or publications associated with the original research. From a FAIR perspective, metadata would always be openly available.

Metadata can be created manually, or automatically generated by the software or equipment used and preferably according to a disciplinary standard. While data documentation is meant to be read and understood by humans, metadata are primarily meant to be processed by machines. There is no FAIR data without machine-actionable metadata.

FAIR principles about metadata

Let’s have a look at FAIR principles.

  1. Which principles focus on metadata?

Solution

Because metadata are data about data, all of the principles i.e. Findable, Accessible, Interoperable and Reusable apply to metadata.

Three types of metadata

We focus on three main types of metadata:

Keep your metadata up-to-date!

  • Descriptive and structural metadata should be added continuously throughout the project.
  • Different types of metadata apply not only to a database, but also to individual sets of data, e.g. images/plots, tables, files, etc.

Types of metadata for geospatial data in your community/research team

Here are some questions about the use case you chose in the introduction of this tutorial, here.

  1. What is the type of metadata in your use case?
  2. What information are described by that metadata?

Ontology

An ontology (or controlled vocabulary) is a standard definition of key concepts in your community/research team and focuses on how those concepts are related to one another. A controlled vocabulary is a set of terms that you have to pick from. Using an ontology:

  1. helps others to understand the structure and content of your data,
  2. makes your data findable, interoperable and reusable.

A controlled vocabulary for climate and forecast data

In climate-related domains, many variables depend on the type of surface. How can you specify the surface type in the metadata?

Solution

Climate and Forecast metadata (CF conventions) maintains a vocabulary specifically for specifying surface and area types. The vocabulary is available on the CF site as the Area Type Table.

Metadata standards

A metadata standard (or convention) is a subject-specific guide to your data. Rules on what content must be included, which syntax should be used, or a controlled vocabulary are included in a metadata standard. The quality of your metadata has a huge impact on the reusability of your research data. It is best practices to use metadata standard and/or an ontology commonly used in your community/research team.

Some of the recognized metadata standards for climate-related domains are:

A recognized metadata convention for climate data

Let’s do a search for the keyword climate in FAIR standards.

  1. Which metadata convention did you find?
  2. Which file formats allow you to include the metadata?
  3. What are other domains that use this convention?

Solution

  1. Climate and Forecast metadata (CF conventions)
  2. NetCDF
  3. Atmospheric science, earth science, natural science, and oceanography

No/incomplete metadata standards in your research team

Imagine you are working in a lab, and you want to use metadata describing processes that produce data. However, the available standards in your research team are not specifically suited for that purpose.

How can you define a relevant metadata scheme? What would you do if there was no standard in your research team?

Sensitive data

You cannot openly publish sensitive data. However, you can always publish rich metadata about your data. Publishing metadata helps you to make clear under which conditions the data can be accessed and how they may be reused.

Key Points

  • From a FAIR perspective, metadata are more important than your data.

  • Metadata are preferably created according to a disciplinary standard.

  • To be FAIR, metadata must have a findable persistent identifier.