Metadata
Overview
Teaching: 15 min
Exercises: 15 minQuestions
What are metadata?
What is an ontology or controlled vocabulary?
What are metadata standards?
Objectives
Understand metadata and their role in FAIR data
Recognize different types of metadata
Choose an appropriate metadata standard for your data
Metadata
Metadata are data about data. In other words, metadata is the underlying definition or description of data.
For example, author name
, date created
, date modified
and file size
are examples of very basic metadata for a file.
Metadata make finding and working with the data easier. Therefore, they are essential components in making your data FAIR. One could argue that metadata are more important than your data. Without metadata, the data would be just numbers. But, without the original data, the metadata can still be useful to track people, institutions or publications associated with the original research. From a FAIR perspective, metadata would always be openly available.
Metadata can be created manually, or automatically generated by the software or equipment used and preferably according to a disciplinary standard. While data documentation is meant to be read and understood by humans, metadata are primarily meant to be processed by machines. There is no FAIR data without machine-actionable metadata.
FAIR principles about metadata
Let’s have a look at FAIR principles.
- Which principles focus on metadata?
Solution
Because metadata are data about data, all of the principles i.e. Findable, Accessible, Interoperable and Reusable apply to metadata.
Three types of metadata
We focus on three main types of metadata:
-
Administrative metadata helps manage a resource or a project and indicates when and how the data were created. For example, the project/resource owner, principal investigator, project collaborators, funder, project period, permissions, etc. They are usually assigned to the data before you collect or create them.
-
Descriptive or citation metadata help to discover and identify data. A very good example of these are keywords, which are often added to data or publications with the only purpose to make them more findable (i.e. with a search engine). Other examples are the authors, title, abstract, keywords, persistent identifier, related publications, etc.
-
Structural metadata describe how a dataset or resource came about, but also how it is internally structured. They address the ‘I’ and ‘R’ in FAIR. For example, measurement units, data collection method, sampling procedure, sample size, categories, variables, etc. Structural metadata have to be created according to best practices in a research community and will be published together with the data.
Keep your metadata up-to-date!
- Descriptive and structural metadata should be added continuously throughout the project.
- Different types of metadata apply not only to a database, but also to individual sets of data, e.g. images/plots, tables, files, etc.
Types of metadata for geospatial data in your community/research team
Here are some questions about the use case you chose in the introduction of this tutorial, here.
- What is the type of metadata in your use case?
- What information are described by that metadata?
Ontology
An ontology (or controlled vocabulary) is a standard definition of key concepts in your community/research team and focuses on how those concepts are related to one another. A controlled vocabulary is a set of terms that you have to pick from. Using an ontology:
- helps others to understand the structure and content of your data,
- makes your data findable, interoperable and reusable.
A controlled vocabulary for climate and forecast data
In climate-related domains, many variables depend on the type of surface. How can you specify the surface type in the metadata?
Solution
Climate and Forecast metadata (CF conventions) maintains a vocabulary specifically for specifying surface and area types. The vocabulary is available on the CF site as the Area Type Table.
Metadata standards
A metadata standard (or convention) is a subject-specific guide to your data. Rules on what content must be included, which syntax should be used, or a controlled vocabulary are included in a metadata standard. The quality of your metadata has a huge impact on the reusability of your research data. It is best practices to use metadata standard and/or an ontology commonly used in your community/research team.
Some of the recognized metadata standards for climate-related domains are:
- Climate and Forecast metadata (CF conventions)
- World Meteorological Organization Core Metadata Profile (WMO-CMP)
- Generic Earth Observation Metadata Standard (GEOMS)
- Cooperative Ocean-Atmosphere Research Data Service Conventions (COARDS)
- Water Markup Language (WaterML)
- Shoreline Metadata Profile of the Content Standards for Digital Geospatial Metadata (SMP-CSDGM)
A recognized metadata convention for climate data
Let’s do a search for the keyword
climate
in FAIR standards.
- Which metadata convention did you find?
- Which file formats allow you to include the metadata?
- What are other domains that use this convention?
Solution
- Climate and Forecast metadata (CF conventions)
- NetCDF
- Atmospheric science, earth science, natural science, and oceanography
No/incomplete metadata standards in your research team
Imagine you are working in a lab, and you want to use metadata describing processes that produce data. However, the available standards in your research team are not specifically suited for that purpose.
How can you define a relevant metadata scheme? What would you do if there was no standard in your research team?
Sensitive data
You cannot openly publish sensitive data. However, you can always publish rich metadata about your data. Publishing metadata helps you to make clear under which conditions the data can be accessed and how they may be reused.
Key Points
From a FAIR perspective, metadata are more important than your data.
Metadata are preferably created according to a disciplinary standard.
To be FAIR, metadata must have a findable persistent identifier.