External data often remains an untapped resource
With increasing amounts of data available via the Internet or obtained from specialized data providers, external data is becoming more and more relevant. External data complements internal data and helps to improve advanced analysis, optimize business processes (e.g., with geolocation, weather, or traffic data), reduce internal data maintenance efforts, and create new services. However, despite its increasing relevance, external data remain an untapped resource for most companies.
What is external data?
Although most companies have an intuitive understanding of external data, there is no common definition. In practice, external data is often associated with specific debates like "open data" "linked open data" or "data market places". The following definition has been developed by the CC CDQ and reflects the understanding of most companies.
External data refers to any type of data that has been captured, processed, and provided from outside the company.
(Krasikov, Eurich and Legner, 2020)
The four types of external data
Based on a review of current practices, we distinguish four relevant external data types: open data, paid data, shared data, and web data. While all four types have a common feature of stemming from external data sources, they differ in provenance, access, costs, structure, and further dimensions.
- Open data can be defined as data that is freely available and can be used as well as republished by everyone without restrictions from copyright or patents.
- Paid data is commercially available data, acquired directly from specialized data providers (or brokers) and data marketplaces, and offered at a certain cost.
- Shared data refers to the data which is shared between companies within business ecosystems (for instance within the CDQ Data Sharing Community).
- Web data comprises any kind of unstructured and semi-structured data publicly available on the Internet. It includes social media data (e.g., Facebook, LinkedIn, Twitter) and the related metadata (e.g., location, time, language).
How to use external data
External data can also be useful in the following situations:
- Providing data-driven insights: Data analytics can be enhanced with external data in operational areas, like customer relationship management, HR, supply chain and warehousing. For example, a grocer who wants to improve the demand forecast with the help of external data can rely on the weather data, data from suppliers, and economic data
- Improving business processes: many companies already use geolocation, weather and traffic data to plan and manage their deliveries; additional information about exceptional events, such as disasters, can help them avoid disruptions in the supply chain
- Enhancing data management capabilities: sourcing external data reduces data maintenance efforts. It may be also used to enrich internal data and improve data quality
- Enabling new services: external data is also used to innovate and introduce new products and services matching consumers' needs
Reference Process
CC CDQ developed a Reference Process for Sourcing and Managing External Data that comprises six core phases:
(1) initiation, (2) screen, (3) assess, (4) integrate, (5) manage and use, and (6) retire.
Do you have questions about external data?
Our data management experts are happy to help you and answer your questions.