What discipline does data sourcing belong to?

Here is ChatGPT’s answer on March 28, 2023 (ChatGPT Mar 14 version, https://chat.openai.com/chat):

Data sourcing is typically considered part of the broader field of data management. Data management encompasses all aspects of the data lifecycle, including data acquisition, storage, processing, analysis, and visualization.

Data sourcing specifically involves the identification, collection, and integration of data from various sources. This process often requires a deep understanding of the business or research objectives and the types of data needed to achieve them.

Data sourcing may also involve data cleansing, normalization, and transformation to ensure that the data is suitable for analysis. These activities may involve expertise in fields such as data engineering, data governance, and data quality management.

In summary, while data sourcing is not a discipline on its own, it is a key component of data management and requires expertise across various related fields.

I strongly agree. Great job ChatGPT!

I particularly like the summary sentence. I would even say that data sourcing is a pilar of data management because without any data being fed from outside the organization, opportunities to perform data analyzes or to feed business processes are limited.

What do YOU think? Leave a comment below or write to eric@databefore.fr.

What are the challenges of sourcing data?

I asked ChatGPT on March 28, 2023 (ChatGPT Mar 14 version, https://chat.openai.com/chat)

Hello! Sourcing data can be a complex process, and there are several challenges that one may encounter. Here are some of the common challenges associated with sourcing data:

  1. Data quality: Data quality is one of the biggest challenges in sourcing data. Data may be incomplete, outdated, inaccurate, or inconsistent, which can lead to incorrect analysis and insights. Ensuring the quality of the data is critical to the success of any analysis.
  2. Data availability: Some data may not be available or may be difficult to obtain. This could be due to data ownership, licensing, privacy concerns, or technical limitations. Data may also be scattered across different sources and formats, making it difficult to integrate and analyze.
  3. Data relevance: The relevance of data to the research or analysis being performed is essential. Sourcing data that is not relevant to the analysis can lead to incorrect conclusions or wasted resources.
  4. Data bias: Data can be biased in various ways, including selection bias, measurement bias, and reporting bias. It’s essential to be aware of any biases in the data and address them appropriately.
  5. Data security: Data security is a critical concern, particularly when dealing with sensitive information. Protecting data from unauthorized access and ensuring its confidentiality is essential.

Overall, sourcing data can be a complex process that requires careful consideration of these challenges and appropriate strategies to address them.

Sounds good to me. Well done ChatGPT!

I would put even more emphasis on Data availability because opportunities for data producers to make their data easier to consume are strongly supported and encouraged by the rapid development of data exchange platforms and options to monetize data. Consequently, data sourcing is not only about finding the most suitable data but also about grabing the best deals, ranging from free open data to off-the-shelf data products to customized third-party data solutions. This means that data sourcing must have a strong data procurement dimension in addition to data acquisition.

What do YOU think? Leave a comment below or write to eric@databefore.fr.