42.9k views
5 votes
According to Donoho, which of the following is NOT true?

1) Doing data science requires that the data you're working with are 'big data.'
2) Being a data scientist requires coping skills for dealing with the pains of large-scale computing.
3) Many statisticians argue that they already are data scientists and question the need for a new field of data science.
4) Data science should be a true science that uses scientific rigor to answer interesting questions.
5) Often the technologies taught in university are different than those used by the data scientist in their job.

1 Answer

2 votes

Final answer:

David Donoho clarifies that data science does not exclusively require big data, and that the field should be practice scientific rigor. As part of a data scientist's role, it's important to justify data selections and effectively communicate findings, illustrating that data science skills are applicable to a range of dataset sizes and scenarios.

Step-by-step explanation:

According to David Donoho, the statement that 'doing data science requires that the data you're working with are 'big data'' is not true. He emphasizes that while data science often deals with large datasets, the core competencies of a data scientist can apply to both large and small datasets. Moreover, the skills of a data scientist involve not just handling big data, but also being able to analyze, interpret, and draw conclusions from various data sources, often using statistical methods. In addition, Donoho acknowledges that many statisticians see themselves as data scientists and may question the emergence of data science as a separate field. He also agrees that the technologies taught at universities may differ from those used in the industry, pointing to an evolving landscape where practical, on-the-job skills are highly valued. Finally, Donoho argues for data science to be grounded in scientific rigor and to be approached as a true science in answering meaningful questions.

It's essential for data scientists to communicate their findings effectively, as highlighted in educational objectives like '4.1: The student can justify the selection of the kind of data needed to answer a particular scientific question'. This exercise of justifying data selection reinforces the principle that correlation does not imply causation and emphasizes the importance of scientific communication.

Indeed, data science includes coping with the challenges of large-scale computing, but it is a misconception that it solely revolves around big data, as any dataset that provides the necessary evidence to support a hypothesis can be valuable, regardless of its size.

User Maarten Wolfsen
by
7.9k points