144k views
1 vote
The data preparation phase of CRISP-DM may include:

A. visualizing the data
B. creating new records/rows
C. aggregating data to a high level (e.g. daily sales instead of individual transactions)
D. calculating bivariate statistics to validate the visualizations

1 Answer

2 votes

Final answer:

During the data preparation phase of CRISP-DM, tasks include visualizing the data, creating new records, aggregating data to a higher level, and calculating bivariate statistics to confirm visualization insights.

Step-by-step explanation:

The data preparation phase of CRISP-DM includes various practices essential for analyzing and interpreting data. CRISP-DM stands for Cross-Industry Standard Process for Data Mining, which is a process model that describes common approaches used by data mining experts. It is a robust and well-understood methodology. The main tasks in the data preparation stage may include the following:

  • Visualizing the data: To understand and detect patterns, outliers, and the general distribution of data.
  • Creating new records/rows: This might involve data imputation where missing values are added based on certain assumptions or procedures.
  • Aggregating data to a higher level: For example, summarizing individual transactions to daily sales to simplify analysis and reduce the complexity of data.
  • Calculating bivariate statistics to validate the visualizations: This helps in understanding the relationship between two variables and thus confirming the patterns and trends observed during the visualization stage.

Descriptive statistics are used for summarizing and organizing such data, which can include measures of central tendency like the mean or median, and measures of variation like the standard deviation or range. Inferential statistics may be employed to draw conclusions from the data, which involves using probability to assess the reliability of the conclusions.

User Miguel Guardo
by
8.2k points