Answer:
1. collation
2. reducing data
3. False
4. central tendency
5. none of the above
6. purging
7. sampling
8. inconsistent data
9. attribute reduction
10. operators
11. Select Attributes
12. Filter Examples
13. True
14. False
15. True
16. data frame
17. columns 4 through 9 for the first 100 employee records
18. sample
19. attach()
Step-by-step explanation:
Combining data from two or more relational database tables is an example of:
- collation (None of the options provided is correct. The correct term for combining data from multiple tables is "joining" or "joining tables".)
The following statement is NOT an example of data scrubbing as described in the text:
- reducing data (Data scrubbing typically refers to the process of cleaning and correcting inconsistent or erroneous data, handling missing values, and resolving data quality issues. Reducing data is not specifically a part of data scrubbing.)
True or false: Having zeroes in the data is not the same as having missing data.
- False (Having zeroes in the data is not the same as having missing data. Zero is a valid value, while missing data refers to the absence of a value for a particular observation.)
Descriptive statistics that give insight into norms in a data set are called measures of:
- central tendency
Missing values in a data set mean that:
- none of the above (Missing values in a data set simply indicate that certain observations or attributes do not have a recorded value. It does not necessarily imply errors, unusability, or the presence of outliers.)
Removing records that contain missing or inconsistent data from a data set before analysis is an example of:
- purging
Selecting some subset of records from a data set is called:
- sampling
A value of "middle-aged" in an attribute that otherwise contains people's ages in number of years would be an example of:
- inconsistent data
Removing columns from a data set because they are not useful for a certain type of data analysis is an example of:
- attribute reduction
Data analysis processes in RapidMiner are built using rectangular building blocks called:
- operators
To remove unwanted attributes from a data set in RapidMiner, use the Select Attributes operator.
To remove unwanted observations from a data set in RapidMiner, use the Filter Examples operator.
True or false: In RapidMiner, data can be either imported or read into the software from CSV, text, and spreadsheet files.
- True
True or false: RapidMiner requires all data attributes to have either a data type or a role, but not both.
- False (In RapidMiner, attributes can have both a data type and a role assigned to them.)
True or false: In R, it is possible to read a data set directly from a text file located on disk, on a file server, or on a web server.
- True
When a data set is imported into R, it is stored in an object called a:
- data frame
In R, assuming the existence of a data frame called Employees, the command Employees [1:100, 4:9] would show:
- columns 4 through 9 for the first 100 employee records
The R command for randomly retrieving some number of rows from an imported data set is:
- sample
To avoid having to retype the name of a data frame when referring to an attribute in a data set in R, use the attach() command.