9.1k views
1 vote
Students can form groups consisting of three students and send their names to instructor before 3rd January and select one dataset from the datasets provided in the bellow link.

https://www.coursera.org/articles/data-analytics-projects-for-beginners
"10 free public datasets for EDA"
Use only one dataset and analyze the data using Microsoft Excel to discover the structure of data, trends, patterns, or any anomalies in the data based on your own hypothesis. Perform the following tasks. You should use visualization to aid your answer.
The final project report which must incorporate all the following 5 tasks and written using the provided template. (14 marks distributed among the below tasks).
==========================================================
Task 1: Understand and describe the nature and structure of the selected dataset. (3 marks)
A brief description about the dataset.
Identify the features of dataset.
Propose hypothesis / assumptions (between 2 variables) to validate.
Task 2: Reduce the dimension of the datasets to support the hypothesis validation. If necessary, do data preprocessing on any missing values, duplicate values, etc. You can also generate new feature from the any of the provided features that may support your hypothesis. Due to the limitation of processing power of some devices, you can reduce your dataset to 1000 tuples. (3 marks)
Task 3: Provide descriptive statistics for some feature using statistical method to understand the dataset more and answer the following analysis questions :(4 marks)
Compare different attributes (features). What trend did you find?
Include any of the measure of central tendency such as the mean, median, and mode.
Describe the spread of your data. This may include the measure of variance, standard deviation, skewness, and kurtosis.
(You are encouraged to impose other analysis questions based on any trend you notice in the dataset).
Task 4: Validate the hypothesis in Task 3 by investigating the relationship between two quantitative variables you have chosen using correlation, regression and R-squared with possible conclusions. (3 marks)
Task 5: Show visual representation of your analysis (hint: use the right chart/graph for your data analysis). (1 mark)

User Chauncy
by
7.8k points

2 Answers

2 votes

Final answer:

This project involves data analysis using Microsoft Excel and includes tasks such as understanding dataset structure, reducing dimension, providing descriptive statistics, validating hypotheses, and creating visual representations.

Step-by-step explanation:

This project is focused on data analysis using Microsoft Excel and involves various tasks such as understanding the structure of the selected dataset, reducing its dimension, providing descriptive statistics, validating hypotheses, and showing visual representation of the analysis.

Task 1 requires a brief description of the dataset, identifying its features, and proposing hypotheses to validate. Task 2 involves reducing the dimension of the dataset and preprocessing the data. Task 3 requires providing descriptive statistics and answering analysis questions. Task 4 involves investigating the relationship between two quantitative variables using correlation, regression, and R-squared. Task 5 requires visual representation of the analysis using appropriate charts/graphs.

User RootTwo
by
7.8k points
3 votes

Final answer:

The project requires a detailed exploratory data analysis using Microsoft Excel. Key tasks include understanding the dataset, hypothesis testing, descriptive statistics, and data visualization. The goal is to uncover and communicate insights from the data through both statistical evidence and graphical representation.

Step-by-step explanation:

This project involves using Microsoft Excel to conduct exploratory data analysis (EDA) on a selected dataset. With a strong focus on descriptive statistics and data visualization, you'll need to understand the dataset's structure, create hypotheses, reduce dataset dimensions if necessary, and provide descriptive statistics to identify trends, patterns, or anomalies. Evaluating the evidence provided by data sets in relation to specific hypotheses falls under the domain of both descriptive and inferential statistics. Finally, validating the hypothesis using methods like correlation and regression will form the basis of this analysis.

Key Steps and Tips for Your Analysis

  • Task 1: Get acquainted with the dataset. Describe it and identify key features. Formulate at least two hypotheses based on an initial review of the variables present.
  • Task 2: Reduce the dataset's dimensionality if needed. Handle preprocessing such as missing values or duplications, and potentially create new features that support your hypotheses.
  • Task 3: Utilize descriptive statistical methods to understand your data better. This includes calculating measures of central tendency (mean, median, mode) and spread (variance, standard deviation, skewness, kurtosis). Develop new analysis questions as trends are recognized.
  • Task 4: Validate your hypothesis through statistical tests like correlation and regression analysis, being sure to report R-squared values and any conclusions you can draw.
  • Task 5: Choose appropriate charts or graphs to visually represent your analysis and support the conclusions drawn from the data.

Remember, the goals are to understand data trends, validate hypotheses, and clearly communicate findings through both statistical evidence and visual representation.

User Jennifer Zouak
by
8.8k points