195k views
3 votes
100 POINTS!

Project: Big Data Programming - Section 2
Finding and Analyzing Your Data


a temperature map of the US

A temperature map of the US (Courtesy of the National Weather Service)

You need a large data set. If you are interested in weather data, try these search prompts. By adding “site:.gov” to your search, you are more likely to find government websites. Be careful in your search to use a trusted and reliable website. You do not want to download a virus along with your data!

climate at a glance site:.gov
statewide time series site:.gov
Examine Your Data
Once you have downloaded data, you will probably need to delete some of the top lines before you read the file. For instance, the following are the top few lines from a file that holds the average February temperature for 126 years. The data lines have three entries: the date, the average February temperature in degrees Fahrenheit, and the departure from the mean February temperature of 33.82 °F. The date is a string composed of the year and month. Since every month is February, all the date strings end in “02.”

Think of what will happen when you read the data in the file. Most of the rows are structured, but the first five rows have identifying information. Be sure you remove such rows from your data file before you process it.

​Contiguous U.S., Average Temperature, February
Units: Degrees Fahrenheit
Base Period: 1901-2000
Missing: -99
Date,Value,Anomaly
189502,26.60,-7.22
189602,35.04,1.22
189702,33.39,-0.43
This is how this file should start.

​189502,26.60,-7.22
189602,35.04,1.22
189702,33.39,-0.43
Be sure to check your file for the leading lines you need to delete.

Your Task
Now that you have your file holding your data, you need to analyze the data in three different ways that answer questions you pose. How you analyze is up to you, since what you analyze depends on what kind of data you have. As an example, with this data file, you can look for weather trends. You could find the average temperature of each decade, find the decade with the lowest average temperature, and the decade with the highest average temperature. It is a shame that the data table does not go back further. The Krakatoa volcano in Indonesia had a major eruption in 1816. It had such an epic effect on the climate that 1813 was known as the year without a summer.

You need your data file saved in the same folder as your program.

Open your data file with Notepad or Wordpad.
Open a new file in Python.
Copy and paste the contents from Notepad to the Python file.
Save the Python file with a .txt extension in the same folder where you save your program.
Analyzing Your Data
Your program will read your data file, perform the analysis, and write the results to a separate file with a .txt extension.

Write a pseudocode plan for your program. Show your plan to a partner. Ask the partner for any suggestions to improve your plan.

When done, show your results to a partner. Ask your partner what parts they found interesting.

1 Answer

3 votes

Answer:

It seems that you are looking for guidance on how to analyze a data set in Python. Here are some steps that you can follow to begin analyzing your data:

Import any necessary libraries or modules in Python. For example, you may want to use the pandas library to help you manipulate and analyze your data.

Read in your data file using a function like `pandas.read_csv()`. This will create a Pandas dataframe containing your data.

Use functions and methods provided by the pandas library (or any other libraries you are using) to perform your analysis. For example, you could use the `mean()` function to calculate the average temperature for each decade, or the `max()` function to find the decade with the highest average temperature.

Use the `write()` function to write the results of your analysis to a new text file.

If necessary, you can also use visualization libraries like Matplotlib or Seaborn to create graphs or plots to help you visualize your data and better understand the trends and patterns in your data.

Step-by-step explanation:

User Jon Gilkison
by
7.3k points