Final answer:
To answer these questions, you can use Apache Spark, a popular big data processing framework. Write a Spark program to find the location with the most crime in New York and the total number of crimes reported there, as well as to identify the top 3 crimes reported in July and the number of crimes of type DANGEROUS WEAPONS in July.
Step-by-step explanation:
To answer these questions, you can use Apache Spark, a popular big data processing framework. Here's how you can write a Spark program to answer each question:
a) Where is most of the crime happening in New York? And what is the total number of crimes reported in that location?
You would need a dataset with information about crime locations in New York. Load the dataset into Spark, group the data by location, and then find the location with the highest number of crimes. Additionally, sum up the total number of crimes reported.
b) What are the top 3 crimes (use OFNS_DESC) that were reported in the month of July (use RPT_DT)?
Filter the dataset to include only crimes reported in July. Group the data by crime description (OFNS_DESC) and count the occurrences of each crime. Sort the results in descending order and take the top 3 crime descriptions.
c) How many crimes of type DANGEROUS WEAPONS were reported in the month of July?
Filter the dataset to include only crimes reported in July and with the type DANGEROUS WEAPONS. Count the number of occurrences of this specific type of crime.