Work on Hortonworks sandbox please
Implement a simple Spark application using Spark Core (not with Spark SQL) to get <state,total_sales> for (year, month, day and hour) granularities.
Two data sets in HDFS are given below.
Customer information <customer_id,name,street,city,state,zip>
Sales information <timestamp,customer_id,sales_price>
You can consider input/output data set in any one of the below format
Text(with any DELIMITER)
Consider timestamp in epoch (for example 1500674713)
Consider all possible cases of datasets like number of states are small(finitely known set) OR huge(unknown set) and come with appropriate solutions.
Use any of PYTHON/SCALA/JAVA API’s of your choice