From this data set : transactions data with 2.5 m records Task 1 : Using Sqoop load data in HDFS ( I have done it) Task 2 : From HDFS using pyspark need to create 4 dimension tables and 1 fact tables. Task 3: Those tables should be copied to redshift cluser Task 4: Analyse data using redshift queries (8 queries) At every step there is hint document we need to compare results.
ETL project using pyspark and redshift 2 days left
₹11862 (Avg Bid)
₹11862 Avg Bid