I currently have a script that parses the iTunes API and puts the data into ElasticSearch and Cassandra databases. It crawls the RSS feeds twice per day. It checks iTunes for new Podcasts every day as well. Here is an example of an RSS feed that it parses.
[login to view URL]
So there are Podcasts which are like audio shows. And then each Podcast has multiple [login to view URL] other words each Podcast has one RSS feed and each RSS feed shows the episodes for that podcast sorted by newest release date first.
The current developer of the script is not very responsive to making changes. So your job is to
1 - There are some parse errors for some of the podcast rss feeds.
2 - We are missing a lot of podcasts from iTunes. We can get some of those from another websites API.
3 - Setup data for each podcast regarding how often they release new episodes. We can determine their frequency by just looking at the RSS feed and storing the frequency in the database. For example for those that have a frequency of once per day or multiple times per day we should crawl every hour of the day. For those that are once per week we should crawl maybe 4 times per day etc...
I will give you the code so you can understand it and also talk with one of my other engineers who knows how it works also.
The code is written in python. You must also show me expertise in elasticsearch