Task:
You get N links to different sites (mostly news-sites and blogs).
You need to create a scrapper that will fetch the title and the first 100 words from the given URL.
Output: an RSS feed with the items having the extracted blogpost title as <title> and the first 100 words and the source URL as <description>
Notice: the blog post/ pages do not have a common post delimiter
Other: use php
I'll choose the script that has the best success rate -- i.e. how many items it fetches successfully from, say, every 100 items.
example: if you get this URL: [login to view URL]
this is what I need as output:
TITLE: For Primaries in Two States, a Variety of Scenarios
DESCRIPTION: New York Times: "It’s almost [login to view URL], not quite. But the Democratic presidential primaries taking place on Tuesday in North Carolina and Indiana have more delegates up for grabs than any of the remaining contests. For political, demographic and mathematical reasons, those states have"...