Question

What's the best way to extract data from a website and save it to a spreadsheet?

Mentioned
#Clay
Share
mapi's avatar
almost 3 years ago

Google Sheets import functions definitely needed a revamp so I've made ImportFromWeb and ImportJSON
ImportFromWeb is a powerful scraper in a G Sheets function. It even loads JS rendered pages:
https://nodatanobusiness.com/importfromweb/

4 points
maguay's avatar
@maguay (replying to @mapi )
almost 3 years ago

Neat, thanks for sharing!

1 point
maguay's avatar
almost 3 years ago

I don't have a perfect suggestion to share here, but a few options that could work:

  • Google Sheets has some built-in scrapping tools, including the importXML and importHTML functions where you can specify which elements from the page you want to import. That'd work with public data, but wouldn't work with stuff behind an account/paywall.
  • Data Scraper via data-miner.io has a Chrome extension, so you could just run it on specific pages where you want to parse data, including stuff that requires a login first.
  • Apify is a bit more advanced—you can turn any page into an API that automatically pushes new results when a page updates. Likely beyond the scope needed here, but the pre-made scrapers there can be useful in copying data from popular websites.
  • Diffbot's Analyze API can also extract data from sites automatically, and the testing tool on their site would let you normalize data from an individual site (albeit public, non-account/paywall page) and then copy the resulting table and paste it into your spreadsheet.
3 points
Slicemetrics's avatar
almost 3 years ago

My fav is Instant Data Scraper. It's a chrome extension that can automatically parse any tabular data format and export it in CSV format. Link - https://chrome.google.com/webstore/detail/instant-data-scraper/ofaokhiedipichpaobibbnahnkdoiiah

3 points
Sergioorlz's avatar
@Sergioorlz (replying to @Slicemetrics )
2 years ago

I don't know you, you don't know me, but I love ya~
You saved me hours and hours of work and tears.
Thanks a lot <3

1 point
peterahn's avatar
almost 3 years ago

I would give Clay.run a shot! I've enjoyed working with their team and am happy to provide an intro.

2 points
maguay's avatar
@maguay (replying to @peterahn )
almost 3 years ago

Interesting, I didn't realize Clay had a data extraction tool! Can you have it watch for text changes on any site? If so, that would essentially let you turn any site into an API for automation...

1 point
peterahn's avatar
@peterahn (replying to @maguay )
almost 3 years ago

Yes they have a chrome extension that easily allows you to scrape data from any website and import it into Clay bases. I haven't seen a way for them to watch for text changes on any site but perhaps we should get their CEO, Kareem, in here to let us know if that's on the roadmap :)

1 point
maguay's avatar
@maguay (replying to @peterahn )
almost 3 years ago

Ahhh gottcha, that'd work for one-time processes, but wouldn't work for automatically parsing sites and checking back to find changes over time.

You should invite them!

1 point
AnujAdhiya's avatar
almost 3 years ago

Haven't done scraping for a bit but used Octoparse in the past and it worked like a champ.

1 point
navpar's avatar
2 years ago

Some ideas are listed here: https://buildastack.com/product-category/personal-productivity/web-scrapper/ (Disclaimer: Build a Stack is my website)

Personally, I have used Listly and found it does the job well.

1 point
What are the best platforms for community management?

We have 15k newsletter subscribers, and have around ~2k of them in a Slack group. We're starting to encounter issues in terms of community management - specifically, it's hard to pin content like c...

Any suggestions for a workaround to an Outlook calendar not syncing with Google Calendars?

Google lets you subscribe to a calendar using a URL - although when using an Outlook 365 Calendar link, events are copied over once, and then the syncing stops. This seems to be a relatively new is...

The community for Clay  power users.