Question

What's the best way to extract data from a website and save it to a spreadsheet?

Mentioned
#Clay
Share
mapi's avatar
2 years ago

Google Sheets import functions definitely needed a revamp so I've made ImportFromWeb and ImportJSON
ImportFromWeb is a powerful scraper in a G Sheets function. It even loads JS rendered pages:
https://nodatanobusiness.com/importfromweb/

4 points
maguay's avatar
@maguay (replying to @mapi )
2 years ago

Neat, thanks for sharing!

1 point
maguay's avatar
2 years ago

I don't have a perfect suggestion to share here, but a few options that could work:

  • Google Sheets has some built-in scrapping tools, including the importXML and importHTML functions where you can specify which elements from the page you want to import. That'd work with public data, but wouldn't work with stuff behind an account/paywall.
  • Data Scraper via data-miner.io has a Chrome extension, so you could just run it on specific pages where you want to parse data, including stuff that requires a login first.
  • Apify is a bit more advanced—you can turn any page into an API that automatically pushes new results when a page updates. Likely beyond the scope needed here, but the pre-made scrapers there can be useful in copying data from popular websites.
  • Diffbot's Analyze API can also extract data from sites automatically, and the testing tool on their site would let you normalize data from an individual site (albeit public, non-account/paywall page) and then copy the resulting table and paste it into your spreadsheet.
3 points
Slicemetrics's avatar
2 years ago

My fav is Instant Data Scraper. It's a chrome extension that can automatically parse any tabular data format and export it in CSV format. Link - https://chrome.google.com/webstore/detail/instant-data-scraper/ofaokhiedipichpaobibbnahnkdoiiah

3 points
Sergioorlz's avatar
@Sergioorlz (replying to @Slicemetrics )
almost 2 years ago

I don't know you, you don't know me, but I love ya~
You saved me hours and hours of work and tears.
Thanks a lot <3

1 point
peterahn's avatar
2 years ago

I would give Clay.run a shot! I've enjoyed working with their team and am happy to provide an intro.

2 points
maguay's avatar
@maguay (replying to @peterahn )
2 years ago

Interesting, I didn't realize Clay had a data extraction tool! Can you have it watch for text changes on any site? If so, that would essentially let you turn any site into an API for automation...

1 point
peterahn's avatar
@peterahn (replying to @maguay )
2 years ago

Yes they have a chrome extension that easily allows you to scrape data from any website and import it into Clay bases. I haven't seen a way for them to watch for text changes on any site but perhaps we should get their CEO, Kareem, in here to let us know if that's on the roadmap :)

1 point
maguay's avatar
@maguay (replying to @peterahn )
2 years ago

Ahhh gottcha, that'd work for one-time processes, but wouldn't work for automatically parsing sites and checking back to find changes over time.

You should invite them!

1 point
AnujAdhiya's avatar
2 years ago

Haven't done scraping for a bit but used Octoparse in the past and it worked like a champ.

1 point
navpar's avatar
almost 2 years ago

Some ideas are listed here: https://buildastack.com/product-category/personal-productivity/web-scrapper/ (Disclaimer: Build a Stack is my website)

Personally, I have used Listly and found it does the job well.

1 point
What's the best video conferencing app for internal discussions?

Three major considerations I have been using to evaluate the plethora of options available: 1. Effortless/non-intrusive: It shouldn't feel like a video call 2. Price: As this app would be complime...

Confluence alternatives for wiki/knowledge bases?

I've been looking to try something new for knowledge/documentation storage for a little while now. Confluence has always been there, but I find it to be relativly limited for the cost and additiona...

How do you manage your chat inbox?

Hey guys, first post here. As part of my work, I have to deal with and respond to a lot of incoming messages from different chats: Linkedin/WhatsApp/Signal/IG. I try to use Unreads/Archive features...

The community for Clay  power users.