Frequently Asked Questions
Here are some answers to some commonly asked questions
Common questions
If you need any help - just get in touch
- Dave approaches us through the contact form on our website. He advises that he would like to process a competitors website's eCommerce store - recording the following information for each product. There are roughly 20,000 products to be crawled. The data needs to be updated each fortnight for competitive research.
- Category
- Name
- URL
- Price
- We clarify a few of Dave's queries.
- We send Dave a quote, which he accepts.
- Dave signs up to our billing management system. This system is used for billing, support and secure data access. He pays the deposit.
- We start work on Dave's project, and finish within three days.
- Dave receives an email with a secure link - allowing him to download the data in CSV format.
- We setup his custom software solution to run on on our servers each fortnight - delivering him a new email link each time.
Most of our scrapers are written using golang, which is our preferred language for web scraping. It's a compiled language which provides huge performance benefits over traditional web scraping tools (such as python's BeautifulSoup). We utilize existing packages as well as our own to create an efficient scraper, which fits your requirements.
Once we've unit tested the golang code, we deploy cloud server instances to run the scraping work. The amount of server instances we deploy depends on the amount of work required (quantity of data to be collected, complexity of websites to be crawled, etc). Often we rotate through proxies, and use threading (go routines) to control the rate at which we process data.
This data is then stored in a database, usually postgres or mysql.
Once complete, depending on the requirements - the database is analyzed and processed into the format requirements of the customer. Often this will be a simple CSV file, which may be shared via FTP, AWS s3, Google cloud, or directly uploaded to a clients server via rsync, scp or similar.