SocSciBot

Basic information

How to use this tool for use/reuse assessment

SocSciBot is a free Windows application that crawls a website specified by a practitioner and extracts hyperlinks from it. The program outputs the results in a variety of ways, such as a list of page and link counts, a list of all external links, and a list of directory, domain, and file interlinking. The program allows a practitioner to review output data in a text file, in an Excel spreadsheet, or in a network visualization. This tool can help practitioners identify and quantify the variety of hyperlinks contained on websites, including links to digital objects. 

Ethical guidelines

Practitioners should follow the practices laid out in the “Ethical considerations and guidelines for the assessment of use and reuse of digital content.” The Guidelines are meant both to inform practitioners in their decision-making, and to model for users what they can expect from those who steward digital collections.

Additional guidelines for responsible practice

It is the responsibility of the practitioner not to overload the target web servers by crawling them too frequently. SocSciBot asks practitioners not to crawl web sites of organizations that may not be able to afford the additional bandwidth, taking into particular account the limited bandwidth and funding in poorer nations. SocSciBot requests that the practitioner read the article “Web crawling ethics revisited: Cost, privacy and denial of service” before making use of the product in order to better understand the ethical issues for crawling.

Practitioners must enter their email address into the program when requested and may receive emails regarding complaints about their web crawling activities through the tool. SocSciBot emails the webmasters of each site that a practitioner crawls to make them aware of the activity, and the organizations may contact the practitioner and request that they cease crawling activities. SocSciBot asks that practitioners honor any such requests.

Strengths

  • SocSciBot is designed to extract hyperlinks from specific web pages. Practitioners would benefit from using the tool when they wanted to collect URLs, including digital object URLs,  from specific web pages. If a practitioner is interested in searching more broadly for instances of linked digital objects or is unsure of which specific web pages to crawl, they should use Webometric Analyst instead. 

  • Like Webometric Analyst, this application is free.

Weaknesses

  • SocSciBot can be time intensive. It is capable of crawling sites with up to 15,000 pages but has speed restrictions. SocSciBot documentation recommends that practitioners use Webometric Analyst if more than 10 websites need to be queried. Documentation suggests that practitioners start first with Webometric Analyst and then use SocSciBot to get granular hyperlink data on a specified set of websites.

     

  • There are documented issues when querying certain types of web pages, including pages that contain these types of links: JavaScript, Java, Flash, and non-ASCII URLs.

  • SocSciBot developers do not provide technical support and existing documentation may not be updated regularly.

Used for these methods

Alternative tools

Skip to content