SocSciBot is a free Windows application that crawls a website specified by a practitioner and extracts hyperlinks from it. The program outputs the results in a variety of ways, such as a list of page and link counts, a list of all external links, and a list of directory, domain, and file interlinking. The program allows a practitioner to review output data in a text file, in an Excel spreadsheet, or in a network visualization. This tool can help practitioners identify and quantify the variety of hyperlinks contained on websites, including links to digital objects.
Practitioners should follow the practices laid out in the “Ethical considerations and guidelines for the assessment of use and reuse of digital content.” The Guidelines are meant both to inform practitioners in their decision-making, and to model for users what they can expect from those who steward digital collections.
It is the responsibility of the practitioner not to overload the target web servers by crawling them too frequently. SocSciBot asks practitioners not to crawl web sites of organizations that may not be able to afford the additional bandwidth, taking into particular account the limited bandwidth and funding in poorer nations. SocSciBot requests that the practitioner read the article “Web crawling ethics revisited: Cost, privacy and denial of service” before making use of the product in order to better understand the ethical issues for crawling.
Practitioners must enter their email address into the program when requested and may receive emails regarding complaints about their web crawling activities through the tool. SocSciBot emails the webmasters of each site that a practitioner crawls to make them aware of the activity, and the organizations may contact the practitioner and request that they cease crawling activities. SocSciBot asks that practitioners honor any such requests.
Helping digital collections measure impact
Hashtag: #digreuse