Web analytics is the collection and analysis of website usage data facilitated through proprietary or open-source tracking software or tools to parse web server logs. Organizations use analytics tools to help assess the impact of their resources on the web, and to make informed decisions about their content or user interfaces. GLAMR institutions commonly use web analytics to track and report quantitative usage data such as digital object views and downloads. In addition to these simple metrics, analytics tools can be used to glean further insights about interactions with objects that may constitute re-use. Referral reports, described below, can reveal valuable clues about how resources have been used in external sites. The use of features for sharing, exporting, or embedding digital objects can be tracked separately from other aspects. Finally, some characteristics about users’ technology can be tracked and parsed for hints about the context for their visits.
Web tracking software can be used to provide data on referral traffic (i.e., the external sites or webpages from which users are clicking on links that lead to your site).. When users follow links to digital resources from external websites, analytics tools record these interactions as “referrals.” Practitioners can use a list of referrers to help determine the context of use or reuse. Strategies can range from using URL patterns as the basis for segmenting different kinds of incoming traffic to actually visiting the referring pages in order to analyze the links in context.
A web analytics referral report thus shares some characteristics with the link analysis data collection method, which uses search indexes or web crawlers to reveal inlinks. A referrer list has a few distinct advantages; it will:
Practitioners might segment top referring URLs into categories based on matching patterns. Meyer et. al. (2009, 113-119) demonstrated a referrer analysis technique to distinguish traffic coming from academic websites, and furthermore mark specific evidence of use within teaching and learning platforms or in academic library resources. Tanner (2012) also noted referral analysis “may prove useful for identifying use of a web site for academic research or in a taught course.” Aery (2015) explored measuring scholarly use of digital collections by isolating any referring URLs that match strings like “syllabus,” “course,” or the names of popular Learning Management Systems (LMSs).
For more advanced referral analysis, one could visit the publicly accessible URLs in the referrer list for closer inspection in order to determine where on the spectrum of use and re-use each case represents. For instance, was the object enhanced, recontextualized, or transformed on the referring page? Does the page show a high level of engagement with the source material? Is the context of use notable, and would reporting on it help demonstrate the value of the digital resources?
Between 2019-2021, Chrome, Firefox, and Safari browsers all implemented new default HTTP referrer policies in the interest of user privacy protection, setting Referrer-Policy to strict-origin-when-cross-origin. This change constrains referral analysis in that a report of referrers will now rarely list the full URL of a referring page–it will only list the site’s hostname (e.g., https://example.edu). Practitioners may still be able to identify specific referring pages for context analysis by searching or browsing the referring sites, though this can be a time-consuming effort.
Web analytics platforms help isolate traffic from social media platforms like Facebook, Twitter, and Pinterest. Links that are shared in–and clicked from–social media will often generate referrer data that can be analyzed much like the referrals from other websites.
While social media links might signal a distinct type of sharing worth measuring, there are however some additional limitations that make social referral analysis challenging. Traffic from mobile or desktop apps for social media (as opposed to websites) will not send a referrer, and thus will not be tallied as social media traffic.
Pageview statistics are limited in that they do not indicate how visitors use a site or what elements they click on during their visit. For more granular, targeted tracking of specific interactions, web analytics software packages support Event Tracking. A practitioner can choose any number of interaction events to track, and define a few custom properties (e.g., a category, action, and label) to record along with each event to keep the data organized.
For assessment purposes, practitioners may consider the elements of their web interface whose interactions signal potential reuse as opposed to use, and collect that event data consistently for reports. For instance, one might track clicks on Share buttons. An interface could also include trackable buttons or menu links to download files directly, or export files on-demand into another format (e.g., .zip or .pdf file).
Digital asset management software may include an “embed” feature that enables a user to copy a snippet of HTML from a digital object page, and paste it into any external website of their choice. This sort of embedding likely recontextualizes an object (e.g., an image, document, or video) by placing an interactive version of it somewhere outside of its host platform.
“Embed code” markup is typically an <iframe>
that references a URL for the embedded view of the object that is different from the main item URL. If possible, practitioners should create a separate web analytics account or property for these URLs to track embed views. This separation can help keep usage data pertaining to the main site more accurate. It can also help in distinguishing use from reuse during analysis.
Web analytics platforms use HTTP referrer information to capture the external website on which an iframe was embedded. Much like with links sending referrals, this data will most often only include the hostname. The individual page where embedding occurred will now rarely be captured by analytics software.
Visitors’ IP addresses are captured by all web analytics platforms, with various configurable options to anonymize or mask bytes in the address to prevent collection of PII (personally identifiable information). Some platforms do not anonymize IPs by default, so practitioners may need to take specific actions to ensure that they are adhering to ethical guidelines for collecting IP and IP-derived statistics.
From the IP address, analytics software packages derive values for other fields, with a caveat that the more anonymized the IP addresses are, the less accurate the derived data will be. One common example is users’ geolocation, viewable in tables by country or city, or in interactive map visualizations. Some platforms also use the IP address to derive information about Internet Service Providers (ISP).
Hughes et. al (2015) conducted an analysis of the top ISPs for users of a digital collection as part of an impact assessment. Researchers were able to use this data to distinguish and quantify the use of the materials from within academic institutions, as well as in local and national government organizations.
Tools covered in this toolkit include:
Practitioners should follow the practices laid out in the “Ethical considerations and guidelines for the assessment of use and reuse of digital content.” The Guidelines are meant both to inform practitioners in their decision-making, and to model for users what they can expect from those who steward digital collections.
Almost all web analytics applications have the capability to capture and collect personal and identifying information, such as IP addresses, locations, query terms, and users’ search patterns. Practitioners must take proactive steps to protect user privacy. More detailed notes can be found in the “additional guidelines for responsible practice” sections of each Web Analytics tool in this toolkit. Several additional resources include:
Hughes, L. M., Ell, P. S., Knight, G. A., & Dobreva, M. (2015). Assessing and measuring impact of a digital collection in the humanities: An analysis of the SPHERE (Stormont Parliamentary Hansards: Embedded in Research and Education) Project. Digital Scholarship in the Humanities, 30(2), 183-198.
Robinson, M. (2009). Promoting the Visibility of Educational Research through an Institutional Repository. Serials Review, 35(3), 133–137.
Aery, S. (2015, June 26). The Elastic Ruler: Measuring Scholarly Use of Digital Collections. Bitstreams: The Digital Collections Blog.
Baughman, S., Roebuck, G., & Arlitsch, K. (2018). Reporting practices of institutional repositories: analysis of responses from two surveys. Journal of Library Administration, 58(1), 65-80.
Chew, B., Rode, J. A., & Sellen, A. (2010, October). Understanding the everyday use of images on the web. In Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries (pp. 102-111).
Ferrini, A., & Mohr, J. J. (2009). Uses, limitations, and trends in web analytics. In Handbook of research on Web log analysis (pp. 124-142). IGI Global.
Fralinger, L., & Bull, J. (2013). Measuring the international usage of US institutional repositories. OCLC Systems & Services: International digital library perspectives.
Hughes, L. M., Ell, P. S., Knight, G. A., & Dobreva, M. (2015). Assessing and measuring impact of a digital collection in the humanities: An analysis of the SPHERE (Stormont Parliamentary Hansards: Embedded in Research and Education) Project. Digital Scholarship in the Humanities, 30(2), 183-198.
Kelly, E. J. (2014). Assessment of digitized library and archives materials: A literature review. Journal of Web Librarianship, 8(4), 384-403.
Meyer, E., Eccles, K., Thelwall, M., & Madsen, C. (2009). Final report to JISC on the usage and impact study of JISC-funded phase 1 digitisation projects and the toolkit for the impact of digitised scholarly resources (TIDSR). Oxford Internet Institute.
Obrien, P., Arlitsch, K., Sterman, L., Mixter, J., Wheeler, J., & Borda, S. (2016). Undercounting file downloads from institutional repositories. Journal of Library Administration, 56(7), 854-874.
Rieger, O. Y. (2009). Search engine use behavior of students and faculty: User perceptions and implications for future research. First Monday.
Szajewski, M. (2013). Szajewski, M. (2013). Using Google Analytics data to expand discovery and use of digital archival content. Practical Technology for Archives, 1(1), 2.
Tanner, S. (2012). Measuring the impact of digital resources: The balanced value impact model (Great Britain) [Report]. King’s College London.
Thelwall, M. (2019). Online indicators for non-standard academic outputs. Springer Handbook of Science and Technology Indicators, 835-856.
Waugh, L., Hamner, J., Klein, J., & Brannon, S. (2015). Evaluating the University of North Texas’ digital collections and institutional repository: An exploratory assessment of stakeholder perceptions and use. The Journal of Academic Librarianship, 41(6), 744-750.
Contributors to this page include Ali Shiri, Joyce Chapman, and Sean Aery.
Aery, S., Chapman, J., Shiri, A. (2023). Web Analytics. Digital Content Reuse Assessment Framework Toolkit (D-CRAFT); Council on Library & Information Resources. https://reuse.diglib.org/toolkit/web-analytics/
Helping digital collections measure impact
Hashtag: #digreuse