Web Analytics

Definition

Web analytics is the collection and analysis of website usage data facilitated through proprietary or open-source tracking software or tools to parse web server logs. Organizations use analytics tools to help assess the impact of their resources on the web, and to make informed decisions about their content or user interfaces. GLAMR institutions commonly use web analytics to track and report quantitative usage data such as digital object views and downloads. In addition to these simple metrics, analytics tools can be used to glean further insights about interactions with objects that may constitute re-use. Referral reports, described below, can reveal valuable clues about how resources have been used in external sites. The use of features for sharing, exporting, or embedding digital objects can be tracked separately from other aspects. Finally, some characteristics about users’ technology can be tracked and parsed for hints about the context for their visits.

Applications for assessing digital content use/reuse

Referrals

Web tracking software can be used to provide data on referral traffic (i.e., the external sites or webpages from which users are clicking on links that lead to your site).. When users follow links to digital resources from external websites, analytics tools record these interactions as “referrals.” Practitioners can use a list of referrers to help determine the context of use or reuse. Strategies can range from using URL patterns as the basis for segmenting different kinds of incoming traffic to actually visiting the referring pages in order to analyze the links in context.

A web analytics referral report thus shares some characteristics with the link analysis data collection method, which uses search indexes or web crawlers to reveal inlinks. A referrer list has a few distinct advantages; it will:

include referring sites that are not publicly accessible
indicate how much relative traffic each of the sites has generated
only include external sites linking to digital objects if those links have indeed been clicked

Practitioners might segment top referring URLs into categories based on matching patterns. Meyer et. al. (2009, 113-119) demonstrated a referrer analysis technique to distinguish traffic coming from academic websites, and furthermore mark specific evidence of use within teaching and learning platforms or in academic library resources. Tanner (2012) also noted referral analysis “may prove useful for identifying use of a web site for academic research or in a taught course.” Aery (2015) explored measuring scholarly use of digital collections by isolating any referring URLs that match strings like “syllabus,” “course,” or the names of popular Learning Management Systems (LMSs).

For more advanced referral analysis, one could visit the publicly accessible URLs in the referrer list for closer inspection in order to determine where on the spectrum of use and re-use each case represents. For instance, was the object enhanced, recontextualized, or transformed on the referring page? Does the page show a high level of engagement with the source material? Is the context of use notable, and would reporting on it help demonstrate the value of the digital resources?

Between 2019-2021, Chrome, Firefox, and Safari browsers all implemented new default HTTP referrer policies in the interest of user privacy protection, setting Referrer-Policy to strict-origin-when-cross-origin. This change constrains referral analysis in that a report of referrers will now rarely list the full URL of a referring page–it will only list the site’s hostname (e.g., https://example.edu). Practitioners may still be able to identify specific referring pages for context analysis by searching or browsing the referring sites, though this can be a time-consuming effort.

Social Media

Web analytics platforms help isolate traffic from social media platforms like Facebook, Twitter, and Pinterest. Links that are shared in–and clicked from–social media will often generate referrer data that can be analyzed much like the referrals from other websites.

While social media links might signal a distinct type of sharing worth measuring, there are however some additional limitations that make social referral analysis challenging. Traffic from mobile or desktop apps for social media (as opposed to websites) will not send a referrer, and thus will not be tallied as social media traffic.

Event Tracking

Pageview statistics are limited in that they do not indicate how visitors use a site or what elements they click on during their visit. For more granular, targeted tracking of specific interactions, web analytics software packages support Event Tracking. A practitioner can choose any number of interaction events to track, and define a few custom properties (e.g., a category, action, and label) to record along with each event to keep the data organized.

For assessment purposes, practitioners may consider the elements of their web interface whose interactions signal potential reuse as opposed to use, and collect that event data consistently for reports. For instance, one might track clicks on Share buttons. An interface could also include trackable buttons or menu links to download files directly, or export files on-demand into another format (e.g., .zip or .pdf file).

Embedded Objects

Digital asset management software may include an “embed” feature that enables a user to copy a snippet of HTML from a digital object page, and paste it into any external website of their choice. This sort of embedding likely recontextualizes an object (e.g., an image, document, or video) by placing an interactive version of it somewhere outside of its host platform.

“Embed code” markup is typically an <iframe> that references a URL for the embedded view of the object that is different from the main item URL. If possible, practitioners should create a separate web analytics account or property for these URLs to track embed views. This separation can help keep usage data pertaining to the main site more accurate. It can also help in distinguishing use from reuse during analysis.

Web analytics platforms use HTTP referrer information to capture the external website on which an iframe was embedded. Much like with links sending referrals, this data will most often only include the hostname. The individual page where embedding occurred will now rarely be captured by analytics software.

Internet Service Providers

Visitors’ IP addresses are captured by all web analytics platforms, with various configurable options to anonymize or mask bytes in the address to prevent collection of PII (personally identifiable information). Some platforms do not anonymize IPs by default, so practitioners may need to take specific actions to ensure that they are adhering to ethical guidelines for collecting IP and IP-derived statistics.

From the IP address, analytics software packages derive values for other fields, with a caveat that the more anonymized the IP addresses are, the less accurate the derived data will be. One common example is users’ geolocation, viewable in tables by country or city, or in interactive map visualizations. Some platforms also use the IP address to derive information about Internet Service Providers (ISP).

Hughes et. al (2015) conducted an analysis of the top ISPs for users of a digital collection as part of an impact assessment. Researchers were able to use this data to distinguish and quantify the use of the materials from within academic institutions, as well as in local and national government organizations.

Tools

Tools covered in this toolkit include:

Ethical guidelines

Practitioners should follow the practices laid out in the “Ethical considerations and guidelines for the assessment of use and reuse of digital content.” The Guidelines are meant both to inform practitioners in their decision-making, and to model for users what they can expect from those who steward digital collections.

Additional guidelines for responsible practice

Almost all web analytics applications have the capability to capture and collect personal and identifying information, such as IP addresses, locations, query terms, and users’ search patterns. Practitioners must take proactive steps to protect user privacy. More detailed notes can be found in the “additional guidelines for responsible practice” sections of each Web Analytics tool in this toolkit. Several additional resources include:

Strengths

While some time is associated with set-up (such as tagging and identifying interactions for event tracking), data collection is an automated process and requires little time from staff compared to many other methods.
Web analytics are helpful for tracking precise quantitative data on actions that happen within a site (e.g., pageviews, downloads, exports, and clicks on individual features of an interface).
Tools needed to perform web analytics are usually free and easily integrated into popular digital asset management software. Some analytics packages have tiered service levels, where basic services are free and advanced features are not.

Weaknesses

The ease of running web analytics tools, and the nature of the default configurations of those tools, can lead practitioners to unintentionally collect data in a manner inconsistent with ethical guidelines for protecting user privacy.
Web analytics do not allow practitioners to ask users why they are downloading, exporting or viewing digital objects. The focus is on raw transactional data.
Referral reports have several limitations. For a referral to be captured, two things must be true: 1) the person referencing a digital object must also provide a link to it; 2) someone must actually click the link. Many referrals will come from private sites that cannot be visited for further analysis. Modern browsers also no longer send full referrer paths, making it difficult to identify exact pages within sites that link to or embed resources.
Considerable information in web analytics reports is derived from IP addresses, including visitor counts, geolocation, and service providers. Counts may be inaccurate due to VPNs, dynamic IP assignment, device switching, or other factors. Taking user privacy protection measures to anonymize or mask IPs will further reduce the accuracy of the data.

Learn how practitioners have used this method

Web analytics to evaluate the impact of digital scholarly resources
Google Analytics and Apache web server logs were used to evaluate the use and impact of ‘The Stormont Papers’, a digital collection of the Hansards of the Stormont Northern Irish Parliament from 1921 to 1972. Researchers used several methods from the Toolkit for the Impact of Digitised Scholarly Resources (TIDSR) to assess the context of use, including analysis of referrer sites and Internet Service Providers.

Hughes, L. M., Ell, P. S., Knight, G. A., & Dobreva, M. (2015). Assessing and measuring impact of a digital collection in the humanities: An analysis of the SPHERE (Stormont Parliamentary Hansards: Embedded in Research and Education) Project. Digital Scholarship in the Humanities, 30(2), 183-198.

Web analytics to explore type and level of usage patterns within a research repository
Google Analytics was used to track usage patterns within an institutional repository.

Robinson, M. (2009). Promoting the Visibility of Educational Research through an Institutional Repository. Serials Review, 35(3), 133–137.

Additional resources

Aery, S. (2015, June 26). The Elastic Ruler: Measuring Scholarly Use of Digital Collections. Bitstreams: The Digital Collections Blog.

Baughman, S., Roebuck, G., & Arlitsch, K. (2018). Reporting practices of institutional repositories: analysis of responses from two surveys. Journal of Library Administration, 58(1), 65-80.

Chew, B., Rode, J. A., & Sellen, A. (2010, October). Understanding the everyday use of images on the web. In Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries (pp. 102-111).

Ferrini, A., & Mohr, J. J. (2009). Uses, limitations, and trends in web analytics. In Handbook of research on Web log analysis (pp. 124-142). IGI Global.

Fralinger, L., & Bull, J. (2013). Measuring the international usage of US institutional repositories. OCLC Systems & Services: International digital library perspectives.

Kelly, E. J. (2014). Assessment of digitized library and archives materials: A literature review. Journal of Web Librarianship, 8(4), 384-403.

Meyer, E., Eccles, K., Thelwall, M., & Madsen, C. (2009). Final report to JISC on the usage and impact study of JISC-funded phase 1 digitisation projects and the toolkit for the impact of digitised scholarly resources (TIDSR). Oxford Internet Institute.

Obrien, P., Arlitsch, K., Sterman, L., Mixter, J., Wheeler, J., & Borda, S. (2016). Undercounting file downloads from institutional repositories. Journal of Library Administration, 56(7), 854-874.

Rieger, O. Y. (2009). Search engine use behavior of students and faculty: User perceptions and implications for future research. First Monday.

Szajewski, M. (2013). Szajewski, M. (2013). Using Google Analytics data to expand discovery and use of digital archival content. Practical Technology for Archives, 1(1), 2.

Tanner, S. (2012). Measuring the impact of digital resources: The balanced value impact model (Great Britain) [Report]. King’s College London.

Thelwall, M. (2019). Online indicators for non-standard academic outputs. Springer Handbook of Science and Technology Indicators, 835-856.

Waugh, L., Hamner, J., Klein, J., & Brannon, S. (2015). Evaluating the University of North Texas’ digital collections and institutional repository: An exploratory assessment of stakeholder perceptions and use. The Journal of Academic Librarianship, 41(6), 744-750.

Contributors

Contributors to this page include Ali Shiri, Joyce Chapman, and Sean Aery.

Cite this page

Aery, S., Chapman, J., Shiri, A. (2023). Web Analytics. Digital Content Reuse Assessment Framework Toolkit (D-CRAFT); Council on Library & Information Resources. https://reuse.diglib.org/toolkit/web-analytics/

Web analytics tutorial