Linking evolved: The future of online research

Share this on social media:

Issue: 

When information spread across a variety of sources is linked together, there are advantages for researchers and publishers alike. Amy Brand and Kristen Fisher explore recent shifts in linking that promise even more benefits.

From its origins in the late 1980s, the essence of the World Wide Web has been its ability to link together seemingly disparate sources of information in intuitive ways. Since then, linking has spread to other Internet-based publishing technologies. The result, of course, has been a profound change, not only in the information and publishing industries, but also in the process of doing research itself. This article looks forward to some of the major shifts in linking that promise new benefits for researchers - as both producers and consumers of knowledge - and important new opportunities for publishers.

The key shift is from direct, static, provider-driven linking to a more intelligent, comprehensive, dynamic, user-centric, scholarly Internet. In this free-flow paradigm, research moves by associations, rather than by following the highly structured hunt-and-gather methods of vertical search engines. Navigation will flow from idea to idea, rather than from search to search, in a closer approximation of how we actually think and communicate ideas. The more intelligence we can build into our technology, the more flexible that technology will allow our information quests to be.

We will increasingly use standard searching merely to kick off the query process, but use dynamic and associative linking technologies to flow from one relevant source to the next. New search and linking technologies will ultimately combine, to allow us to drill up or down to target content by presenting simultaneous views of link/search options, on the one hand, and a range of approximate query matches and conceptually related resources on the other. In what is arguably a better fit with how the human mind works, online research becomes a process of recognising, as opposed to recalling, precisely the resources we seek or otherwise find relevant when we happen to upon them.

Linking: Where we are now

Within the realm of scholarly work, the first logical step in linking was to connect cited references to the electronic versions of those articles, whether in abstract or full text form. ISI's science citation linking methods, and the newer CrossRef initiative, have made it clear how powerful this type of linking could be.

The current system of Internet addressing made widespread linking possible several years ago. Yet cross-publisher linking remained a somewhat onerous and error-prone endeavour for scholarly publishers until CrossRef was formed in early 2000. Before CrossRef, publishers seeking to link to one another had to sign numerous bilateral contracts and implement several publisher-specific linking schemes. Any change in the address of a linked item meant that previously published pointers to that content became obsolete.

With CrossRef, publishers gained both a technology and a business infrastructure for persistent linking with other publishers. On the business side, the publisher (or intermediary) signs one agreement with CrossRef and acquires the right to link to all other participating publishers. CrossRef's current membership includes 184 publishers - meaning that close to 17,000 bilateral agreements would have otherwise been needed to create the same interlinking network.

On the technology side, CrossRef prevents broken links via use of the Digital Object Identifier, or DOI. A DOI is an alphanumeric name (e.g., 'doi:10.1101/gr.10.12.1841') for digital content, such as a book, journal article, chapter, or image. The DOI is paired with the object's electronic address, or URL, in an updateable central directory, and is published in place of the URL in order to allow the content to move without the link itself changing. To date, CrossRef has registered seven million content items and covers roughly 7,000 scholarly and professional journals.

While most of these DOIs are assigned to journal articles, 300,000 proceedings chapters have been registered, and the deposit of books at both the title and chapter level is underway. Researchers are already using the CrossRef system at an impressive rate of two to three million DOI clicks per month.

One shortcoming of the model described above is that it fails to take the researcher's access privileges - or, for that matter, preferences - into account. Because publishers control DOI assignment, DOIs point to publisher-designated resources. Yet for the user working in an institutional context, it may not always be useful to be directed to the publisher's website, even at the article level. For example, the institution may not subscribe to the e-journal itself, but may still be able to offer the user access to the desired article through an aggregated database or through print holdings.

In order to provide users with a unified display integrating all relevant, available resources in relation to a citation, many research libraries are implementing local link servers, such as SFX. A library's own link server can take the user from a specific database record or citation to a menu of linking options appropriate to the institution's holdings and access privileges. One noteworthy development at CrossRef last year was a procedure for integrating the DOI with library link servers. The central DOI directory is now able to redirect a DOI link activated in an institutional context to that institution's own link menu. Furthermore, the library resolver can pull the bibliographic information needed to create locally appropriate OpenURL-based links directly from the CrossRef metadata database. With evolving technology, several new trends will be seen in how linking occurs and continues to advance, and even to alter research methods. Increasing linked content and more intelligent, dynamic linking tools will create a more intuitive research environment that is essentially user-focused.

Towards a comprehensive linking network

A truly comprehensive linking network for online research content is within reach. In order to realise this vision, the DOI directory will eventually include identifiers for all types of material that could be linked in a research publication. This will involve the cooperation of DOI registration agencies, of which there are a growing number to cover different regions of the world, different content types, and different applications. As we saw above, extensive linking within the journal literature is well underway, and books and conference proceedings are rapidly being added. Next to come are patents, technical reports, government documents, learning objects, datasets, and images. In addition, new data formats are being linked. Already, Atypon is dynamically adding links to PDFs. Soon it will be possible to dynamically add links to content from many applications and multimedia.

In addition to the content-type dimension, the linking network will continue to expand along three other key dimensions: (1) backwards in time, as publishers and institutions digitise archival material; (2) to more granular levels, as DOIs get assigned to sub-parts of content; and (3) via multiple resolution of the DOI, whereby the user clicking on a single link will be presented with a menu of publisher-supplied options. These include: the option to go to alternative sites for the same content; to view related resources; to drill up or drill down within the publication; to get more information about the author; and to purchase or acquire rights to the content. (For a preview of this functionality, please visit www.crossref.org/mr.)

  • Example of multiple resolution, which offers the user a set of linking options from a single citation, (available soon from CrossRef).

Dynamic and associative linking

Publishers will continue to add value to their content through a variety of linking enhancements. Among these are forward linking, dynamic linking, and associative or conceptual linking. With cross-publisher forward linking, which is under joint development from CrossRef and technology partner Atypon, the user will have access to complete citation pathways, and therefore be able to link to content that cites - as well as is cited by - the current item. With dynamic linking, links to related resources will be automatically generated on the fly by intelligent 'more like this' algorithms. For example, keyword and phrase linking - currently static links sprinkled throughout a full-text resource that point users to relevant reference works, such as dictionaries, glossaries, or encyclopaedias - will be dynamically generated using concept mapping. A key section of a work may warrant a link in the margin, offering the user 'more on these concepts' and the option to spin off a query within a range of sources.

Example of a possible future link, operating as a pop-up based on a taxonomic hierarchy, that allows users to drill up and down conceptually, and to use this to spin off a more intelligent query in or across new sources (overlaid onto Blackwell Synergy article for demonstration purposes)

From dynamic linking technologies, the next step is the use of ontologies to provide context for these more intelligent, conceptually-based linking options, allowing the user to drill up or down a hierarchy from within a link to find a way to related information (see image above). Taking Medline as an example, we are currently seeing a shift from controlled domain-specific vocabularies such as MeSH, to meta-thesauri, such as the Unified Medical Language System (UMLS) that encodes semantic relationships among medical concepts. With tools like UMLS, linking systems could offer researchers a set of linking options that reflects the concepts within a textual work, combined with knowledge of how these concepts fit into a more highly structured understanding of the field. 'More like this' could become 'more in this same conceptual space.' From within the link, the user could choose to query a related resource with a parent term from the hierarchical meta-thesaurus, narrow the focus to a child term, or expand it to include a sister term.

The shift to user-centric linking

Current linking systems are, as we have seen, managed by content providers or libraries. As technologies advance, we predict a shift from provider- and institution-driven to user-driven contextualisation. The linking options presented to a user will begin to reflect information about that unique individual, as collected in a profile of access rights, personal preferences, and an historical log of the user's behaviour. The more information that can be harvested about the user, or that the user chooses to reveal, the more meaningful and rich his or her linking experience will be. Intermediate resolution pages that give users a list of linking options could be bypassed if the user's full text access rights or preferences are known, giving truly seamless access. Or a link pop-up menu could be enhanced to include further search options based on more complex conceptual algorithms that are customised to fit the user's profile.

As the universe of possible linking options expands exponentially, only through understanding the user's goals and effectively filtering the noise can technology truly facilitate research. The problem will not be finding content, but in locating and offering the right content for a specific user. As the network of linking options grows, so will the logged information on user preferences and behaviours, giving a picture of how content is used across millions of user sessions. Data coming from this user-centric network will become increasingly important in determining the true value of content sources to users.

Changing how we research

There are times when technology drives human behaviour, and vice versa. In the case of scholarly research using the Internet, there will be some of each. The ever-widening pool of potential links, combined with user-driven technologies to customise and filter that pool, will shape how researchers use the Internet - not only to find content, but also to generate a flow of ideas, new concepts, and ways to communicate with each other.

A growing problem that scholars face is the multitude of tools and resources available as the starting point for a research quest. Having to repeat the query process across multiple interfaces is time-consuming. Many end up relying on one or two tried and true sources and then hope for the best.

With a more free-flow concept-to-concept type of linking in place, users will consistently find themselves within relevant sites that they otherwise would not have known about or queried. In this environment, the most richly linked content will be accessed most often.

The evolved linking on the Internet brings into question the roles of publishers, aggregators, and librarians. From the point of view of researchers aiming to disseminate their work, even the publisher and the library can seem like intermediaries.

While they may be there to filter, package, distribute, and archive, they are not the ultimate producers and consumers of ideas in the research information chain - scholars are. As we have seen, publishers can effectively promote their relevance to, and add value for, the research community through the enhanced linking functionality and collaborative services, such as CrossRef, that make the virtual integration of online resources possible.

With access to content increasingly coming through associative, dynamic linking, there is a clear need for internal filtering based on the credibility of the source. Libraries will continue to serve the crucial functions of creating federated search interfaces for their user-base, and determining at the highest level what sources should be included in a pool of linking options. Offering researchers both tools to manage their own profiles and custom linking options will be important steps that librarians can take to further the creation of a true user-centric information network.

These trends toward virtual, or distributed, integration of research content may leave traditional aggregators in a lurch. With publishers collaborating directly with one another, with some exposing not only their metadata but also their proprietary full-text for search and navigation purposes, and with automated tools for the intelligent classification of content, there is less of an obvious need for manual aggregation of subject-based resources going forward.

Domain expertise is nonetheless needed to refine linking algorithms and prime the ontological engines that ultimately provide the researcher with an optimally meaningful yet free-flow linking experience.

Amy Brand is director of business development at CrossRef; Kristen Fisher is director of product marketing at Atypon Systems.