“Open access”, the ability to access biomedical literature free of charge and with few restrictions, is slowly being adopted by publishers worldwide. This means you no longer need privileges at academic libraries or paid journal subscriptions to access biomedical science journals. While all journals do not provide free access, the number is steadily increasing and this means anyone can retrieve past and present biomedical literature. But where does one start? There are two well-known publicly available tools that provide access to multiple literature archives. Based on the MEDLINE database, the most widely used by the biomedical community is PubMed (https://pubmed.ncbi.nlm.nih.gov/). Pubmed was created in 1996 by the National Center for Biotechnology Information (NCBI), US National Library of Medicine (NLM) at the National Institutes of Health (NIH). This database contains more than 32 million citations of peer-reviewed, life science journals and online books sourced across several NLM resources. Since 2008, the NIH Public Access Policy requires that manuscripts from NIH funded research is made freely available to the public within 12 months of publication. Manuscripts are deposited in PubMed Central (PMC), one of the NLM resources archives. Google Scholar (https://scholar.google.com/), a more recent tool since 2004 based on the University of Michigan archive collection. This database is purported to contain anywhere from 160 to 389 million peer- and non-peer reviewed articles, books and documents making it the world’s largest academic search engine. Google does not disclose the size or resources used to compile the database.
Both engines have similar search capabilities but use slightly different search algorithms. PubMed uses a weighted term frequency algorithm. The algorithm calculates the frequency that the terms appears in the databases and then returns a ranked list based on the weighted frequency. PubMed accepts direct keyword searching and uses Boolean operators i.e. and, not, or, to narrow or broaden search strategies. Keywords are mapped to Medical Subject Headings (MeSH) (https://www.nlm.nih.gov/mesh/meshhome.html), a thesaurus originally created by NLM to organize medical vocabulary for cataloging, indexing and searching in MEDLINE but now applied across NLM databases. Keyword terms are translated to MeSH headings and combined using Boolean operators to significantly improve search results. The default search result output list is organized by “Most Recent” and easily changed to “Best Match” or “Journal” among other parameters. Other filters allow sorting and search refinement by language, article type, publication type, publication dates and most importantly, by text availability. Access to full free text (via PMC or publisher open access) is noted in the main result output and full text links are provided when reviewing specific citations. A list of similar or related articles, based on keywords, is included in the result output increasing the search yield and minimizing the need for repeat searching. Each manuscript indexed is assigned a unique PubMed Identifier (PMID) and Digital Object Identifier (DOI). PMIDs allow any reference to be found within PubMed and is often searchable in non-academic search engines. DOIs are similarly unique and provide a permanent link to article location on the internet.
In contrast, Google Scholar uses a combined weighted algorithm to rank results. Author(s), full-text keyword(s), and how often the manuscript is cited, are all used to weight search results and rank order the output. The algorithm places more weight on citation counts for both keyword and author searches resulting in the top results being the most cited articles. While many consider this advantageous, Google Scholar has been criticized for this approach as it artificially inflates citation counts. Regardless, users can search for either digital or physical versions of the articles. Links to both published (commercial) and open access (repository) versions are provided but users cannot filter between “toll” (pay to access) and “open” access for full-text-articles. Default search output is sorted by relevance but can be filtered by date or date range. Similar to PubMed, Google Scholar includes related articles in the result list but Google Scholar’s most powerful feature is the ability to “include citations” in the search output. Since the algorithm weights citation counts, other manuscripts that have cited the article being viewed are linked in the search output, greatly increasing the efficiency of literature review. Unfortunately, Google Scholar does not display DOIs so identifying the original work can be more difficult. This makes the database more susceptible to the inclusion of non-peer reviewed and predatory journal articles.
Now that you know how to find information in a manuscript and have some tools to find articles, the next step is to begin searching. However, before you can do that, you need to have a good idea of what you are trying to find. The approach used to sort through hundreds of articles very much depends on what is being looked for. For example, looking for a detailed protocol requires more comprehensive, deeper scanning/reading of an article than finding if a marker is expressed in a particular tumor.
In the third post in this series; Cruising the Biomedical Scientific Literature; Screening Articles, learn how to narrow down searches and use the article structure described previously to select articles in PubMed or Google Scholar in order to build a reference dataset.
Written by: Luis Chiriboga, PhD, HT(ASCP), QIHC