Blog Entry

Digital Data #2: Digital Search Challenges


Digital research is essential for the modern pharmaceutical or healthcare facility, and this requires the process of undertaking digital searches. This can include specialist software, such as regularly updated pipeline databases to searching specialty journals, or utilizing different online database to obtain data. The search and research process relies on the ability to use online resources to find, assess, and leverage information to inform a decision. Digital online research methods are related to existing research methodologies but re-invent and re-imagine them in the light of new technologies and conditions associated with the Internet, making use of the advantages that digital data presents.

A typical digital search cycle is:

SAndle - Digital Data Search Cycle.png


Figure 1: A digital data search cycle

This article looks at the challenges involved with data searching and data capture, and provides links to several database of interest to those working in the life sciences field.

Digital searches: The challenges

Undertaking digital research is not without challenges. Challenges include (1):

  • Sampling bias: It can sometimes be hard to assess the representativeness of online samples and, therefore, the generalizability of results.
  • Validity of data: Accurate data are vital for research. The potential anonymity of the online environment raises concerns about invalid data owing to, for example, multiple enrolment or deliberate falsification.
  • Poor quality research: Sometimes research is of poor quality or suspect in nature. The type of journal is important (with some journals being more reputable than others), as well as the number of times a piece of research has been cited.
  • Selecting too narrow or too broad a time frame when assessing published research: Time allocated to the search needs attention as exploring and selecting data are early steps in the research method and research conducted as part of academic assessment have narrow timeframes.

The need to differentiate between difference sources of literature is also important (2):

  • Primary literature: Primary sources are the authentic publication of an expert's new evidence, conclusions and proposals (case reports, clinical trials, and so forth) and are usually published in a peer-reviewed journal. Preliminary reports, congress papers and preprints also constitute primary literature.
  • Secondary literature: Secondary sources are systematic review articles or meta-analyses where material derived from primary source literature are inferred and evaluated
  • Tertiary literature: Tertiary literature consists of collections that compile information from primary or secondary literature (such as reference books).

Such challenges can lead to inaccurate hypotheses being set or research going in an incorrect direction.

Some good practices for undertaking digital searches include:

  • Selecting appropriate key words.
  • Varying key words. Synonyms and alternate terms should be considered to elicit further information, such as barbiturates in place of thiopentone.
  • Spellings should also be taken into account, i.e., anesthesia in place of anaesthesia (American and British).
  • Being specific. For example, searching for ‘clinical rials for drug x’ will reveal lots of data but not necessarily the higher-level data required. Whereas, searching for ‘randomized controlled clinical trial for drug x’ may prove more rewarding.
  • Reviewing multiple data bases.
  • Phrase search. This can be useful, in that it will provide pages with only the words typed in the phrase, in that exact order and with no words in between them.
  • Boolean operators, such as AND, OR and NOT can narrow down searchers. Combining two words using ‘AND’ will fetch articles that mention both the words. Using ‘OR’ will widen the search and fetch more articles that mention either subject. While using the term ‘NOT’ to combine words will fetch articles containing the first word but not the second, thus narrowing the search.
  • Many search engines have filters. Filters can also be used to refine the search, for example, article types, text availability, language, age, sex and journal categories.

It is important to carefully cite the collected information carefully, including references to data itself. The elements of a data/statistics citation include:

  • Author(s)/Creator
  • Title
  • Year of publication: The date when the statistics/dataset was published or released (rather than the collection or coverage date)
  • Publisher: the data center/repository
  • Any applicable identifier (including edition or version)
  • Availability and access: URL or other location information for the data/statistics

Further examples of best practice are provided by Creswell (3):

  1. Identify keywords and use them to search articles from library and Internet resources as described above
  2. Search several databases to search articles related to your topic
  3. Use thesaurus to identify terms to locate your articles
  4. Find an article that is similar to your topic; then look at the terms used to describe it, and use them for your search
  5. Use databases that provide full-text articles (free through academic libraries, Internet or for a fee) as much as possible so that you can save time searching for your articles
  6. If you are examining a topic for the first time and unaware of the research on it, start with broad syntheses of the literature, such as overviews, summaries of the literature on your topic or review articles
  7. Start with the most recent issues of the journals and look for studies about your topic and then work backward in time. Follow-up on references at the end of the articles for more sources to examine
  8. Refer books on a single topic by a single author or group of authors or books that contain chapters written by different authors
  9. Next look for recent conference papers. Often, conference papers report the latest research developments. Contact authors of pertinent studies. Write or phone them, asking if they know of studies related to your area of interest
  10. The easy access and ability to capture entire articles from the web make it attractive. However, check these articles carefully for authenticity and quality and be cautious about whether they represent systematic research.

Digital literature searches provide researchers in pharmaceuticals and healthcare with an opportunity to learn more about a given topic and it provides insight on how the topic was studied by previous analysts. It helps to interpret ideas, detect shortcomings and recognize opportunities. It follows that systematic and well-organized research may help in designing a novel research (4).

Digital research sources

When undertaking digital research, the following general sources are useful for those engaged in pharmaceuticals and healthcare:

Academic OneFile

A source for peer-reviewed, full-text articles for academic libraries from the world's leading journals, this comprehensive resource covers the physical and social sciences, technology, medicine, engineering, the arts, technology, literature, and many other subjects. See:


AccessScience is an authoritative online resource that contains reference material covering all major scientific disciplines. It offers links to primary research material, videos and exclusive animations, plus specially designed curriculum maps for teachers. It also encompasses the McGraw-Hill Encyclopedia of Science & Technology and McGraw-Hill Yearbook of Science & Technology. See:

Google Scholar

Google Scholar uses the power of Google searches applied to research papers and patents. It enables the user to find research papers for all academic disciplines for free, and it often provides links to full text PDF file. See:

Google Books

Google Books allows web users to browse an index of thousands of books. Once the science-based book is located, you can look through pages, find online reviews and learn where you can get a hard copy. See:

Microsoft Academic

Microsoft Academic takes a different approach to Google Scholar and it generates for each paper that is indexed an overview page that allows to easily explore top citing articles and references of the article. Some 210 million articles are accessible. See:


Infotopia describes itself as a “Google-alternative safe search engine.” The academic search engine pulls from results that have been curated by librarians, teachers and other educational workers. A unique search feature allows users to select a category and then see a list of internal and external resources pertaining to the topic. See:


WorldWideScience, which refers to itself as “The Global Science Gateway,” is operated by the Office of Scientific and Technical Information—a branch of the Office of Science within the U.S. Department of Energy. The site utilizes databases from over 70 countries. When users type a query, it hits databases from all over the world and will display both English and translated results from related journals and academic resources. See:

Lexis Web

For researching legal topics Lexis Web is useful for any law-related inquiries. The results are drawn from legal sites, which can be filtered by criteria such as news, blog, government and commercial. Users can also filter results by jurisdiction, practice area, source and file format. See:


With its minimalist design, Refseek has behind it an engine that can pull extensively from over one billion web pages, encyclopedias, journals and books. It is similar to Google in its functionality, except that it focuses more on scientific and academic results—meaning more results will come from .edu or .org sites, as well as online encyclopedias. It also has an option to search documents directly providing easy access to PDFs of academic papers. See:


BASE is hosted at Bielefeld University in Germany and that's where the name stems from (Bielefeld Academic Search Engine). There is coverage of approximately 136 million articles. See:


CORE is an academic search engine dedicated to open access research papers. For each search result a link to the full text PDF or full text web page is provided. See: bundles and offers free access to search results from more than 15 U.S. federal agencies. See:

Semantic Scholar

Semantic Scholar aims to provide more relevant and impactful search results using AI powered algorithms that find hidden connections and links between research topics. See:

Baidu Scholar

Although Baidu Scholar's interface is in Chinese, the search engine’s index contains research papers in English as well as Chinese. See:

Where To Search For Research Papers Open Access Databases And Search Platforms

A number of websites collate research papers and associated materials that can be freely and fully accessed. Some examples are outlined below (in some cases the websites contain a mix of open access full texts and abstracts, with links leading to paywalls).


CORECORE is a multidisciplinary aggregator of open access research. It allows users to search more than 66 million open access articles. Most of these link to the full-text article on the original publisher's site.

In addition to a straightforward keyword search, CORE offers advanced search options to filter results by publication type, year, language, journal, repository, and author.

CORE can be accessed here:


Operating as a research and publishing network, ScienceOpen offers open access to more than 28 million articles in all areas of science. The Berlin- and Boston-based company was founded in 2013 with the goal to "facilitate open and public communications between academics and to allow ideas to be judged on their merit, regardless of where they come from."

The site can be accessed here:

Directory of Open Access Journals

Directory of Open Access Journals is a multidisciplinary, community-curated directory, the providing researchers access to high-quality, peer-reviewed journals. The site was launched in 2003 with the aim of increasing the visibility of open access scholarly journals.

It is available here:

arXiv e-Print Archive

The arXiv e-Print Archive has been around since 1991 and is a well-known resource in the fields of mathematics and computer science. It is run by Cornell University Library and now offers open access to more than one million e-prints.

The site is available here:

Public Library of Science

Public Library of Science (PLOS) is the major site for open access science. Publishing seven open access journals, the non-profit organization is committed to facilitating openness in academic research. According to the site, "all PLOS content is at the highest possible level of open access, meaning that scientific articles are immediately and freely available to anyone, anywhere."

The link for PLOS is:


OpenDOAR, is the Directory of Open Access Repositories, is a comprehensive resource for finding open access journals and articles. Using Google Custom Search, OpenDOAR combs through open access repositories around the world and returns relevant research in all disciplines.

OpenDOAR can be found here:

Bielefeld Academic Search Engine

BASEThe Bielefeld Academic Search Engine (BASE) is operated by the Bielefeld University Library in Germany, and it offers more than 100 million documents from more than 4,000 sources. Sixty percent of its content is open access, and you can filter your search accordingly.

The link for the academic site is:

Digital Library of the Commons Repository

Run by Indiana University, the Digital Library of the Commons (DLC) Repository is a multidisciplinary journal repository that allows users to check thousands of free and open access articles from around the world. Users can browse by document type, date, author, title, and more or search for keywords relevant to your topic.

The digital library can be found here:

The reference to ‘Commons’ fits in with the ‘creative commons’ approach. Creative Commons (CC) is an internationally active non-profit organization that provides free licenses for creators to use when making their work available to the public. These licenses help the creator to give permission for others to use the work in advance under certain conditions. See:


According to the Paperitywebsite, it is the "first multidisciplinary aggregator of open access journals and papers." Their focus is helping you avoid paywalls while connecting you to authoritative research.

Papercity can be found here:

BioMed Central

BioMed Central  provides open access research from more than 290 peer-reviewed journals in the fields of biology, clinical medicine, and health. Users can browse these journals by subject or title, or you can search all articles for your required keyword.

The link is:


A multidisciplinary search engine, JURN provides links to various scholarly websites, articles, and journals that are all free access or open access. JURN has indexed almost 5,000 repositories.

JURN’s link is:


DryadDryad is a digital repository of curated, open access scientific research. It is run by a not-for-profit membership organization that aims to "promote a world where research data is openly available, integrated with the scholarly literature, and routinely reused to create knowledge."

It is free to access, but there is a publishing charge associated for publishing data in Dryad.

The link is:


Run by the British Library, EThOS allows you to search over 400,000 doctoral theses in a variety of disciplines. Although some full texts are behind paywalls, users can limit their search to items available for immediate download, either directly through EThOS or through an institution's website.

The link is:


PubMed, of the U.S. National Center for Biotechnology Information, is a research platform in the fields of science and medicine. It offers access to "more than 26 million citations for biomedical literature from MEDLINE, life science journals, and online books." While many resources are behind paywalls, you can filter your search to view free full texts only, making this an even more valuable resource. The PDA journal is one of the journals indexed.

The link is:

Semantic Scholar

Semantic Scholar harnesses the power of artificial intelligence to efficiently sort through millions of science-related papers based on your search terms. According to the site, although some articles are behind paywalls, "the data [they] have for those articles is limited," so users can expect to receive mostly full-text results. Another feature is the extensive advanced search options, which allow you to search by cell type and brain region, among other things.

This website can be found here:

There are two open access areas where researchers publish their own works (where permission has been given by the publisher). Two examples are ResearchGate. Mendeley, and For reference, the links for the pages of this book’s author are provided:

Of the above, ResearchGate is probably the most powerful. The site is ostensibly a social networking site for scientists and researchers; however, it is also a major resource in that over 11 million researchers have submitted their work, which totals more than 100 million publications, on the site for anyone to access. It is possible to search by publication, data and author, or you can even ask the researchers questions.

In addition to the above web-links, digital libraries frequently use the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to expose their metadata to other digital libraries, and search engines like Google Scholar, Yahoo! and Scirus also use OAI-PMH to find these deep web resources. In considering this issue, OAI-PMH attempts to address the problem of metadata interoperability. OAI-PMH attempts to achieve this is by demanding that data providers adhere to a minimal standard at the schema level (5).


There are many other resources in addition to those outlined above, although the most commonly used have been presented. It also stands that some are of more use than others when specific topics are being looked for (6). Furthermore, no one database can search all the pharmaceutical or medical literature. Hence, there is need to search several different databases.

The use of such digital resource is especially important for today’s pharmaceutical and healthcare environment, which is evolving rapidly and presents continual challenges. These include the growth of personalized medicine and artificial intelligence. In addition, clinical trials have become smaller and based around niche drugs and smaller patient populations. Gathering sufficiently useful information requires effective digital search skills.


  1. Alessi EJ, Martin JI. (2010) Conducting an Internet-based survey: benefits, pitfalls, and lessons learned. Social Work Res;2:122–8. 10.1093/swr/34.2.122
  2. Cronin P, Ryan F, Coughlan M. (2008) Undertaking a literature review: A step-by-step approach. Br J Nurs.;17:38–43
  3. Creswell JW. (2014) Research Design: Qualitative, Quantitative and Mixed Method Approaches. 4th ed. Thousand Oaks, CA: Sage
  4. Grewal, A., Kataria, H. and Dhawn, I. (2016) Literature search for research planning and identification of research problem, Indian J Anaesth. 60(9): 635–639. doi: 10.4103/0019-5049.190618
  5. Fegen, N. (2007). What is the OAI Protocol for Metadata Harvesting. In JISC cetis. Retrieved from (on 9th April 2021)
  6. Gusenbauer, M. and Haddaway, N. R. (2020). "Which Academic Search Systems are Suitable for Systematic Reviews or Meta‐Analyses? Evaluating Retrieval Qualities of Google Scholar, PubMed and 26 other Resources". Research Synthesis Methods. 11 (2): 181–217

Product Added Successfully

This product has been added to your account and you can access it from your dashboard. As a member, you are entitled to a total of 0 products.

Do you want access to more of our products? Upgrade your membership now!

Your Product count is over the limit

Do you want access to more of our products? Upgrade your membership now!

Product added to cart successfully.

You can continue shopping or proceed to checkout.

Comments (0)

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
  • Use to create page breaks.
Enter the characters shown in the image.
Validated Cloud logo