Comments have closed. View all of the finalist entries below.

Globalization of Science Results with AI

What is the compelling question or challenge?

A critical challenge in science is to generalize from limited observations and data. How globally significant are the processes we observe and measure? Can Artifical Intelligence (AI) help answering this question?

What do we know now about this Big Idea and what are the key research questions we need to address?

This big idea proposes a hybrid machine-human approach for the systematic analysis of scientific discoveries. This challenge includes several important research issues:

  1. Machine reading – machines must have the capability to read scientific materials accurately and at scale. With the exception of a few specific and limited use cases, this exceeds the current state of the art. Further, machines must assign a believability score, i.e., how much should this finding be trusted?, to the extracted knowledge. Such a believability score should include information about the strength of the experiments reported, the reliability of the sources (i.e., publishers, authors), linguistic hints about the believability of findings such as hedging language, etc.
  2. Machine assembly – machines must be able to assemble individual findings into consistent knowledge. For example, in cancer research, individual biochemical interactions must be assembled into complete protein signaling pathways. In geoscience, findings within specific spatio-temporal contexts must be assembled into a consistent map. The complexity of this assembly task is compounded by the fact that the detection of overlaps between individual findings is a non-trivial procedure. For example, in biomedical tests, the same protein often has multiple different names. In geoscience, location names are often ambiguous.
  3. Machine reasoning – once individual findings are assembled into a knowledge map, machines must have the potential to infer new knowledge from existing facts. For example, in geoscience, machines should be able to detect if subduction-related magmatism goes through flare-ups and lulls volumetrically over time. Such flare-up events have been documented at limited and more accessible local and regional scales and are often correlated with abundant mineral resources. Can the existing observations, both the esoteric (the one that explains what drives magmatism) and the opportunistic (finding metals that are crucial to our society) be expanded beyond the classic regions of research? Machines can be guiding us to the next place where rare earth metals or other critical resource can be found. Similarly, in the biomedical domain, machines should generalize from the signaling pathways acquired from individual publications, to holistically explain diseases and suggest treatments.
  4. Hybrid human-machine analysis – lastly but most importantly, the entire process above must be a hybrid initiative between machines and humans, to account for the inherent limitations of machines. For example, machine reading will remain an imperfect process, which must have human quality control. While “human-in-the-loop” approaches have been investigated in machine learning for simpler tasks such as text/image classification, “human-in-the-loop” methods for complex tasks such as assembly and reasoning remain beyond the state of the art. Further, to enable the human/machine interaction, the machine learning algorithms deployed in the three steps above must all be interpretable by humans, i.e., they must explain their decisions, and must be able to receive and incorporate corrective feedback from humans.
Show More


National Science Foundation, 2415 Eisenhower Avenue, Alexandria VA 22314, USA

Tel: (703) 292-5111, FIRS: (800) 877-8339 | TDD: (800) 281-8749