1.6: Knowledge
From Data to New Knowledge Other Information:
From Data to New Knowledge -- Most of the world's information is now "born digital," and legacy texts, images, sounds, videos,
and films as well are being digitized around the clock. Although statistical estimates vary, they agree that the amount of
digital data generated annually is many orders of magnitude greater than the total amount of information in all the books
ever written, and the total is expected to continue growing exponentially. In the advanced sciences alone, the proliferation
of ultra-powerful and distributed data-collection instruments and experimental facilities has turned the conduct of leading-edge
research into a global-scale, data-intensive enterprise. The Federal agencies in the NITRD Program together generate exabytes
of research data annually. Financial, commercial, communications, and Web-based enterprises likewise continually generate
vast amounts of new digital information. Where we are now -- today, our capacity to create electronic data is outpacing advances
in the technologies needed to manage and make effective use of society's data resources. Ultra-large-scale data sets -- what
scientists refer to as "big data" -- are troves of potential new knowledge, but as noted above, the current networking infrastructure
does not provide levels of end-to-end performance that would enable individuals and groups to access and work with big data
on their desktops. While the plummeting cost of mass storage eases the stress of archiving massive data resources, we also
do not yet know how to design scalable technologies—such as semantic frameworks and open ontologies—that would substantially
advance capabilities for rapidly identifying, integrating, refining, analyzing, and visualizing heterogeneous and ultra-scale
information in ways that would help people learn, think, and decide. Nor do we yet have a rationalized, robust information
infrastructure for the long-term preservation, curation, federation, sustainability, accessibility, and survivability of vital
Federal electronic records and data collections, such as those overseen by the National Archives and Records Administration
(NARA). "Harnessing the Power of Digital Data for Science and Society," the 2009 report of the Interagency Working Group on
Digital Data (which includes many NITRD agencies), has proposed an initial framework for developing such an infrastructure.
Research needs -- we need far more powerful and nuanced tools than exist today to mine data troves deeply, and to combine
and visualize diverse forms of data, in order to "see" the significant items, patterns, and relationships that could lead
to new insights. To support complex human, societal, and organizational ideas, analysis, and timely action and decision-making,
multisource forms of large-scale, raw digital information (e.g., sensor data) must be managed, assimilated, and accessible
in formats responsive to the user's needs and expertise. At the extreme scale represented by 21st century scientific and other
data, significant R&D challenges in applying information to enhance discovery and decision-making remain to be addressed,
including: * Information standards—data interoperability and integration of distributed data; generalizable ontologies; data
format description language (DFDL) for electronic records and data; data structure research for complex digital objects; interoperability
standards for semantically understood ubiquitous health information records; and information services for cloud-based systems
* Decision support -- next-generation machine learning, semantic logic, and data mining algorithms; portals and frameworks
for data and processes; tools for large-scale collaboration; user-oriented and collaborative techniques and tools for thematic
discovery, synthesis, data provenance, analysis, and visualization for decision making; mobile, distributed information for
emergency personnel; management of human responses to data; collaborative information triage; portfolio analysis; development
of data corpora for impact assessment and other metrics of scientific R&D; and multidisciplinary R&D in ways to convert data
into knowledge and discovery * Information management -- intelligent rule-based data management; increasing access to and
cost-effective integration and maintenance of complex collections of heterogeneous data; innovative architectures for data-intensive
and power-aware computing; scalable technologies; integration of policies (differential sensitivity, security, user authentication)
with data; integrated data repositories and computing grids; testbeds; sustainability and validation of complex models; and
grid-enabled visualization for petascale collections
Indicator(s):
|