skip to main content
article

Scientific data management in the coming decade

Published: 01 December 2005 Publication History

Abstract

Scientific instruments and computer simulations are creating vast data stores that require new scientific methods to analyze and organize the data. Data volumes are approximately doubling each year. Since these new instruments have extraordinary precision, the data quality is also rapidly improving. Analyzing this data to find the subtle effects missed by previous studies requires algorithms that can simultaneously deal with huge datasets and that can find very subtle effects --- finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.

References

[1]
{fr1} Committee on Data Management, Archiving, and Computing (CODMAC) Data Level Definitions http://science.hq.nasa.gov/research/earth_science_formats.html
[2]
{fr2} http://hdf.ncsa.uiuc.edu/HDF5/
[3]
{fr3} http://my.unidata.ucar.edu/content/software/netcdf/
[4]
{fr4} http://fits.gsfc.nasa.gov/
[5]
{fr5} http://vizier.u-strasbg.fr/doc/UCD.htx
[6]
{fr6} "MapReduce: Simplified Data Processing on Large Clusters," J. Dean, S. Ghemawat, ACM OSDI, Dec. 2004.
[7]
{fr7} "Parallel Database Systems: the Future of High Performance Database Systems", D. DeWitt, J. Gray, CACM, Vol. 35, No. 6, June 1992.
[8]
{fr8} "When Database Systems Meet the Grid," M. Nieto Santisteban et. al., CIDR, 2005, http://www-db.cs.wisc.edu/cidr/papers/P13.pdf
[9]
{fr9} "Batch is back: CasJobs serving multi-TB data on the Web," W. O'Mullane, et. al, in preparation.
[10]
{fr10} "Lessons Learned from Managing a Petabyte," J. Becla and D. L. Wang, CIDR, 2005, http://www-db.cs.wisc.edu/cidr/papers/P06.pdf
[11]
{fr11} D. T. Liu and M. J. Franklin, VLDB, 2004, www.cs.berkeley.edu/~dtliu/pubs/griddb_vldb04. pdf
[12]
{fr12} M. Litzkow, M. Livny and M. Mutka, Condor - A Hunter of Idle Workstations, International Conference of Distributed Computing Systems, 1988.
[13]
{fr13} I. Foster and C. Kesselman, Globus: A Metacomputing Infrastructure Toolkit, Journal of Supercomputer Applications and High Performance Computing, 1997.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 34, Issue 4
December 2005
86 pages
ISSN:0163-5808
DOI:10.1145/1107499
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2005
Published in SIGMOD Volume 34, Issue 4

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)212
  • Downloads (Last 6 weeks)24
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Mapping Hierarchical File Structures to Semantic Data Models for Efficient Data Integration into Research Data Management SystemsData10.3390/data90200249:2(24)Online publication date: 26-Jan-2024
  • (2024)SeLeP: Learning Based Semantic Prefetching for Exploratory Database WorkloadsProceedings of the VLDB Endowment10.14778/3659437.365945817:8(2064-2076)Online publication date: 1-Apr-2024
  • (2024)The role of research university libraries in research data management: The case of TürkiyeInformation Development10.1177/02666669231224430Online publication date: 5-Jan-2024
  • (2024)Significantly Improving Fixed-Ratio Compression Framework for Resource-limited ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673092(845-855)Online publication date: 12-Aug-2024
  • (2024)Octopus: Experiences with a Hybrid Event-Driven Architecture for Distributed Scientific ComputingSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00071(496-507)Online publication date: 17-Nov-2024
  • (2024)DAI: How Pre-computation Speeds up Data AnalysisComputational Science – ICCS 202410.1007/978-3-031-63751-3_8(116-130)Online publication date: 2-Jul-2024
  • (2023)Digital Twin-Based Concept for Reliable Research Data ManagementProceedings of the Conference on Research Data Infrastructure10.52825/cordi.v1i.2971Online publication date: 7-Sep-2023
  • (2023)Establishing Reliable Research Data Management by Integrating Measurement Devices Utilizing Intelligent Digital TwinsSensors10.3390/s2301046823:1(468)Online publication date: 1-Jan-2023
  • (2023)Blue Brain Nexus: An open, secure, scalable system for knowledge graph management and data-driven scienceSemantic Web10.3233/SW-22297414:4(697-727)Online publication date: 24-Apr-2023
  • (2023)End-to-End Workflows for Climate Science: Integrating HPC Simulations, Big Data Processing, and Machine LearningProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624283(2042-2052)Online publication date: 12-Nov-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media