Computer Science > Databases
[Submitted on 5 May 2016 (v1), last revised 15 May 2016 (this version, v2)]
Title:PipeGen: Data Pipe Generator for Hybrid Analytics
View PDFAbstract:We develop a tool called PipeGen for efficient data transfer between database management systems (DBMSs). PipeGen targets data analytics workloads on shared-nothing engines. It supports scenarios where users seek to perform different parts of an analysis in different DBMSs or want to combine and analyze data stored in different systems. The systems may be colocated in the same cluster or may be in different clusters. To achieve high performance, PipeGen leverages the ability of all DBMSs to export, possibly in parallel, data into a common data format, such as CSV or JSON. It automatically extends these import and export functions with efficient binary data transfer capabilities that avoid materializing the transmitted data on the file system. We implement a prototype of PipeGen and evaluate it by automatically generating data pipes between five different DBMSs. Our experiments show that PipeGen delivers speedups up to 3.8x compared with manually exporting and importing data across systems using CSV.
Submission history
From: Brandon Haynes [view email][v1] Thu, 5 May 2016 17:43:38 UTC (1,877 KB)
[v2] Sun, 15 May 2016 19:37:55 UTC (1,877 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.