Extract data from SQL Server

Overview

Extracting data every time from a constrained, on-prem source system by several different teams cannot be allowed. The design principle is to build one data pipeline that will extract data from the source system and to never read the same data twice from the source. This data will be moved to a cloud platform where it can be consumed multiple times. This is much more efficient and secure at the end of the day.

To minimise the impact on the source system and bandwidth of the site, we will restore the weekly database backups to another non-critical SQL Server on-prem. From there, we will use the SQL Server Command Line Utility, sqlcmd, to extract CSV data from the two tables directly. The CSV files will then be converted into parquet files for uploading to Azure. The blog post with more detail is available here:

Extract dark data from SQL Server

Repo structure and file descriptions

main.py: Simple Python script to launch sqlcmd, extract CSV files and convert them to parquet format.

The dependencies and virtual environment details are located in the Pipfile which can be used with pipenv.

Remember to install SQL Server Command Line Utilities.

License

GNU GPL v3

Author

Coenraad Pretorius

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extract data from SQL Server

Overview

Repo structure and file descriptions

License

Author

About

Releases

Packages

Languages

h3xagn/extract-data-from-sql

Folders and files

Latest commit

History

Repository files navigation

Extract data from SQL Server

Overview

Repo structure and file descriptions

License

Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages