Skip to content

Extracting data from SQL Server using Python and sqlcmd

Notifications You must be signed in to change notification settings

h3xagn/extract-data-from-sql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extract data from SQL Server

Overview

Extracting data every time from a constrained, on-prem source system by several different teams cannot be allowed. The design principle is to build one data pipeline that will extract data from the source system and to never read the same data twice from the source. This data will be moved to a cloud platform where it can be consumed multiple times. This is much more efficient and secure at the end of the day.

To minimise the impact on the source system and bandwidth of the site, we will restore the weekly database backups to another non-critical SQL Server on-prem. From there, we will use the SQL Server Command Line Utility, sqlcmd, to extract CSV data from the two tables directly. The CSV files will then be converted into parquet files for uploading to Azure. The blog post with more detail is available here:

Repo structure and file descriptions

  • main.py: Simple Python script to launch sqlcmd, extract CSV files and convert them to parquet format.

The dependencies and virtual environment details are located in the Pipfile which can be used with pipenv.

Remember to install SQL Server Command Line Utilities.

License

GNU GPL v3

Author

Coenraad Pretorius

About

Extracting data from SQL Server using Python and sqlcmd

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages