dltHub

dltHub

Softwareentwicklung

Supporting a new generation of Python users when they create and use data in their organizations

Info

Since 2017, the number of Python users has been increasing by millions annually. The vast majority of these people leverage Python as a tool to solve problems at work. Our mission is to make them autonomous when they create and use data in their organizations. For this end, we are building an open source Python library called data load tool (dlt). Our users use dlt in their Python scripts to turn messy, unstructured data into regularly updated datasets. It empowers them to create highly scalable, easy to maintain, straightforward to deploy data pipelines without having to wait for help from a data engineer. We are dedicated to keeping dlt an open source project surrounded by a vibrant, engaged community. To make this sustainable, dltHub stewards dlt while also offering additional software and services that generate revenue (similar to what GitHub does with Git). dltHub is based in Berlin and New York City. It was founded by data and machine learning veterans. We are backed by Dig Ventures and many technical founders from companies such as Hugging Face, Instana, Matillion, Miro, and Rasa.

Website
https://dlthub.com/
Branche
Softwareentwicklung
Größe
11–50 Beschäftigte
Hauptsitz
Berlin
Art
Privatunternehmen
Gegründet
2022

Orte

Beschäftigte von dltHub

Updates

  • Unternehmensseite von dltHub anzeigen, Grafik

    6.142 Follower:innen

    Did you already try the dlt rest api client? Check out this video Docs: https://lnkd.in/emKkDPrR Want to learn more? join our next edition of elt with dlt! https://dlthub.com/events

    Profil von Adrian Brudaru anzeigen, Grafik

    Open source pipelines - dlthub.com

    RestAPI client offered by dlt can guess pagination, automatically handle retries and even custom handle error codes. Check out this short video how to use it. Want more? sign up for our September edition of our ELT with DLT workshop! https://dlthub.com/events

  • dltHub hat dies direkt geteilt

    Profil von Adrian Brudaru anzeigen, Grafik

    Open source pipelines - dlthub.com

    Data Engineers' Pain points: What do you hate most in your job? Data engineers are sharing their frustrations in a recent Reddit thread. The most common gripes revolve around: 1. Data Governance and Compliance: The burden of ensuring data quality and adherence to regulations. 2. Data Quality Monitoring and Troubleshooting: The constant battle against data issues and the time-consuming process of resolving them. 3. Collaboration with Data Scientists and Analysts: Challenges in communicating and aligning expectations with other data professionals. More here: https://lnkd.in/e2xHZdcE How does dlt help with the above? 1. dlt offers a compliance workshop and multiple features to help you get it done. Sign up here: https://dlthub.com/events 2. dlt offers multiple features around data contracts, schema change alerts, data quality checks and test scaffolding. Learn more about how to use these in our ELT with dlt workshop. Sign up here: https://dlthub.com/events 3. Collaboration with the rest of the data team: dlt is built to be usable by ayone on the team. Get them started and let them implement what they need. Communicate via PRs instead of vague human language. Does your team not yet speak dlt? that's fine, dlt has such a shallow learning curve, they can probably just use them. If not, sign up for the workshops here: https://dlthub.com/events

    From the dataengineering community on Reddit

    From the dataengineering community on Reddit

    reddit.com

  • Unternehmensseite von dltHub anzeigen, Grafik

    6.142 Follower:innen

    Welcome to another edition of learning Fridays! Let's learn about compliance techniques. First, if you haven't signed up for our compliance webinars, do so here: https://dlthub.com/events Onwards, let's look at common compliance related implementations 1. How does GDPR affect startup innovation? Fraunhofer institute looked into this in Germany, with striking conclusions: Taking care of compliance upfront enables fast progress. Read more here: https://lnkd.in/ebu7NHWQ 2. What is pseudonymisation and when to do it? Pseudonymization is the process of replacing private identifiers with fake identifiers or pseudonyms. Unlike anonymization, pseudonymization allows data to be de-identified while still permitting some form of re-identification under controlled conditions. This is useful for maintaining data utility for tasks that require some form of identity context without exposing actual identifiers. If you need to do it at scale, consider identifying PII columns automatically and handling them based on common policies. Read how to implement it in dlt here: https://lnkd.in/eq83kNKH 3. Data vault modelling: What is is and why does it fall under compliance? The data vault architecture was designed for easy ingestion and auditing of data, and while it's often sold by agencies without being needed, it is sometimes both needed and useful. Check out this medium blog post about how the modelling would work: https://lnkd.in/dbQPV7dw Do you have challenges around the compliance topic? Tell us about it in the comments!

    Practical Introduction to Data Vault Modeling

    Practical Introduction to Data Vault Modeling

    medium.com

  • Unternehmensseite von dltHub anzeigen, Grafik

    6.142 Follower:innen

    GDPR or HIPAA compliance cause a lot of issues for companies who want to move fast. Research from the Fraunhofer institute suggests that taking proactive action to get compliant accelerates data usage, because having clarity on the process of how to use data containing PII enables us to take action instead of end up in decision-freeze. For this reason, we decided to organize a series of webinars 1. GDPR compliance for data teams 2. HIPPAA compliance for data teams. Each of these will be held by a data protection officer and lawyer who will give us the simple rundown. Of course, there will be a completion badge to show off your knowledge. Interested? Sign up here:

    dltHub events

    dltHub events

    dlthub.com

  • Unternehmensseite von dltHub anzeigen, Grafik

    6.142 Follower:innen

    The last weeks, we held out first run of ELT with DLT, the workshop with 600 students. Across 4 hours, the students learned Part 1: Focus on pythonic implementations - How to build Best Practice Python ELT  - How to leverage dlt to implement these best practices Part 2: Focus on advanced topics - Performance optimisations, scaling - Using the declarative REST API wrapper - Reverse ETL - data contracts - how to deploy dlt in various environments, from serverless functions to an orchestrator. - logging - how to start dbt jobs. Want to take this workshop too? Join our next edition here:

    dltHub events

    dltHub events

    dlthub.com

  • dltHub hat dies direkt geteilt

    Profil von Adrian Brudaru anzeigen, Grafik

    Open source pipelines - dlthub.com

    Why are data engineers moving from Airbyte to dlt? The answer might surprise you. The main reason is not for marginal improvements or specific feature; it's rather because of the large fundamental differences. Think of the data consumer as a market, and the ETL solutions provider as a vendor. Where's the data engineer in all of this? Human middleware filling in any leftover gaps. What's new about dlt is that dlt is an open core devtool made for the data engineer and the data team. This enables them to self serve and take paid vendors out of the ecuation, The data engineer here remains the provider of data, while the vendor (dlthub) can offer things around dlt, such as extra helpers for data platform teams. This fundamentally different paradign gives rise to a completely different product that enables, empowers, and grows a teams' capabilities instead of replacing or limiting them. Here are some of the things the communtiy mentions they love about dlt - Enhanced debugging capabilities: dlt allows greater control over data extraction, providing much-needed flexibility to debug complex API behaviors and unexpected data issues.  - Customization and Extensibility: Unlike UI builders or rigid frameworks,  dlt offers a developer-friendly framework that’s highly customizable.  - Operational Simplicity: One of the standout features of dlt is its operational simplicity. It's just a library, for everything from the development to running and deploying new sources. - Embeddability in your existing workflows: Run it on Airflow, Dagster, AWS lambda or google cloud functions to deal with transactional loads of any scale or anything from small or massive streaming. The move to dlt is more than just a change of tools; it's a strategic upgrade to your data stack's and team's future. Read more on this reddit thread:

    From the dataengineering community on Reddit: Replace Airbyte with dlt

    From the dataengineering community on Reddit: Replace Airbyte with dlt

    reddit.com

  • Unternehmensseite von dltHub anzeigen, Grafik

    6.142 Follower:innen

    🚀Blast from the Past: Data engineering fads Hey folks! Remember when we all heard MongoDB was the end-all for databases? Seems like it's just chilling in the background now while we debate over newer toys. And how about that epic Python vs. R showdown? Spoiler: Python's everywhere, but don't tell that to an R enthusiast unless you want an earful  Here are some flavors from the past that aged like cheese 👀 Colorful Visualizations but poor quality tables Back in the day, vendors were showcasing how they can make dashboards look like a xmas tree. Turns out, simple bars and lines do the trick just fine. Oh and thank god Flash and Ajax are out. 🛠️ Tools: Here today, gone tomorrow? Speaking of fads, remember when everyone tried to put all their data in Hadoop? Now we’re all about Snowflake, dbt, and asking if these will stand the test of time or join the pile of "remember when" tech. 🧐 DataVault 2.0: Yay or nah? Remember when DataVault 2.0 was gonna be the next big thing? Really though? Feels like warehousing with extra steps and complexity. Might work if you're big enough to run separate teams for extraction and modeling, otherwise, it's a hard pass. 🎢 What's next? Every few months, there's a new “game-changer” that's supposed to revolutionize our work. From the frenzy around big data to the push for real-time everything, it's a wild ride in data engineering. What's your bet on the next big fad to fizzle?

Ähnliche Seiten

Finanzierung

dltHub Insgesamt 1 Finanzierungsrunde

Letzte Runde

Pre-Seed

1.500.000,00 $

Investor:innen

Dig Ventures
Weitere Informationen auf Crunchbase