dltHub

dltHub

Softwareentwicklung

Supporting a new generation of Python users when they create and use data in their organizations

Info

Since 2017, the number of Python users has been increasing by millions annually. The vast majority of these people leverage Python as a tool to solve problems at work. Our mission is to make them autonomous when they create and use data in their organizations. For this end, we are building an open source Python library called data load tool (dlt). Our users use dlt in their Python scripts to turn messy, unstructured data into regularly updated datasets. It empowers them to create highly scalable, easy to maintain, straightforward to deploy data pipelines without having to wait for help from a data engineer. We are dedicated to keeping dlt an open source project surrounded by a vibrant, engaged community. To make this sustainable, dltHub stewards dlt while also offering additional software and services that generate revenue (similar to what GitHub does with Git). dltHub is based in Berlin and New York City. It was founded by data and machine learning veterans. We are backed by Dig Ventures and many technical founders from companies such as Hugging Face, Instana, Matillion, Miro, and Rasa.

Website
https://dlthub.com/
Branche
Softwareentwicklung
Größe
11–50 Beschäftigte
Hauptsitz
Berlin
Art
Privatunternehmen
Gegründet
2022

Orte

Beschäftigte von dltHub

Updates

  • Unternehmensseite von dltHub anzeigen, Grafik

    6.420 Follower:innen

    Learning Fridays: Let's look at development environments for data people The challenge of a local dev env for data devs is an unsolved problem. In software engineering 10y back, devs used to spend hours or days trying to get their local runtime as similar to a production one as possible before deploying and hoping nothing blows up. Nowadays techs like Docker largely solve that problem. For data people, the situation is different, the problem a bit different and more complex. Let's see what the industry currently does in this regard 1. Over a year ago, we were playing with the offline-online runtime switch in this DuckDB - MotherDuck demo https://lnkd.in/ehPfX5MK. A local runtime enables local development. Deploying or using the same runtime in the cloud would enable local-production parity. 2. YAML engineers! See this post from Christophe Blefari https://lnkd.in/e3UESY_N 3. One api for portability across backends. Separate your code's business logic from vendor lock flavors with Ibis: https://ibis-project.org/

    • Kein Alt-Text für dieses Bild vorhanden
  • dltHub hat dies direkt geteilt

    Unternehmensseite von LanceDB anzeigen, Grafik

    5.492 Follower:innen

    𝗦𝗲𝗮𝗿𝗰𝗵 𝗥𝗲𝗱𝗱𝗶𝘁 𝗽𝗼𝘀𝘁𝘀 𝘂𝘀𝗶𝗻𝗴 𝗱𝗹𝘁 𝗮𝗻𝗱 𝗟𝗮𝗻𝗰𝗲𝗗𝗕 This example performs following tasks on Reddit, Inc. posts using dltHub framework ✅ Generating TL;DR summaries of reddit posts using LLMs ✅ Storing posts in a structured and readable format in Notion using dltHub ✅ Search through the stored summaries using LanceDB 🔨 Colab walkthrough - https://lnkd.in/gkv2FnZP 🌟 Checkout other Vector Search examples - https://lnkd.in/gjkfMrKa #dlt #summarization #reddit #notion

    • Kein Alt-Text für dieses Bild vorhanden
  • dltHub hat dies direkt geteilt

    Profil von Adrian Brudaru anzeigen, Grafik

    Open source pipelines - dlthub.com

    Open Tables and Open Compute. Every day we see more adoption. dlt recently added Delta Tables support to enable PostHogs Delta + Clickhouse stack. You can leverage it too! Read more about it in the case study here: https://lnkd.in/eAYm5QRZ Do you want to be part of this wave of innovation? If you are building such open compute lakes, and want to do it with dlt, let me know! we are looking for design partners to build with in this direction.

    PostHog leverages dlt to offer their users an inexpensive and scalable data warehouse

    PostHog leverages dlt to offer their users an inexpensive and scalable data warehouse

    dlthub.com

  • Unternehmensseite von dltHub anzeigen, Grafik

    6.420 Follower:innen

    In a recent Reddit thread, data engineers discussed how they decide which technologies to keep up with. Here are some key takeaways from the responses: 🧶 Focus on fundamentals: Many suggest sticking to core skills like SQL, Python, and data modeling, which are relevant across tools. 🧶 Solve real problems: Engineers prioritize tools that directly solve current issues in their work or company, rather than chasing every new tool. 🧶 Keep an eye on job trends: Monitoring job postings helps to identify which tools (e.g., Databricks, Snowflake, Airflow) are in demand. 🧶 Avoid over-diversification: Some feel overwhelmed by trying too many tools and prefer to specialize in a few key ones. 🧶 Balance between new tools and established tech: There’s a balance between learning new, hyped technologies and relying on stable, proven solutions. 🧶 Interest matters: Learning tools that excite you personally can make the process more enjoyable and sustainable. How do you keep up with trends in our space? https://lnkd.in/ez43y_AM

  • dltHub hat dies direkt geteilt

    Profil von Adrian Brudaru anzeigen, Grafik

    Open source pipelines - dlthub.com

    Open Tables to Open Compute. Databricks dropped $2B on Tabular (Apache Iceberg). Big move, big market. Implications? 1️⃣ Open standards are winning. Iceberg, Delta Lake break vendor lock-in. 2️⃣ Compute flexibility is next. Mix and match engines based on workload needs. 3️⃣ Vendor dynamics shifting. Databricks aims to lead open ecosystem while selling compute. Other vendors will react. 4️⃣ Vendor innovation incoming. Expect new compute options within 12 months. 5️⃣ What should Data engineers do? Stay sharp. Evaluate emerging tools against project requirements. Share findings louder than the noise. Key questions: - [How] are you leveraging open formats now? - With what compute? - What pain points in current compute options need solving? #DataEng #OpenCompute #LakeHouse #Iceberg #DeltaLake

  • Unternehmensseite von dltHub anzeigen, Grafik

    6.420 Follower:innen

    The Rise of Open Compute: Shaping the Future of Data Engineering Open compute initiatives are driving innovation and competition. The battle over compute is intensifying, and the landscape is rapidly shifting. No longer dominated by a few heavyweight vendors, we’re seeing a surge of open compute technologies and a diverse range of players stepping into the space. Leading open compute technologies today include Kubernetes for container orchestration, Apache Spark and Flink for big data processing, Trino for distributed SQL querying, Dask and Ray for parallel computing, Apache Hadoop for scalable storage and processing, ClickHouse for real-time analytics, TensorFlow for machine learning, and Apache Arrow for high-performance in-memory data processing. This democratization of compute is empowering data engineers to choose the tools and platforms that best fit their specific needs without being locked into proprietary ecosystems. At dlthub, we recognize the significance of this movement. By embracing open standards, we can integrate the best technologies available, ensuring our workflows remain efficient and scalable. However, with multiple technologies and vendors entering the arena, the landscape remains unpredictable. Data engineers must stay vigilant, continuously evaluating new tools and approaches to stay ahead. As open compute continues to gain momentum, the next year promises to bring even more advancements and new players into the field. While the future is uncertain, one thing is clear: embracing open compute is essential for building future proof, resilient data infrastructures. #OpenCompute #DataEngineering #TechInnovation #DataInfrastructure #FutureOfData #OpenStandards

  • dltHub hat dies direkt geteilt

    Profil von Deepyaman Datta anzeigen, Grafik

    Senior Staff Software Engineer at Voltron Data

    The past two years—at Claypot AI (@Voltron Data) and later Voltron Data—have been like a dream. I've had the opportunity to work on a diverse array of technical challenges, from building a streaming feature platform for data scientists and machine learning engineers from scratch to contributing to and growing the Ibis ecosystem, including the first streaming backend, extensions for ML preprocessing, and integrations with Kedro and Pandera. I've also enjoyed the opportunity to speak about my work at Data Council, PyData London, and several other conferences and venues. Most of all, I've appreciated the opportunity to work with brilliant minds in the community—shout out especially to the Ibis team, former Claypot team, and numerous others I've interacted with through conferences and open-source collaboration. Sadly, you have to wake up from a dream. I was one of over 50 employees laid off by Voltron Data yesterday, and I'm #OpenToWork. I have a breadth of experience contributing to and maintaining open-source projects in the Python data ecosystem; building tooling for data scientists, machine learning engineers, and data engineers; and developing and productionizing end-to-end ML use cases across a range of industries (largely in a consulting setting during my time at QuantumBlack, AI by McKinsey). If you think I'd be a good fit for your team or somebody else's, please don't hesitate to reach out. I'm open to a variety of roles, even though my most recent interest and experience has been focused on ML/data platform and tooling. I'm based in Salt Lake City, and it's challenging for me to relocate for the next couple years; that said, I'm open to remote roles as well as those requiring travel.

    Dieser Inhalt ist hier nicht verfügbar.

    Mit der LinkedIn App können Sie auf diese und weitere Inhalte zugreifen.

  • Unternehmensseite von dltHub anzeigen, Grafik

    6.420 Follower:innen

    Learning Fridays: Diving into Data Ponds and Apache Iceberg 🐟❄️ Happy Friday, everyone! Today, I'd like to share three knowledge nuggets that have been on my mind: 1️⃣ What Is a Data Pond? A data pond is essentially a smaller, more focused version of a data lake. It's designed for specific teams or projects within an organization, offering quick and agile access to relevant datasets without the complexity of managing a full-scale data lake. Data ponds enable rapid prototyping and specialized analytics, but they can lead to data silos if not properly integrated with the organization's broader data infrastructure. 💡 Implication: While data ponds boost team agility and innovation, they require careful governance to maintain data consistency and prevent fragmentation across the organization. 👉 Read more: https://lnkd.in/e6hm87j6. 2️⃣ How Does Apache Iceberg Fit into Data Ponds (Mobility of Compute)? Apache Iceberg is an open table format designed for large analytic datasets in distributed environments. In the context of data ponds, Iceberg provides a powerful way to manage data by enabling mobility of compute. This means you can use different processing engines—like Spark, Flink, Trino, or Presto—over the same dataset without compatibility issues. Iceberg's features like schema evolution, partitioning, and ACID transactions make it easier to handle complex data transformations within a data pond. 💡 Implication: Iceberg enhances data ponds by providing flexibility and interoperability, allowing teams to choose the best processing tools for their needs without being locked into a single vendor or technology stack. 👉 Learn more: Why Apache Iceberg Matters for Data Engineers URL: https://lnkd.in/etp-yfVs 3️⃣ The Potential Impact of Databricks Acquiring Tabular and the Role of Snowflake Databricks has acquired Tabular, this move will have significant implications for the data industry, especially regarding the mobility of compute that Iceberg facilitates. 💡 Implication: Such an acquisition could influence the openness and neutrality of Apache Iceberg. It might limit the flexibility that Iceberg offers, potentially steering users toward Databricks' proprietary platform and increasing the risk of vendor lock-in. This development could also impact Snowflake, intensifying the competitive landscape as both companies aim to provide the most comprehensive data solutions. 👉 Thought-provoking read: The Battle Over Open Table Formats URL: https://lnkd.in/ezuwEX5N

    • Kein Alt-Text für dieses Bild vorhanden

Ähnliche Seiten

Finanzierung

dltHub Insgesamt 1 Finanzierungsrunde

Letzte Runde

Pre-Seed

1.500.000,00 $

Investor:innen

Dig Ventures
Weitere Informationen auf Crunchbase