PyData London

PyData London

Data Infrastructure and Analytics

PyData London provides a forum for users + developers of data analysis tools to share ideas!

About us

We are London’s community for developers and users of Python data tools. Run by volunteers, PyData London provides a forum for users and developers of data analysis tools to share ideas and learn from each other. We run monthly events on the first Tuesday of each month, in addition to annual conferences. We promote discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualisation. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. Sign up to our next event on Meetup and follow us on Twitter for the latest news! PyData London is the London chapter of the international PyData community. PyData is organised by NumFOCUS.org, a non-profit organisation which supports and promotes world-class, innovative, open source scientific software. The PyData Code of Conduct ( http://pydata.org/code-of-conduct.html ) governs our events. Twitter: @PyDataLondon

Website
https://london.pydata.org/
Industry
Data Infrastructure and Analytics
Company size
2-10 employees
Headquarters
London
Type
Nonprofit
Founded
2013
Specialties
Python, Data Science, Artificial Intelligence, Data Engineering, Analytics, Machine Learning, Data Visualization, Natural Language Processing, Deep Learning, Business Intelligence, Software Development, Programming, R, Julia, NumPy, Tensorflow, p, Statistics, Modelling, Algorithms, Workshops, Talks, Time Series, Data Mining, Automation, and Big Data

Locations

Updates

  • PyData London reposted this

    View profile for Atharva L., graphic

    Generalist - Data Scientist | Content Writer | Community Manager | Event Organiser. Learning to sell with a bias to action.

    Meetup #55: PyData London 88th Meetup Tuesday, 3rd Sep 2024 (#proofofnetwork) Dan Keeley delivered a talk on "The Power of Connection," highlighting how, in a world where data is often fragmented, solutions that bridge these gaps can unlock immense potential for collaboration and innovation. At his company, Rebura LTD., Dan developed an entity resolution system designed to unify and clean data. This system involved creating a standardized data model, recording and scoring all possible links between data points using techniques like fuzzy matching, and visualizing these links. Tools like libpostal helped standardize addresses, breaking them down into consistent elements like house numbers and postcodes. The system also used advanced techniques such as recursion loops to reveal hidden connections and improve data quality. While the process is complex, involving Python scripts and machine learning models, the outcome is a clean, complete dataset that can drive meaningful insights and connections. Dan emphasized the importance of data quality and pointed to open-source solutions like zingg.ai as valuable resources in the field of identity resolution. Tanay Mehta directly addressed the challenges posed by the increasing size and resource demands of Large Language Models (LLMs). As training and fine-tuning new LLMs becomes costly, model merging offers a promising alternative. This technique combines multiple LLMs into a single model that integrates the strengths of each. Simple methods like linear merging and Spatial Linear Interpolation (SLERP) averages the weights of parent models and blends task vectors from different models to create a balanced outcome. Although these methods can be effective, they may encounter issues like redundant parameters and conflicting signs across models. TIES merging overcomes these challenges by keeping only the most influential parameters and resolving sign conflicts, leading to more efficient and aligned model combinations. These merging techniques are notable for their ability to run on CPUs, reducing the need for expensive GPU resources. The two lightning talks by... ...Nicholas Clarke showcased the One Billion Row Challenge (1BRC), which explores how to efficiently process one billion rows of data. Instead of using Java, Nick leveraged ArcticDB’s powerful C++ engine for data processing and compression, to complete the challenge in just 11 seconds, demonstrating that Python can achieve similar results with fewer lines of code. ..., and Peter Houghton illustrated that while LLMs like GPT-4 can grasp complex details such as country-specific addresses, they need better prompts to generate accurate payment messages, which is improved by defining JSON schema outputs for more consistent results. As the creator of the PYISO20022 module, Pete highlighted how this tool supports the generation and parsing of ISO 20022 messages, which are used in global payment systems like FedNow and SWIFT.

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • PyData London reposted this

    View profile for Atharva L., graphic

    Generalist - Data Scientist | Content Writer | Community Manager | Event Organiser. Learning to sell with a bias to action.

    Meetup #53: PyData London 87th Meetup Tuesday, 6th Aug 2024 (#proofofnetwork) Pearl Prakash delivered an informative talk on building a trip itinerary planner using the Retriever Augmented Generation (RAG). As a travel enthusiast who dislikes the tedious planning and scheduling aspects, Pearl found traditional methods, such as using ChatGPT, to be lacking due to their reliance on older data and the expensive, data-heavy process of fine-tuning. Instead, RAG offered the desired flexibility and plentiful test cases. Her solution leverages the Google Places and Directions APIs and a static database stored in Bigquery. The user-friendly RAG application, hosted on a Streamlit front-end interface and running locally, takes diverse user inputs and provides a comprehensive, day-specific itinerary for activities in London, including an estimate of the trip cost. Pearl walked through the RAG process and shared the validation metrics she used to fine-tune the tool i.e. response time, completness, missing keywords, missing query parts. She concluded with valuable lessons learned from the project, emphasising the importance of limiting project scope initially, expecting multiple iterations, and adopting a hybrid approach of manual checks and validation metrics. Looking ahead, plans include expanding the location range to cover all of the UK, plotting the trip on a map, and adding an option for users to share itineraries on social media. Suyash J. showed the process of building a gesture-controlled Arduino robot using Python. Highlighting robotics as the next big thing in technology, Suyash explained the relevance of robots in simulating real-world use cases such as crane operation and robot dogs. However, he acknowledged the high cost of real robots which led him to use inexpensive, accessible tools with robust community support, like Arduino. The software used (mainly Python libraries) included the gesture recognizer model that uses Mediapipe, a motor controller for Arduino, and InfluxDB for writing values swiftly to a time series database. Suyash demonstrated a live code demo featuring laser-cut wooden robots. He chose hand gestures as they are a natural user interface and easy to detect with machine learning, despite a game controller being more fitting. He also explained the advantages of using a time series database, such as optimisation for managing large amounts of data and open-source accessibility. Ian Ozsvald's lightning talk introduced the Abstraction and Reasoning Challenge, a Kaggle competition aimed at creating an algorithm capable of solving abstract reasoning tasks. The data-hungry nature of conventional machine learning techniques limits their adaptability to real-world unpredictability. He also highlighted the Abstraction and Reasoning Corpus (ARC) as a tool for gauging AI skill-acquisition. Ian suggested a future course of action, where LLMs could read, analyse, and summarise numerous failed model outputs to come up with new strategies.

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image

Similar pages