Skip to main content

Showing 1–5 of 5 results for author: Dumitru, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13657  [pdf, ps, other

    cs.CL

    FuLG: 150B Romanian Corpus for Language Model Pretraining

    Authors: Vlad-Andrei Bădoiu, Mihai-Valentin Dumitru, Alexandru M. Gherghescu, Alexandru Agache, Costin Raiciu

    Abstract: Research in the field of language models is rapidly evolving, with many open models being released to the public. Openly available pretraining corpora usually focus on only a handful of languages, with many others either missing completely or extremely underrepresented. In this report, we introduce FuLG, a hundred-fifty-billion-token Romanian corpus extracted from CommonCrawl. We present our metho… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.12819  [pdf, other

    cs.DC cs.CL cs.LG cs.NI

    A Look Into Training Large Language Models on Next Generation Datacenters

    Authors: Alexandru M. Gherghescu, Vlad-Andrei Bădoiu, Alexandru Agache, Mihai-Valentin Dumitru, Iuliu Vasilescu, Radu Mantu, Costin Raiciu

    Abstract: Is it still worth doing computer networking research? What are relevant problems in this space given the supremacy of hyperscalers in deployed large networks? We take an unconventional approach to finding relevant research directions, by starting from Microsoft's plans to build a $100 billion datacenter for ML. Our goal is to understand what models could be trained in such a datacenter, as well as… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2406.13679  [pdf, other

    cs.NI cs.LG

    Prose-to-P4: Leveraging High Level Languages

    Authors: Mihai-Valentin Dumitru, Vlad-Andrei Bădoiu, Costin Raiciu

    Abstract: Languages such as P4 and NPL have enabled a wide and diverse range of networking applications that take advantage of programmable dataplanes. However, software development in these languages is difficult. To address this issue, high-level languages have been designed to offer programmers powerful abstractions that reduce the time, effort and domain-knowledge required for developing networking appl… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  4. arXiv:2301.02607  [pdf, ps, other

    eess.SP cs.LG q-bio.QM

    A Data-Driven Gaussian Process Filter for Electrocardiogram Denoising

    Authors: Mircea Dumitru, Qiao Li, Erick Andres Perez Alday, Ali Bahrami Rad, Gari D. Clifford, Reza Sameni

    Abstract: Objective: Gaussian Processes (GP)-based filters, which have been effectively used for various applications including electrocardiogram (ECG) filtering can be computationally demanding and the choice of their hyperparameters is typically ad hoc. Methods: We develop a data-driven GP filter to address both issues, using the notion of the ECG phase domain -- a time-warped representation of the ECG be… ▽ More

    Submitted 9 January, 2024; v1 submitted 6 January, 2023; originally announced January 2023.

  5. arXiv:2208.10087  [pdf

    cs.CY

    A Trust Framework for Government Use of Artificial Intelligence and Automated Decision Making

    Authors: Pia Andrews, Tim de Sousa, Bruce Haefele, Matt Beard, Marcus Wigan, Abhinav Palia, Kathy Reid, Saket Narayan, Morgan Dumitru, Alex Morrison, Geoff Mason, Aurelie Jacquet

    Abstract: This paper identifies the current challenges of the mechanisation, digitisation and automation of public sector systems and processes, and proposes a modern and practical framework to ensure and assure ethical and high veracity Artificial Intelligence (AI) or Automated Decision Making (ADM) systems in public institutions. This framework is designed for the specific context of the public sector, in… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

    Comments: Comments were integrated into the paper from all peer reviewers. Am happy to provide a copied history of comments if useful