Skip to content
@cisnlp

Deep NLP @ CIS - LMU

Deep Natural Language Processing Group at Center for Language and Information Processing, University of Munich (LMU)

Popular repositories Loading

  1. simalign simalign Public

    Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)

    Python 352 48

  2. GlotLID GlotLID Public

    Language Identification with Support for More Than 2000 Labels -- EMNLP 2023

    Python 108 8

  3. Glot500 Glot500 Public

    Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023

    Python 99 3

  4. semi-markov-crf semi-markov-crf Public

    Code for paper "Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging"

    Python 17 4

  5. GlotCC GlotCC Public

    GlotCC Dataset and Pipline -- NeurIPS 2024

    Jupyter Notebook 17

  6. GlotScript GlotScript Public

    Resource and Tool for Writing System Identification -- LREC 2024

    Python 14 2

Repositories

Showing 10 of 31 repositories
  • cisnlp.github.io Public

    Homepage of cisnlp

    cisnlp/cisnlp.github.io’s past year of commit activity
    SCSS 3 MIT 0 0 0 Updated Dec 15, 2024
  • GlotLID Public

    Language Identification with Support for More Than 2000 Labels -- EMNLP 2023

    cisnlp/GlotLID’s past year of commit activity
    Python 108 Apache-2.0 8 1 0 Updated Nov 28, 2024
  • GlotWeb Public

    GlotWeb: Web Indexing for Low-Resource Languages -- under construction.

    cisnlp/GlotWeb’s past year of commit activity
    Python 11 CC0-1.0 0 4 0 Updated Nov 24, 2024
  • GlotCC Public

    GlotCC Dataset and Pipline -- NeurIPS 2024

    cisnlp/GlotCC’s past year of commit activity
    Jupyter Notebook 17 CC0-1.0 0 0 0 Updated Nov 1, 2024
  • oscar-io Public Forked from oscar-project/oscar-io

    Readers/Writers for GlotCC/OSCAR corpus

    cisnlp/oscar-io’s past year of commit activity
    Rust 1 Apache-2.0 1 0 0 Updated Oct 23, 2024
  • ungoliant Public Forked from oscar-project/ungoliant

    🕷️ The pipeline for the OSCAR/GlotCC corpus

    cisnlp/ungoliant’s past year of commit activity
    Rust 3 Apache-2.0 14 0 0 Updated Oct 23, 2024
  • oscar-tools Public Forked from oscar-project/oscar-tools

    The original tooling for the GlotCC/OSCAR corpus rewritten in Rust

    cisnlp/oscar-tools’s past year of commit activity
    Rust 1 Apache-2.0 3 0 0 Updated Oct 23, 2024
  • 2024fall-crosslingual-vlm-block-seminar Public

    Materials of 2024 Fall cross-lingual visual language models block seminar at LMU Munich.

    cisnlp/2024fall-crosslingual-vlm-block-seminar’s past year of commit activity
    Python 2 0 0 0 Updated Oct 14, 2024
  • MEXA Public

    Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment

    cisnlp/MEXA’s past year of commit activity
    Python 8 Apache-2.0 0 0 0 Updated Oct 10, 2024
  • LangSAMP Public

    LangSAMP: Language-Script Aware Multilingual Pretraining

    cisnlp/LangSAMP’s past year of commit activity
    Python 1 0 0 0 Updated Sep 30, 2024

Top languages

Loading…

Most used topics

Loading…