🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
-
Updated
Apr 7, 2021 - Shell
🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
A command-line toolkit to extract text content and category data from Wikipedia dump files
Corpus creator for Chinese Wikipedia
Reading the data from OPIEC - an Open Information Extraction corpus
Wikipedia-based Explicit Semantic Analysis, as described by Gabrilovich and Markovitch
Downloads and imports Wikipedia page histories to a git repository
Extracting useful metadata from Wikipedia dumps in any language.
An F/OSS solution combining AI with Wikipedia knowledge via a RAG pipeline
Python package for working with MediaWiki XML content dumps
A simple utility to index wikipedia dumps using Lucene.
Network Visualizer for the 'Geschichten aus der Geschichte' Podcast
Collects a multimodal dataset of Wikipedia articles and their images
A library that assists in traversing and downloading from Wikimedia Data Dumps and their mirrors.
A Python toolkit to generate a tokenized dump of Wikipedia for NLP
Node.js module for parsing the content of wikipedia articles into javascript objects
Convert Wikipedia XML dump files to JSON or Text files
Research for master degree, operation projizz-I/O
Scripts to download the Wikipedia dumps (available at https://dumps.wikimedia.org/ )
Convert WIKI dumped XML (Chinese) to human readable documents in markdown and txt.
Contains code to build a search engine by creating an index and perform search over Wikipedia data.
Add a description, image, and links to the wikipedia-dump topic page so that developers can more easily learn about it.
To associate your repository with the wikipedia-dump topic, visit your repo's landing page and select "manage topics."