Skip to content

Wikipedia Article Summarizer a simple Python project based on NLP techniques

Notifications You must be signed in to change notification settings

emreYbs/Wikipedia-Article-Summarizer

Repository files navigation

Wikipedia-Article-Summarizer 🧡

(Text Summarization with NLTK )

A simple Python project based on NLP techniques: You provide a Wikipedia Article and later get the summary.

image

Note: I've exported this Jupyter Notebook as pdf in case you may not Jupyter installed or not use Visual Stuido Code. In the repo, you can check the pdf version for convenience. Or use the python version

STEPS: When you run the Wikipedia Article Summarizer.py, or use the Jupyter Notebook version,

  1. The python code will ask you to provide the URL address of the Wikipedia Article, -in English Articles-
    Which Wikipedia article would you want me to summarize? : (URL)

                     *Provide the Wikipedia URL like this: ( https://    )*
    
  2. The article you have provided will be summarized via Natural Language Processing techniques.

Note: I use my bash scripts and provide some Wikipedia article links and get short summarization of the links I gave. Time-saving for a student or who reads articles a lot in Wikipedia.

EXAMPLE URL: I have given a short article, but if you provide a long article, the code will perform better. Since I wanted to add some screenshots, I kept it short. https://en.wikipedia.org/wiki/Aalto_University_School_of_Science_and_Technology

https://en.wikipedia.org/wiki/F-Secure (You can see the code in action with Short Articles)
https://en.wikipedia.org/wiki/Sufism (This article is longer and the longer the better, well, generally, for the summary:)

Try with many different Wikipedia Articles in English to test the code. For now, I am happy and improving the code and making it more complex is beyond my current skills:), but you are free to fork and improve it.

image

image

image

image

image

Requirements

Install these as requirements if you need. You may also try "pip3 install beautifulsoup4" if "pip" encounters errors.

pip install beautifulsoup4
pip install lxml
pip install nltk (you may also need to install stopwords package)

NOTES:

Normally, in Jupyter Notebooks, you may prefer to give a fixed URL, change the URL when you need it and not ask for user input. But I wanted to see from which articles I can get a better summary and when the NLTK does "so so":) That's why, I ask for user input and give different Wikipedia articles in English language. Also, this way, code is more flexible.

userLink = input("Which Wikipedia article would you want me to summarize: ") #with user input version, a bit more flexible

If you prefer a fixed URL in the code or if you encounter an error in Jupyter, then you can also change the code with a pre-given URL and change accordingly later in Jupyter Notebook.

Example:
raw_data = urllib.request.urlopen('https://en.wikipedia.org/wiki/F-Secure') (with a pre-given Wikipedia URL)
document = raw_data.read()

Since I like F-Secure and wishing to attend their trainings, I search for them and wrote this this simple Wikipedia Article summarizer to practise NLP and Python, meanwhile learning more about F-Secure, its history, culture, etc.

image