Web scraping word meaning with Python and Beautifulsoup

Argus Waikhom
4 min readMar 3, 2021

What you’ll be learning from here?

  • A simple explanation of web scraping
  • Writing python code to extract word meaning from a dictionary website using Beautifulsoup

Web scraping:

Web scraping is downloading a web page and extracting data from it. Very common uses are to collect data from a certain website, to automate some processes using a bot or web crawler.

It has many names Web scraping, web harvesting, web data extraction, or whatever you call it, they all do the same thing.

How does it work: Web pages are built using text-based mark-up languages. They have tags, classes, which are required to make up the whole page. To extract a specific part of the data those tags, classes will be targeted. Will show you later in this article.

What I’m explaining here is the very basic of it. There are a lot more you can learn about web scraping.

NOTE: Web scraping is illegal for some websites (make sure you check that out before you extract data from a site) but here, we’re doing it for educational purposes. So, this should be fine. I guess😐.

Beautifulsoup:

To develop our dictionary project, we’re going to use Beautifulsoup. It is a python package to parse HTML and XML documents. I’m not gonna explain in detail about Beautifulsoup. We’re using it here because it is very simple and easy to learn. There are other more powerful tools you can use such as Scrapy, Selenium, etc.

Let the coding begin:

We are going to write a simple code to extract the meaning of a word from a dictionary website (oxfordlearnersdictionaries).

When we search for the meaning of the word “hello”, this is what we got from the website.

Let’s extract some data from this page, such as word meaning, example sentences, etc.

If you right-click at the part of the result you want to extract and select Inspect, you be directed to the part of the code which is displaying that information.

  • First, you need to install requests and bs4 packages, if you haven’t already installed
  • Create a python (.py) file and add these imports
  • Let’s make an HTTP call to search for the word “hello”
  • We’ll create a Beautifulsoup object from the response-text
  • We are going to create separate methods to get the origin and definition of the search word
  • If we got any problem extracting attributes, we will consider it as Word not found”. (This is not a good error handling approach. You can read the word not found text from the HTTP response to handle this error if the search word is not able to find. I’ll leave this to you)
  • If you Inspect the part to the web page where the word origin is displaying, you would notice this “span” tag with “wordorigin” as a value of attribute “unbox”
  • We can use this information to extract the origin of the word
  • These below simple 2 lines of code will get the origin part of the response and extract the text
  • Let’s extract the definitions and the example sentences
  • If you inspect the definition part, it is in a “ul” tag with the class “senses_multiple”. Each “li” tag inside this list has a definition and its example sentences
  • In line no. 4 we’re using the “find_all()” method to extract all the “li” HTML tags with class “sense”. It returns a list of items. Each item has a definition and example sentences.
  • Make a loop to go through each definition
  • In inspect result, we can see the definition sentence is in a “span” tag with class “def”. So, that’s what we’re extracting in line no. 6
  • Similar to what we did for definition, we can see all the example sentences are in a “ul” tag with class “examples”. So, let’s get the text for each “li” inside the “ul” tag
  • Now we have the different methods to get the word origin, definitions, and examples, we can do some testing
  • Below we have the whole code, you can put any word to the “word_to_search” variable to get the definition
  • Run the code in a terminal

Web scraping can do a lot more than what we’re doing here. So, if you’re interested, explore more and build a great product.

Happy coding.

Thank you for reading. Hope you like it. (●’◡’●)

--

--