Making and designing a translator with Jsoup

in #utopian-io6 years ago (edited)

What Will I Learn?

Greetings, this tutorial will cover the usage of Jsoup library in java. By using this library you can pull information on websites and use them then present them in a user friendly output. Jsoup becomes functional when you need to compare the data on multiple pages, get the rapidly changing/refreshing information, analyze the values on a web page and much more. In order to use this library you must first download it’s .jar file and locate it same place with your java class. Later in your java method you will be able to call the library.

Requirements

  • IDE is required to test the code (preferably Eclipse IDE for java developers)
  • Basic knowledge on Java.
  • Basic knowledge about Jsoup library.

Difficulty

This tutorial is prepared for indivuduals who have a prior knowledge about Java classes, libraries and programming languages,

  • Intermediate

Tutorial Contents

In this tutorial we will pull our data's from Tureng translation site and process it according to our needs. There are quite a lot of methods and ways to index a webpage in java but the fastest and accurate one is to use api of the desired page if its possible. Firstly we should go to the page that we want to get datas. Then we should find the div class that we want to pull and after processing the data we will be able to get the below outputs,


1.png

We shall begin by importing the libraries that we want to use in this project.

The first librarty that we need to locate is the java.io.IOException which is capable of showing/displaying detailed errors when user enters an unexpected input. Briefly it is used to optimize input/output (i/o) relationship,

import java.io.IOException;

We can then procceed on adding our Jsoup library which is capable of generating,tracking tracking the html codes of the desired sites

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

Now we should add one last library that will help us to get the user entered values,

import java.util.Scanner

Then we can declare our class

public class

The name of the class can be picked by you. Its just need to be same with name located on workbench.And we need to define our method by saying public static void, we mean that the code is visible,no return value and a class type.

public static void main(String[] args) throws IOException

Now we should procceed on getting a desired inpuıt from the user. For this tutorial we will limit the extend of our dictinoray with turkish and enligsh words but in the following tutorial we will extend and proceed and adding multiple languages. In your case you can get / perform the same task for other languages.Before starting another minor issue is to display all leters. Beside the latin ones there are six unique turkish letters. In order to display them properly we have to change the output text type to UTC-8. In eclipse you can right click your project then properties and change the text type. You will then be able to see the letters properly.


Now we can proceed on getting a proper input. In order to classify and search the world that user enters we must first determine whether it is turkish or english. And then we can go ask the word that the user wants to type. To first ask the language and then word below codes was written,

    Scanner keyboard2 =new Scanner (System.in);
    System.out.println("EN / TR ? ");
    String kang2 =keyboard2.next();

Now depending on the output three different if cases are written. One for EN one for TR and one for the rest.

if (kang2.equals("EN"))
else if(kang2.equals("TR"))
else

In your design you may try this for any language and get the data from any site. But since it's a well known and has highly developed data the translated word information will be obtained from Tureng To get the data first the website that has the data defined,

String link = "http://tureng.com/tr/turkce-ingilizce/" + string;

With this code the code will redirict to the direct page of the word that user enters. Then we can define the method to get the data. We will use connect method instead of parse since we want to get entire html code of the site in a text format, Below code goes to the site get the data entered in desired div tag (here in td.tr.ts) and then convert it into text format by providing a space after each predictioın.

   Document doc = Jsoup.connect(link).get();
    Elements initialtable = doc.select("td.tr.ts");
    initialtable.remove(0);
    String dr = "";
    System.out.println("");
    int i = 0;
     for (Element d : initialtable) {
        dr = d.text();
        i++;
        System.out.println(i + ". " +dr);  
        }
      }

Now for English entered word this code will go to tureng search that word and get the turkish representations of that word in a list format and then number them starting from 1 (the initial value of i) Moreover same procedure was repeated for the turkish word and this time td.em.ts tag was used to get the english representation of turkish entered word,

 else if(kang2.equals("TR"))
     {
    Scanner keyboard =new Scanner (System.in);
    System.out.println("Hangi türkçe kelimeyi çevirmek istiyorsunuz?"); 
    String string=keyboard.next();
    String link  = "http://tureng.com/tr/turkce-ingilizce/" + string;
    Document doc = Jsoup.connect(link).get();
    Elements initialtable = doc.select("td.en.tm");
    initialtable.remove(0);
    String dr = "";
    System.out.println("");
    int i = 0;
    for (Element d : initialtable) {
        dr = d.text();
        i++;
        System.out.println(i + ". " +dr);  
         }      
       }


And lastly another prompt message added to for the cases when uesr enters neither EN nor TR letter. Which will display a "Not supported language" message.

   else 
      {
       System.out.println("Not supported language");
      }

Finally our code is ready to test below is the sample outcomes and the full code. Note that in your design you can do it for any language or languages moreover youı can get data from almost any website with Jsoup and arrange it into user friendly output. In our next tutorail we will add multiple languages, improve the output format and slowly begin to implement the project on Applet.

The overall code,

import java.io.IOException;
import java.net.URL;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.parser.Parser;
import org.jsoup.select.Elements;
import java.util.Scanner;


public class curric {
    public static void main(String[] args) throws IOException  {
    Scanner keyboard2 =new Scanner (System.in);
    System.out.println("EN / TR ? ");
    String kang2 =keyboard2.next();
     if (kang2.equals("EN")) 
     {
    Scanner keyboard =new Scanner (System.in);
    System.out.println("Which word you want to trasnlate?");
    String string=keyboard.next();
    String link  = "http://tureng.com/tr/turkce-ingilizce/" + string;
    Document doc = Jsoup.connect(link).get();
    Elements initialtable = doc.select("td.tr.ts");
    initialtable.remove(0);
    String dr = "";
    System.out.println("");
    int i = 0;
     for (Element d : initialtable) {
        dr = d.text();
        i++;
        System.out.println(i + ". " +dr);  
        }
      }
   else if(kang2.equals("TR"))
     {
    Scanner keyboard =new Scanner (System.in);
    System.out.println("Hangi türkçe kelimeyi çevirmek istiyorsunuz?"); 
    String string=keyboard.next();
    String link  = "http://tureng.com/tr/turkce-ingilizce/" + string;
    Document doc = Jsoup.connect(link).get();
    Elements initialtable = doc.select("td.en.tm");
    initialtable.remove(0);
    String dr = "";
    System.out.println("");
    int i = 0;
    for (Element d : initialtable) {
        dr = d.text();
        i++;
        System.out.println(i + ". " +dr);  
         }      
       }
   else 
      {
       System.out.println("Not supported language");
      }

  } 
}

Output for sample English word ('naive')

1.png

Output for sample Turkish entered word ('ev')

1.png

Output when another language demanded,

1.png

Curriculum



Posted on Utopian.io - Rewarding Open Source Contributors

Sort:  

Thank you for the contribution. It has been approved.

You can contact us on Discord.
[utopian-moderator]

Hey @wodsuz I am @utopian-io. I have just upvoted you!

Achievements

  • You have less than 500 followers. Just gave you a gift to help you succeed!
  • Seems like you contribute quite often. AMAZING!

Suggestions

  • Contribute more often to get higher and higher rewards. I wish to see you often!
  • Work on your followers to increase the votes/rewards. I follow what humans do and my vote is mainly based on that. Good luck!

Get Noticed!

  • Did you know project owners can manually vote with their own voting power or by voting power delegated to their projects? Ask the project owner to review your contributions!

Community-Driven Witness!

I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!

mooncryption-utopian-witness-gif

Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x