documentation

Previous Home Next

Singlish to Sinhala Text Conversion

The Singlish to Sinhala text conversion component of සිංLingua offers multiple approaches to translate Singlish text to Sinhala. These approaches are designed to suit different requirements and accuracy levels.

Usual tools “oyata den kohomada” -> ඔයට ඩෙන් කොහොමඩ

සිංLingua library “oyata den kohomada” -> ඔයාට දැන් කොහොමද (oyaata dhaen kohomadha)

Table of Contents

1. Rule-Based Translation

The Rule-Based Translation approach involves converting Singlish text to Sinhala using a predefined set of rules. This method is efficient and provides quick translations based on specific patterns. To use this approach, the library provides a class RuleBasedTransliterator, which can be used as follows:

from sinlingua.singlish.rulebased_transliterator import RuleBasedTransliterator

transliterator = RuleBasedTransliterator()

singlish_text = "YOUR_SINGLISH_TEXT"
sinhala_text = transliterator.transliterator(singlish_text)
print(sinhala_text)

2. Machine Translation using FastText Model

The Machine Translation approach using the FastText model involves leveraging word embeddings to enhance translation accuracy. This method uses FastText’s pre-trained Sinhala word vectors to identify similar words for each word in the translated text. The process is as follows:

  1. Convert Singlish text to Sinhala using the Rule-Based Translation approach.
  2. Utilize the FastText Sinhala word vectors to find similar words for each translated word.
  3. Enhance the translated text by replacing each word with its corresponding similar word from the FastText model.

To use this approach, the library provides a class MachineTransliterator. Here’s how to use the MachineTransliterator with the FastText model:

from sinlingua.singlish.machine_transliterator import MachineTransliterator

# initialize the Machine Transliterator with the FastText model
fasttext_model_path = "path/to/model/cc.si.300.vec"
transliterator = MachineTransliterator(model=fasttext_model_path)

# provide Singlish text as input
singlish_text = "YOUR_SINGLISH_TEXT"

# translate using the Machine Transliterator with FastText
translated_text = transliterator.transliterator(text=singlish_text)
print(translated_text)

FastText model can be downloaded by using the given url: FastText Model

3. Hybrid Translation

The Hybrid Translation approach combines the rule-based and machine translation methods to provide accurate and context-aware translations. This approach offers a balance between efficiency and accuracy. The HybridTransliterator class is also used for this approach:

from sinlingua.singlish.hybrid_transliterator import HybridTransliterator

transliterator = HybridTransliterator(api_key="YOUR_OPENAI_API_KEY", org_key="YOUR_ORGANIZATION_KEY")

singlish_text = "YOUR_SINGLISH_TEXT"
sinhala_text = transliterator.transliterator(text=singlish_text)
print(sinhala_text)

3.1. Machine Mask Translation

This function enhances machine-translated text by masking misspelled Sinhala words. It identifies misspelled Sinhala words and replaces them with a mask "<mask>" to improve text flow.

from sinlingua.singlish.hybrid_transliterator import HybridTransliterator

hybrid = HybridTransliterator(api_key="YOUR_OPENAI_API_KEY", org_key="YOUR_ORGANIZATION_KEY")

rule_text = "YOUR_SINGLISH_TEXT"
masked_text = hybrid.machine_mask(text=rule_text)
print(masked_text)

3.2. Machine Suggest Translation

This function further refines machine-translated text by suggesting alternative words for masked words. It can be used in two different approaches through the library.

3.2.1. Approach 1

from sinlingua.singlish.rulebased_transliterator import RuleBasedTransliterator
from sinlingua.singlish.hybrid_transliterator import HybridTransliterator
from sinlingua.singlish.manual_transliterator import ManualTransliterator

rulebased = RuleBasedTransliterator()
hybrid = HybridTransliterator(api_key="YOUR_OPENAI_API_KEY", org_key="YOUR_ORGANIZATION_KEY")
manual = ManualTransliterator()

# using manual_mask with generated coordinates
rule_text = "YOUR_SINGLISH_TEXT"
rule_based_output = rulebased.transliterator(text=rule_text)
coordinates_df = manual.generate_coordinates(text=rule_based_output)
# coordinates of words to be masked can be found by printing coordinates_df.
# ex: [(row, column), (row, column)]
coordinates_list = [(0, 0), (0, 2)]
masked_text, changes = manual.manual_mask(dataframe=coordinates_df, coordinates=coordinates_list)
suggested_text = hybrid.machine_suggest(text=masked_text, changes=changes)
print(suggested_text)

3.2.2. Approach 2

from sinlingua.singlish.rulebased_transliterator import RuleBasedTransliterator
from sinlingua.singlish.hybrid_transliterator import HybridTransliterator
from sinlingua.singlish.manual_transliterator import ManualTransliterator

rulebased = RuleBasedTransliterator()
hybrid = HybridTransliterator(api_key="YOUR_OPENAI_API_KEY", org_key="YOUR_ORGANIZATION_KEY")
manual = ManualTransliterator()

# using machine_mask with generated coordinates
rule_text = "YOUR_SINGLISH_TEXT"
masked_text = hybrid.machine_mask(text=rule_text)
coordinates_df = manual.generate_coordinates(text=masked_text)
# coordinates of masked words can be found by printing coordinates_df.
# ex: [(row, column), (row, column)]
coordinates_list = [(0, 0), (0, 2)]
masked_text, changes = manual.manual_mask(dataframe=coordinates_df, coordinates=coordinates_list)
suggested_text = hybrid.machine_suggest(text=masked_text, changes=changes)
print(suggested_text)

Optional Parameters

  1. prompt_masking: str For change masking prompt from the default prompt
  2. prompt_suggestion: str For change word suggesting prompt from the default prompt

To view the default prompts:

from sinlingua.singlish.hybrid_transliterator import HybridTransliterator

hybrid = HybridTransliterator()

# level van be 0 or 1 for masking prompt and word suggesting prompt respectively
level = 0
hybrid.view_prompt(level=level)

Sure, here is the section for the Manual Translation component, including the steps you provided:

4. Manual Translation

The Manual Translation approach allows you to manually manipulate and modify translations according to your preferences. This can be useful for refining translations and making context-specific adjustments. The ManualTransliterator class provides various methods to aid in this process.

4.1. Generate Coordinates

The first step in manual translation is generating coordinates for each word in the Sinhala text. Coordinates uniquely identify each word and its position in the text. This can be done using the generate_coordinates method.

from sinlingua.singlish.manual_transliterator import ManualTransliterator

manual = ManualTransliterator()
sinhala_text = "YOUR_SINHALA_TEXT"
df = manual.generate_coordinates(text=sinhala_text, max_columns=20)
print(df)

To visualize the coordinates, you can export them to a CSV file using the to_csv method.

manual.to_csv(dataframe=df, file="dataframe.csv")

Optional parameters

  1. max_columns: int You can specify the maximum number of columns in the coordinate plane.

4.2. Replace Cells

This method allows you to manually replace specific cells in the coordinate plane with desired words. You need to generate coordinates using the generate_coordinates method. Then, create a replacement dictionary where keys are the coordinates to be replaced and values are the replacement words. The replace_cells method returns a new dataframe with the changes applied.

from sinlingua.singlish.manual_transliterator import ManualTransliterator

manual = ManualTransliterator()
sinhala_text = "MISSPELLED_SINHALA_TEXT"
df = manual.generate_coordinates(text=sinhala_text)
print(df)

replacement_dictionary = {
    (0, 1): "WORD_TO_BE_REPLACED", # (0, 1) is a coordinate which represents row 0, column 1
    (0, 7): "WORD_TO_BE_REPLACED" # (0, 7) is a coordinate which represents row 0, column 7
}

changed_df, track_changes = manual.replace_cells(dataframe=df, replacement_dict=replacement_dictionary)
print(changed_df)

You can also use the undo_changes method to revert the changes if needed.

original_df = manual.undo_changes(dataframe=changed_df, changes=track_changes)
print(original_df)

4.3. Manual Masking

Manual masking involves masking specific words in the coordinate plane to be replaced later. Similar to the previous steps, you generate coordinates using the generate_coordinates method. Specify the coordinates you want to mask, and then use the manual_mask method. The resulting reconstructed_text can be passed to the machine_suggest method from the Hybrid Translation approach to find the best-matching words for the masked positions.

from sinlingua.singlish.manual_transliterator import ManualTransliterator

manual = ManualTransliterator()
sinhala_text = "MISSPELLED_SINHALA_TEXT"
df = manual.generate_coordinates(text=sinhala_text)
print(df)

coordinates = [(0, 1), (0, 7)]  # Example coordinates to mask
reconstructed_text, changes = manual.manual_mask(dataframe=df, coordinates=coordinates)
print(reconstructed_text)

# You can use machine_suggest to find matches for masked words

4.4. Reconstruct Text

Finally, the reconstruct_text method allows you to convert the modified dataframe back into a text. This step completes the manual translation process.

from sinlingua.singlish.manual_transliterator import ManualTransliterator

manual = ManualTransliterator()
sinhala_text = "YOUR_SINHALA_TEXT"
df = manual.generate_coordinates(text=sinhala_text)
print(df)

constructed_text = manual.reconstruct_text(dataframe=df)
print(constructed_text)

These manual translation methods provide flexibility and control over the translation process, enabling you to refine translations based on specific requirements.


For usage examples and detailed documentation, visit the සිංLingua GitHub repository.

Please remember to replace "YOUR_SINHALA_TEXT" and "MISSPELLED_SINHALA_TEXT" with the actual Sinhala text you are working with. Also, replace "YOUR_USERNAME/REPO" with your actual GitHub repository information.

Getting Started

To use the Singlish to Sinhala text conversion component of සිංLingua, follow these steps:

  1. Install the සිංLingua library:
    pip install sinlingua
    
  2. Import the required classes for the chosen translation approach:
    from sinlingua.singlish.rulebased_transliterator import RuleBasedTransliterator
    from sinlingua.singlish.machine_transliterator import MachineTransliterator
    from sinlingua.singlish.hybrid_transliterator import HybridTransliterator
    from sinlingua.singlish.manual_transliterator import ManualTransliterator
    
  3. Initialize the transliterator class based on your chosen approach:
    rulebased = RuleBasedTransliterator()  # For Rule-Based Translation 
    machine = MachineTransliterator()  # For Machine Translation   
    hybrid = HybridTransliterator(api_key="YOUR_OPENAI_API_KEY", org_key="YOUR_ORGANIZATION_KEY")  # For Hybrid Translation
    manual = ManualTransliterator()  # For Manual Translation
    
  4. Use the functions of the above mentioned classes accordingly

Note

Previous Home Next