Language Identification

Identifies the language of a piece of text.

The Language Identification API analyzes a piece of text that you provide and returns the language of the text.

You can use Language Identification to determine the correct language settings to use for other IDOL OnDemand APIs, such as Sentiment Analysis or Entity Extraction.

Quick Start

You must provide input text. The following example adds the text as plain text:

GET /1/api/[async|sync]/identifylanguage/v1?text=the+quick+brown+fox+jumps+over+the+lazy+dog

The API returns the language and the encoding, and details of the UTF-8 character ranges that the input text includes.

{
  "language": "english",
  "language_iso639_2b": "ENG",
  "encoding": "UTF8",
  "unicode_scripts": [
    "Basic Latin"
  ]
}

You can also provide the text in a file. In File mode, IDOL OnDemand uses the Text Extraction API to extract the text from the file and then uses the extracted text in the API.

You must provide a minimum of three words for language identification. However, you can improve the accuracy by providing more text. The amount of text that you must provide for accurate language identification depends on both the language and the type of text. For UTF-8 encoded languages that use a unique script, the Language Identification API might be able to identify the language using only a few characters. For other languages, the API might need a few sentences to accurately identify the language, and it might need a large paragraph to distinguish between two similar languages.

The amount of text required also depends on the type of text. For example, it is difficult to identify the language from a list of places, numbers, and names. If your text contains these things, you might need to provide more text to identify the language. For natural language text, such as a news article, the API can usually detect the language from fewer characters.

A full list of supported languages is found in the Response tab.

Synchronous
https://api.idolondemand.com/1/api/sync/identifylanguage/v1
Asynchronous
https://api.idolondemand.com/1/api/async/identifylanguage/v1
Authentication

This API requires an authentication token to be supplied in the following parameter:

ParameterDescription
apikeyThe API key to use to authenticate the API request.
Input Source

This API accepts a single input source that can be supplied using one of the following parameters:

ParameterDescription
textThe text to process. You must provide a minimum of three words.
fileA file containing the document to process. Multi part POST only.
referenceAn IDOL OnDemand reference obtained from either the Expand Container or Store Object API. The corresponding document is passed to the API.
urlA publicly accessible HTTP URL from which the document can be retrieved.
Parameters

In addition to the above input source, this API accepts the following parameters:

NameTypeDescription
additional_metadata
boolean Set to true to get additional metadata information on identified language.
All parameters for this API are optional.

This API returns a JSON response that is described by the model below. This single model is presented both as an easy to read abstract definition and as the formal JSON schema.

Model
This is an abstract definition of the response that describes each of the properties that might be returned.
Language Identification Response {
language (enum<Language>) The identified language of the input text.
language_iso639_2b (enum<Language_iso639_2b>) The ISO639-2B code for the identified language of the input text, "UND" if the language could not be identified.
encoding (enum<Encoding>) The identified encoding of the input text.
unicode_scripts (array[string], optional) The UTF-8 character ranges that your input text includes.
}
enum<Language> {
'afrikaans', 'albanian', 'amharic', 'arabic', 'armenian', 'azeri', 'basque', 'belarussian', 'bengali', 'berber', 'breton', 'bulgarian', 'burmese', 'catalan', 'cherokee', 'chinese', 'croatian', 'czech', 'danish', 'dutch', 'english', 'esperanto', 'estonian', 'faroese', 'finnish', 'french', 'gaelic', 'georgian', 'german', 'greek', 'greenlandic', 'gujarati', 'hebrew', 'hindi', 'hungarian', 'icelandic', 'indonesian', 'italian', 'japanese', 'kannada', 'kazakh', 'khmer', 'korean', 'kurdish', 'lao', 'latin', 'latvian', 'lithuanian', 'luxembourgish', 'macedonian', 'malayalam', 'maltese', 'maori', 'mongolian', 'nepali', 'norwegian', 'oriya', 'persian', 'polish', 'portuguese', 'pushto', 'romanian', 'russian', 'serbian', 'sindhi', 'singhalese', 'slovak', 'slovenian', 'somali', 'spanish', 'swahili', 'swedish', 'syriac', 'tagalog', 'tajik', 'tamil', 'telugu', 'thai', 'tibetan', 'turkish', 'ukrainian', 'urdu', 'uyghur', 'uzbek', 'vietnamese', 'welsh', 'unknown'
}
enum<Language_iso639_2b> {
'AFR', 'ALB', 'AMH', 'ARA', 'ARM', 'AZE', 'BAQ', 'BEL', 'BEN', 'BER', 'BRE', 'BUL', 'BUR', 'CAT', 'CHR', 'CHI', 'HRV', 'CZE', 'DAN', 'DUT', 'ENG', 'EPO', 'EST', 'FAO', 'FIN', 'FRE', 'GLE', 'GEO', 'GER', 'GRE', 'KAL', 'GUJ', 'HEB', 'HIN', 'HUN', 'ICE', 'IND', 'ITA', 'JPN', 'KAN', 'KAZ', 'KHM', 'KOR', 'KUR', 'LAO', 'LAT', 'LAV', 'LIT', 'LTZ', 'MAC', 'MAL', 'MLT', 'MAO', 'MON', 'NEP', 'NPI', 'NOR', 'ORI', 'PER', 'POL', 'POR', 'PUS', 'RUM', 'RUS', 'SRP', 'SND', 'SIN', 'SLO', 'SLV', 'SOM', 'SPA', 'SWA', 'SWE', 'SYR', 'TGL', 'TGK', 'TAM', 'TEL', 'THA', 'TIB', 'TUR', 'UKR', 'URD', 'UIG', 'UZB', 'VIE', 'WEL', 'UND'
}
enum<Encoding> {
'ARABIC', 'ARABIC_ISO', 'ASCII', 'CHINESESIMPLIFIED', 'CHINESETRADITIONAL', 'CYRILLIC', 'CYRILLIC_ISO', 'CYRILLIC_KOI8', 'EASTERNEUROPEAN', 'EASTERNEUROPEAN_ISO', 'EUC', 'GREEK', 'GREEK_ISO', 'HEBREW', 'HEBREW_ISO', 'JIS', 'KOREAN', 'NORTHERNEUROPEAN', 'NORTHERNEUROPEAN_ISO', 'SHIFTJIS', 'THAI', 'TURKISH', 'UTF8', 'VIETNAMESE'
}
Model Schema
This is a JSON schema that describes the syntax of the response. See json-schema.org for a complete reference.
{
    "type": "object",
    "properties": {
        "language": {
            "enum": [
                "afrikaans",
                "albanian",
                "amharic",
                "arabic",
                "armenian",
                "azeri",
                "basque",
                "belarussian",
                "bengali",
                "berber",
                "breton",
                "bulgarian",
                "burmese",
                "catalan",
                "cherokee",
                "chinese",
                "croatian",
                "czech",
                "danish",
                "dutch",
                "english",
                "esperanto",
                "estonian",
                "faroese",
                "finnish",
                "french",
                "gaelic",
                "georgian",
                "german",
                "greek",
                "greenlandic",
                "gujarati",
                "hebrew",
                "hindi",
                "hungarian",
                "icelandic",
                "indonesian",
                "italian",
                "japanese",
                "kannada",
                "kazakh",
                "khmer",
                "korean",
                "kurdish",
                "lao",
                "latin",
                "latvian",
                "lithuanian",
                "luxembourgish",
                "macedonian",
                "malayalam",
                "maltese",
                "maori",
                "mongolian",
                "nepali",
                "norwegian",
                "oriya",
                "persian",
                "polish",
                "portuguese",
                "pushto",
                "romanian",
                "russian",
                "serbian",
                "sindhi",
                "singhalese",
                "slovak",
                "slovenian",
                "somali",
                "spanish",
                "swahili",
                "swedish",
                "syriac",
                "tagalog",
                "tajik",
                "tamil",
                "telugu",
                "thai",
                "tibetan",
                "turkish",
                "ukrainian",
                "urdu",
                "uyghur",
                "uzbek",
                "vietnamese",
                "welsh",
                "unknown"
            ]
        },
        "language_iso639_2b": {
            "enum": [
                "AFR",
                "ALB",
                "AMH",
                "ARA",
                "ARM",
                "AZE",
                "BAQ",
                "BEL",
                "BEN",
                "BER",
                "BRE",
                "BUL",
                "BUR",
                "CAT",
                "CHR",
                "CHI",
                "HRV",
                "CZE",
                "DAN",
                "DUT",
                "ENG",
                "EPO",
                "EST",
                "FAO",
                "FIN",
                "FRE",
                "GLE",
                "GEO",
                "GER",
                "GRE",
                "KAL",
                "GUJ",
                "HEB",
                "HIN",
                "HUN",
                "ICE",
                "IND",
                "ITA",
                "JPN",
                "KAN",
                "KAZ",
                "KHM",
                "KOR",
                "KUR",
                "LAO",
                "LAT",
                "LAV",
                "LIT",
                "LTZ",
                "MAC",
                "MAL",
                "MLT",
                "MAO",
                "MON",
                "NEP",
                "NPI",
                "NOR",
                "ORI",
                "PER",
                "POL",
                "POR",
                "PUS",
                "RUM",
                "RUS",
                "SRP",
                "SND",
                "SIN",
                "SLO",
                "SLV",
                "SOM",
                "SPA",
                "SWA",
                "SWE",
                "SYR",
                "TGL",
                "TGK",
                "TAM",
                "TEL",
                "THA",
                "TIB",
                "TUR",
                "UKR",
                "URD",
                "UIG",
                "UZB",
                "VIE",
                "WEL",
                "UND"
            ]
        },
        "encoding": {
            "enum": [
                "ARABIC",
                "ARABIC_ISO",
                "ASCII",
                "CHINESESIMPLIFIED",
                "CHINESETRADITIONAL",
                "CYRILLIC",
                "CYRILLIC_ISO",
                "CYRILLIC_KOI8",
                "EASTERNEUROPEAN",
                "EASTERNEUROPEAN_ISO",
                "EUC",
                "GREEK",
                "GREEK_ISO",
                "HEBREW",
                "HEBREW_ISO",
                "JIS",
                "KOREAN",
                "NORTHERNEUROPEAN",
                "NORTHERNEUROPEAN_ISO",
                "SHIFTJIS",
                "THAI",
                "TURKISH",
                "UTF8",
                "VIETNAMESE"
            ]
        },
        "unicode_scripts": {
            "type": "array",
            "items": {
                "type": "string"
            }
        }
    },
    "required": [
        "language",
        "language_iso639_2b",
        "encoding"
    ]
}
https://api.idolondemand.com/1/api/sync/identifylanguage/v1
/developer/api/api-example/1/api/sync/identifylanguage/v1
Examples
See this API for yourself - select one of our examples below.
Identify English
"New tests on human bones hidden in a Spanish cave for some 400,000 years set a new record for the oldest human DNA sequence ever decoded—and may scramble the scientific picture of our early relatives."
Identify German
"Neue Versuche an menschlichen Knochen in einer spanischen Höhle versteckt für einige 400.000 Jahre einen neuen Rekord für den ältesten menschlichen DNA-Sequenz immer decodiert und kann den wissenschaftlichen Bild unserer frühen Verwandten klettern."
Identify Chinese
"最新的大约40万年前的西班牙洞穴中发现的人类骨头测试,创造了已破译人类最古老DNA序列的新纪录-并且可能改变人类早期亲属的科学图谱。"
Identify Japanese
"約40年間スペインの洞窟の中に隠された人骨の新しいテストにて、これまでに解読された中で最も古い人間のDNA配列に関する新しい記録となりますーそしてこの発見が`これまでの人類早期の科学的な図を混乱させる可能性があります"
Input Source
ParameterValue
text
file
reference
url
Parameters
NameTypeValue
additional_metadata
boolean (Default: False)

ASync – Response An error occurred making the API request
Response Code:
Response Body

	

Making API Request…
Checking result of job

To try this API with your own data and use it in your own applications, you need an API Key. You can create an API Key from your account page - API Keys.

Output Refresh An error occurred making the API request View the raw output View Input
Rendered RawHtml Response
Result Display
Response Code:
Response Body:

		
Make this call with curl
curl


If you would like to provide us with more information then please use the box below:

We will use your submission to help improve our product.