Difference between revisions of "User:Maybeitworksnow/proposal"

From Apertium
Jump to navigation Jump to search
Line 71: Line 71:
   
 
As I've already noticed, the Apertium team is ready to help in any minute and to answer any question. It's pleasure to work with such people.
 
As I've already noticed, the Apertium team is ready to help in any minute and to answer any question. It's pleasure to work with such people.
  +
  +
  +
  +
== Bel-ukr language pair ==
  +
  +
  +
I would like to propose the Apertium my work on the project to create a new language pair for Belarusian - Ukrainian languages.
  +
  +
Why Google and Apertium should sponsor it and how and who it will benefit in society.
  +
  +
I plan to build up the open-free ‘language resource’, which could be the first good one in the field for Bel-Ukr languages (I would write the best one: ‘Easy to be the best in the field, when you’re the only’.).
  +
  +
Despite of the fact that many people in Belarus and Ukraine are bilingual (speaking both - Russian and Belarusian / Ukrainian languages) and that Russian is more popular in some regions of these neighbour countries, Belarusian and Ukrainian are widely spoken, have the official status, and as I’ve noticed, not all the people in these countries can Russian. If we are talking about young people, they prefer Bel or Ukr instead of Russian. However, all the existed dictionaries, MT platforms for these idioms have good developed applications just for Russian, in the best case for English. Then what about people, who do not speak any Russian or English, and want to understand their neighbors? Belarusian and Ukrainian are official languages of the nations, but they don’t have any good languages resources for not-Russian and not-English speaking people to communicate. (Can you imagine that for Spanish - Portuguese, for example?)
  +
  +
For instance, Apertium has projects for rus-ukr, rus-bel language pairs. You can easily google Russian - Belarusian dictionary, a very good one for Rus-Ukrainian (e.g. AbbyyLingvo, though it’s still being developed), but no Bel-Ukr or Ukr-Bel. When I needed to find such a dictionary, I’ve found one, but it was almost empty, I couldn’t search any basic word. That’s why I tend to write that it doesn’t exist. However, the idea of creation some language resource like that is here, so it just proves my argument that people need it and they will use it.

Revision as of 13:52, 3 April 2017

Personal details

Name: Anastasia Buianova (Анастасия Буянова)

E-mail address: anastasia.d.buianova@gmail.com

Other information that may be useful to contact you: cell-number +7 926 47 27 444

Location: Moscow (UTC+03:00)


Language knowledge / working languages: Russian (native), English (C1), German (B2-C1).

Programming: python.

Education: skills and experience

I’m currently in my fourth year studying Computational (Applied) and Fundamental Linguistics at Higher School of Economics (HSE) in Moscow, Russia. [Full time, BA: September 2013 - June 2017]

Fundamental and Computational Linguistics at Karls Eberhard University of Tuebingen, Germany. [Exchange: spring semester, March - July 2016]


(Some-of-)Taken and passed university courses:

Computational linguistics: natural language processing, machine learning, python 2, python 3.

Mathematics: Logic, Discrete Mathematics (Combinatorics), Linear Algebra and Mathematical Analysis, Probability Theory and Mathematical Statistics.

Fundamental linguistics: morphology, phonetics, syntax, semantics, typology (Tuebingen and HSE).

Applied linguistics: phylogenetics (Tuebingen), sociolinguistics, psycholinguistics, neurolinguistics.

Linguistic interests

Lexical typology, morphology, syntax;

Slavic languages (morphology and syntax in typological perspective, also lexical typology);

old languages and scripts;

German and Dialects (z.B. Schwaebisch, Badisch, Duetsch; phonetics, morphology, syntax);

Russian (morphology, syntax; non-normative lexicon as a lexical-semantic phenomenon).

Research (past and current)

Term papers - past

Buianova, A. (2015) "Classification of Sardinian Dialects Based on the Swadesh List".

Buianova, A. (2016) “Discrepancies between the Scientific and Naive Taxonomy in the Names of Plants / Animals (Based on Slavic Languages)”.


Bachelor thesis - current

“Constructions with Body Parts in Typological Perspective” for Russian, Czech, German, and English, with the idea to build up an electronic dictionary for linguistic and medical usage.


Why is it you are interested in machine translation?

Unfortunately, I’ve never worked on machine translation for all my university years. That’s why I’m strongly interested in meeting it and working with it. I would like to learn closer the MT mechanism, to develop the skills I’ve already had, and to use in practice the knowledge I’ve been keeping for many years at university classes and through my translating sessions. I guess, as a computational linguist I should have such an experience, which might help me to find and open new horizons of my future career and research. This internship could give me the practical skills I’m looking for, make me more professional in the area of my studies and future job.

Why is it that you are interested in Apertium?

Apertium has projects of MT for small, dialectical, and local languages that is pretty seldom (I would write that Apertium is the only). From my experience of Ket and Sardinian I believe that native speakers really need it and appreciate it. In some case, Apertium helps to popularize ‘little’ languages among people, who prefered to switch into the formal literary language or into a ‘big dad’ (like bel-rus), and maybe because of that to spark the interest to their culture (it's very simple to identify from a sociolinguistic researches and experiment, e.g. where native speakers of local languages of Russia were shy to use their language because 'nobody could understand them' and 'that's why it was better to learn Russian'. It also can case the interest in these languages among people, who might not know about their existence) . Also Apertium provides methods of the rule-based MT that I would like to learn closer, as I wrote above.

As I've already noticed, the Apertium team is ready to help in any minute and to answer any question. It's pleasure to work with such people.


Bel-ukr language pair

I would like to propose the Apertium my work on the project to create a new language pair for Belarusian - Ukrainian languages.

Why Google and Apertium should sponsor it and how and who it will benefit in society.

I plan to build up the open-free ‘language resource’, which could be the first good one in the field for Bel-Ukr languages (I would write the best one: ‘Easy to be the best in the field, when you’re the only’.).

Despite of the fact that many people in Belarus and Ukraine are bilingual (speaking both - Russian and Belarusian / Ukrainian languages) and that Russian is more popular in some regions of these neighbour countries, Belarusian and Ukrainian are widely spoken, have the official status, and as I’ve noticed, not all the people in these countries can Russian. If we are talking about young people, they prefer Bel or Ukr instead of Russian. However, all the existed dictionaries, MT platforms for these idioms have good developed applications just for Russian, in the best case for English. Then what about people, who do not speak any Russian or English, and want to understand their neighbors? Belarusian and Ukrainian are official languages of the nations, but they don’t have any good languages resources for not-Russian and not-English speaking people to communicate. (Can you imagine that for Spanish - Portuguese, for example?)

For instance, Apertium has projects for rus-ukr, rus-bel language pairs. You can easily google Russian - Belarusian dictionary, a very good one for Rus-Ukrainian (e.g. AbbyyLingvo, though it’s still being developed), but no Bel-Ukr or Ukr-Bel. When I needed to find such a dictionary, I’ve found one, but it was almost empty, I couldn’t search any basic word. That’s why I tend to write that it doesn’t exist. However, the idea of creation some language resource like that is here, so it just proves my argument that people need it and they will use it.