Cristian Cardellino

class Coder extends Human with Geek

Spanish Billion Words Corpus and Embeddings

So, a year and a half since my last post. Even if I kind of update my page to be a blog from the root, shame on me.

This blog post however is not something related to what I did in the previous ones. I promise someday I will continue with my Python to Scala tutorials, but for now you’ll have to settle with this.

Since I am a PhD Student in Natural Language Processing and a native speaker of the Spanish language, I like to do my research in this language. The problem is that Spanish, unlike English, doesn’t have that many resources.

In the last year I have been working and researching in the fields of deep learning and word embeddings. The problem with word embeddings, specially with those generated by neural networks methods like word2vec, is that they require great amount of unannotated data.

Most of the works I have seen to create Spanish word embeddings use the Wikipedia, which is a big corpus, but not that big, so I decided to contribute to the world of word embeddings by first releasing a corpus big enough to train some decent word embeddings, and then by releasing some embeddings created on my own.

This is why I am releasing now the Spanish Billion Words Corpus and Embeddings, a resource for the Spanish language that offers a big corpus (of nearly 1.5 billion words) and a set of word vectors (or embeddings) trained from this corpus.

Feel free to use it as it is released under a Creative Commons BY-SA license.

From Python to Scala (VII): Functions (II)

Hello again! Nice to see you decided to come back. If you check my previous post you know that functions are quite an important matter in the Scala language.

Last time, talking about recursion, I wasn’t able to cover all the topics about functions. So I decided to dedicate yet another post to it. You can call it “advanced functions”, but I don’t think is so “advance” what I’m going to show here.

You are welcome to read some more on functions in this new blog post.


Default Values

Following the Python Tutorial, I’ll talk a little about this.

Default argument values in Scala are very similar to Python’s. With the difference being in the static types, that is, you’ll have to explicit declare the type of the argument:

def foo(x: Int, y: Int = 0, z: Int = 1): Int = (x + y) * z

foo(10) // Returns 10

foo(10, 10) // Returns 20

foo(10, 10, 2) // Returns 40

foo(10, z = 2) // Returns 20

foo(10, z = 2, y = 10) // Returns 40

foo(10, 10, y = 10) // Error! The parameter `y` has already been specified

As you can see, there is no problem in how to send the arguments, but if you don’t explicitly tell what parameter you are passing, it will use the order to define the assignments.

From Python to Scala (VI): Functions

Welcome to another post on my series of tutorials. As you can see (if you were following my tutorials since I started them), I change the environment of my blog, using Octopress to facilitate the blog writing (it has very nice features such as the automatic categories and blog archive).

This time we will exploring one of the most powerful things Scala offers as a functional programming language. That is, of course, the functions, the core concept in this paradigm.

This concept is quite important, and I’m sure I won’t be able to explain the full potential of Scala functions as I’m not a master in functional programming paradigm. Yet, I’ll do my best. However, it is important that you take a tutorial or course on Scala’s functional programming (I deeply recommend Martin Odersky’s Functional Programming Principles in Scala).

Functions Basics

Scala functions are declared using the same reserved word that Python uses: def. Like all Scala’s control flow instructions, the scope of the function is defined either by the immediate next instruction or by a block closed between curly braces: { and }.

I won't be able to explain the full potential of Scala functions as I'm not a master in functional programming paradigm. Yet, I'll do my best.

Functions in Scala are actually values assigned to a symbol (just like a val or a var), so naturally they have a type. The type of a function is defined as a list of parameters of some type returning a parameter of some type (can be the same, can be different). In basic terms, this means that every parameter of a function should have an explicit type (the system cannot infer the type on its own and will throw an error if you don’t declare it). But, they can have an implicit returning type that the system can infer:

def add(x: Int, y: Int): Int = x + y // All good!

def pow2(x: Int) = x * x // Correct again. The system infer the returning type as Int

def substract(x, y) = x - y // Wrong. The system doesn't know the type of x and y

From Python to Scala (v): Control Flow Tools

Ok, after a short period of laziness, I come back for more. I warned you about my activity, but, to be fair, it’s been a busy couple of weeks at work.

However, before starting, I wanted you to know that there is an upcoming Course for Functional Programming Principles in Scala in 25 days (starts on September 15th). You can find more information about it (or even enroll in it) at Coursera. The course is in charge of Martin Odersky, the creator of Scala, so you are in good hands.

So, back to business. On this session let’s talk about some more real programming.

Control Flow Tools

The if statement

The most basic and probably the most well known statement in programming, the conditional control flow:

val x: Int = 10

if (x < 0)
  println("x is Negative")
else if (x > 0)
  println("x is Positive")
  println("x is Zero")

// Will return: "x is Positive"

Why Did I Choose Scala?

So, on this entry I’ll put a halt on the series of tutorials I’ve been writing. Instead I think it’s time to give a personal opinion in why did I choose Scala as my new main language.

Before keep going on this, I’ll just state that this is a complete personal opinion on Scala, is completely subjective. The reason why I chose it is mine and doesn’t have to be your reason to choose it, but maybe you’ll find some useful insights on what advantages I think the language has.

So, a couple of friends and co-workers asked me “Why Scala over Python? (or any other language for that matter)”, I guess I’ve never answered with a full justification on why did I do it. Actually, I don’t think I have a real or valid justification more than “because I liked it”, but I do want to state some stuff that end up with me switching from a Python programmer to a Scala programmer.

From Python to Scala (IV): Arrays & ListBuffers

So, now you’ve learnt about Scala lists. As you could see in the previous examples, Scala has a very functional kind of lists, as these are immutable.

If you are ever to use Scala as a functional programming language this is the way to go. I really recommend you to, at least, try to learn this paradigm, as it is design purpose and has many advantages. But, then again, even now I sometimes go back to imperative programming in Scala myself because is more natural to me. Scala as imperative language is pretty similar to Java, so as a side effect I ended up learning how to read Java code (I knew some Java but only the basics, learning Scala my Java understanding improved a lot).

But, lets say that functional programming is way too much to deal with now and you want to know a type more similar to Python lists, the oldie but goodie mutable lists. You have a couple of options of data structures available in Scala, I’ll present two of the most commons.

Scala Arrays

Ok, if my university’s data structure teacher sees me presenting Scala arrays as an option for a “mutable” list he probably would take away my degree and force me to redo the Computer Sciences career all over again.

An array is not a list and will never be one. But, for someone who comes from a Python environment, it’s probably an easy option to replace a immutable list for a mutable version.

Arrays are the simplest and one of the oldest (if not the oldest) data structure you’ll ever face with. In fact, most high-level programming languages lists are internally implemented as arrays. If you’ve ever deal with a real old imperative programming language (I’m looking at you C developer), you are familiar to the concept of array. The thing is that Python doesn’t really have them (at least not internally, you’ll have to import a module for dealing with arrays).

Arrays have some pros and cons in programming, as every data structure. Among the most common pros of an array you’ll find the efficiency they carry in comparison to lists. As arrays are represented as collection of elements (of the same type) stored in a continuous space of memory. They differ from lists in that you’ll have an index for all the elements (which makes the access time of a constant order) and in general are faster to make operations than in lists which can have chunks of elements sparse in many places.

From Python to Scala (III): Lists

Following with the series in this crash course from Python to Scala, today I’ll introduce one of the most useful Scala’s data structures and make the comparison to Python.

Scala Lists

Starting off with one of the most used data structures in Scala (and in functional languages in general) and also the most common data structure in Python as well: the lists.

A list in Scala is a data structure to represent a collection of values of the same type. Lists are very used in Python, and the concept is quite similar in Scala, with a couple of exceptions. First, in Python are written as a list of comma-separated values between square brackets. The empty list, is represented as a pair of empty square brackets:

>>> squares = [1, 4, 9, 16, 25]
>>> squares
[1, 4, 9, 16, 25]
>>> empty = []
>>> empty

In Scala, a list is build with the use of a constructor of name List and the values passed by parameter to the constructor. The empty list is represented by the empty constructor:

scala> val squares = List(1, 4, 9, 16, 25)
squares: List[Int] = List(1, 4, 9, 16, 25)

scala> val empty = List()
empty: List[Nothing] = List()

From Python to Scala (II): Types, Variables & Values

Scala Types

Following my series of tutorial of Scala for Python programmers, I’ll start to talk about something most Python programmers don’t usually pay attention to because the language doesn’t require it to do so.

I’m talking about data types. It’s not that Python doesn’t have types for its variables, but as it is a dynamically typed programming language, you usually don’t care about the type of the variable. At least not unless you try to add a number and a letter: you cannot add apple and oranges, naturally you cannot add strings and numbers (not at least without conversion first):

In general terms, however, Python won’t bother about the type you are giving to your variables: actually, you won’t be able to declare a type for them as Python will infer it. So, this is perfectly normal for a Python program:

string = "This is a string"
print string # Will output "This is a string"

string + 100 # Invalid. Will result in a TypeError exception.

string = 100 # Perfectly valid. `string` type will be int from now on.
print string # Will output 100

string + 100 # Valid. Will result in 200.

From Python to Scala (I): The Basics

This is the first post in a series in which I’ll try to give a nice insight for the Scala Language to a programmer with background in Python. I chose to do these posts since, at least when I started this series, the “Scala for people coming from Python” tutorial was a work in progress.

First of all I’ll state some of my background (in case you didn’t check my about page), in a kind of a disclaimer. There are people out there who are experts in Python. I’m not one of them. I only have a background of 4 years in this language, and only work with the 2.X version (started with 2.5 until 2.7). Never even try to learn Python 3. Also, there are experts on Scala as well, I’m not one of those either. In fact, my Scala knowledge is far from deep, I learned Scala at the end of last year and been using it since then (along with Python).

There are people out there who are experts in Python. I'm not one of them. [...]
Also, there are experts on Scala as well, I'm not one of those either.

Once you know this, I’ll just say I have enough knowledge of both Scala and Python to get by. I’ve done some projects in Django and some projects in Play Framework, but nothing really impressive. The reason I’m doing this set of tutorials is because when I started to learn Scala I didn’t have one and many times I end up in Stackoverflow looking for how to do in Scala things I did in Python.

Hello World!

As every programmer trying to learn a new language, I think is suitable for me to start this professional blog with a Hello World. So, in this first blog post I say Hello World!

This is the first blog I’ve created that has relation with what I do for a living. The only other surviving blog is a personal one (and is in Spanish also).

The aim of this place is to write about Computer Sciences problems and solutions I came across to overcome them, most of them of course are programming ones.

As I said in my about page, I was for a long time a Python programmer, but now I turn myself into the world of Scala and it has so interesting I’ve practically abandoned my old time coding Python skills (of course, with Python, is never that hard to return).

I hope my posts will help you somehow.