Pautomac

Download

You find on this page the data for the competition.

For each problem, you can download the train and test set available during the competition. The idea was to use the first set (or both if you thought that helped) to train your algorithm and learn a model. Participants could then assign probabilities to each element of the test set and submit their answers on the compete page. More details are given on the participate page.

The solution file (which contains the probabilities in the target of the elements of the test set) and the model file (with a description of the target machine) are now available.

Competition data

Artificial data

During this phase, the machine used to generate each data set WAS not given. In addition, different values had been given to several parameters while they were fixed in the previous phase of the competition: the size of the samples and of the alphabet varies, the number of states of the target is not the same, the symbol and transition sparsities take different values. More details are given in the paper containing the result of the competition. All these files can be found in this archive.

Problem numberFiles Problem numberFiles Problem numberFiles
Problem 1 Problem 2 Problem 3
Problem 4 Problem 5 Problem 6
Problem 7 Problem 8 Problem 9
Problem 10 Problem 11 Problem 12
Problem 13 Problem 14 Problem 15
Problem 16 Problem 17 Problem 18
Problem 19 Problem 20 Problem 21
Problem 22 Problem 23 Problem 24
Problem 25 Problem 26 Problem 27
Problem 28 Problem 29 Problem 30
Problem 31 Problem 32 Problem 33
Problem 34 Problem 35 Problem 36
Problem 37 Problem 38 Problem 39
Problem 40 Problem 41 Problem 42
Problem 43 Problem 44 Problem 45
Problem 46 Problem 47 Problem 48

Real data

The first real data problem corresponds to part of speech tagging. The train and test sets are randomly selected sentences where words have been automatically replaced by POS. The evaluation is done by comparing the submitted probabilities with the ones obtained with the 3-gram baseline trained on the whole corpus (which is 10 times bigger than the available train set).

The second real-data problem comes from sensor information. Note that all strings are of length 20 as they correspond to sliding windows over a discretized sensor signal. It may thus worth to include a stop after 20 steps in your models instead of final probabilities... The evaluation is done in the same way than for the first real data problem.

Train set for real problem 1 Test set for real problem 1
Train set for real problem 2 Test set for real problem 2

Train data

These are the data that were used during the training phase of the conference which ended on May 20th 2012. The files containing the real probabilities of the test sets are now available. They correspond to the probabilities that are attributed to the element of the test sets by the target machine.

Train SetTest setSolution Train SetTest setSolution Train SetTest setSolution
Train set 1 Test set 1 Solution 1 Train set 2 Test set 2 Solution 2 Train set 3 Test set 3 Solution 3
Train set 4 Test set 4 Solution 4 Train set 5 Test set 5 Solution 5 Train set 6 Test set 6 Solution 6
Train set 7 Test set 7 Solution 7 Train set 8 Test set 8 Solution 8 Train set 9 Test set 9 Solution 9
Train set 10 Test set 10 Solution 10 Train set 11 Test set 11 Solution 11 Train set 12 Test set 12 Solution 12
Train set 13 Test set 13 Solution 13 Train set 14 Test set 14 Solution 14 Train set 15 Test set 15 Solution 15
Train set 16 Test set 16 Solution 16 Train set 17 Test set 17 Solution 17 Train set 18 Test set 18 Solution 18
Train set 19 Test set 19 Solution 19 Train set 20 Test set 20 Solution 20 Train set 21 Test set 21 Solution 21
Train set 22 Test set 22 Solution 22 Train set 23 Test set 23 Solution 23 Train set 24 Test set 24 Solution 24
Train set 25 Test set 25 Solution 25 Train set 26 Test set 26 Solution 26 Train set 27 Test set 27 Solution 27
Train set 28 Test set 28 Solution 28 Train set 29 Test set 29 Solution 29 Train set 30 Test set 30 Solution 30
Train set 31 Test set 31 Solution 31 Train set 32 Test set 32 Solution 32 Train set 33 Test set 33 Solution 34
Train set 34 Test set 34 Solution 35 Train set 35 Test set 35 Solution 35 Train set 36 Test set 36 Solution 36
Train set 37 Test set 37 Solution 37 Train set 38 Test set 38 Solution 38 Train set 39 Test set 39 Solution 39
Train set 40 Test set 40 Solution 40 Train set 41 Test set 41 Solution 41 Train set 42 Test set 42 Solution 42
Train set 43 Test set 43 Solution 43 Train set 44 Test set 44 Solution 44 Train set 45 Test set 45 Solution 45
Train set 46 Test set 46 Solution 46 Train set 47 Test set 47 Solution 47 Train set 48 Test set 48 Solution 48
Train set 49 Test set 49 Solution 49 Train set 50 Test set 50 Solution 50 Train set 51 Test set 51 Solution 51

Hint: the name of each file (except for the first problem) contains an information about the way it had been generated.