Rule

Data description

  • The competition consists of two tracks. Participants can participate in one of these tracks, or both.
  • Track1 is to predict whether or not users have left a game service, and track2 is to predict survival time of users.
  • There are one training set and two test sets. Each set is composed of game logs of different periods in ‘Blade and Soul’.

  • train.tar: A list of accounts with label
  • traindata_1.tar.gz ~ traindata_5.tar.gz: game logs for training. The file archives gzip-compressed game log files for each account.
  • test1.tar / test2.tar: A list of accounts without label
  • testdata1_1.tar.gz ~ testdata1_5.tar.gz / testdata2_1.tar.gz ~ testdata2_5.tar.gz: game logs for test. Each file archives gzip-compressed game log files for each account.
  • sample.csv: samples from train.csv
  • sample_gamelog.tar.gz: game logs for samples

(To get download links, Please join the Google Groups.)


Data structure


(click to view larger image
)

  • Definition of Time: Week and Day
    • We defined a week as a period from 6 AM on Wednesday through 6 AM on the following Wednesday. This is in consideration of the server maintenance performed every Wednesday morning.
      E.g.) 06:00:00.000, May 4th, 2016 – 05:59:59.999, May 11th, 2016
    • The same rule is applied when defining a day.
      E.g.) 06:00:00.000, May 10th, 2016 – 05:59:59.999, May 11th, 2016
  • For this competition, a player’s game log data for 8 weeks is provided.
  • Players have logged in on the last week (⑧) of the 8-week session. The data also shows the accounts that logged in in the last week (⑪) of the No Data period.
  • Retained vs. Churned
    • During the No Data period, the last access dates of the players may differ for the last week (⑪).
    • A player is considered “not churned” if the player has logged in for 5 weeks (⑫) after the No Data Period, and is considered “churned” otherwise. (The player who did not log in during the period ⑫, but came back afterwards is considered as having already left the game, therefore defined as “churned”.)
    • Churn_yn =
  • Survival Time
    • The last access dates of the players may differ in the last week (⑧) of the 8-week game log data.
    • In consideration of this, survivor time is defined as the period between the last access date during the 8 weeks and the final access date of the most recent data, which is not provided. For the survival time of a player which cannot be determined using the provided data, “+” is marked next to the survival time to differentiate such data from the player who already has churned.
    • The survival time should be calculated using a whole number without censoring-sign (+). The observation period must end at some point. So any predicted value in a whole number that is larger than our observed value will be considered correct.
    • Survival time = d — dS
      ( dE: The last access date in the observation period,  dS: The last access date in the 8-week data)

Submission

  • Participants must submit all two test results. The submission file must have the following name and format.
  • File name :
    • Track1
      • submission_track1_test1.csv : a result file of test set 1 for track1
      • submission_track1_test2.csv : a result file of test set 2 for track1
    • Track2
      • submission_track2_test1.csv : a result file of test set 1 for track2
      • submission_track2_test2.csv : a result file of test set 2 for track2
  • File format : CSV file (with header) with the following schema
    • Track1
    • Track2

      ※ All predicted value must be data without censoring sign (‘+’).

Evaluation

  • Track 1
    • We will use F1 score for prediction performance of track 1

  • Track 2
    • We will use Root Mean Squared Logarithmic Error (RMSLE) for prediction performance of track 2
      ( 𝐧 ∶ 𝐭𝐨𝐭𝐚𝐥 𝐧𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐨𝐛𝐬𝐞𝐫𝐯𝐚𝐭𝐢𝐨𝐧𝐬, 𝒑𝒊 ∶ 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏, 𝒂𝒊 ∶ 𝒂𝒄𝒕𝒖𝒂𝒍 𝒓𝒆𝒔𝒑𝒐𝒏𝒔𝒆 𝒇𝒐𝒓 𝒊 )
    • We should consider censoring data of ground truth when we evaluate the model performance. To address this problem, we will measure the performance after replacing the predicted value that exceeds the observation period with the maximum value (i.e. observation time).