Predict Soccer Match Total Goals With Machine Learning

  • Stato: Closed
  • Premio: $20
  • Proposte ricevute: 13
  • Vincitore: Gozienkwocha

Descrizione del concorso

Soccer is the world's most popular sport.
**This contest will test whether you're among the best Machine Learning engineers on Freelancer.com**
Your challenge is to use ML & Deep Learning to build a model that can best classify the TOTAL number of goals scored in a soccer match given publicly available data.

The data provided includes details on a team's recent performance, probability of winning, match location, date, recent performance against the opposing team & other recent info. In all, there are close to 100 input variables provided.
You can find a definition of each input variable here: http://bit.ly/Column_definitions

For each soccer match/ fixture:
If the total goals scored by both teams is greater than 2.5, its outcome is recorded as Over.
If the total goals scored is less than 2.5, its outcome is recorded as Under.
This data is recorded under each dataset’s last column called “outcome”.

A leaderboard of top 10 performing models will be posted daily on the contest's comments section.
The competition will run for 8 days.
A payout has been guaranteed & will be provided to the winner of the contest.

The data & other material:
There are 3 datasets provided (found in “Data CSVs.zip” zip file).
1. training_data.csv - This contains 100 000 matches & their outcomes that you will use to train your model(s).
2. validation_data.csv - This contains 50 000 matches & their outcomes that you will use to test/validate your model(s) performance.
3. testing_data.csv - This contains 500 matches (without outcomes) that you will need to predict with your model & submit their results as a list of 0 or 1 as part of your submission.
When predicting, if you predict less than 2.5 total goals, you will need to label that outcome as 0, if you predict more than 2.5 total goals, label that as 1.
4. A helper_script.ipynb python notebook has been provided. This script contains prebuilt functions that will help with data cleaning, encoding, imputing & model training. You may use this script to transform the data & train your model.

Performance Criteria:
- The F1 Score (https://en.wikipedia.org/wiki/F1_score) will be used to determine your model's performance against other contestants.
- This F1 Score will be based on the predictions you make for the data in point 3 above (testing_data.csv).
For the leaderboard, F1 Scores will be rounded off to 3 decimal places.
- Should there be a tie, all of the top positioned contestants will each get the guaranteed payout.
- ** You may only post 2 submissions per day **

Programming Language:
1. You are encouraged to use Python for model construction.
2. You may use any classification technique as you see fit (Deep Learning, Machine Learning)

Submission:
Your submission must contain 3 things.
1. A list of your model's predictions for the first 250 matches on the testing_data.csv file. This must be posted as a comment under your submission. The comment must be of the form: First 250 entries: [0,1,0,1,0,0...,0,0]
2. A list of your model's predictions for the second 250 matches on the testing_data.csv file. This must be posted as a comment under your submission. The comment must be of the form: Second 250 entries: [0,1,0,1,0,0...,0,0]
3. A picture of your validation data F1 Score (calculated on 'validation_data.csv').

You are welcome to post any questions that you have on the contest's chat board.

Are you among the best of the best in Machine Learning?
PROVE IT by winning this contest.

Competenze consigliate

Feedback del Datore di Lavoro

“Chigozie's solution was cutting edge, easy to understand & showed deep understanding of the problem. I would highly recommend him any Data Science/ Machine learning tasks & plan to work with him in future.”

Immagine del profilo LuyandaD, South Africa.

Bacheca pubblica per chiarimenti

  • Gozienkwocha
    Gozienkwocha
    • 3 anni fa

    Many thanks to the contest holder. It was really an enjoying time working on this project.

    • 3 anni fa
    1. Gitesh98
      Gitesh98
      • 3 anni fa

      Can you please share your file?

      • 3 anni fa
    2. Gozienkwocha
      Gozienkwocha
      • 3 anni fa

      Hi Gitesh, I would have shared my file but the contest holder hasn't given me the permission to do so. Have left a message for him in that regards and I'm yet to receive any response. I needed his permission because during the handover I signed an agreement to will all code rights to him, so if I share this file, I may be violating the agreement, hence I seek for his go-ahead before doing so. Once I get his green lights, I'll do well to share it with you. Thank you for your understanding.

      • 3 anni fa
  • LuyandaD
    Titolare del Concorso
    • 3 anni fa

    I have verified the results in entry #15 .

    Without disclosing the contestant's exact methods, their solution involved the following:
    1. Data cleaning & removing of duplicates.
    2. Handling missing values & feature engineering on columns with dates.
    3. Using the median values to fill in missing data.
    3. Using 2 ensemble modules to model the outcome & blending the predictions from these to arrive at a final outcome.

    This entry has been awarded the contest's prize

    • 3 anni fa
  • LuyandaD
    Titolare del Concorso
    • 3 anni fa

    The contest has now closed.
    A huge thank you to all contestants for participating.

    I will now evaluate the best performing entries & award the prize.

    • 3 anni fa
  • LuyandaD
    Titolare del Concorso
    • 3 anni fa

    Leaderboard 12/03/2021:

    1. Entry #15 . F1_Score: 66.161
    2. Entry #11 . F1_Score: 66.048
    3. Entry #5 . F1_Score: 65.144
    4. Entry #7 . F1_Score: 64.588
    5. Entry #8 . F1_Score: 64.437
    6. Entry #3 . F1_Score: 63.889
    7. Entry #2 . F1_Score: 63.887
    8. Entry #10 . F1_Score: 63.809
    9. Entry #9 . F1_Score: 61.504
    10. Entry #4 . F1_Score: 60.429

    • 3 anni fa
  • LuyandaD
    Titolare del Concorso
    • 3 anni fa

    Leaderboard 11/03/2021:

    1. Entry #11 . F1_Score: 66.048
    2. Entry #5 . F1_Score: 65.144
    3. Entry #7 . F1_Score: 64.588
    4. Entry #8 . F1_Score: 64.437
    5. Entry #3 . F1_Score: 63.889
    6. Entry #2 . F1_Score: 63.887
    7. Entry #10 . F1_Score: 63.809
    8. Entry #9 . F1_Score: 61.504
    9. Entry #4 . F1_Score: 60.429
    10.

    • 3 anni fa
  • LuyandaD
    Titolare del Concorso
    • 3 anni fa

    Attention to all contestants

    1. I have now updated the leaderboard in a comment below.
    2. The contest will close in 18 hours, you may still submit entries until the contest has closed.
    3. Once the contest has closed, new entries will be evaluated & the leaderboard will be updated.
    4. The contestants with the top 3 entries will be asked in private chat to submit the notebooks used to generate their predictions.
    5. These notebooks will be used to reproduce results & verify that a winning entry has not been faked.
    6. The list of correct outcomes will be shared with you so that you may verify the results calculated for your own entry & that of others.

    7. Once a winning entry has been verified, the prize amount will be awarded.

    Thank you for your participation so far :)

    • 3 anni fa
  • LuyandaD
    Titolare del Concorso
    • 3 anni fa

    Leaderboard 10/03/2021:

    1. Entry #5 . F1_Score: 65.144
    2. Entry #7 . F1_Score: 64.588
    3. Entry #8 . F1_Score: 64.437
    4. Entry #3 . F1_Score: 63.889
    5. Entry #2 . F1_Score: 63.887
    6. Entry #10 . F1_Score: 63.809
    7. Entry #9 . F1_Score: 61.504
    8. Entry #4 . F1_Score: 60.429
    9.
    10.

    • 3 anni fa
  • LuyandaD
    Titolare del Concorso
    • 3 anni fa

    Leaderboard 09/03/2021:

    1. Entry #5 . F1_Score: 65.144
    2. Entry #7 . F1_Score: 64.588
    3. Entry #8 . F1_Score: 64.437
    4. Entry #3 . F1_Score: 63.889
    5. Entry #2 . F1_Score: 63.887
    6. Entry #9 . F1_Score: 61.504
    7. Entry #4 . F1_Score: 60.429
    8.
    9.
    10.

    • 3 anni fa
  • LuyandaD
    Titolare del Concorso
    • 3 anni fa

    Leaderboard 08/03/2021:

    1. Entry #5 . F1_Score: 65.144
    2. Entry #3 . F1_Score: 63.889
    3. Entry #2 . F1_Score: 63.887
    4. Entry #4 . F1_Score: 60.429
    5.
    6.
    7.
    8.
    9.
    10.

    • 3 anni fa
  • LuyandaD
    Titolare del Concorso
    • 3 anni fa

    Leaderboard 07/03/2021:

    1. Entry #3 . F1_Score: 63.889
    2. Entry #2 . F1_Score: 63.887
    3. Entry #4 . F1_Score: 60.429
    4.
    5.
    6.
    7.
    8.
    9.
    10.

    • 3 anni fa
  • rawatpankaj9876
    rawatpankaj9876
    • 3 anni fa

    In goal_home and goal_away column
    "negative" sign indicate what

    • 3 anni fa
    1. LuyandaD
      Titolare del Concorso
      • 3 anni fa

      It expresses the maximum predicted goals that each team is expected to get.
      e.g. -3.5 means this team is expected to score 3 goals or less.

      • 3 anni fa
  • Gozienkwocha
    Gozienkwocha
    • 3 anni fa

    Hello. I would like to ask the contest holder if the values in the match winner have any significant meaning. Like if '1 N' means home team won, 'N 2' if away team won and so on. Or do they signify the score outcomes of the match? Like "1 N" mean the match eded in 1-0 in favour of home team, 'N 2' mean match ended in 0-2 in favour of home team and 1 and 2 mean that there was a draw. Thank you.

    • 3 anni fa
    1. Gozienkwocha
      Gozienkwocha
      • 3 anni fa

      Does it also imply that 1 and 2 mean outright win for home and away, respectively? I mean. 1 means the public are predicting that the home side will win and 2, the away side

      • 3 anni fa
    2. LuyandaD
      Titolare del Concorso
      • 3 anni fa

      yes, 1 means home team is predicted to win.
      2 means away team is expected to wean outright.

      • 3 anni fa
  • LuyandaD
    Titolare del Concorso
    • 3 anni fa

    Leaderboard 06/03/2021:

    1. Entry #3 . F1_Score: 63.889
    2. Entry #2 . F1_Score: 63.887
    3.
    4.
    5.
    6.
    7.
    8.
    9.
    10.

    • 3 anni fa
  • LuyandaD
    Titolare del Concorso
    • 3 anni fa

    Edit to instructions:
    Please ignore this instruction: "When predicting, if you predict less than 2.5 total goals, you will need to label that outcome as 0, if you predict more than 2.5 total goals, label that as 1."

    If you use encoding on the outcome column, the value of Under will become 1 & the value of Over will become 0.
    Your entries will be graded according to this rule going forward.

    • 3 anni fa
  • dataexpert18
    dataexpert18
    • 3 anni fa

    #increaseprize #increaseprize #increaseprize #increaseprize

    • 3 anni fa
    1. LuyandaD
      Titolare del Concorso
      • 3 anni fa

      Hi, which number is your entry?

      • 3 anni fa
  • LuyandaD
    Titolare del Concorso
    • 3 anni fa

    Leaderboard 05/03/2021:

    1. Entry #2 . F1_Score: 63.89
    2.
    3.
    4.
    5.
    6.
    7.
    8.
    9.
    10.

    • 3 anni fa
  • LuyandaD
    Titolare del Concorso
    • 3 anni fa

    Hi there.
    I am the contest's holder.
    You are welcome to post any questions here.

    I will be updating the scoreboard once every day.
    I will also post the f1 scores of each entry in its comments section.
    Reminder, only 2 entries per contestant per day.

    • 3 anni fa

Mostra altri commenti

Come iniziare a usare i concorsi

  • Pubblica il tuo concorso

    Pubblica il tuo concorso Con facilità e in pochi istanti

  • Ottieni tante proposte

    Ottieni una Miriade di Proposte Da tutto il mondo

  • Seleziona la proposta migliore

    Seleziona la proposta migliore Scarica i file - Facile!

Pubblica subito un concorso o unisciti a noi oggi stesso!