Grid Search and Stacking (Bronze | Competition Record of MUFG Data Science Champion Ship 2023)
(code available : https://github.com/Mao718/data-science/tree/main/三菱UFJ)
Intro
Since last time ( SIGNATE Student Cup ), didn't implement further improvements such as stacking frameworks become one of my repentant. I attend another competition to further try to optimize the machine learning models performance by Grid search and Stacking. Here’s the record of this competition.
.png%3Ftable%3Dblock%26id%3Ddac80f26-a8df-47ac-95d7-997bfe57d8eb%26spaceId%3Df7245a49-869d-4bb9-bbd4-6b828ef68397%26expirationTimestamp%3D1746064800000%26signature%3DYrd2PiF4Qg-_21mYQ_uCJXB24CHfk9_RdrbpTuf4FvM?table=block&id=dac80f26-a8df-47ac-95d7-997bfe57d8eb&cache=v2)
The competition is to build up a model to detect credit card fraud. One important thing to know is that true and false sample are significantly imbalance.
As customary, here are my scores by date, which I would like to separate into four phases.
- Basic: Following last time experience. (Detail in SIGNATE Student Cup )
- Overfitting: I tried to include more data which lead overfitting.
- Grid Search: Implement grid search algorithm for XGboost.
- Stacking: Stacking up catboost, XGboost and LightGBM.
Since the basic period isn’t really various from last time. I would like to describe the approch of other phases.
Overfitting
After I created the based line. I tried to embed and also include the extra features. However, once I include the credit card details the result shows every serious overfitting. Hence, I check the features one by one, and found out the extra credit cards details is variance. For example, the acct_open_date have around 300 unique date out of 400 cards. Which almost give algorithm a unique ID for every cards. Hence, the algorithm is likely to point out the specific cards, which is not we want. Even the algorithm can get 0.6 during the cross-validation still fail on testing set.
Grid Search XGboost
Subsequently, I clean all the data and do the basic target embedding, however, still didn’t get the satisfied performance. Hence, I tried to find the breakthrough from the model. As I said before that true and false sample are significantly imbalance. I Hence, I tried to set scale_pos_weight differently which gave me totally difference score. Hence, one question pop out into my mind. Is the hyperparameter good enough? To found out the answer grid search might be a good idea. So I let the PC work for me all day and night around 2 days to get the best combination out. And the score come to the range bronze to silver.
Stacking
After Grid Search I was kind of stuck again. I couldn’t found any way to improve the performance. Once I searching the solution only there’s a complex technic came into my sight “stacking”. First I repeat he grid search for catboosh and lightGBM. Then I stack these algorithm and make a XGboosh to aggerate the outputs. However, in my case the performance doesn’t improve a lot. Although in validation set I can get the gold performance 0.67. However, drop in testing set. I think the main reason might be the model I choose are too similar.
My Opinion
Through this competition, I have learned how grid search and stacking effect the performance. Another interesting I want to talk is that, I found out there’s one paper handling similar data. However, this paper’s additional features are mostly based on times, but this competition didn’t provide the times data. I’m very curious if we can include the time features how good will the model be.