general：

1, I guess when you split the sample to training and test, you add all the backgrounds and then randomly split. 
This may cause problems in some cases since the most correct way should be splitting each sample to training and test by hand to make sure 
for example 50% of the specific bkg goes to training and test.

2, The AUC is not the best number of quantify the performance of MVA optimization. You should use significance Z=sqrt[ 2( (s+b)log(1+s/b)-s ) ], even Z=s/sqrt(s+b) is not accurate.

3, some variables need more discussions

4, You can also optimize a second "loose" category if the signal efficiency is not good.

6, It would be good if you can have a table to summarize the event yield of signal and bkg and significance.

correlation removal:

1, Should we keep the one with clear physics meaning and motivation? This is related to 3) of last section.

2, You can try to further reduce the number of variables since we do not see any lose from Step1 to Step3

iteration removal:

1, This is not high priority, but good to try after fininshing the baseline optimization. You may also want to optimize the BDT configuration in the future.