> If the input data set is not carefully screened then obviously it will contain biases, which in turn will influence the models. Screening of the input dataset might not be an option because of the sheer volume of data required.
If you intentionally filter the input data to get a particular result, then you'll get the result you want to get - it's a tautology. However the resulting model will be useless.
The correct procedure is not to screen the input data at all - on the contrary, you should ensure your data collection procedure is as thorough and unbiased as possible, so that it doesn't unintentionally introduce bias. If the volume of data is large enough, outliers will be averaged out, and then your model has the best chance to match reality.