I have been leading the design and building of my company's healthcare LLM - which is supposed to serve a diverse set of customers. When training a machine learning model it is very important to pay heed to data biases as it can lead to model outputting information that can be unethical and harmful. Safety of the outputs and removal of biases have been at the forefront of the work I am doing. This involves approaches like - smart sampling of data to take care of gender and racial biases, filtering for harmful text, and using ML deidentification techniques. It is also very important to keep the language appropriate by eliminating harsh phrasing. All the above can be achieved by state-of-the-art machine learning tools and I believe this should be built as the core component of any pipeline - something that I prioritized even if it meant more work.