If you have to deal with uploads from real-life users, you run into encoding issues from time to time. This is especially pernicious with CSV files, which Excel loves to export with strange and Windows-specific latin codecs.
What’s with the garbled characters? What the heck darn kind of outfit are you running here?
^ Feedback from a client, probably
pip install chardet
chardet looks at the data you feed it and makes an educated guess about the codec. Here’s a very basic codec guessing function:
from chardet.universaldetector import UniversalDetector DEFAULT_ENCODING = "utf-8" def guess_codec(file_name: str) -> str: codec_detector = UniversalDetector() with open(file_name, mode="rb") as file: while True: line = file.readline() if not line: break codec_detector.feed(line) if codec_detector.done: break result = codec_detector.close() encoding = result.get("encoding") return encoding or DEFAULT_ENCODING
I understand that if you’re dealing with user uploads, there’s a pretty tight ceiling on your happiness, but I hope this helps in a marginal way.