10 Python Data Science Libraries for 2024

Matthew Kolakowski
2 min readNov 19, 2023

--

Approximately 137,000 Python libraries are available as of November 15th. I am introducing 10 Python libraries for data science that can be used in personal projects and may be helpful in production. Note: In production data science, absolutes don’t exist as experiences vary by industry and organization.

1. Nixtla Statsforecast: If you want to add time series forecasting projects to your portfolio, Statsforecast should be one of your first options- https://lnkd.in/gp5qfryT

2. PiML (Python Interpretable Machine Learning): To better understand your machine learning models, PiML is a solution. It is currently in production at a Fortune 50 company- https://lnkd.in/g9DKR7XG

3. Pytimetk: Matt Dancho and the Business Science team created an accessible and functional time series Python package- https://lnkd.in/gZG3cZaF

4. CatBoost: If you are heavy on ranking and classification tasks, this package has best-in-class performance- https://lnkd.in/g3wderyd

5. MAPIE (Model Agnostic Prediction Interval Estimator): Uncertainty quantification made accessible regardless of model- https://lnkd.in/g8dWv6H7

6. PySpark: Spark is a widely used Python library in data science. If you are not familiar with Spark, start learning- https://lnkd.in/gFSzDkF6

7. Boto3 (Amazon Web Services Software Development Kit for Python): Want to leverage S3 (Object Storage) and EC2 (Resizable Compute Instances), this library is a must- https://lnkd.in/gznksyjz

8. Sphinx: Having clear and concise code documentation is crucial for helping team members unfamiliar with the project understand the code. Sphinx helps automate the documentation creation process- https://lnkd.in/gKeetS_Q

9. Airflow: If you need to schedule, develop, and monitor batch-ordered workflows, Airflow will likely be the tool utilized- https://lnkd.in/gQi53sBt

10. PyTorch: If you’re interested in deep learning neural networks, PyTorch is a popular Python tool. It has many use cases and is spread across the data science production ecosphere- https://lnkd.in/g-aVEjNA

--

--

Matthew Kolakowski

I am a data science professional with 15 years of experience in analytics and technical project management in the federal and private sectors.