What is Python
What is Python
Python is a high-level, general-purpose programming language known for its clean and readable syntax. It was created by Guido van Rossum and first released in 1991, and has since grown into one of the most widely used languages in the world.
Why Python?
- Readable: Python code reads almost like plain English, which makes it easier to learn and maintain.
- Versatile: Python is used across web development, automation, data engineering, data science, machine learning, and more.
- Large Ecosystem: There are thousands of libraries and packages available to extend what Python can do out of the box.
- In Demand: Python consistently ranks as one of the most sought-after skills in data and engineering roles.
Python in Data Roles
For data engineers, analysts, and scientists, Python is a core tool. It is used to:
- Read, clean, and transform data
- Automate repetitive tasks and pipelines
- Interact with APIs and databases
- Build and train machine learning models
- Create visualisations and reports
Python Versions
Python 2 reached end-of-life in 2020 and should not be used. This course uses Python 3, specifically 3.10 or higher. If you see python and python3 used interchangeably, they refer to the same thing — the difference is covered in the setup guides.
How Python Runs
Python is an interpreted language. This means code is executed line by line at runtime, rather than compiled into a binary first. You write a .py file, hand it to the Python interpreter, and it runs.
your_script.py → Python interpreter → output
This makes Python quick to write and test, at the cost of being slower than compiled languages like C or Go for heavy computation. In practice, this rarely matters for the work covered in this course.
Practice Exercises
- In your own words, write down one reason why Python is popular in data engineering.
- Look up one Python library used in data work that interests you and note what it does.