Collection of Hands-on Exercises to Get Started with Apache Iceberg

#apacheiceberg #python #sql #database

Don't Miss Out on Several Great Talks on Apache Iceberg at the Subsurface Conference on May 2nd and 3rd, 2024. Register now for free

Apache Iceberg is an innovative data lakehouse table format designed to revolutionize how you manage large-scale data across various storage layers, such as Hadoop and AWS S3. By treating these diverse storage solutions as a cohesive, universal database, Apache Iceberg facilitates seamless integration with numerous tools, platforms, and interfaces, enhancing both flexibility and accessibility.

One of the standout features of Apache Iceberg is its support for ACID transactions, which ensures data integrity by enabling atomicity, consistency, isolation, and durability in data processing operations. Additionally, Apache Iceberg's Time Travel capability allows users to query historical versions of data, providing valuable insights into data changes over time. The format also supports dynamic schema evolution and partitioning adjustments without downtime, which simplifies the management of data as it grows and changes.

Apache Iceberg also includes a host of other advanced features that enhance data usability and management. These include snapshot isolation for concurrent data access, upserts and deletes within tables, and handling large-scale metadata efficiently.

To help you harness the full potential of Apache Iceberg, our latest blog series gathers a collection of straightforward, hands-on tutorials. These guides are designed to provide you with practical experience in working with Apache Iceberg, from setting up your first table to executing complex data operations. Whether you're a data scientist, engineer, or analyst, these tutorials offer a valuable opportunity to enhance your data handling skills and leverage Apache Iceberg’s powerful features in your projects.

Self-Contained Exercises

These are completely self-contained and can be done from your laptop, all infrastructure is spun up as Docker containers.

Require Online Services

These are tutorials that will require signing for services like Upsolver, Dremio, AWS and other to complete.

Video Playlists:

In summary, Apache Iceberg is more than just a data storage format; it is a comprehensive solution that adapts to the complexities of modern data environments, ensuring robust data integrity, flexibility, and scalability. Whether you are just starting out or looking to enhance your existing data infrastructure, the range of tutorials and exercises we've compiled provides valuable insights and practical experience with Apache Iceberg. Don’t forget to mark your calendar for the Subsurface Conference on May 2nd and 3rd, 2024, where you can dive deeper into Apache Iceberg through engaging talks and expert discussions. Register now for free and take the next step in mastering your data lakehouse capabilities with Apache Iceberg.

DEV Community

Collection of Hands-on Exercises to Get Started with Apache Iceberg

Self-Contained Exercises

Require Online Services

Video Playlists:

Top comments (0)

Read next

Chatbot with Semantic Kernel - Part 3: Inspector & tokens 🔎

How to Install PySpark on Your Local Machine

Vedro Hooks

7 Powerful Python Metaprogramming Techniques for Dynamic Code