AWS Glue upgrades Spark engines, backs Ray framework

AWS Glue, a serverless data integration service provided by Amazon Web Services, showcases Python and Apache Spark capabilities in a version 4.0 release introduced this week.

The upgrade adds engines for Python 3.10 and Apache Spark 3.3.0. Both engines include performance enhancements and bug fixes, with Spark offering capabilities such as row-level runtime filtering and improved error messages.

New engine plugins in Glue 4.0 support the Ray compute framework, the Cloud Shuffle Service for Spark, and Adaptive Query Execution. Support for the Pandas data analysis and manipulation tool, built on top of Python, also is featured. New data format support covers Apache Hudi, Apache Iceberg, and Delta Lake. Glue 4.0 also includes the Parquet vectorized reader, with support for additional encodings and data types.

AWS Glue provides data discovery, data preparation, data transformation, and data integration capabilities, with autoscaling based on workload size. AWS said Glue also now offers visual transforms for customers to use and share business-specific ETL logic among teams.

AWS announced a preview of AWS Glue for Ray as a new engine option. Data engineers can use AWS Glue for Ray to process large data sets with Python and popular Python libraries. Distributed processing of Python code is done over multi-node clusters.

Glue 4.0 is available now in parts of the US including Ohio, Northern Virginia, and Northern California.

Copyright © 2022 IDG Communications, Inc.

Jennifer R. Kelley

Leave a Reply

Next Post

Prototyping The Prototype | Hackaday

Wed Nov 30 , 2022
For fundamental prototyping, the go-to software to piece collectively a operating circuit is the breadboard. It’s a fantastic way to establish a notion operates ahead of paying out cash and time on a PCB. For much more intricate responsibilities we can make use of simulation software package these types of […]

You May Like