Why Scala?

Scala gives you the feeling of an interpreting language. But it’s basically a compiled language, where anything you type is compiled to the byte code, and it runs inside the JVM. It is often the chosen option for very high-performance systems. 

James Gosling, Java ‘s father, says, “If I were to select a language to use today other than Java, it would be Scala.” 

Scala was first introduced in 2003 in an attempt to create a “Better Java” Scala that combines the object-oriented nature of Java with a functional programming model. Scala code can be compiled into an intermediary form of a different language and then executed in non-Scala environments. Beyond being a cross-platform language, Scala is also interoperable with Java code.

but, for real? why scala?

Scala programming language has become very popular, especially in the field of cloud computing. The functional aspect of Scala makes it ideal for creating domain-specific languages (DSLs). Beyond that, Scala’s interoperability with Java, coupled with the highly functional and scalable nature of the language, has made Scala a fast-growing programming language.

Scala was designed to improve the Java language. Scala code is more informative and concise than Java code. Java is too verbose which make the programming process overwhelming. Scala has been designed to express common programming patterns in an elegant and type-safe manner. Java uses generic typing that can easily be circumvented, making the syntax easier to read and understand.

Frameworks

Akka

Akka is a framework which can be used on the Java Virtual Machine to create highly parallel, distributed and fault-tolerant applications for both Scala and Java. The Actor model method is used to control concurrency in applications.

Play Framework

Scalatra is a free and open-source web application framework written in Scala. Scalatra is an alternative to the Lift, Play!, and Unfiltered frameworks.

Spark

Spark is written in Scala, and it also provide APIs for Java and Python. However, Java does not support an interactive shell with REPL (Read-Evaluate-Print Loop) and Python is a lot slower than Scala in general. Most of the features are available on Scala first and then on the Python port. It’s highly recommended for Spark users to pick up some Scala.

Recommended Reading

JustEnoughScalaForSpark: Git Tutorial, YouTube Presentation, Slide

Making ZIO, Akka and Slick play together nicely

Learning Materials

Scala for the Impatient Book by Cay S. Horstmann

Coursera – Functional Programming in Scala Specialization

Data Engineering Training Resources

This list highlights some training resources for data engineering. These are resources that have been shared with me, and they could be of benefit to people who want to get training and land a job as a data engineer.

(Recommended) Coursera- Data Engineering with Google Cloud
  • This advanced certification program is designed to help you learn the skills that you need to improve your career in data engineering.
  • In this program, you will get additional training to prepare you for the industry-recognized Google Cloud Professional Data Engineer certification.
  • The program is included with a combination of presentations, demos, and labs that are designed to help you understand the core concepts more clearly.
  • During the program, you will be able to make data-driven decisions by collecting, transforming, and publishing data, as well as you will gain real-world experience via a number of hands-on Qwiklabs projects.
  • Get the opportunity to practice essential job skills, such as designing, building, and running data processing systems and operationalizing machine learning models
(Recommended) Udacity – Data Engineering Nanodegree Certification
  • Teaches you how to engineer data and extract useful information from it.
  • You will learn about the techniques to design a data model, build warehouses, automate the processing and handle various scales of information.
  • Use NoSQL, PostgreSQL and Apache Cassandra to create databases and models.
  • Explore how cloud-based warehouses are built and how they function.
Coursera – Become a Data Engineer:
  • Learn how to become a data engineer on Coursera.
  • No experience is required to begin your learning.
  • Follow a step by step plan based on the relevant recommendations.
  • Improve the business value of your company by building data models, database systems and using business intelligence tools.
Coursera – Data Engineering, Big Data on Google Cloud Platform
  • This comprehensive specialization offered by Google Cloud is designed to provide you with practical knowledge of data processing systems on GCP.
  • Throughout the classes, you will learn how to design the systems first before going ahead with the development process.
  • Apart from this you will also analyze both structured and unstructured data, implement autoscaling and apply ML techniques to extract information.
  • In case you are interested in becoming better at machine learning, don’t forget checking out some of the Top Machine Learning Courses.
  • BigQuery is used to draw insights from large datasets after it is transformed, cleansed and validated.
LinkedIn Learning Lynda – Become a Data Engineer: Mastering the Concepts
  • In this learning path, you will explore all the essential concepts that will equip with you the skills required to implement them in real-world situations and pursue a career in this field.
  • Begin with the foundational training that will acquaint you with the necessary technical jargons and concepts before moving on to databases that can be used to store and manage any scale of data.
  • Once you are done with these fundamental concepts you can indulge in the various tools and open-source software that will show you how to architect big data applications, build data pipelines, handle real-time apps using Hazelcast and Apache Spark to name a few crucial topics.
  • Understand how to perform core data engineering tasks such as staging, cleansing, and migrating data.
  • Lectures and exercises and be accessed both online and offline.
Coursera – Big Data for Data Engineers Certification
  • If you are interested in jump-starting a career in one of the in-demand fields like Data Analyst, Scientist or Engineer then this is the program for you.
  • The classes explore topics such as Hadoop, MapReduce, Spark that are accompanied by practical assignments.
  • Once you have built a strong foundation you can move on to data processing in real-time and applying machine learning on a large scale.
  • The curriculum is designed in such a way that by the time you end the specialization you will not only have the theoretical knowledge to take on more advanced classes but also some experience with the relevant tools and software.
  • The specialization consists of four concise courses with increasing level of difficulty.
edX – Data Engineering Courses
  • This e-learning platform has compiled a series of programs that with make you familiar with this field and guide you in your journey to design analytical solutions.
  • The options are categorized based on the level of difficulty so that you can choose one according to your current experience level.
  • Some of the bestsellers include strategies to transform your business, analytics using Spark, and enterprise data management.
  • Choose from individual courses, micromasters program, and professional certifications.
  • Key tools used in the lectures include Spark, Hadoop, Azure.
DataQuest – Data Engineer Path
  • If you have prior experience in Python and want to upgrade your knowledge to build a career as a data engineer then this path is worth a look.
  • Understand how to build data pipelines using Python and pandas.
  • Load large datasets in the Postgres database after cleansing, transforming and validating it.
  • Once you are done with this you can cover the different algorithms and data structures that can make the analysis process faster and give better results.
  • Understand how data is processed in batches and augment pandas with SQLite.
Cognitive Class – Data Science and Cognitive Computing Courses
  • Learn how to query, summarize, and analyze large data sets stored in Hadoop compatible file systems.
  • In this course, you’ll learn how to use Hadoops and other open source tools to analyze Big Data. Learn about IBM’s suite of products, as well as other open-source tools for analyzing big data.
  • Use these tools and techniques to understand how to search, analyze, and store big data in the Big Data era.
  • This course will help you understand how IBM’s Big Data technology is used to store, manipulate, and retrieve data.
Design a site like this with WordPress.com
Get started