Posts

Showing posts from September, 2020

Azure Data Flow - Cloud based ETL

Image
  Data Flow is a service product from Azure, which can be integrated with Data Factory to provide GUI based components to develop complex transformation logic.  Data Flow, in turn can be executed as activities within Azure Data Factory pipelines. The purpose of Data Flows is to transform massive amount of data with zero coding. Behind the scenes, Data Flow will execute on a Spark cluster for scaled out data processing.  For one of our clients - the requirement was to provide a visual step-by-step approach to doing ETL on their data. A stored procedure wouldn't have helped in achieving this. But, data flow, integrated with Azure Data Factory helped us to create such an ETL pipeline along with fast processing of data. Few Data Flow activities available are: Joins – join data from 2 streams Conditional Splits – for splitting the data based on a particular condition Union – functions similar to SQL union Lookup – looking up from other streams Derived Columns – create new columns using