Distributed Data Analytics (WT 2017/18) - tele-TASK

Dr. Thorsten Papenbrock

14 Episodes

The free lunch is over! Computer systems up until the turn of the century became constantly faster without any particular effort simply because the hardware they were running on increased its clock speed with every new release. This trend has changed and today's CPUs stall at around 3 GHz. The size of modern computer systems in terms of contained transistors (cores in CPUs/GPUs, CPUs/GPUs in compute nodes, compute nodes in clusters), however, still increases constantly. This caused a paradigm shift in writing software: instead of optimizing code for a single thread, applications now need to solve their given tasks in parallel in order to expect noticeable performance gains. Distributed computing, i.e., the distribution of work on (potentially) physically isolated compute nodes is the most extreme method of parallelization.

Big Data Analytics is a multi-million dollar market that grows constantly! Data and the ability to control and use it is the most valuable ability of today's computer systems. Because data volumes grow so rapidly and with them the complexity of questions they should answer, data analytics, i.e., the ability of extracting any kind of information from the data becomes increasingly difficult. As data analytics systems cannot hope for their hardware getting any faster to cope with performance problems, they need to embrace new software trends that let their performance scale with the still increasing number of processing elements.

In this lecture, we take a look a various technologies involved in building distributed, data-intensive systems. We discuss theoretical concepts (data models, encoding, replication, ...) as well as some of their practical implementations (Akka, MapReduce, Spark, ...). Since workload distribution is a concept which is useful for many applications, we focus in particular on data analytics.

Podcasts Similar to Distributed Data Analytics (WT 2017/18) - tele-TASK

Distributed Data Management (WT 2018/19) - tele-TASK (99.44%)

Prof. Dr. Felix Naumann, Dr. Thorsten Papenbrock

Distributed Data Management (WT 2019/20) - tele-TASK (98.72%)

Dr. Thorsten Papenbrock

Distributed Data Management (ST 2021) - tele-TASK (98.39%)

Dr. Thorsten Papenbrock

The Python Podcast.__init__ (95.93%)

Tobias Macey

Software at Scale (95.56%)

Utsav Shah

Data Engineering Podcast (95.47%)

Tobias Macey

Pipeline Conversations (95.42%)

ZenML GmbH

KDnuggets (95.38%)

None

Contributor (95.23%)

Eric Anderson

JVM Advent (95.22%)

None

Artificial Intelligence (95.09%)

None

Programming Throwdown (94.84%)

Patrick Wheeler and Jason Gauci

GIS (94.67%)

mapscaping.com

Semaphore (94.64%)

None

Geospatial Tech and Tools (94.62%)

mapscaping.com

Stack Overflow Blog (94.57%)

None

Changelog (94.49%)

None

Go Time: Golang, Software Engineering (94.47%)

Changelog Media

SimpleAI (94.45%)

satyabrata pal

Determined Podcast Series (94.43%)

Ameet Talwalkar

RCE - Super Computers (94.41%)

Brock Palen

The Marvell Essential Technology Podcast (94.4%)

Marvell Technology

CodeNewbie (94.37%)

CodeNewbie

Molecular Coding (94.25%)

Andrew Dalke

Rise of the Stack Developer (94.21%)

Darren Pulsipher

Google SRE Prodcast (94.17%)

MP English, Viv, Salim Virji

Camunda Nation Podcast (94.12%)

The Camunda Community Podcast, hosted by Josh Wulf.

Software Sessions (94.11%)

Jeremy Jung

OVS Orbit (94.08%)

Ben Pfaff

The Frontside Podcast (94.07%)

Charles Lowell & the Frontside Team

Changelog Master Feed (94.02%)

Changelog Media

Grafana's Big Tent (93.97%)

Grafana Labs

Code[ish] (93.91%)

Salesforce Engineering

devtools.fm (93.85%)

Andrew Lisowski, Justin Bennett

The Endpoint Management Podcast by Adaptiva (93.78%)

Adaptiva

Software Defined Data Center: All Things Considered! (93.72%)

Brett Schechter

The Data Exchange with Ben Lorica (93.71%)

Ben Lorica

Building Collaboration (93.65%)

Cord

Acima Development (93.65%)

Mike Challis

The Engineering Leader (93.65%)

Steve Westgarth

CoRecursive: Coding Stories (93.64%)

Adam Gordon Bell - Software Developer

Chaos Computer Club - recent events feed (low quality) (93.62%)

sirgoofy

The People Behind Your Favorite Apps (93.61%)

SmartBear

Ruby Rogues (93.55%)

Top End Devs

Programmers Quickie (93.53%)

Software Engineering

Conversations on Applied AI (93.53%)

Justin Grammens

Open Log (93.5%)

Domenico Tripodi

Open Source Directions hosted by OpenTeams (93.49%)

Quansight, LLC

SciNology Podcast (93.48%)

SciNology Team

OxyCast (93.47%)

Oxylabs

LY Corporation Tech Blog (93.46%)

None

The Data Engineering Show (93.45%)

The Firebolt Data Bros

Working Code (93.39%)

Adam Tuttle, Ben Nadel, Carol Hamilton, Tim Cunningham

Justin Doescher's Podcast (93.26%)

Justin Doescher

Test & Code in Python (93.14%)

Brian Okken

The Production-First Mindset (93.13%)

Liran Haimovitch

PodRocket - A web development podcast from LogRocket (93.08%)

LogRocket

Diary of a Product Founder (93.08%)

Isaac Aderogba

Streaming Audio: Apache Kafka® & Real-Time Data (93.06%)

Confluent, original creators of Apache Kafka®

O'Reilly Data Show Podcast (93.06%)

O'Reilly Media

Request For Commits (93.05%)

Changelog Media

Software over Coffee (93.03%)

Michael Kofman

Compositional (93.03%)

Tweag I/O

IEEE Software's "On Computing" with Grady Booch (93.01%)

IEEE Computer Society

Complete Developer Podcast (93.01%)

BJ Burns and Will Gant

Data and processes in computing - for iBooks (93.0%)

The Open University

Machine Learning Guide (92.95%)

Dept

The MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography (92.95%)

MapScaping

Beyond Parsing (92.95%)

Beyond Parsing

Entrepreneurial Open Source (92.94%)

Gaël Blondelle & Thabang Mashologu

Observy McObservface (92.92%)

New Relic Developer Relations

The AceGB Show (92.91%)

Ace Balangitan

ethiCS (92.91%)

Allegheny College Department of Computer Science

Chaos Computer Club - recent events feed (high quality) (92.91%)

sirgoofy

Chaos Computer Club - recent events feed (92.89%)

sirgoofy

Geospatial Concepts (92.88%)

mapscaping.com

Flink Blog Feed (92.85%)

None

Software Daily (92.82%)

SoftwareDaily.com

Polyglot (92.8%)

Polyglot

Fireside FileMaker (92.76%)

firesidefilemaker

Point-Free Videos (92.74%)

Brandon Williams & Stephen Celis

How to Program with Java Podcast (92.72%)

Best Java podcast on iTunes, learn about variables, control structures, col

The Changelog: Software Development, Open Source (92.71%)

Changelog Media

Adventures in Machine Learning (92.68%)

Top End Devs

Loving Legacy (92.68%)

Richard Bown

Devops Mastery (92.67%)

Brian Wagner, Jason Didonato

The Machine Learning Podcast (92.65%)

Tobias Macey

The InfoQ Podcast (92.64%)

InfoQ

GIPHY Engineering (92.59%)

None

Mikial Nijjar (92.54%)

Mikial Nijjar

Modernize or Die ® Podcast - CFML News Edition (92.51%)

Ortus Solutions

FLOSS for Science (92.5%)

FLOSSforScience

mattengg 3Speak Podcast (92.5%)

None

The Backend Engineering Show with Hussein Nasser (92.48%)

Hussein Nasser

mapscaping.com (92.47%)

mapscaping.com

The Good Data Podcast (92.46%)

Drew Farnsworth

Environment Variables (92.44%)

Green Software Foundation

AWS Big Data Blog (92.44%)

None

Lightbend (92.43%)

Lightbend

Bright Computing Spotlight ON (92.42%)

Bright Computing