Enterprise MLOps Interview-Simon Stiebellehner

Enterprise MLOps Interview-Simon Stiebellehner

Author: Noah Gift September 23, 2022 Duration: 56:16

If you enjoyed this video, here are additional resources to look at:

Coursera + Duke Specialization: Building Cloud Computing Solutions at Scale Specialization: https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scale

Python, Bash, and SQL Essentials for Data Engineering Specialization: https://www.coursera.org/specializations/python-bash-sql-data-engineering-duke

AWS Certified Solutions Architect - Professional (SAP-C01) Cert Prep: 1 Design for Organizational Complexity:
https://www.linkedin.com/learning/aws-certified-solutions-architect-professional-sap-c01-cert-prep-1-design-for-organizational-complexity/design-for-organizational-complexity?autoplay=true

O'Reilly Book: Practical MLOps: https://www.amazon.com/Practical-MLOps-Operationalizing-Machine-Learning/dp/1098103017

O'Reilly Book: Python for DevOps: https://www.amazon.com/gp/product/B082P97LDW/

O'Reilly Book: Developing on AWS with C#: A Comprehensive Guide on Using C# to Build Solutions on the AWS Platform
https://www.amazon.com/Developing-AWS-Comprehensive-Solutions-Platform/dp/1492095877

Pragmatic AI: An Introduction to Cloud-based Machine Learning: https://www.amazon.com/gp/product/B07FB8F8QP/

Pragmatic AI Labs Book: Python Command-Line Tools: https://www.amazon.com/gp/product/B0855FSFYZ

Pragmatic AI Labs Book: Cloud Computing for Data Analysis: https://www.amazon.com/gp/product/B0992BN7W8

Pragmatic AI Book: Minimal Python: https://www.amazon.com/gp/product/B0855NSRR7

Pragmatic AI Book: Testing in Python: https://www.amazon.com/gp/product/B0855NSRR7

Subscribe to Pragmatic AI Labs YouTube Channel: https://www.youtube.com/channel/UCNDfiL0D1LUeKWAkRE1xO5Q

Subscribe to 52 Weeks of AWS Podcast: https://52-weeks-of-cloud.simplecast.com

View content on noahgift.com: https://noahgift.com/

View content on Pragmatic AI Labs Website: https://paiml.com/

[00:00.000 --> 00:02.260] Hey, three, two, one, there we go, we're live.
[00:02.260 --> 00:07.260] All right, so welcome Simon to Enterprise ML Ops interviews.
[00:09.760 --> 00:13.480] The goal of these interviews is to get people exposed
[00:13.480 --> 00:17.680] to real professionals who are doing work in ML Ops.
[00:17.680 --> 00:20.360] It's such a cutting edge field
[00:20.360 --> 00:22.760] that I think a lot of people are very curious about.
[00:22.760 --> 00:23.600] What is it?
[00:23.600 --> 00:24.960] You know, how do you do it?
[00:24.960 --> 00:27.760] And very honored to have Simon here.
[00:27.760 --> 00:29.200] And do you wanna introduce yourself
[00:29.200 --> 00:31.520] and maybe talk a little bit about your background?
[00:31.520 --> 00:32.360] Sure.
[00:32.360 --> 00:33.960] Yeah, thanks again for inviting me.
[00:34.960 --> 00:38.160] My name is Simon Stebelena or Simon.
[00:38.160 --> 00:40.440] I am originally from Austria,
[00:40.440 --> 00:43.120] but currently working in the Netherlands and Amsterdam
[00:43.120 --> 00:46.080] at Transaction Monitoring Netherlands.
[00:46.080 --> 00:48.780] Here I am the lead ML Ops engineer.
[00:49.840 --> 00:51.680] What are we doing at TML actually?
[00:51.680 --> 00:55.560] We are a data processing company actually.
[00:55.560 --> 00:59.320] We are owned by the five large banks of Netherlands.
[00:59.320 --> 01:02.080] And our purpose is kind of what the name says.
[01:02.080 --> 01:05.920] We are basically lifting specifically anti money laundering.
[01:05.920 --> 01:08.040] So anti money laundering models that run
[01:08.040 --> 01:11.440] on a personalized transactions of businesses
[01:11.440 --> 01:13.240] we get from these five banks
[01:13.240 --> 01:15.760] to detect unusual patterns on that transaction graph
[01:15.760 --> 01:19.000] that might indicate money laundering.
[01:19.000 --> 01:20.520] That's a natural what we do.
[01:20.520 --> 01:21.800] So as you can imagine,
[01:21.800 --> 01:24.160] we are really focused on building models
[01:24.160 --> 01:27.280] and obviously ML Ops is a big component there
[01:27.280 --> 01:29.920] because that is really the core of what you do.
[01:29.920 --> 01:32.680] You wanna do it efficiently and effectively as well.
[01:32.680 --> 01:34.760] In my role as lead ML Ops engineer,
[01:34.760 --> 01:36.880] I'm on the one hand the lead engineer
[01:36.880 --> 01:38.680] of the actual ML Ops platform team.
[01:38.680 --> 01:40.200] So this is actually a centralized team
[01:40.200 --> 01:42.680] that builds out lots of the infrastructure
[01:42.680 --> 01:47.320] that's needed to do modeling effectively and efficiently.
[01:47.320 --> 01:50.360] But also I am the craft lead
[01:50.360 --> 01:52.640] for the machine learning engineering craft.
[01:52.640 --> 01:55.120] These are actually in our case, the machine learning engineers,
[01:55.120 --> 01:58.360] the people working within the model development teams
[01:58.360 --> 01:59.360] and cross functional teams
[01:59.360 --> 02:01.680] actually building these models.
[02:01.680 --> 02:03.640] That's what I'm currently doing
[02:03.640 --> 02:05.760] during the evenings and weekends.
[02:05.760 --> 02:09.400] I'm also lecturer at the University of Applied Sciences, Vienna.
[02:09.400 --> 02:12.080] And there I'm teaching data mining
[02:12.080 --> 02:15.160] and data warehousing to master students, essentially.
[02:16.240 --> 02:19.080] Before TMNL, I was at bold.com,
[02:19.080 --> 02:21.960] which is the largest eCommerce retailer in the Netherlands.
[02:21.960 --> 02:25.040] So I always tend to see the Amazon of the Netherlands
[02:25.040 --> 02:27.560] or been a lux actually.
[02:27.560 --> 02:30.920] It is still the biggest eCommerce retailer in the Netherlands
[02:30.920 --> 02:32.960] even before Amazon actually.
[02:32.960 --> 02:36.160] And there I was an expert machine learning engineer.
[02:36.160 --> 02:39.240] So doing somewhat comparable stuff,
[02:39.240 --> 02:42.440] a bit more still focused on the actual modeling part.
[02:42.440 --> 02:44.800] Now it's really more on the infrastructure end.
[02:45.760 --> 02:46.760] And well, before that,
[02:46.760 --> 02:49.360] I spent some time in consulting, leading a data science team.
[02:49.360 --> 02:50.880] That's actually where I kind of come from.
[02:50.880 --> 02:53.360] I really come from originally the data science end.
[02:54.640 --> 02:57.840] And there I kind of started drifting towards ML Ops
[02:57.840 --> 02:59.200] because we started building out
[02:59.200 --> 03:01.640] a deployment and serving platform
[03:01.640 --> 03:04.440] that would as consulting company would make it easier
[03:04.440 --> 03:07.920] for us to deploy models for our clients
[03:07.920 --> 03:10.840] to serve these models, to also monitor these models.
[03:10.840 --> 03:12.800] And that kind of then made me drift further and further
[03:12.800 --> 03:15.520] down the engineering lane all the way to ML Ops.
[03:17.000 --> 03:19.600] Great, yeah, that's a great background.
[03:19.600 --> 03:23.200] I'm kind of curious in terms of the data science
[03:23.200 --> 03:25.240] to ML Ops journey,
[03:25.240 --> 03:27.720] that I think would be a great discussion
[03:27.720 --> 03:29.080] to dig into a little bit.
[03:30.280 --> 03:34.320] My background is originally more on the software engineering
[03:34.320 --> 03:36.920] side and when I was in the Bay Area,
[03:36.920 --> 03:41.160] I did individual contributor and then ran companies
[03:41.160 --> 03:44.240] at one point and ran multiple teams.
[03:44.240 --> 03:49.240] And then as the data science field exploded,
[03:49.240 --> 03:52.880] I hired multiple data science teams and worked with them.
[03:52.880 --> 03:55.800] But what was interesting is that I found that
[03:56.840 --> 03:59.520] I think the original approach of data science
[03:59.520 --> 04:02.520] from my perspective was lacking
[04:02.520 --> 04:07.240] in that there wasn't really like deliverables.
[04:07.240 --> 04:10.520] And I think when you look at a software engineering team,
[04:10.520 --> 04:12.240] it's very clear there's deliverables.
[04:12.240 --> 04:14.800] Like you have a mobile app and it has to get better
[04:14.800 --> 04:15.880] each week, right?
[04:15.880 --> 04:18.200] Where else, what are you doing?
[04:18.200 --> 04:20.880] And so I would love to hear your story
[04:20.880 --> 04:25.120] about how you went from doing kind of more pure data science
[04:25.120 --> 04:27.960] to now it sounds like ML Ops.
[04:27.960 --> 04:30.240] Yeah, yeah, actually.
[04:30.240 --> 04:33.800] So back then in consulting one of the,
[04:33.800 --> 04:36.200] which was still at least back then in Austria,
[04:36.200 --> 04:39.280] data science and everything around it was still kind of
[04:39.280 --> 04:43.720] in this infancy back then 2016 and so on.
[04:43.720 --> 04:46.560] It was still really, really new to many organizations,
[04:46.560 --> 04:47.400] at least in Austria.
[04:47.400 --> 04:50.120] There might be some years behind in the US and stuff.
[04:50.120 --> 04:52.040] But back then it was still relatively fresh.
[04:52.040 --> 04:55.240] So in consulting, what we very often struggled with was
[04:55.240 --> 04:58.520] on the modeling end, problems could be solved,
[04:58.520 --> 05:02.040] but actually then easy deployment,
[05:02.040 --> 05:05.600] keeping these models in production at client side.
[05:05.600 --> 05:08.880] That was always a bit more of the challenge.
[05:08.880 --> 05:12.400] And so naturally kind of I started thinking
[05:12.400 --> 05:16.200] and focusing more on the actual bigger problem that I saw,
[05:16.200 --> 05:19.440] which was not so much building the models,
[05:19.440 --> 05:23.080] but it was really more, how can we streamline things?
[05:23.080 --> 05:24.800] How can we keep things operating?
[05:24.800 --> 05:27.960] How can we make that move easier from a prototype,
[05:27.960 --> 05:30.680] from a PUC to a productionized model?
[05:30.680 --> 05:33.160] Also how can we keep it there and maintain it there?
[05:33.160 --> 05:35.480] So personally I was really more,
[05:35.480 --> 05:37.680] I saw that this problem was coming up
[05:38.960 --> 05:40.320] and that really fascinated me.
[05:40.320 --> 05:44.120] So I started jumping more on that exciting problem.
[05:44.120 --> 05:45.080] That's how it went for me.
[05:45.080 --> 05:47.000] And back then we then also recognized it
[05:47.000 --> 05:51.560] as a potential product in our case.
[05:51.560 --> 05:54.120] So we started building out that deployment
[05:54.120 --> 05:56.960] and serving and monitoring platform, actually.
[05:56.960 --> 05:59.520] And that then really for me, naturally,
[05:59.520 --> 06:01.840] I fell into that rabbit hole
[06:01.840 --> 06:04.280] and I also never wanted to get out of it again.
[06:05.680 --> 06:09.400] So the system that you built initially,
[06:09.400 --> 06:10.840] what was your stack?
[06:10.840 --> 06:13.760] What were some of the things you were using?
[06:13.760 --> 06:17.000] Yeah, so essentially we had,
[06:17.000 --> 06:19.560] when we talk about the stack on the backend,
[06:19.560 --> 06:20.560] there was a lot of,
[06:20.560 --> 06:23.000] so the full backend was written in Java.
[06:23.000 --> 06:25.560] We were using more from a user perspective,
[06:25.560 --> 06:28.040] the contract that we kind of had,
[06:28.040 --> 06:32.560] our goal was to build a drag and drop platform for models.
[06:32.560 --> 06:35.760] So basically the contract was you package your model
[06:35.760 --> 06:37.960] as an MLflow model,
[06:37.960 --> 06:41.520] and then you basically drag and drop it into a web UI.
[06:41.520 --> 06:43.640] It's gonna be wrapped in containers.
[06:43.640 --> 06:45.040] It's gonna be deployed.
[06:45.040 --> 06:45.880] It's gonna be,
[06:45.880 --> 06:49.680] there will be a monitoring layer in front of it
[06:49.680 --> 06:52.760] based on whatever the dataset is you trained it on.
[06:52.760 --> 06:55.920] You would automatically calculate different metrics,
[06:55.920 --> 06:57.360] different distributional metrics
[06:57.360 --> 06:59.240] around your variables that you are using.
[06:59.240 --> 07:02.080] And so we were layering this approach
[07:02.080 --> 07:06.840] to, so that eventually every incoming request would be,
[07:06.840 --> 07:08.160] you would have a nice dashboard.
[07:08.160 --> 07:10.040] You could monitor all that stuff.
[07:10.040 --> 07:12.600] So stackwise it was actually MLflow.
[07:12.600 --> 07:15.480] Specifically MLflow models a lot.
[07:15.480 --> 07:17.920] Then it was Java in the backend, Python.
[07:17.920 --> 07:19.760] There was a lot of Python,
[07:19.760 --> 07:22.040] especially PySpark component as well.
[07:23.000 --> 07:25.880] There was a, it's been quite a while actually,
[07:25.880 --> 07:29.160] there was a quite some part written in Scala.
[07:29.160 --> 07:32.280] Also, because there was a component of this platform
[07:32.280 --> 07:34.800] was also a bit of an auto ML approach,
[07:34.800 --> 07:36.480] but that died then over time.
[07:36.480 --> 07:40.120] And that was also based on PySpark
[07:40.120 --> 07:43.280] and vanilla Spark written in Scala.
[07:43.280 --> 07:45.560] So we could facilitate the auto ML part.
[07:45.560 --> 07:48.600] And then later on we actually added that deployment,
[07:48.600 --> 07:51.480] the easy deployment and serving part.
[07:51.480 --> 07:55.280] So that was kind of, yeah, a lot of custom build stuff.
[07:55.280 --> 07:56.120] Back then, right?
[07:56.120 --> 07:59.720] There wasn't that much MLOps tooling out there yet.
[07:59.720 --> 08:02.920] So you need to build a lot of that stuff custom.
[08:02.920 --> 08:05.280] So it was largely custom built.
[08:05.280 --> 08:09.280] Yeah, the MLflow concept is an interesting concept
[08:09.280 --> 08:13.880] because they provide this package structure
[08:13.880 --> 08:17.520] that at least you have some idea of,
[08:17.520 --> 08:19.920] what is gonna be sent into the model
[08:19.920 --> 08:22.680] and like there's a format for the model.
[08:22.680 --> 08:24.720] And I think that part of MLflow
[08:24.720 --> 08:27.520] seems to be a pretty good idea,
[08:27.520 --> 08:30.080] which is you're creating a standard where,
[08:30.080 --> 08:32.360] you know, if in the case of,
[08:32.360 --> 08:34.720] if you're using scikit learn or something,
[08:34.720 --> 08:37.960] you don't necessarily want to just throw
[08:37.960 --> 08:40.560] like a pickled model somewhere and just say,
[08:40.560 --> 08:42.720] okay, you know, let's go.
[08:42.720 --> 08:44.760] Yeah, that was also our thinking back then.
[08:44.760 --> 08:48.040] So we thought a lot about what would be a,
[08:48.040 --> 08:51.720] what would be, what could become the standard actually
[08:51.720 --> 08:53.920] for how you package models.
[08:53.920 --> 08:56.200] And back then MLflow was one of the little tools
[08:56.200 --> 08:58.160] that was already there, already existent.
[08:58.160 --> 09:00.360] And of course there was data bricks behind it.
[09:00.360 --> 09:02.680] So we also made a bet on that back then and said,
[09:02.680 --> 09:04.920] all right, let's follow that packaging standard
[09:04.920 --> 09:08.680] and make it the contract how you would as a data scientist,
[09:08.680 --> 09:10.800] then how you would need to package it up
[09:10.800 --> 09:13.640] and submit it to the platform.
[09:13.640 --> 09:16.800] Yeah, it's interesting because the,
[09:16.800 --> 09:19.560] one of the, this reminds me of one of the issues
[09:19.560 --> 09:21.800] that's happening right now with cloud computing,
[09:21.800 --> 09:26.800] where in the cloud AWS has dominated for a long time
[09:29.480 --> 09:34.480] and they have 40% market share, I think globally.
[09:34.480 --> 09:38.960] And Azure's now gaining and they have some pretty good traction
[09:38.960 --> 09:43.120] and then GCP's been down for a bit, you know,
[09:43.120 --> 09:45.760] in that maybe the 10% range or something like that.
[09:45.760 --> 09:47.760] But what's interesting is that it seems like
[09:47.760 --> 09:51.480] in the case of all of the cloud providers,
[09:51.480 --> 09:54.360] they haven't necessarily been leading the way
[09:54.360 --> 09:57.840] on things like packaging models, right?
[09:57.840 --> 10:01.480] Or, you know, they have their own proprietary systems
[10:01.480 --> 10:06.480] which have been developed and are continuing to be developed
[10:06.640 --> 10:08.920] like Vertex AI in the case of Google,
[10:09.760 --> 10:13.160] the SageMaker in the case of Amazon.
[10:13.160 --> 10:16.480] But what's interesting is, let's just take SageMaker,
[10:16.480 --> 10:20.920] for example, there isn't really like this, you know,
[10:20.920 --> 10:25.480] industry wide standard of model packaging
[10:25.480 --> 10:28.680] that SageMaker uses, they have their own proprietary stuff
[10:28.680 --> 10:31.040] that kind of builds in and Vertex AI
[10:31.040 --> 10:32.440] has their own proprietary stuff.
[10:32.440 --> 10:34.920] So, you know, I think it is interesting
[10:34.920 --> 10:36.960] to see what's gonna happen
[10:36.960 --> 10:41.120] because I think your original hypothesis which is,
[10:41.120 --> 10:44.960] let's pick, you know, this looks like it's got some traction
[10:44.960 --> 10:48.760] and it wasn't necessarily tied directly to a cloud provider
[10:48.760 --> 10:51.600] because Databricks can work on anything.
[10:51.600 --> 10:53.680] It seems like that in particular,
[10:53.680 --> 10:56.800] that's one of the more sticky problems right now
[10:56.800 --> 11:01.800] with MLopsis is, you know, who's the leader?
[11:02.280 --> 11:05.440] Like, who's developing the right, you know,
[11:05.440 --> 11:08.880] kind of a standard for tooling.
[11:08.880 --> 11:12.320] And I don't know, maybe that leads into kind of you talking
[11:12.320 --> 11:13.760] a little bit about what you're doing currently.
[11:13.760 --> 11:15.600] Like, do you have any thoughts about the, you know,
[11:15.600 --> 11:18.720] current tooling and what you're doing at your current company
[11:18.720 --> 11:20.920] and what's going on with that?
[11:20.920 --> 11:21.760] Absolutely.
[11:21.760 --> 11:24.200] So at my current organization,
[11:24.200 --> 11:26.040] Transaction Monitor Netherlands,
[11:26.040 --> 11:27.480] we are fully on AWS.
[11:27.480 --> 11:32.000] So we're really almost cloud native AWS.
[11:32.000 --> 11:34.840] And so that also means everything we do on the modeling side
[11:34.840 --> 11:36.600] really evolves around SageMaker.
[11:37.680 --> 11:40.840] So for us, specifically for us as MLops team,
[11:40.840 --> 11:44.680] we are building the platform around SageMaker capabilities.
[11:45.680 --> 11:48.360] And on that end, at least company internal,
[11:48.360 --> 11:52.880] we have a contract how you must actually deploy models.
[11:52.880 --> 11:56.200] There is only one way, what we call the golden path,
[11:56.200 --> 11:59.800] in that case, this is the streamlined highly automated path
[11:59.800 --> 12:01.360] that is supported by the platform.
[12:01.360 --> 12:04.360] This is the only way how you can actually deploy models.
[12:04.360 --> 12:09.360] And in our case, that is actually a SageMaker pipeline object.
[12:09.640 --> 12:12.680] So in our company, we're doing large scale batch processing.
[12:12.680 --> 12:15.040] So we're actually not doing anything real time at present.
[12:15.040 --> 12:17.040] We are doing post transaction monitoring.
[12:17.040 --> 12:20.960] So that means you need to submit essentially DAX, right?
[12:20.960 --> 12:23.400] This is what we use for training.
[12:23.400 --> 12:25.680] This is what we also deploy eventually.
[12:25.680 --> 12:27.720] And this is our internal contract.
[12:27.720 --> 12:32.200] You need to provision a SageMaker in your model repository.
[12:32.200 --> 12:34.640] You got to have one place,
[12:34.640 --> 12:37.840] and there must be a function with a specific name
[12:37.840 --> 12:41.440] and that function must return a SageMaker pipeline object.
[12:41.440 --> 12:44.920] So this is our internal contract actually.
[12:44.920 --> 12:46.600] Yeah, that's interesting.
[12:46.600 --> 12:51.200] I mean, and I could see like for, I know many people
[12:51.200 --> 12:53.880] that are using SageMaker in production,
[12:53.880 --> 12:58.680] and it does seem like where it has some advantages
[12:58.680 --> 13:02.360] is that AWS generally does a pretty good job
[13:02.360 --> 13:04.240] at building solutions.
[13:04.240 --> 13:06.920] And if you just look at the history of services,
[13:06.920 --> 13:09.080] the odds are pretty high
[13:09.080 --> 13:12.880] that they'll keep getting better, keep improving things.
[13:12.880 --> 13:17.080] And it seems like what I'm hearing from people,
[13:17.080 --> 13:19.080] and it sounds like maybe with your organization as well,
[13:19.080 --> 13:24.080] is that potentially the SDK for SageMaker
[13:24.440 --> 13:29.120] is really the win versus some of the UX tools they have
[13:29.120 --> 13:32.680] and the interface for Canvas and Studio.
[13:32.680 --> 13:36.080] Is that what's happening?
[13:36.080 --> 13:38.720] Yeah, so I think, right,
[13:38.720 --> 13:41.440] what we try to do is we always try to think about our users.
[13:41.440 --> 13:44.880] So how do our users, who are our users?
[13:44.880 --> 13:47.000] What capabilities and skills do they have?
[13:47.000 --> 13:50.080] And what freedom should they have
[13:50.080 --> 13:52.640] and what abilities should they have to develop models?
[13:52.640 --> 13:55.440] In our case, we don't really have use cases
[13:55.440 --> 13:58.640] for stuff like Canvas because our users
[13:58.640 --> 14:02.680] are fairly mature teams that know how to do their,
[14:02.680 --> 14:04.320] on the one hand, the data science stuff, of course,
[14:04.320 --> 14:06.400] but also the engineering stuff.
[14:06.400 --> 14:08.160] So in our case, things like Canvas
[14:08.160 --> 14:10.320] do not really play so much role
[14:10.320 --> 14:12.960] because obviously due to the high abstraction layer
[14:12.960 --> 14:15.640] of more like graphical user interfaces,
[14:15.640 --> 14:17.360] drag and drop tooling,
[14:17.360 --> 14:20.360] you are also limited in what you can do,
[14:20.360 --> 14:22.480] or what you can do easily.
[14:22.480 --> 14:26.320] So in our case, really, it is the strength of the flexibility
[14:26.320 --> 14:28.320] that the SageMaker SDK gives you.
[14:28.320 --> 14:33.040] And in general, the SDK around most AWS services.
[14:34.080 --> 14:36.760] But also it comes with challenges, of course.
[14:37.720 --> 14:38.960] You give a lot of freedom,
[14:38.960 --> 14:43.400] but also you're creating a certain ask,
[14:43.400 --> 14:47.320] certain requirements for your model development teams,
[14:47.320 --> 14:49.600] which is also why we've also been working
[14:49.600 --> 14:52.600] about abstracting further away from the SDK.
[14:52.600 --> 14:54.600] So our objective is actually
[14:54.600 --> 14:58.760] that you should not be forced to interact with the raw SDK
[14:58.760 --> 15:00.600] when you use SageMaker anymore,
[15:00.600 --> 15:03.520] but you have a thin layer of abstraction
[15:03.520 --> 15:05.480] on top of what you are doing.
[15:05.480 --> 15:07.480] That's actually something we are moving towards
[15:07.480 --> 15:09.320] more and more as well.
[15:09.320 --> 15:11.120] Because yeah, it gives you the flexibility,
[15:11.120 --> 15:12.960] but also flexibility comes at a cost,
[15:12.960 --> 15:15.080] comes often at the cost of speeds,
[15:15.080 --> 15:18.560] specifically when it comes to the 90% default stuff
[15:18.560 --> 15:20.720] that you want to do, yeah.
[15:20.720 --> 15:24.160] And one of the things that I have as a complaint
[15:24.160 --> 15:29.160] against SageMaker is that it only uses virtual machines,
[15:30.000 --> 15:35.000] and it does seem like a strange strategy in some sense.
[15:35.000 --> 15:40.000] Like for example, I guess if you're doing batch only,
[15:40.000 --> 15:42.000] it doesn't matter as much,
[15:42.000 --> 15:45.000] which I think is a good strategy actually
[15:45.000 --> 15:50.000] to get your batch based predictions very, very strong.
[15:50.000 --> 15:53.000] And in that case, maybe the virtual machines
[15:53.000 --> 15:56.000] make a little bit less of a complaint.
[15:56.000 --> 16:00.000] But in the case of the endpoints with SageMaker,
[16:00.000 --> 16:02.000] the fact that you have to spend up
[16:02.000 --> 16:04.000] these really expensive virtual machines
[16:04.000 --> 16:08.000] and let them run 24 seven to do online prediction,
[16:08.000 --> 16:11.000] is that something that your organization evaluated
[16:11.000 --> 16:13.000] and decided not to use?
[16:13.000 --> 16:15.000] Or like, what are your thoughts behind that?
[16:15.000 --> 16:19.000] Yeah, in our case, doing real time
[16:19.000 --> 16:22.000] or near real time inference is currently not really relevant
[16:22.000 --> 16:25.000] for the simple reason that when you think a bit more
[16:25.000 --> 16:28.000] about the money laundering or anti money laundering space,
[16:28.000 --> 16:31.000] typically when, right,
[16:31.000 --> 16:34.000] all every individual bank must do anti money laundering
[16:34.000 --> 16:37.000] and they have armies of people doing that.
[16:37.000 --> 16:39.000] But on the other hand,
[16:39.000 --> 16:43.000] the time it actually takes from one of their systems,
[16:43.000 --> 16:46.000] one of their AML systems actually detecting something
[16:46.000 --> 16:49.000] that's unusual that then goes into a review process
[16:49.000 --> 16:54.000] until it eventually hits the governmental institution
[16:54.000 --> 16:56.000] that then takes care of the cases that have been
[16:56.000 --> 16:58.000] at least twice validated that they are indeed,
[16:58.000 --> 17:01.000] they look very unusual.
[17:01.000 --> 17:04.000] So this takes a while, this can take quite some time,
[17:04.000 --> 17:06.000] which is also why it doesn't really matter
[17:06.000 --> 17:09.000] whether you ship your prediction within a second
[17:09.000 --> 17:13.000] or whether it takes you a week or two weeks.
[17:13.000 --> 17:15.000] It doesn't really matter, hence for us,
[17:15.000 --> 17:19.000] that problem so far thinking about real time inference
[17:19.000 --> 17:21.000] has not been there.
[17:21.000 --> 17:25.000] But yeah, indeed, for other use cases,
[17:25.000 --> 17:27.000] for also private projects,
[17:27.000 --> 17:29.000] we've also been considering SageMaker Endpoints
[17:29.000 --> 17:31.000] for a while, but exactly what you said,
[17:31.000 --> 17:33.000] the fact that you need to have a very beefy machine
[17:33.000 --> 17:35.000] running all the time,
[17:35.000 --> 17:39.000] specifically when you have heavy GPU loads, right,
[17:39.000 --> 17:43.000] and you're actually paying for that machine running 2047,
[17:43.000 --> 17:46.000] although you do have quite fluctuating load.
[17:46.000 --> 17:49.000] Yeah, then that definitely becomes quite a consideration
[17:49.000 --> 17:51.000] of what you go for.
[17:51.000 --> 17:58.000] Yeah, and I actually have been talking to AWS about that,
[17:58.000 --> 18:02.000] because one of the issues that I have is that
[18:02.000 --> 18:07.000] the AWS platform really pushes serverless,
[18:07.000 --> 18:10.000] and then my question for AWS is,
[18:10.000 --> 18:13.000] so why aren't you using it?
[18:13.000 --> 18:16.000] I mean, if you're pushing serverless for everything,
[18:16.000 --> 18:19.000] why is SageMaker nothing serverless?
[18:19.000 --> 18:21.000] And so maybe they're going to do that, I don't know.
[18:21.000 --> 18:23.000] I don't have any inside information,
[18:23.000 --> 18:29.000] but it is interesting to hear you had some similar concerns.
[18:29.000 --> 18:32.000] I know that there's two questions here.
[18:32.000 --> 18:37.000] One is someone asked about what do you do for data versioning,
[18:37.000 --> 18:41.000] and a second one is how do you do event based MLOps?
[18:41.000 --> 18:43.000] So maybe kind of following up.
[18:43.000 --> 18:46.000] Yeah, what do we do for data versioning?
[18:46.000 --> 18:51.000] On the one hand, we're running a data lakehouse,
[18:51.000 --> 18:54.000] where after data we get from the financial institutions,
[18:54.000 --> 18:57.000] from the banks that runs through massive data pipeline,
[18:57.000 --> 19:01.000] also on AWS, we're using glue and step functions actually for that,
[19:01.000 --> 19:03.000] and then eventually it ends up modeled to some extent,
[19:03.000 --> 19:06.000] sanitized, quality checked in our data lakehouse,
[19:06.000 --> 19:10.000] and there we're actually using hoodie on top of S3.
[19:10.000 --> 19:13.000] And this is also what we use for versioning,
[19:13.000 --> 19:16.000] which we use for time travel and all these things.
[19:16.000 --> 19:19.000] So that is hoodie on top of S3,
[19:19.000 --> 19:21.000] when then pipelines,
[19:21.000 --> 19:24.000] so actually our model pipelines plug in there
[19:24.000 --> 19:27.000] and spit out predictions, alerts,
[19:27.000 --> 19:29.000] what we call alerts eventually.
[19:29.000 --> 19:33.000] That is something that we version based on unique IDs.
[19:33.000 --> 19:36.000] So processing IDs, we track pretty much everything,
[19:36.000 --> 19:39.000] every line of code that touched,
[19:39.000 --> 19:43.000] is related to a specific row in our data.
[19:43.000 --> 19:46.000] So we can exactly track back for every single row
[19:46.000 --> 19:48.000] in our predictions and in our alerts,
[19:48.000 --> 19:50.000] what pipeline ran on it,
[19:50.000 --> 19:52.000] which jobs were in that pipeline,
[19:52.000 --> 19:56.000] which code exactly was running in each job,
[19:56.000 --> 19:58.000] which intermediate results were produced.
[19:58.000 --> 20:01.000] So we're basically adding lineage information
[20:01.000 --> 20:03.000] to everything we output along that line,
[20:03.000 --> 20:05.000] so we can track everything back
[20:05.000 --> 20:09.000] using a few tools we've built.
[20:09.000 --> 20:12.000] So the tool you mentioned,
[20:12.000 --> 20:13.000] I'm not familiar with it.
[20:13.000 --> 20:14.000] What is it called again?
[20:14.000 --> 20:15.000] It's called hoodie?
[20:15.000 --> 20:16.000] Hoodie.
[20:16.000 --> 20:17.000] Hoodie.
[20:17.000 --> 20:18.000] Oh, what is it?
[20:18.000 --> 20:19.000] Maybe you can describe it.
[20:19.000 --> 20:22.000] Yeah, hoodie is essentially,
[20:22.000 --> 20:29.000] it's quite similar to other tools such as
[20:29.000 --> 20:31.000] Databricks, how is it called?
[20:31.000 --> 20:32.000] Databricks?
[20:32.000 --> 20:33.000] Delta Lake maybe?
[20:33.000 --> 20:34.000] Yes, exactly.
[20:34.000 --> 20:35.000] Exactly.
[20:35.000 --> 20:38.000] It's basically, it's equivalent to Delta Lake,
[20:38.000 --> 20:40.000] just back then when we looked into
[20:40.000 --> 20:42.000] what are we going to use.
[20:42.000 --> 20:44.000] Delta Lake was not open sourced yet.
[20:44.000 --> 20:46.000] Databricks open sourced a while ago.
[20:46.000 --> 20:47.000] We went for Hoodie.
[20:47.000 --> 20:50.000] It essentially, it is a layer on top of,
[20:50.000 --> 20:53.000] in our case, S3 that allows you
[20:53.000 --> 20:58.000] to more easily keep track of what you,
[20:58.000 --> 21:03.000] of the actions you are performing on your data.
[21:03.000 --> 21:08.000] So it's essentially very similar to Delta Lake,
[21:08.000 --> 21:13.000] just already before an open sourced solution.
[21:13.000 --> 21:15.000] Yeah, that's, I didn't know anything about that.
[21:15.000 --> 21:16.000] So now I do.
[21:16.000 --> 21:19.000] So thanks for letting me know.
[21:19.000 --> 21:21.000] I'll have to look into that.
[21:21.000 --> 21:27.000] The other, I guess, interesting stack related question is,
[21:27.000 --> 21:29.000] what are your thoughts about,
[21:29.000 --> 21:32.000] I think there's two areas that I think
[21:32.000 --> 21:34.000] are interesting and that are emerging.
[21:34.000 --> 21:36.000] Oh, actually there's, there's multiple.
[21:36.000 --> 21:37.000] Maybe I'll just bring them all up.
[21:37.000 --> 21:39.000] So we'll do one by one.
[21:39.000 --> 21:42.000] So these are some emerging areas that I'm, that I'm seeing.
[21:42.000 --> 21:49.000] So one is the concept of event driven, you know,
[21:49.000 --> 21:54.000] architecture versus, versus maybe like a static architecture.
[21:54.000 --> 21:57.000] And so I think obviously you're using step functions.
[21:57.000 --> 22:00.000] So you're a fan of, of event driven architecture.
[22:00.000 --> 22:04.000] Maybe we start, we'll start with that one is what are your,
[22:04.000 --> 22:08.000] what are your thoughts on going more event driven in your organization?
[22:08.000 --> 22:09.000] Yeah.
[22:09.000 --> 22:13.000] In, in, in our case, essentially everything works event driven.
[22:13.000 --> 22:14.000] Right.
[22:14.000 --> 22:19.000] So since we on AWS, we're using event bridge or cloud watch events.
[22:19.000 --> 22:21.000] I think now it's called everywhere.
[22:21.000 --> 22:22.000] Right.
[22:22.000 --> 22:24.000] This is how we trigger pretty much everything in our stack.
[22:24.000 --> 22:27.000] This is how we trigger our data pipelines when data comes in.
[22:27.000 --> 22:32.000] This is how we trigger different, different lambdas that parse our
[22:32.000 --> 22:35.000] certain information from your log, store them in different databases.
[22:35.000 --> 22:40.000] This is how we also, how we, at some point in the back in the past,
[22:40.000 --> 22:44.000] how we also triggered new deployments when new models were approved in
[22:44.000 --> 22:46.000] your model registry.
[22:46.000 --> 22:50.000] So basically everything we've been doing is, is fully event driven.
[22:50.000 --> 22:51.000] Yeah.
[22:51.000 --> 22:56.000] So, so I think this is a key thing you bring up here is that I've,
[22:56.000 --> 23:00.000] I've talked to many people who don't use AWS, who are, you know,
[23:00.000 --> 23:03.000] all alternatively experts at technology.
[23:03.000 --> 23:06.000] And one of the things that I've heard some people say is like, oh,
[23:06.000 --> 23:13.000] well, AWS is in as fast as X or Y, like Lambda is in as fast as X or Y or,
[23:13.000 --> 23:17.000] you know, Kubernetes or, but, but the point you bring up is exactly the
[23:17.000 --> 23:24.000] way I think about AWS is that the true advantage of AWS platform is the,
[23:24.000 --> 23:29.000] is the tight integration with the services and you can design event
[23:29.000 --> 23:31.000] driven workflows.
[23:31.000 --> 23:33.000] Would you say that's, that's absolutely.
[23:33.000 --> 23:34.000] Yeah.
[23:34.000 --> 23:35.000] Yeah.
[23:35.000 --> 23:39.000] I think designing event driven workflows on AWS is incredibly easy to do.
[23:39.000 --> 23:40.000] Yeah.
[23:40.000 --> 23:43.000] And it also comes incredibly natural and that's extremely powerful.
[23:43.000 --> 23:44.000] Right.
[23:44.000 --> 23:49.000] And simply by, by having an easy way how to trigger lambdas event driven,
[23:49.000 --> 23:52.000] you can pretty much, right, pretty much do everything and glue
[23:52.000 --> 23:54.000] everything together that you want.
[23:54.000 --> 23:56.000] I think that gives you a tremendous flexibility.
[23:56.000 --> 23:57.000] Yeah.
[23:57.000 --> 24:00.000] So, so I think there's two things that come to mind now.
[24:00.000 --> 24:07.000] One is that, that if you are developing an ML ops platform that you
[24:07.000 --> 24:09.000] can't ignore Lambda.
[24:09.000 --> 24:12.000] So I, because I've had some people tell me, oh, well, we can do this and
[24:12.000 --> 24:13.000] this and this better.
[24:13.000 --> 24:17.000] It's like, yeah, but if you're going to be on AWS, you have to understand
[24:17.000 --> 24:18.000] why people use Lambda.
[24:18.000 --> 24:19.000] It isn't speed.
[24:19.000 --> 24:24.000] It's, it's the ease of, ease of developing very rich solutions.
[24:24.000 --> 24:25.000] Right.
[24:25.000 --> 24:26.000] Absolutely.
[24:26.000 --> 24:28.000] And then the glue between, between what you are building eventually.
[24:28.000 --> 24:33.000] And you can even almost your, the thoughts in your mind turn into Lambda.
[24:33.000 --> 24:36.000] You know, like you can be thinking and building code so quickly.
[24:36.000 --> 24:37.000] Absolutely.
[24:37.000 --> 24:41.000] Everything turns into which event do I need to listen to and then I trigger
[24:41.000 --> 24:43.000] a Lambda and that Lambda does this and that.
[24:43.000 --> 24:44.000] Yeah.
[24:44.000 --> 24:48.000] And the other part about Lambda that's pretty, pretty awesome is that it
[24:48.000 --> 24:52.000] hooks into services that have infinite scale.
[24:52.000 --> 24:56.000] Like so SQS, like you can't break SQS.
[24:56.000 --> 24:59.000] Like there's nothing you can do to ever take SQS down.
[24:59.000 --> 25:02.000] It handles unlimited requests in and unlimited requests out.
[25:02.000 --> 25:04.000] How many systems are like that?
[25:04.000 --> 25:05.000] Yeah.
[25:05.000 --> 25:06.000] Yeah, absolutely.
[25:06.000 --> 25:07.000] Yeah.
[25:07.000 --> 25:12.000] So then this kind of a followup would be that, that maybe data scientists
[25:12.000 --> 25:17.000] should learn Lambda and step functions in order to, to get to
[25:17.000 --> 25:18.000] MLOps.
[25:18.000 --> 25:21.000] I think that's a yes.
[25:21.000 --> 25:25.000] If you want to, if you want to put the foot into MLOps and you are on AWS,
[25:25.000 --> 25:31.000] then I think there is no way around learning these fundamentals.
[25:31.000 --> 25:32.000] Right.
[25:32.000 --> 25:35.000] There's no way around learning things like what is a Lambda?
[25:35.000 --> 25:39.000] How do I, how do I create a Lambda via Terraform or whatever tool you're
[25:39.000 --> 25:40.000] using there?
[25:40.000 --> 25:42.000] And how do I hook it up to an event?
[25:42.000 --> 25:47.000] And how do I, how do I use the AWS SDK to interact with different
[25:47.000 --> 25:48.000] services?
[25:48.000 --> 25:49.000] So, right.
[25:49.000 --> 25:53.000] I think if you want to take a step into MLOps from, from coming more from
[25:53.000 --> 25:57.000] the data science and it's extremely important to familiarize yourself
[25:57.000 --> 26:01.000] with how do you, at least the fundamentals, how do you architect
[26:01.000 --> 26:03.000] basic solutions on AWS?
[26:03.000 --> 26:05.000] How do you glue services together?
[26:05.000 --> 26:07.000] How do you make them speak to each other?
[26:07.000 --> 26:09.000] So yeah, I think that's quite fundamental.
[26:09.000 --> 26:14.000] Ideally, ideally, I think that's what the platform should take away from you
[26:14.000 --> 26:16.000] as a, as a pure data scientist.
[26:16.000 --> 26:19.000] You don't, should not necessarily have to deal with that stuff.
[26:19.000 --> 26:23.000] But if you're interested in, if you want to make that move more towards MLOps,
[26:23.000 --> 26:27.000] I think learning about infrastructure and specifically in the context of AWS
[26:27.000 --> 26:31.000] about the services and how to use them is really fundamental.
[26:31.000 --> 26:32.000] Yeah, it's good.
[26:32.000 --> 26:33.000] Because this is automation eventually.
[26:33.000 --> 26:37.000] And if you want to automate, if you want to automate your complex processes,
[26:37.000 --> 26:39.000] then you need to learn that stuff.
[26:39.000 --> 26:41.000] How else are you going to do it?
[26:41.000 --> 26:42.000] Yeah, I agree.
[26:42.000 --> 26:46.000] I mean, that's really what, what, what Lambda step functions are is their
[26:46.000 --> 26:47.000] automation tools.
[26:47.000 --> 26:49.000] So that's probably the better way to describe it.
[26:49.000 --> 26:52.000] That's a very good point you bring up.
[26:52.000 --> 26:57.000] Another technology that I think is an emerging technology is the
[26:57.000 --> 26:58.000] managed file system.
[26:58.000 --> 27:05.000] And the reason why I think it's interesting is that, so I 20 plus years
[27:05.000 --> 27:11.000] ago, I was using file systems in the university setting when I was at
[27:11.000 --> 27:14.000] Caltech and then also in film, film industry.
[27:14.000 --> 27:22.000] So film has been using managed file servers with parallel processing
[27:22.000 --> 27:24.000] farms for a long time.
[27:24.000 --> 27:27.000] I don't know how many people know this, but in the film industry,
[27:27.000 --> 27:32.000] the, the, the architecture, even from like 2000 was there's a very
[27:32.000 --> 27:38.000] expensive file server and then there's let's say 40,000 machines or 40,000
[27:38.000 --> 27:39.000] cores.
[27:39.000 --> 27:40.000] And that's, that's it.
[27:40.000 --> 27:41.000] That's the architecture.
[27:41.000 --> 27:46.000] And now what's interesting is I see with data science and machine learning
[27:46.000 --> 27:52.000] operations that like that, that could potentially happen in the future is
[27:52.000 --> 27:57.000] actually a managed NFS mount point with maybe Kubernetes or something like
[27:57.000 --> 27:58.000] that.
[27:58.000 --> 28:01.000] Do you see any of that on the horizon?
[28:01.000 --> 28:04.000] Oh, that's a good question.
[28:04.000 --> 28:08.000] I think for our, for our, what we're currently doing, that's probably a
[28:08.000 --> 28:10.000] bit further away.
[28:10.000 --> 28:15.000] But in principle, I could very well imagine that in our use case, not,
[28:15.000 --> 28:17.000] not quite.
[28:17.000 --> 28:20.000] But in principle, definitely.
[28:20.000 --> 28:26.000] And then maybe a third, a third emerging thing I'm seeing is what's going
[28:26.000 --> 28:29.000] on with open AI and hugging face.
[28:29.000 --> 28:34.000] And that has the potential, but maybe to change the game a little bit,
[28:34.000 --> 28:38.000] especially with hugging face, I think, although both of them, I mean,
[28:38.000 --> 28:43.000] there is that, you know, in the case of pre trained models, here's a
[28:43.000 --> 28:48.000] perfect example is that an organization may have, you know, maybe they're
[28:48.000 --> 28:53.000] using AWS even for this, they're transcribing videos and they're going
[28:53.000 --> 28:56.000] to do something with them, maybe they're going to detect, I don't know,
[28:56.000 --> 29:02.000] like, you know, if you recorded customers in your, I'm just brainstorm,
[29:02.000 --> 29:05.000] I'm not seeing your company did this, but I'm just creating a hypothetical
[29:05.000 --> 29:09.000] situation that they recorded, you know, customer talking and then they,
[29:09.000 --> 29:12.000] they transcribe it to text and then run some kind of a, you know,
[29:12.000 --> 29:15.000] criminal detection feature or something like that.
[29:15.000 --> 29:19.000] Like they could build their own models or they could download the thing
[29:19.000 --> 29:23.000] that was released two days ago or a day ago from open AI that transcribes
[29:23.000 --> 29:29.000] things, you know, and then, and then turn that transcribe text into
[29:29.000 --> 29:34.000] hugging face, some other model that summarizes it and then you could
[29:34.000 --> 29:38.000] feed that into a system. So it's, what is, what is your, what are your
[29:38.000 --> 29:42.000] thoughts around some of these pre trained models and is your, are you
[29:42.000 --> 29:48.000] thinking of in terms of your stack, trying to look into doing fine tuning?
[29:48.000 --> 29:53.000] Yeah, so I think pre trained models and especially the way that hugging face,
[29:53.000 --> 29:57.000] I think really revolutionized the space in terms of really kind of
[29:57.000 --> 30:02.000] platformizing the entire business around or the entire market around
[30:02.000 --> 30:07.000] pre trained models. I think that is really quite incredible and I think
[30:07.000 --> 30:10.000] really for the ecosystem a changing way how to do things.
[30:10.000 --> 30:16.000] And I believe that looking at the, the costs of training large models
[30:16.000 --> 30:19.000] and looking at the fact that many organizations are not able to do it
[30:19.000 --> 30:23.000] for, because of massive costs or because of lack of data.
[30:23.000 --> 30:29.000] I think this is a, this is a clear, makes it very clear how important
[30:29.000 --> 30:33.000] such platforms are, how important sharing of pre trained models actually is.
[30:33.000 --> 30:37.000] I believe it's a, we are only at the, quite at the beginning actually of that.
[30:37.000 --> 30:42.000] And I think we're going to see that nowadays you see it mostly when it
[30:42.000 --> 30:47.000] comes to fairly generalized data format, images, potentially videos, text,
[30:47.000 --> 30:52.000] speech, these things. But I believe that we're going to see more marketplace
[30:52.000 --> 30:57.000] approaches when it comes to pre trained models in a lot more industries
[30:57.000 --> 31:01.000] and in a lot more, in a lot more use cases where data is to some degree
[31:01.000 --> 31:05.000] standardized. Also when you think about, when you think about banking,
[31:05.000 --> 31:10.000] for example, right? When you think about transactions to some extent,
[31:10.000 --> 31:14.000] transaction, transaction data always looks the same, kind of at least at
[31:14.000 --> 31:17.000] every bank. Of course you might need to do some mapping here and there,
[31:17.000 --> 31:22.000] but also there is a lot of power in it. But because simply also thinking
[31:22.000 --> 31:28.000] about sharing data is always a difficult thing, especially in Europe.
[31:28.000 --> 31:32.000] Sharing data between organizations is incredibly difficult legally.
[31:32.000 --> 31:36.000] It's difficult. Sharing models is a different thing, right?
[31:36.000 --> 31:40.000] Basically, similar to the concept of federated learning. Sharing models
[31:40.000 --> 31:44.000] is significantly easier legally than actually sharing data.
[31:44.000 --> 31:48.000] And then applying these models, fine tuning them and so on.
[31:48.000 --> 31:52.000] Yeah, I mean, I could just imagine. I really don't know much about
[31:52.000 --> 31:56.000] banking transactions, but I would imagine there could be several
[31:56.000 --> 32:01.000] kinds of transactions that are very normal. And then there's some
[32:01.000 --> 32:06.000] transactions, like if you're making every single second,
[32:06.000 --> 32:11.000] you're transferring a lot of money. And it happens just
[32:11.000 --> 32:14.000] very quickly. It's like, wait, why are you doing this? Why are you transferring money
[32:14.000 --> 32:20.000] constantly? What's going on? Or the huge sum of money only
[32:20.000 --> 32:24.000] involves three different points in the network. Over and over again,
[32:24.000 --> 32:29.000] just these three points are constantly... And so once you've developed
[32:29.000 --> 32:33.000] a model that is anomaly detection, then
[32:33.000 --> 32:37.000] yeah, why would you need to develop another one? I mean, somebody already did it.
[32:37.000 --> 32:41.000] Exactly. Yes, absolutely, absolutely. And that's
[32:41.000 --> 32:45.000] definitely... That's encoded knowledge, encoded information in terms of the model,
[32:45.000 --> 32:49.000] which is not personally... Well, abstracts away from
[32:49.000 --> 32:53.000] but personally identifiable data. And that's really the power. That is something
[32:53.000 --> 32:57.000] that, yeah, as I've said before, you can share significantly easier and you can
[32:57.000 --> 33:03.000] apply to your use cases. The kind of related to this in
[33:03.000 --> 33:09.000] terms of upcoming technologies is, I think, dealing more with graphs.
[33:09.000 --> 33:13.000] And so is that something from a stackwise that your
[33:13.000 --> 33:19.000] company's investigated resource can do? Yeah, so when you think about
[33:19.000 --> 33:23.000] transactions, bank transactions, right? And bank customers.
[33:23.000 --> 33:27.000] So in our case, again, it's a... We only have pseudonymized
[33:27.000 --> 33:31.000] transaction data, so actually we cannot see anything, right? We cannot see names, we cannot see
[33:31.000 --> 33:35.000] iPads or whatever. We really can't see much. But
[33:35.000 --> 33:39.000] you can look at transactions moving between
[33:39.000 --> 33:43.000] different entities, between different accounts. You can look at that
[33:43.000 --> 33:47.000] as a network, as a graph. And that's also what we very frequently do.
[33:47.000 --> 33:51.000] You have your nodes in your network, these are your accounts
[33:51.000 --> 33:55.000] or your presence, even. And the actual edges between them,
[33:55.000 --> 33:59.000] that's what your transactions are. So you have this
[33:59.000 --> 34:03.000] massive graph, actually, that also we as TMNL, as Transaction Montenegro,
[34:03.000 --> 34:07.000] are sitting on. We're actually sitting on a massive transaction graph.
[34:07.000 --> 34:11.000] So yeah, absolutely. For us, doing analysis on top of
[34:11.000 --> 34:15.000] that graph, building models on top of that graph is a quite important
[34:15.000 --> 34:19.000] thing. And like I taught a class
[34:19.000 --> 34:23.000] a few years ago at Berkeley where we had to
[34:23.000 --> 34:27.000] cover graph databases a little bit. And I
[34:27.000 --> 34:31.000] really didn't know that much about graph databases, although I did use one actually
[34:31.000 --> 34:35.000] at one company I was at. But one of the things I learned in teaching that
[34:35.000 --> 34:39.000] class was about the descriptive statistics
[34:39.000 --> 34:43.000] of a graph network. And it
[34:43.000 --> 34:47.000] is actually pretty interesting, because I think most of the time everyone talks about
[34:47.000 --> 34:51.000] median and max min and standard deviation and everything.
[34:51.000 --> 34:55.000] But then with a graph, there's things like centrality
[34:55.000 --> 34:59.000] and I forget all the terms off the top of my head, but you can see
[34:59.000 --> 35:03.000] if there's a node in the network that's
[35:03.000 --> 35:07.000] everybody's interacting with. Absolutely. You can identify communities
[35:07.000 --> 35:11.000] of people moving around a lot of money all the time. For example,
[35:11.000 --> 35:15.000] you can detect different metric features eventually
[35:15.000 --> 35:19.000] doing computations on your graph and then plugging in some model.
[35:19.000 --> 35:23.000] Often it's feature engineering. You're computing between the centrality scores
[35:23.000 --> 35:27.000] across your graph or your different entities. And then
[35:27.000 --> 35:31.000] you're building your features actually. And then you're plugging in some
[35:31.000 --> 35:35.000] model in the end. If you do classic machine learning, so to say
[35:35.000 --> 35:39.000] if you do graph deep learning, of course that's a bit different.
[35:39.000 --> 35:43.000] So basically that could for people that are analyzing
[35:43.000 --> 35:47.000] essentially networks of people or networks, then
[35:47.000 --> 35:51.000] basically a graph database would be step one is
[35:51.000 --> 35:55.000] generate the features which could be centrality.
[35:55.000 --> 35:59.000] There's a score and then you then go and train
[35:59.000 --> 36:03.000] the model based on that descriptive statistic.
[36:03.000 --> 36:07.000] Exactly. So one way how you could think about it is
[36:07.000 --> 36:11.000] whether we need a graph database or not, that always depends on your specific use case
[36:11.000 --> 36:15.000] and what database. We're actually also running
[36:15.000 --> 36:19.000] that using Spark. You have graph frames, you have
[36:19.000 --> 36:23.000] graph X actually. So really stuff in Spark built for
[36:23.000 --> 36:27.000] doing analysis on graphs.
[36:27.000 --> 36:31.000] And then what you usually do is exactly what you said. You are trying
[36:31.000 --> 36:35.000] to build features based on that graph.
[36:35.000 --> 36:39.000] Based on the attributes of the nodes and the attributes on the edges and so on.
[36:39.000 --> 36:43.000] And so I guess in terms of graph databases right
[36:43.000 --> 36:47.000] now, it sounds like maybe the three
[36:47.000 --> 36:51.000] main players maybe are there's Neo4j which
[36:51.000 --> 36:55.000] has been around for a long time. There's I guess Spark
[36:55.000 --> 36:59.000] and then there's also, I forgot what the one is called for AWS
[36:59.000 --> 37:03.000] is it? Neptune, that's Neptune.
[37:03.000 --> 37:07.000] Have you played with all three of those and did you
[37:07.000 --> 37:11.000] like Neptune? Neptune was something we, Spark of course we actually currently
[37:11.000 --> 37:15.000] using for exactly that. Also because it allows us to do
[37:15.000 --> 37:19.000] to keep our stack fairly homogeneous. We did
[37:19.000 --> 37:23.000] also PUC in Neptune a while ago already
[37:23.000 --> 37:27.000] and well Neptune you definitely have essentially two ways
[37:27.000 --> 37:31.000] how to query Neptune either using Gremlin or SparkQL.
[37:31.000 --> 37:35.000] So that means the people, your data science
[37:35.000 --> 37:39.000] need to get familiar with that which then is already one bit of a hurdle
[37:39.000 --> 37:43.000] because usually data scientists are not familiar with either.
[37:43.000 --> 37:47.000] But also what we found with Neptune
[37:47.000 --> 37:51.000] is also that it's not necessarily built for
[37:51.000 --> 37:55.000] as an analytics graph database. It's not necessarily made for
[37:55.000 --> 37:59.000] that. And that then become, then it's sometimes, at least
[37:59.000 --> 38:03.000] for us, it has become quite complicated to handle different performance considerations
[38:03.000 --> 38:07.000] when you actually do fairly complex queries across that graph.
[38:07.000 --> 38:11.000] Yeah, so you're bringing up like a point which
[38:11.000 --> 38:15.000] happens a lot in my experience with
[38:15.000 --> 38:19.000] technology is that sometimes
[38:19.000 --> 38:23.000] the purity of the solution becomes the problem
[38:23.000 --> 38:27.000] where even though Spark isn't necessarily
[38:27.000 --> 38:31.000] designed to be a graph database system, the fact is
[38:31.000 --> 38:35.000] people in your company are already using it. So
[38:35.000 --> 38:39.000] if you just turn on that feature now you can use it and it's not like
[38:39.000 --> 38:43.000] this huge technical undertaking and retraining effort.
[38:43.000 --> 38:47.000] So even if it's not as good, if it works, then that's probably
[38:47.000 --> 38:51.000] the solution your company will use versus I agree with you like a lot of times
[38:51.000 --> 38:55.000] even if a solution like Neo4j is a pretty good example of
[38:55.000 --> 38:59.000] it's an interesting product but
[38:59.000 --> 39:03.000] you already have all these other products like do you really want to introduce yet
[39:03.000 --> 39:07.000] another product into your stack. Yeah, because eventually
[39:07.000 --> 39:11.000] it all comes with an overhead of course introducing it. That is one thing
[39:11.000 --> 39:15.000] it requires someone to maintain it even if it's a
[39:15.000 --> 39:19.000] managed service. Somebody needs to actually own it and look after it
[39:19.000 --> 39:23.000] and then as you said you need to retrain people to also use it effectively.
[39:23.000 --> 39:27.000] So it comes at significant cost and that is really
[39:27.000 --> 39:31.000] something that I believe should be quite critically
[39:31.000 --> 39:35.000] assessed. What is really the game you have? How far can you go with
[39:35.000 --> 39:39.000] your current tooling and then eventually make
[39:39.000 --> 39:43.000] that decision. At least personally I'm really
[39:43.000 --> 39:47.000] not a fan of thinking tooling first
[39:47.000 --> 39:51.000] but personally I really believe in looking at your organization, looking at the people
[39:51.000 --> 39:55.000] what skills are there, looking at how effective
[39:55.000 --> 39:59.000] are these people actually performing certain activities and processes
[39:59.000 --> 40:03.000] and then carefully thinking about what really makes sense
[40:03.000 --> 40:07.000] because it's one thing but people need to
[40:07.000 --> 40:11.000] adopt and use the tooling and eventually it should really speed them up and improve
[40:11.000 --> 40:15.000] how they develop. Yeah, I think it's very
[40:15.000 --> 40:19.000] that's great advice that it's hard to understand how good of advice it is
[40:19.000 --> 40:23.000] because it takes experience getting burned
[40:23.000 --> 40:27.000] creating new technology. I've
[40:27.000 --> 40:31.000] had experiences before where
[40:31.000 --> 40:35.000] one of the mistakes I've made was putting too many different technologies in an organization
[40:35.000 --> 40:39.000] and the problem is once you get enough complexity
[40:39.000 --> 40:43.000] it can really explode and then
[40:43.000 --> 40:47.000] this is the part that really gets scary is that
[40:47.000 --> 40:51.000] let's take Spark for example. How hard is it to hire somebody that knows Spark? Pretty easy
[40:51.000 --> 40:55.000] how hard is it going to be to hire somebody that knows
[40:55.000 --> 40:59.000] Spark and then hire another person that knows the gremlin query
[40:59.000 --> 41:03.000] language for Neptune, then hire another person that knows Kubernetes
[41:03.000 --> 41:07.000] then tire another, after a while if you have so many different kinds of tools
[41:07.000 --> 41:11.000] you have to hire so many different kinds of people that all
[41:11.000 --> 41:15.000] productivity goes to a stop. So it's the hiring as well
[41:15.000 --> 41:19.000] Absolutely, I mean it's virtually impossible
[41:19.000 --> 41:23.000] to find someone who is really well versed with gremlin for example
[41:23.000 --> 41:27.000] it's incredibly hard and I think tech hiring is hard
[41:27.000 --> 41:31.000] by itself already
[41:31.000 --> 41:35.000] so you really need to think about what can I hire for as well
[41:35.000 --> 41:39.000] what expertise can I realistically build up?
[41:39.000 --> 41:43.000] So that's why I think AWS
[41:43.000 --> 41:47.000] even with some of the limitations about the ML platform
[41:47.000 --> 41:51.000] the advantages of using AWS is that
[41:51.000 --> 41:55.000] you have a huge audience of people to hire from and then the same thing like
[41:55.000 --> 41:59.000] Spark, there's a lot of things I don't like about Spark but a lot of people
[41:59.000 --> 42:03.000] use Spark and so if you use AWS and you use Spark
[42:03.000 --> 42:07.000] let's say those two which you are then you're going to have a much easier time
[42:07.000 --> 42:11.000] hiring people, you're going to have a much easier time training people
[42:11.000 --> 42:15.000] there's tons of documentation about it so I think a lot of people
[42:15.000 --> 42:19.000] are very wise that you're thinking that way but a lot of people don't think about that
[42:19.000 --> 42:23.000] they're like oh I've got to use the latest, greatest stuff and this and this and this
[42:23.000 --> 42:27.000] and then their company starts to get into trouble because they can't hire
[42:27.000 --> 42:31.000] people, they can't maintain systems and then productivity starts to
[42:31.000 --> 42:35.000] to degrees. Also something
[42:35.000 --> 42:39.000] not to ignore is the cognitive load you put on a team
[42:39.000 --> 42:43.000] that needs to manage a broad range of very different
[42:43.000 --> 42:47.000] tools or services. It also puts incredible
[42:47.000 --> 42:51.000] cognitive load on that team and you suddenly also need an incredible breadth
[42:51.000 --> 42:55.000] of expertise in that team and that means you're also going
[42:55.000 --> 42:59.000] to create single points of failures if you don't really
[42:59.000 --> 43:03.000] scale up your team.
[43:03.000 --> 43:07.000] It's something to really, I think when you go for
[43:07.000 --> 43:11.000] new tooling you should really look at it from a holistic perspective
[43:11.000 --> 43:15.000] not only about this is the latest and greatest.
[43:15.000 --> 43:19.000] In terms of Europe versus
[43:19.000 --> 43:23.000] US, have you spent much time in the US at all?
[43:23.000 --> 43:27.000] Not at all actually, flying to the US Monday but no, not at all.
[43:27.000 --> 43:31.000] That also would be kind of an interesting
[43:31.000 --> 43:35.000] comparison in that the culture of the United States
[43:35.000 --> 43:39.000] is really this culture of
[43:39.000 --> 43:43.000] I would say more like survival of the fittest or you work
[43:43.000 --> 43:47.000] seven days a week and you're constantly like you don't go on vacation
[43:47.000 --> 43:51.000] and you're proud of it and I think it's not
[43:51.000 --> 43:55.000] a good culture. I'm not saying that's a good thing, I think it's a bad
[43:55.000 --> 43:59.000] thing and that a lot of times the critique people have
[43:59.000 --> 44:03.000] about Europe is like oh will people take vacation all the time and all this
[44:03.000 --> 44:07.000] and as someone who has spent time in both I would say
[44:07.000 --> 44:11.000] yes that's a better approach. A better approach is that people
[44:11.000 --> 44:15.000] should feel relaxed because when
[44:15.000 --> 44:19.000] especially the kind of work you do in MLOPs
[44:19.000 --> 44:23.000] is that you need people to feel comfortable and happy
[44:23.000 --> 44:27.000] and more the question
[44:27.000 --> 44:31.000] what I was going to is that
[44:31.000 --> 44:35.000] I wonder if there is a more productive culture
[44:35.000 --> 44:39.000] for MLOPs in Europe
[44:39.000 --> 44:43.000] versus the US in terms of maintaining
[44:43.000 --> 44:47.000] systems and building software where the US
[44:47.000 --> 44:51.000] what it's really been good at I guess is kind of coming up with new
[44:51.000 --> 44:55.000] ideas and there's lots of new services that get generated but
[44:55.000 --> 44:59.000] the quality and longevity
[44:59.000 --> 45:03.000] is not necessarily the same where I could see
[45:03.000 --> 45:07.000] in the stuff we just talked about which is if you're trying to build a team
[45:07.000 --> 45:11.000] where there's low turnover
[45:11.000 --> 45:15.000] you have very high quality output
[45:15.000 --> 45:19.000] it seems like that maybe organizations
[45:19.000 --> 45:23.000] could learn from the European approach to building
[45:23.000 --> 45:27.000] and maintaining systems for MLOPs.
[45:27.000 --> 45:31.000] I think there's definitely some truth in it especially when you look at the median
[45:31.000 --> 45:35.000] tenure of a tech person in an organization
[45:35.000 --> 45:39.000] I think that is actually still significantly lower in the US
[45:39.000 --> 45:43.000] I'm not sure I think in the Bay Area somewhere around one year or two months or something like that
[45:43.000 --> 45:47.000] compared to Europe I believe
[45:47.000 --> 45:51.000] still fairly low. Here of course in tech people also like to switch companies more often
[45:51.000 --> 45:55.000] but I would say average is still more around
[45:55.000 --> 45:59.000] two years something around that staying with the same company
[45:59.000 --> 46:03.000] also in tech which I think is a bit longer
[46:03.000 --> 46:07.000] than you would typically have it in the US.
[46:07.000 --> 46:11.000] I think from my perspective where I've also built up most of the
[46:11.000 --> 46:15.000] current team I think it's
[46:15.000 --> 46:19.000] super important to hire good people
[46:19.000 --> 46:23.000] and people that fit to the team fit to the company culture wise
[46:23.000 --> 46:27.000] but also give them
[46:27.000 --> 46:31.000] let them not be in a sprint all the time
[46:31.000 --> 46:35.000] it's about having a sustainable way of working in my opinion
[46:35.000 --> 46:39.000] and that sustainable way means you should definitely take your vacation
[46:39.000 --> 46:43.000] and I think usually in Europe we have quite generous
[46:43.000 --> 46:47.000] even by law vacation I mean in Netherlands by law you get 20 days a year
[46:47.000 --> 46:51.000] but most companies give you 25 many IT companies
[46:51.000 --> 46:55.000] 30 per year so that's quite nice
[46:55.000 --> 46:59.000] but I do take that so culture wise it's really everyone
[46:59.000 --> 47:03.000] likes to take vacations whether that's sea level or whether that's an engineer on a team
[47:03.000 --> 47:07.000] and that's in many companies that's also really encouraged
[47:07.000 --> 47:11.000] to have a healthy work life balance
[47:11.000 --> 47:15.000] and of course it's not only about vacations also but growth opportunities
[47:15.000 --> 47:19.000] letting people explore develop themselves
[47:19.000 --> 47:23.000] and not always pushing on max performance
[47:23.000 --> 47:27.000] so really at least I always see like a partnership
[47:27.000 --> 47:31.000] the organization wants to get something from an
[47:31.000 --> 47:35.000] employee but the employee should also be encouraged and developed
[47:35.000 --> 47:39.000] in that organization and I think that is something that in many parts of
[47:39.000 --> 47:43.000] Europe where there is big awareness for that
[47:43.000 --> 47:47.000] so my hypothesis is that
[47:47.000 --> 47:51.000] it's possible that Europe becomes
[47:51.000 --> 47:55.000] the new hub of technology
[47:55.000 --> 47:59.000] and I'll tell you why here's my hypothesis the reason why is that
[47:59.000 --> 48:03.000] in terms of machine learning operations
[48:03.000 --> 48:07.000] I've already talked to multiple people who know the
[48:07.000 --> 48:11.000] data around it like big companies and they've told me that
[48:11.000 --> 48:15.000] it's going to be close to impossible to hire people soon
[48:15.000 --> 48:19.000] because essentially there's too many job openings
[48:19.000 --> 48:23.000] and there's not enough people that know machine learning, machine learning operations, cloud computing
[48:23.000 --> 48:27.000] and so the American culture unfortunately I think
[48:27.000 --> 48:31.000] is so cutthroat that they don't encourage
[48:31.000 --> 48:35.000] people to be loyal to their company
[48:35.000 --> 48:39.000] and in addition to that because there is no universal healthcare system
[48:39.000 --> 48:43.000] in the US
[48:43.000 --> 48:47.000] it's kind of a prisoner's dilemma where nobody
[48:47.000 --> 48:51.000] sees each other and so they're constantly optimizing
[48:51.000 --> 48:55.000] but in the case of machine learning it's a different
[48:55.000 --> 48:59.000] industry where you do really need to have
[48:59.000 --> 49:03.000] some longevity for employees because the systems are very complex
[49:03.000 --> 49:07.000] system to develop and so if the culture of Europe
[49:07.000 --> 49:11.000] which is much more friendly to the worker I think it
[49:11.000 --> 49:15.000] could lead to Europe having
[49:15.000 --> 49:19.000] a better outcome for machine learning operations
[49:19.000 --> 49:23.000] so that's one part of it and then the second part of it is the other thing the US has
[49:23.000 --> 49:27.000] has done that I think Europe
[49:27.000 --> 49:31.000] has done that if I compare Europe versus the US in terms of
[49:31.000 --> 49:35.000] data privacy that I think the US has dropped the ball
[49:35.000 --> 49:39.000] and they haven't done a good job at it but Europe has actually
[49:39.000 --> 49:43.000] done much much better at holding tech companies accountable
[49:43.000 --> 49:47.000] and I think if you asked
[49:47.000 --> 49:51.000] well informed people if they would like some of the
[49:51.000 --> 49:55.000] practices of the United States tech companies to change I think most
[49:55.000 --> 49:59.000] well informed people would say we don't want you to recommend
[49:59.000 --> 50:03.000] bad data like extremist video content
[50:03.000 --> 50:07.000] I mean there's people that are extremists that love it
[50:07.000 --> 50:11.000] or we don't want you to sell our personal information without our consent
[50:11.000 --> 50:15.000] so it could also lead to a better
[50:15.000 --> 50:19.000] outcome for the people
[50:19.000 --> 50:23.000] that are using machine learning and AI in Europe
[50:23.000 --> 50:27.000] so I actually suspect and this is my hypothesis
[50:27.000 --> 50:31.000] who knows if I'm true or not is that I think Europe could be
[50:31.000 --> 50:35.000] the leader from let's say 2022 to
[50:35.000 --> 50:39.000] 2040 in AI and ML because of
[50:39.000 --> 50:43.000] the culture but I don't know that's just one hypothesis I have
[50:43.000 --> 50:47.000] yeah I think around the what you mentioned before
[50:47.000 --> 50:51.000] around the fact that perhaps Turnover is in tech companies here in Europe
[50:51.000 --> 50:55.000] is less I think that definitely helps you build systems that survive the test of time as well
[50:55.000 --> 50:59.000] right I mean everyone had the case when a key engineer
[50:59.000 --> 51:03.000] off boards from a team leaves the company and then you need to
[51:03.000 --> 51:07.000] hire another person right it's long times of not being super productive
[51:07.000 --> 51:11.000] long time not being super effective so you continuously
[51:11.000 --> 51:15.000] lose track that you need
[51:15.000 --> 51:19.000] so I think you could be right there that in the
[51:19.000 --> 51:23.000] longer run when systems really need to be matured and developed over
[51:23.000 --> 51:27.000] longer time Europe might have an edge there
[51:27.000 --> 51:31.000] might be a bit better suited to do that
[51:31.000 --> 51:37.000] the salaries are still higher in the US and also I think many US companies are starting to enter more
[51:37.000 --> 51:41.000] from a people perspective even remote work and everything they're starting to also
[51:41.000 --> 51:45.000] poach more and more engineers from Europe because
[51:45.000 --> 51:49.000] of course vacation and everything and having a healthy work life balance
[51:49.000 --> 51:53.000] is one thing but for many people if you
[51:53.000 --> 51:57.000] give you a 50% higher paycheck that's also a strong argument
[51:57.000 --> 52:01.000] so it's difficult actually to also for Europe to
[52:01.000 --> 52:05.000] keep the engineers here that as well
[52:05.000 --> 52:09.000] no I will say this though if you work remote from
[52:09.000 --> 52:13.000] Europe that's a very different scenario than living
[52:13.000 --> 52:17.000] in the US because you'll see when
[52:17.000 --> 52:21.000] unfortunately the United States since about 1980
[52:21.000 --> 52:25.000] has declined and
[52:25.000 --> 52:29.000] the data around the US is pretty dire
[52:29.000 --> 52:33.000] actually the life expectancy is one of the
[52:33.000 --> 52:37.000] lowest in the world for a G20 country
[52:37.000 --> 52:41.000] so then if you walk through the major
[52:41.000 --> 52:45.000] cities of the US there's just poverty
[52:45.000 --> 52:49.000] everywhere like people are living in very low
[52:49.000 --> 52:53.000] quality conditions where every time I go to Europe
[52:53.000 --> 52:57.000] I go to Munich, I go to London, I go to wherever
[52:57.000 --> 53:01.000] that basically the cities are beautiful
[53:01.000 --> 53:05.000] and well maintained so I think if the cases that if a US company
[53:05.000 --> 53:09.000] let a European live in Europe and work
[53:09.000 --> 53:13.000] remote yeah that could work out because the European
[53:13.000 --> 53:17.000] citizen has an EU citizen has amazing
[53:17.000 --> 53:21.000] healthcare they have the
[53:21.000 --> 53:25.000] safety net their cities aren't basically
[53:25.000 --> 53:29.000] highly unequal but I think it's the
[53:29.000 --> 53:33.000] location of the US in its current form
[53:33.000 --> 53:37.000] I personally wouldn't recommend
[53:37.000 --> 53:41.000] someone from Europe moving to the US because
[53:41.000 --> 53:45.000] unfortunately I think it's a
[53:45.000 --> 53:49.000] great place to live just to be totally honest
[53:49.000 --> 53:53.000] if you're already in Europe and on the flip side I think that
[53:53.000 --> 53:57.000] there's a lot of Americans actually who are very interested in
[53:57.000 --> 54:01.000] universal healthcare in particular is not even
[54:01.000 --> 54:05.000] possible in the US because of the politics in the US
[54:05.000 --> 54:09.000] and a lot of medical bankruptcies occur
[54:09.000 --> 54:13.000] and so from a start up perspective as well
[54:13.000 --> 54:17.000] this is something that people don't talk about in America it's like yeah we're all about
[54:17.000 --> 54:21.000] startups well think about how many more people would be able to
[54:21.000 --> 54:25.000] create a company if you didn't have to worry about going bankrupt
[54:25.000 --> 54:29.000] if you broke your arm or you have some kind of
[54:29.000 --> 54:33.000] sickness or whatever so
[54:33.000 --> 54:37.000] I think it's an interesting trade off
[54:37.000 --> 54:41.000] situation and I would say that the sweet spot might be
[54:41.000 --> 54:45.000] you work for an American company and get the higher salary but you still live in Europe
[54:45.000 --> 54:49.000] that would be the dream scenario I think that's why many people are actually doing it
[54:49.000 --> 54:53.000] I think especially since covid started you can really see it
[54:53.000 --> 54:57.000] before that it wasn't really a thing working for a US company
[54:57.000 --> 55:01.000] who really sits in the US and you're full remote but I think now since 2, 2 and a half years
[55:01.000 --> 55:05.000] it's really becoming reality actually
[55:05.000 --> 55:09.000] interesting yeah well
[55:09.000 --> 55:13.000] hearing a lot of your ideas around
[55:13.000 --> 55:17.000] startups and what you're doing and
[55:17.000 --> 55:21.000] also about how you're a SageMaker
[55:21.000 --> 55:25.000] is there any place that someone can get a hold of you
[55:25.000 --> 55:29.000] if they listen to this on the Orelia platform or
[55:29.000 --> 55:33.000] think content that you're developing yourself or any other information you want to share
[55:33.000 --> 55:37.000] yeah definitely so I think best place to reach out to me and I'm always
[55:37.000 --> 55:41.000] happy to receive a few messages and have a good chat or a virtual coffee
[55:41.000 --> 55:45.000] is via LinkedIn my name is here that's how you can find me on LinkedIn
[55:45.000 --> 55:49.000] I'm also at conferences here and there well in Europe mostly
[55:49.000 --> 55:53.000] typically when there is an MLOps conference you're probably going to see me there
[55:53.000 --> 55:57.000] in one way or another that is something as well
[55:57.000 --> 56:01.000] cool yeah well I'm glad we had a chance to talk
[56:01.000 --> 56:05.000] you taught me a few things that I'm definitely going to follow up on
[56:05.000 --> 56:09.000] and I really appreciate it and hopefully we can talk again soon
[56:09.000 --> 56:13.000] thanks a lot for the chat okay all right

🔥 Hot Course Offers:

🚀 Level Up Your Career:

Learn end-to-end ML engineering from industry veterans at PAIML.COM


Noah Gift guides you through a year-long journey with 52 Weeks of Cloud, a weekly exploration designed for anyone building, managing, or simply curious about modern cloud infrastructure. Each episode digs into a specific technical topic, moving beyond surface-level explanations to offer practical insights you can apply. You’ll hear detailed discussions on the platforms that power the industry-like AWS, Azure, and Google Cloud-and how to navigate multi-cloud strategies effectively. The conversation regularly delves into the orchestration of these systems with Kubernetes and the specialized world of machine learning operations, or MLOps, including the integration and implications of large language models. This isn't just theory; it's a focused look at the tools and methodologies shaping how software is deployed and scaled today. By committing to this podcast, you're essentially getting a structured, expert-led curriculum that breaks down complex subjects into manageable weekly segments, all aimed at building a comprehensive and practical understanding of the cloud ecosystem.
Author: Language: English Episodes: 225

52 Weeks of Cloud
Podcast Episodes
The Little Data Thief Who Could: Chapter Eight-Billionaires Bedazzle [not-audio_url] [/not-audio_url]

Duration: 2:03
https://noahgift.com/articles/ldt-chp8-billionaire-bedazzle/ 🔥 Hot Course Offers:🤖 Master GenAI Engineering - Build Production AI Systems🦀 Learn Professional Rust - Industry-Grade Development📊 AWS AI & Analytics - Scale…
The Little Data Thief Who Could: Chapter Six-Lizard Lair [not-audio_url] [/not-audio_url]

Duration: 2:47
https://noahgift.com/articles/ldt0chp6-lizard-lair/ 🔥 Hot Course Offers:🤖 Master GenAI Engineering - Build Production AI Systems🦀 Learn Professional Rust - Industry-Grade Development📊 AWS AI & Analytics - Scale Your ML i…
The Little Data Thief Who Could: Chapter Five-Mutants Walk Amongst Us [not-audio_url] [/not-audio_url]

Duration: 2:58
https://noahgift.com/articles/ldt-chp5-mutants/ 🔥 Hot Course Offers:🤖 Master GenAI Engineering - Build Production AI Systems🦀 Learn Professional Rust - Industry-Grade Development📊 AWS AI & Analytics - Scale Your ML in Cl…
The Little Data Thief Who Could: Chapter Three-Mud Wrestling in Kauai [not-audio_url] [/not-audio_url]

Duration: 1:56
https://noahgift.com/articles/ldt-chp3-mud-wrestling-kauai/ 🔥 Hot Course Offers:🤖 Master GenAI Engineering - Build Production AI Systems🦀 Learn Professional Rust - Industry-Grade Development📊 AWS AI & Analytics - Scale Y…
Little Data Thief Who Could: Episode Two-Honey Pot [not-audio_url] [/not-audio_url]

Duration: 3:06
https://noahgift.com/articles/ldt-chp2-honeypot/ 🔥 Hot Course Offers:🤖 Master GenAI Engineering - Build Production AI Systems🦀 Learn Professional Rust - Industry-Grade Development📊 AWS AI & Analytics - Scale Your ML in C…
Little Data Thief Who Could:  Episode One [not-audio_url] [/not-audio_url]

Duration: 2:23
https://noahgift.com/articles/little-data-thief-chp1-scrape-to-obey/ 🔥 Hot Course Offers:🤖 Master GenAI Engineering - Build Production AI Systems🦀 Learn Professional Rust - Industry-Grade Development📊 AWS AI & Analytics…
Silicon Valley Collapse, a Science Fiction Short Story by Noah Gift [not-audio_url] [/not-audio_url]

Duration: 2:50
https://noahgift.com/articles/silicon-valley-collapse/ 🔥 Hot Course Offers:🤖 Master GenAI Engineering - Build Production AI Systems🦀 Learn Professional Rust - Industry-Grade Development📊 AWS AI & Analytics - Scale Your M…