WEBVTT 00:00.000 --> 00:14.000 Hi everyone, my name is Alex, welcome everyone who woke up so early to visit this room, 00:14.000 --> 00:19.000 there's a software performance, the room, I'm a the room organizer, I'm not an Bluetooth 00:19.000 --> 00:24.000 work but let's find, we didn't have enough of them for us. 00:24.000 --> 00:31.000 So, and for today, I wanted to talk about one interesting topic, interesting to me, 00:31.000 --> 00:37.000 about software performance, but not about actual benchmarks, so I'm a state-of-the-art performance, 00:37.000 --> 00:39.000 it's several domains, something like that. 00:39.000 --> 00:46.000 I wanted to talk about how accessible this software performance is for regular people. 00:46.000 --> 00:55.000 For like-for-ups, a few words about me, I'm a regular C++ engineer, right now I'm a Rust engineer, 00:55.000 --> 01:02.000 I'm interested in different compiler stuff, especially LVM, I prefer LVM over GCC for several reasons. 01:02.000 --> 01:11.000 I spent several years in C++ Commits Thunder Committee, I'm the offer of the awesome PGO project, if you hear about it, 01:11.000 --> 01:17.000 and the room organizer, and I actually actually like rapids of swear. 01:17.000 --> 01:22.000 So, what do we usually hear about software? 01:22.000 --> 01:27.000 Usually, at first, different benchmarks, good benchmark report, and not so good benchmark reports, 01:27.000 --> 01:33.000 please don't use the compound or benches, there's serious stuff, different engineering and blogs, 01:33.000 --> 01:40.000 something like we rewrote something, usually in Rust, and it became blazingly fast, 01:40.000 --> 01:49.000 hardcore, and low-level optimization stuff, possibly from FFMP guys, something, rewriting in assembly, 01:49.000 --> 01:56.000 and achieving great performance for conversions, and academic papers like, for example, a SIMD GISON, 01:56.000 --> 02:02.000 like a normal technist, pretty noble technics for like three years ago, that was invented. 02:02.000 --> 02:09.000 But one point is frequently missed, how all of this software performance equipment are easily, 02:09.000 --> 02:15.000 to be used in everyday life. For example, check with one. 02:15.000 --> 02:24.000 These are two titles from Faronics, and actually for this performance improvements for the Linux kernel, 02:24.000 --> 02:35.000 and they are pretty significant, but can you say right now, which performance improvement is ready to use right now, 02:35.000 --> 02:44.000 and which one is really is not achievable even after one year or two, with current tooling? 02:44.000 --> 02:53.000 That's all the idea of my talk. So, let's start from the compilers, the compilation models, and that's driven optimizations. 02:53.000 --> 03:00.000 So, actually there are two major compilation models ahead of time, and just a time compilation model, 03:01.000 --> 03:07.000 and there are a pretty much differences between them. 03:07.000 --> 03:14.000 For example, ahead of time compilation model is not limited in time, so much as a Justin time, 03:14.000 --> 03:20.000 regarding how much time can we spend in performing different optimizations. 03:20.000 --> 03:27.000 For example, we can spend several hours in extreme cases, even days for optimizing one binary. 03:27.000 --> 03:32.000 If we can achieve, for example, one additional percent of performance. 03:32.000 --> 03:37.000 In Justin time model, it's impossible, because it's usually run on a target machine, 03:37.000 --> 03:43.000 it has limited time frame to perform optimizations, et cetera, et cetera. 03:43.000 --> 03:48.000 However, Justin time model has one advantage. 03:48.000 --> 03:54.000 It can collect on a target machine running for cloth, type of a cloth, 03:54.000 --> 03:58.000 which paths of the coat are executed, how frequently et cetera. 03:58.000 --> 04:02.000 At this information can be used during the compilers' optimization. 04:02.000 --> 04:08.000 For example, more precise in lining, in lining was one of the most important optimizations, 04:08.000 --> 04:15.000 hot cold splitting for better utilisation, CPU, iKesh, et cetera. 04:15.000 --> 04:20.000 Fortunately, in the ahead of time world, it's not available, right? 04:20.000 --> 04:26.000 It's not available, because on a target machine, we don't have a virtual machine 04:26.000 --> 04:30.000 which can collect these metrics, we just have our own binary. 04:30.000 --> 04:37.000 So, for ahead of time, we arose implemented a technology which is called profile guided optimizations. 04:37.000 --> 04:40.000 Opidure, the idea is to practice the same. 04:40.000 --> 04:46.000 We need to collect profile on a target machine, pass it to the compiler, probably convert, and that's it. 04:47.000 --> 04:51.000 It's a pegoor, that's important, so do we need to carry about it. 04:51.000 --> 04:57.000 So, I'll collect some benchmarks, that's part of the awesome Pedro project, and as you see, 04:57.000 --> 05:00.000 a performance improvement, appropriate huge. 05:00.000 --> 05:07.000 Sometimes, even two X, for example, it's about MongoDB, that's not so good database. 05:07.000 --> 05:14.000 I would say, something from this perspective, for different libraries, compilers, 05:14.000 --> 05:20.000 local analyzers, databases, et cetera, improvements are really huge. 05:20.000 --> 05:23.000 So, we need to care about it. 05:23.000 --> 05:36.000 However, so, in theory, enabling Pedro is a simple process, like just recompile your project. 05:36.000 --> 05:44.000 If you're a special compiler switches, run target work load, collect profiles, and that's it. 05:44.000 --> 05:50.000 However, if you will try to do it, you will get this one. 05:50.000 --> 05:59.000 You will get a lot, and a lot of additional problems, which almost don't exist in just in time world. 06:00.000 --> 06:04.000 You will have at first double or triple compilation model. 06:04.000 --> 06:09.000 In extreme cases, you will need to compile your head of time binary four times. 06:09.000 --> 06:15.000 In some extreme, a Pedro scenario, it's pretty huge overhead on CI. 06:15.000 --> 06:19.000 You need to think about profiles, cube between different workloads. 06:19.000 --> 06:26.000 You need to think about merging profiles for the same binary, but from different workloads. 06:26.000 --> 06:32.000 You need to think about profile storage, if you want to reproduce your binary, et cetera, et cetera. 06:32.000 --> 06:36.000 There are a lot of additional problems. You need to solve them. 06:36.000 --> 06:39.000 There are some systems that can help you with that. 06:39.000 --> 06:46.000 For example, there is a dedicated way to do Pedro, which is called sampling Pedro. 06:46.000 --> 06:50.000 It eliminates instrumentation overhead from the instrumentation Pedro. 06:50.000 --> 06:54.000 So, instrumentation Pedro has overhead several times. 06:54.000 --> 06:59.000 On the run time, sampling Pedro can have one percent, for example. 06:59.000 --> 07:06.000 Unfortunately, for using sampling Pedro, you need a bunch of additional tooling, one more, 07:06.000 --> 07:10.000 and you need to install it, update, et cetera. 07:10.000 --> 07:16.000 For example, if you want to use parca, or this yandex-perforator, 07:16.000 --> 07:22.000 with system-wide profile or open source by yandex, which supports profiling, 07:22.000 --> 07:29.000 the wall-serer fleet of binaries, and using these profiles during the profile guide optimization phase. 07:29.000 --> 07:32.000 It can be integrated on CI, et cetera, et cetera. 07:32.000 --> 07:39.000 Let's see how, actually, yandex Google simply didn't open source their own system. 07:39.000 --> 07:45.000 Google white profiler, and actually, many other big projects are doing inside it. 07:45.000 --> 07:51.000 However, as you see, you need to maintain additional amount of infrastructure. 07:51.000 --> 07:56.000 For example, path-greSQL, usually cluster, click-house, cluster, and something, 07:56.000 --> 07:58.000 some extra storage. 07:58.000 --> 08:01.000 It's not that friendly. 08:01.000 --> 08:09.000 So, you need to do a lot, really, a lot of additional work. 08:09.000 --> 08:20.000 Just too much, the ability to mimic optimization of a just-in-time world. 08:20.000 --> 08:25.000 Usually, we think that just-in-time optimizes worse when they head-of-time. 08:25.000 --> 08:32.000 However, for workload-specific optimizations, just-in-time is just simpler to use, 08:32.000 --> 08:35.000 compared to Pedro, just-in-time really just works. 08:35.000 --> 08:42.000 For example, in every browser of 8, the 8th JavaScript engine, that's it. 08:42.000 --> 08:51.000 So, we can try to eliminate at least part of this complexity by proper documentation. 08:51.000 --> 08:56.000 I thought so, however, there are a lot of traps in this way too. 08:56.000 --> 09:01.000 For example, performance optimization can take several forms, 09:01.000 --> 09:04.000 or can have forms of really good books. 09:04.000 --> 09:11.000 Like system-performance or brand-run-grade hackers, the light-famous, and inter-architect-specific optimizations. 09:11.000 --> 09:15.000 Or any other form, like an official performance book, 09:15.000 --> 09:21.000 project-specific simulation guidelines, like Reddit, staff, YouTube, coding influencers, 09:21.000 --> 09:24.000 and a lot of different videos. 09:24.000 --> 09:27.000 Even this talk. 09:27.000 --> 09:31.000 But there is one issue with all of that. 09:31.000 --> 09:37.000 Unfortunately, people just don't read it, don't wash it, and don't listen it. 09:37.000 --> 09:45.000 And we are going once again, so we try to document something, like all of this Pedro stuff, 09:45.000 --> 09:51.000 fancy how to avoid our problems, and people still will not do it. 09:51.000 --> 09:55.000 Just because we don't care, we don't want to read, we don't have time, 09:55.000 --> 09:59.000 we have work life balance, etc. 09:59.000 --> 10:05.000 And just in time, it's simply better from this perspective. 10:05.000 --> 10:13.000 Right right to eliminate for several frameworks. 10:13.000 --> 10:19.000 I tried to push a little bit more optimizations to the upstream, 10:19.000 --> 10:22.000 like to the documentation of these libraries. 10:22.000 --> 10:26.000 Rata 2 is the most popular terminal user interface library in Rust. 10:26.000 --> 10:30.000 They have a dedicated optimization guideline. 10:30.000 --> 10:36.000 They also have a similar framework in Rust, dedicated optimization guideline, 10:36.000 --> 10:38.000 and towering. 10:38.000 --> 10:40.000 Almost the same. 10:40.000 --> 10:46.000 However, I wanted to check how effective is this solution just putting 10:46.000 --> 10:50.000 a documentation and crossing the fingers. 10:50.000 --> 10:56.000 So, I checked a lot of GitHub projects, 10:56.000 --> 11:02.000 and just found that people offers all these projects. 11:02.000 --> 11:06.000 They have written their applications. 11:06.000 --> 11:10.000 They didn't enable all of these optimizations from documentation. 11:10.000 --> 11:12.000 They were existing at the moment. 11:12.000 --> 11:16.000 And when I created a PR, it's enabling all of these optimizations. 11:16.000 --> 11:18.000 They happily accept it. 11:18.000 --> 11:20.000 Almost all of them. 11:20.000 --> 11:24.000 I have a pretty high conversion rate, for example, like 90%. 11:24.000 --> 11:28.000 That's a photography part. 11:28.000 --> 11:32.000 That's for Rata 2 documentation. 11:32.000 --> 11:40.000 And even when you're contributed, 11:40.000 --> 11:42.000 some changes to the documentation, 11:42.000 --> 11:44.000 like I did, for example, for Rata 2. 11:44.000 --> 11:46.000 It was my contribution. 11:46.000 --> 11:48.000 There is one more problem. 11:48.000 --> 11:52.000 It will work, probably, probably work. 11:52.000 --> 11:54.000 Only one newly created applications. 11:54.000 --> 11:58.000 And all applications in the ecosystem, which I already exist in, 11:58.000 --> 12:00.000 highly likely, will not be updated. 12:00.000 --> 12:04.000 Simply because developers of these applications 12:04.000 --> 12:08.000 will not find an update in the documentation of their framework. 12:08.000 --> 12:10.000 They don't care. 12:10.000 --> 12:14.000 They just write a wrong way of application one time, and that's it. 12:14.000 --> 12:20.000 So, and even if they read documentation carefully. 12:20.000 --> 12:22.000 The documentation doesn't care. 12:22.000 --> 12:24.000 In many cases, really important details. 12:24.000 --> 12:28.000 For example, that's why OSIMPJOR project was created, 12:28.000 --> 12:34.000 because I tried to apply PJOR for several databases. 12:34.000 --> 12:38.000 Let's say PostgreSQL, Escalated, etc. 12:38.000 --> 12:42.000 And unfortunately, I found so many issues in current 12:42.000 --> 12:44.000 documentation in the PJOR ecosystem. 12:44.000 --> 12:48.000 So, I already spent three years in discovering 12:48.000 --> 12:52.000 all the issues and hidden gems at treps 12:52.000 --> 12:56.000 on the way, and I'm still not finished. 12:56.000 --> 13:00.000 In many cases, the documentation is outdated, 13:00.000 --> 13:04.000 and even worse, the documentation is outdated in such a way 13:04.000 --> 13:07.000 that you cannot understand it, it's outdated. 13:07.000 --> 13:13.000 For example, where it was a manual in client, 13:13.000 --> 13:17.000 how to use PJOR, like a simple PJOR manual. 13:17.000 --> 13:22.000 And this manual consisted of two PJOR modes. 13:22.000 --> 13:27.000 It's called Frontend PJOR, F-PJOR, and IR PJOR, 13:27.000 --> 13:31.000 intermediate representation PJOR, intermediate representation PJOR, 13:31.000 --> 13:33.000 like L-O-V-R, IR. 13:33.000 --> 13:41.000 So, by default, client was recommending using Frontend PJOR. 13:41.000 --> 13:45.000 That's like in the guide line, a visual guide line from the compiler, 13:45.000 --> 13:49.000 and only in the edge end of the instruction, 13:49.000 --> 13:53.000 where it was written a small note, like very alternative way 13:53.000 --> 13:57.000 to use PJOR, with the IR PJOR. 13:57.000 --> 14:01.000 However, I randomly found an issue 14:01.000 --> 14:07.000 that, actually, Frontend PJOR is deprecated. 14:07.000 --> 14:12.000 And no one even written a note anywhere 14:12.000 --> 14:15.000 in the documentation or whatever. 14:15.000 --> 14:17.000 It was an internal knowledge in Google, 14:17.000 --> 14:21.000 because Google was implementing most PJOR staff in LVM. 14:21.000 --> 14:24.000 And we simply didn't put a note, and that's it. 14:24.000 --> 14:27.000 And the issue was created in 2020, 14:27.000 --> 14:32.000 and I only three years after that discovered this change, 14:32.000 --> 14:35.000 only three years, and I need to ask additionally 14:35.000 --> 14:38.000 from PJOR developers from Google 14:38.000 --> 14:44.000 at the LVM discourse forum, and we answered my question. 14:44.000 --> 14:49.000 So, unfortunately, there are already open source projects, 14:49.000 --> 14:54.000 a pretty huge open source projects that are already integrated PJOR. 14:54.000 --> 14:58.000 And they are integrated in the wrong way. 14:58.000 --> 15:02.000 They integrated Frontend PJOR, and they were using 15:02.000 --> 15:04.000 for many, many years. 15:04.000 --> 15:06.000 It wasn't that critical. 15:06.000 --> 15:09.000 You didn't need something performance. 15:09.000 --> 15:13.000 Yes, unfortunately, they did. 15:13.000 --> 15:17.000 When the client documentation was changed, 15:17.000 --> 15:21.000 hardly pushed them to change a lot of things 15:21.000 --> 15:24.000 in their PJOR guideline. 15:24.000 --> 15:29.000 After some time, you go by the developers. 15:29.000 --> 15:34.000 Somehow found it, probably because I reported an issue, 15:34.000 --> 15:35.000 to the upstream. 15:35.000 --> 15:40.000 And they decided to implement PJOR for their database, 15:40.000 --> 15:44.000 and they got additional 10% improvement. 15:44.000 --> 15:50.000 So, for many years, I guess, at least two or three years, 15:50.000 --> 15:54.000 you got by the view was missing additional 10% 15:54.000 --> 15:58.000 of performance for their and their users, 15:58.000 --> 16:01.000 just because the documentation was, 16:01.000 --> 16:05.000 I wouldn't say, line, it was updated. 16:05.000 --> 16:10.000 It was updated in a pretty dirty way. 16:10.000 --> 16:16.000 And sometimes, the documentation is simply line. 16:16.000 --> 16:19.000 I had the case with SQLite. 16:19.000 --> 16:23.000 So, when I started investigating PJOR for different database, 16:23.000 --> 16:27.000 of course, I started from particular SQL, 16:27.000 --> 16:30.000 my SQL, and someday I came to SQLite, 16:30.000 --> 16:34.000 SQLite is fast database, so I wanted to optimize it even more. 16:34.000 --> 16:39.000 SQLite and the documentation had a dedicated note 16:39.000 --> 16:43.000 that PJOR doesn't help to optimize SQLite. 16:44.000 --> 16:46.000 It was a dedicated note. 16:46.000 --> 16:50.000 I, of course, I tested it on my own hardware, 16:50.000 --> 16:53.000 and I got 10% improvement. 16:53.000 --> 16:58.000 Of course, I reported to the upstream, they didn't believe me. 16:58.000 --> 17:02.000 I needed to perform a bunch of additional benchmarks 17:02.000 --> 17:04.000 on different compilers. 17:04.000 --> 17:07.000 So, at first, I started with Clank. 17:07.000 --> 17:09.000 They asked, please use GCC. 17:09.000 --> 17:11.000 Okay, I reproduced on GCC. 17:11.000 --> 17:15.000 So, later, they asked, so your benchmark suit, 17:15.000 --> 17:19.000 I used Clankbench for lab workloads. 17:19.000 --> 17:21.000 So, please use our own data. 17:21.000 --> 17:25.000 Our own benchmark, we have speed test, something. 17:25.000 --> 17:27.000 So, okay, I implemented once again. 17:27.000 --> 17:31.000 I reproduced results with PJOR improvement once again, 17:31.000 --> 17:36.000 10% on the after that, they just deleted this note 17:36.000 --> 17:39.000 from the documentation, and they never added a note 17:39.000 --> 17:43.000 about the PJOR helps to improve SQLite performance. 17:43.000 --> 17:49.000 And this topic on the forum is still unanswered. 17:49.000 --> 17:53.000 So, if documentation doesn't work, 17:53.000 --> 17:55.000 maybe tooling can help us. 17:55.000 --> 17:57.000 Okay, let's try. 17:57.000 --> 18:01.000 How tooling can interior help us. 18:01.000 --> 18:05.000 Okay, of course, we can try to automate some best practices 18:05.000 --> 18:10.000 from optimization guidelines, from framework, et cetera. 18:10.000 --> 18:14.000 Of course, we can do some kind of convenient benchmark 18:14.000 --> 18:16.000 in the course, how can we measure performance 18:16.000 --> 18:19.000 without a good benchmark is impossible. 18:19.000 --> 18:22.000 So, actually, I've seen several projects 18:22.000 --> 18:27.000 that we are measuring performance 18:27.000 --> 18:29.000 of improvement by NI. 18:29.000 --> 18:33.000 Actually, I am not lying into that GUI stuff. 18:34.000 --> 18:38.000 We can try to automate optimization routines, 18:38.000 --> 18:41.000 for example, at least semi-automatic PJOR stuff. 18:41.000 --> 18:45.000 And, of course, profilerists, if a good visualization, 18:45.000 --> 18:51.000 for example, imagine the newbie perspective 18:51.000 --> 18:57.000 on Intel VTUN versus Linux Perf, which one is more friendly. 18:57.000 --> 19:00.000 From my perspective, I'm kind of in UB. 19:01.000 --> 19:06.000 I was in UB in profiling, Intel VTUN is much, much more friendly. 19:06.000 --> 19:12.000 So, let's start with an example, creating a new application 19:12.000 --> 19:14.000 from a template. 19:14.000 --> 19:17.000 We can try to integrate in some templates, 19:17.000 --> 19:20.000 ready to use template, our old recommendations, 19:20.000 --> 19:21.000 regarding the link time optimization, 19:21.000 --> 19:23.000 called Gen Units, whatever you want, 19:23.000 --> 19:28.000 like all our favorite compilers wishes. 19:29.000 --> 19:32.000 And, there are, actually, already, 19:32.000 --> 19:34.000 a lot of, like, template, generators, 19:34.000 --> 19:36.000 and ready to use templates. 19:36.000 --> 19:38.000 I use the same framework, 19:38.000 --> 19:40.000 rather, to the towering and dieoxos. 19:40.000 --> 19:44.000 And, I integrated, 19:44.000 --> 19:49.000 Intel, rather, to templates, these recommendations. 19:49.000 --> 19:52.000 They are already merged, and ready to use. 19:52.000 --> 19:54.000 Unfortunately, towering and dieoxos, 19:54.000 --> 19:56.000 I created an issue in UB, 19:56.000 --> 19:58.000 and, again, they are not that interested. 19:58.000 --> 20:01.000 They are still unanswered. 20:01.000 --> 20:06.000 So, when, if a router to user, right now, 20:06.000 --> 20:10.000 we'll try to use a cargo-generate tool, 20:10.000 --> 20:13.000 it's kind of the same standard tool in the RASTEK system, 20:13.000 --> 20:17.000 and we'll try to create a new application for the router tool. 20:17.000 --> 20:19.000 It's, it's, how recommended, 20:19.000 --> 20:21.000 by the documentation from templates. 20:21.000 --> 20:25.000 These optimizations will be enabled from the day one. 20:25.000 --> 20:29.000 Not sometimes when I will create one and other PR 20:29.000 --> 20:32.000 on the GitHub, please enable them from the day one. 20:32.000 --> 20:35.000 However, this way still has issues. 20:35.000 --> 20:38.000 People don't use project generators. 20:38.000 --> 20:43.000 For various reasons, because they were just like copy paste 20:43.000 --> 20:45.000 from their previous project, 20:45.000 --> 20:48.000 they don't know about generators, 20:48.000 --> 20:52.000 and they simply wipe coding stuff, 20:52.000 --> 20:55.000 that's like a template engine, let's say, 20:55.000 --> 20:59.000 and this wipe coded templates, 20:59.000 --> 21:01.000 I'm not poisoned enough, 21:01.000 --> 21:04.000 it's optimization guidelines, let's say. 21:04.000 --> 21:09.000 And already written applications will not be covered anyway, 21:09.000 --> 21:12.000 because when you change S, 21:12.000 --> 21:14.000 the same situation is the documentation. 21:14.000 --> 21:17.000 When you change it template, 21:17.000 --> 21:21.000 it doesn't apply to the already created applications, 21:21.000 --> 21:26.000 because people don't use generators only on the start. 21:26.000 --> 21:30.000 In our example, let's compare benchmarking 21:30.000 --> 21:33.000 between Rust and C++ stuff, 21:33.000 --> 21:35.000 how the ecosystems are different, 21:35.000 --> 21:39.000 and which one is more accessible to the developer. 21:39.000 --> 21:41.000 benchmarking, of course, is needed 21:41.000 --> 21:43.000 to measure performance, as I said. 21:43.000 --> 21:46.000 So, good example is how it's done in Rust. 21:46.000 --> 21:49.000 Rust, very, very, 21:50.000 --> 21:53.000 by default, that's a default ecosystem, 21:53.000 --> 21:55.000 it's called Cargabange. 21:55.000 --> 21:59.000 I think I don't think that you can find 21:59.000 --> 22:02.000 any benchmark in the Rust ecosystem 22:02.000 --> 22:05.000 that will not be running by Cargabange command, 22:05.000 --> 22:06.000 that's a standard, 22:06.000 --> 22:11.000 and I did it hundreds of times for many, many projects, 22:11.000 --> 22:14.000 it's always done in the same way. 22:14.000 --> 22:17.000 Let's compare it to C++. 22:18.000 --> 22:20.000 We can start from the simple question, 22:20.000 --> 22:22.000 which benchmark library should we use? 22:22.000 --> 22:26.000 Most people will say Google benchmark, okay, let's find. 22:26.000 --> 22:31.000 How to connect, how to add the Google benchmark to your project? 22:31.000 --> 22:34.000 There are a lot of different ways. 22:34.000 --> 22:35.000 There are, 22:35.000 --> 22:38.000 language-specific package managers like Conan, 22:38.000 --> 22:41.000 the Cpc, of course, system dependencies, 22:41.000 --> 22:44.000 like your favorite package manager on the system, 22:44.000 --> 22:46.000 maybe an XOS, maybe whatever. 22:46.000 --> 22:48.000 And next question, 22:48.000 --> 22:50.000 how to run with benchmark? 22:50.000 --> 22:52.000 In Rust, it's a very standard command. 22:52.000 --> 22:54.000 In C++, 22:54.000 --> 22:57.000 there is no any standard command to run a benchmark. 22:57.000 --> 23:00.000 You will need, in the case of Google benchmark, 23:00.000 --> 23:02.000 you will need to build a binary, 23:02.000 --> 23:04.000 and run with binary. 23:04.000 --> 23:06.000 And if it's your project, 23:06.000 --> 23:07.000 it's fine. 23:07.000 --> 23:10.000 But if you are running benchmarks for your 23:10.000 --> 23:13.000 third-party project, you will need to figure out how run 23:13.000 --> 23:15.000 with benchmark, read, read, read me, 23:15.000 --> 23:17.000 and usually you need to read, 23:17.000 --> 23:20.000 Simaq files, make files, whatever. 23:20.000 --> 23:22.000 And of course, it's like a Zoo, 23:22.000 --> 23:24.000 a C++. 23:24.000 --> 23:26.000 And now, a good tool, 23:26.000 --> 23:29.000 a good example of such helpful tooling, 23:29.000 --> 23:32.000 supported with Cargo Vizor tool. 23:32.000 --> 23:37.000 That's a tool that tries to automatically 23:37.000 --> 23:42.000 apply, require some helpful optimizations 23:42.000 --> 23:44.000 into your project. 23:44.000 --> 23:47.000 So, how is it helpful? 23:47.000 --> 23:49.000 When you have this tool, 23:49.000 --> 23:51.000 you don't need to learn the 23:51.000 --> 23:54.000 world documentation for the RustC compiler. 23:54.000 --> 23:58.000 You don't need to be an expert in different compiler optimizations. 23:58.000 --> 24:00.000 What does it mean? 24:00.000 --> 24:04.000 About the different modes and the differences. 24:04.000 --> 24:07.000 You can just run this Vizor. 24:07.000 --> 24:12.000 This Vizor tool will do pretty good defaults 24:12.000 --> 24:16.000 for you for several profiles and with it. 24:16.000 --> 24:20.000 Unfortunately, there is anything comparable 24:20.000 --> 24:23.000 in C++ for all. 24:23.000 --> 24:27.000 Another example, post link optimization. 24:27.000 --> 24:29.000 Post link optimization, 24:29.000 --> 24:33.000 that's yet another step over profile guide optimization. 24:33.000 --> 24:36.000 This technique actually, 24:36.000 --> 24:38.000 what it does, 24:38.000 --> 24:43.000 is actually the most important optimization. 24:43.000 --> 24:47.000 It's a reordering functions in your binary 24:47.000 --> 24:50.000 to make your CPU instruction cache happy, 24:50.000 --> 24:54.000 because it reduces eye-cash instruction misses. 24:54.000 --> 24:57.000 And it improves performance a lot. 24:57.000 --> 25:00.000 There are three, like two. 25:00.000 --> 25:03.000 Okay, two is at the moment. 25:03.000 --> 25:05.000 With the most popular LVM bolt, 25:05.000 --> 25:08.000 it was developed by phasebook slash meta. 25:08.000 --> 25:11.000 That's default way to do payload right now. 25:11.000 --> 25:13.000 And Google propeller, that's similar to 25:13.000 --> 25:16.000 but from Google, works a bit in a different way, 25:16.000 --> 25:20.000 because all of them both tries to disassemble your binary. 25:20.000 --> 25:23.000 Reassemble it like re-shuffle functions, 25:23.000 --> 25:26.000 and assembly binary gain. 25:26.000 --> 25:28.000 But Google propeller, 25:28.000 --> 25:33.000 they are modifying the compiler itself, 25:33.000 --> 25:38.000 and they are reordering functions during the link in stage. 25:38.000 --> 25:43.000 So they don't need to disassemble the reduced binary. 25:43.000 --> 25:46.000 There are pros and cons of each approach, 25:46.000 --> 25:49.000 and both developers and Google developers 25:49.000 --> 25:52.000 don't agree with each other. 25:52.000 --> 25:54.000 But it's what we have. 25:54.000 --> 25:56.000 Also, we had Intel TLO, 25:56.000 --> 25:57.000 finally out optimizer, 25:57.000 --> 26:00.000 but it was up and sourced. 26:00.000 --> 26:02.000 It had several commits, 26:02.000 --> 26:04.000 and it's for now it's archived. 26:04.000 --> 26:06.000 And since we have a lot of layoffs in Intel, 26:06.000 --> 26:09.000 it's probably resting piece. 26:09.000 --> 26:11.000 So forever. 26:11.000 --> 26:13.000 So you can just look into the code, 26:13.000 --> 26:15.000 but please don't use it. 26:15.000 --> 26:17.000 Should we care about PLO? 26:17.000 --> 26:19.000 Yes, we should, 26:19.000 --> 26:23.000 because it gives one really great leap 26:23.000 --> 26:25.000 in performance once again, 26:25.000 --> 26:27.000 in compared to Pidgeor. 26:27.000 --> 26:30.000 Let's talk from Amir Ayupov. 26:30.000 --> 26:33.000 It's one of the developers, all of the emboldt. 26:33.000 --> 26:39.000 And the implement of this tool for using internally 26:39.000 --> 26:40.000 at Meta, 26:40.000 --> 26:42.000 and they just decided to open source it. 26:42.000 --> 26:45.000 And you can see, 26:45.000 --> 26:49.000 they like to show LDM both 26:49.000 --> 26:52.000 performance improvement on clinic. 26:52.000 --> 26:55.000 But I have more benchmarks, 26:55.000 --> 26:58.000 for example, on different databases, 26:58.000 --> 27:00.000 for example, on the index database. 27:00.000 --> 27:03.000 And I still, even on database, 27:03.000 --> 27:05.000 on the real workload, 27:05.000 --> 27:09.000 I can get plus 5% of performance, 27:09.000 --> 27:11.000 just by applying this tool. 27:11.000 --> 27:14.000 Let's approach the performance improvement 27:14.000 --> 27:16.000 we are talking about already 27:16.000 --> 27:19.000 LTO plus Pidgeor optimized binaries. 27:19.000 --> 27:22.000 But how easy is to use PLO in practice? 27:22.000 --> 27:24.000 So let's start from the good example. 27:24.000 --> 27:26.000 I was done in Rust. 27:26.000 --> 27:29.000 Rust has a yet another, 27:29.000 --> 27:33.000 like plug-in to cargo, cargo Pidgeor, 27:33.000 --> 27:36.000 cargo Pidgeor allows you to, 27:36.000 --> 27:39.000 do in a semi-automatic way, 27:39.000 --> 27:42.000 Pidgeor optimization routines, 27:42.000 --> 27:46.000 and additionally both optimization routines. 27:46.000 --> 27:51.000 And it's implemented in so good way. 27:51.000 --> 27:54.000 So actually, you can implement 27:54.000 --> 27:57.000 in exactly this comment, 27:57.000 --> 28:01.000 Pidgeor plus PLO optimization pipeline. 28:01.000 --> 28:06.000 That's a really, really powerful optimization pipeline. 28:06.000 --> 28:11.000 It's used, it's used by clank, 28:11.000 --> 28:14.000 not in distributions, 28:14.000 --> 28:17.000 by default, but if you wear a scripts. 28:17.000 --> 28:20.000 This pipeline is used by default by RustC, 28:20.000 --> 28:21.000 right now. 28:21.000 --> 28:23.000 So if you're using Rust, 28:23.000 --> 28:25.000 and you're using RustC compiler, 28:25.000 --> 28:26.000 it's optimized by this. 28:26.000 --> 28:29.000 And let's compare to the C++ world. 28:29.000 --> 28:32.000 How can you try to mimic it? 28:32.000 --> 28:35.000 For example, C++ world doesn't have such a tool. 28:35.000 --> 28:38.000 You need to implement everything on your own. 28:38.000 --> 28:41.000 You need to start from the Pidgeor stuff. 28:41.000 --> 28:45.000 You need to figure out proper compiler switches. 28:45.000 --> 28:49.000 Of course, GCC, GCC, and clank, 28:49.000 --> 28:51.000 they have similar compilers, 28:51.000 --> 28:54.000 which are so Pidgeor, but they are not the same. 28:54.000 --> 28:57.000 So we need to support in extreme cases both. 28:57.000 --> 29:00.000 You need to build it, you need to run it. 29:00.000 --> 29:03.000 Manually, of course, you need to collect Pidgeor profiles. 29:03.000 --> 29:06.000 Pidgeor profiles are collected differently. 29:06.000 --> 29:09.000 In GCC and clank ecosystems. 29:09.000 --> 29:12.000 You need to convert this profiles to a proper format. 29:12.000 --> 29:15.000 Format switches recognizable by the compiler. 29:15.000 --> 29:18.000 It's done by different tools for GCC and clank. 29:18.000 --> 29:21.000 And they are not compatible with each other. 29:21.000 --> 29:24.000 You need to recompile your application once again. 29:24.000 --> 29:27.000 Manually, of course, by parsing this profiles. 29:27.000 --> 29:30.000 With additional bunch of compilers switches. 29:30.000 --> 29:33.000 You need to figure out how to run LVM-bolt. 29:33.000 --> 29:40.000 LVM-bolt internally has 289 switches. 29:40.000 --> 29:46.000 And 100, I guess 150 are... 29:46.000 --> 29:49.000 So a regular available. 29:49.000 --> 29:51.000 You can just minus minus help. 29:51.000 --> 29:53.000 And a rest of them. 29:53.000 --> 29:57.000 You need to write something like minus minus hidden help. 29:58.000 --> 30:00.000 And they will be available. 30:00.000 --> 30:03.000 And even you will look through all of them. 30:03.000 --> 30:07.000 You will not understand what does they mean and what we do. 30:07.000 --> 30:11.000 So, of probably you figure out how to run both. 30:11.000 --> 30:15.000 So, you need to instrument your binary once again. 30:15.000 --> 30:16.000 It's both. 30:16.000 --> 30:19.000 Somehow, you need to run your binary once again. 30:19.000 --> 30:21.000 You need to collect both profiles. 30:21.000 --> 30:24.000 They are not the same as Pidgeor profiles. 30:24.000 --> 30:28.000 And finally, you need to optimize your binary of both. 30:28.000 --> 30:31.000 All of these things are done only manually. 30:31.000 --> 30:35.000 Only by you, you need to run scripts compared to this one. 30:35.000 --> 30:37.000 It's all already automated. 30:37.000 --> 30:43.000 It's all already automated by clever engineers from the RASDF team. 30:43.000 --> 30:50.000 I would say RASDF is much, much simpler and more accessible to a regular developer. 30:50.000 --> 30:53.000 I done both, both stuff. 30:53.000 --> 30:58.000 And if you remember the table about Pidgeor table, 30:58.000 --> 31:02.000 most of these applications, except for your libraries, 31:02.000 --> 31:04.000 they are written in RASD. 31:04.000 --> 31:06.000 Not because I'm a RASDZ, 31:06.000 --> 31:10.000 because I've had a dedicated track for RASDF here. 31:10.000 --> 31:16.000 But simply because it's so much easier can be done in RASD, 31:16.000 --> 31:20.000 but I just don't want to waste my time with C++. 31:20.000 --> 31:23.000 During the optimization routines, that's it. 31:23.000 --> 31:27.000 Not because I just simply like RASD, but I like RASD. 31:27.000 --> 31:30.000 So, not about issues is extra tooling. 31:30.000 --> 31:34.000 Of course, people just don't know about Pidgeor tools. 31:34.000 --> 31:36.000 The same issue is the documentation. 31:36.000 --> 31:40.000 So, where we need to learn the documentation, 31:40.000 --> 31:44.000 like official books or whatever they need to read it, we don't have time. 31:45.000 --> 31:49.000 Tools are not easy to install for various RASDF reasons. 31:49.000 --> 31:52.000 For example, no Pidgeor package in a repository. 31:52.000 --> 31:54.000 For example, that was the case. 31:54.000 --> 31:58.000 One year ago, I go for LVM Bolt. 31:58.000 --> 32:03.000 I was suggesting an idea into different distributions. 32:03.000 --> 32:05.000 The simple idea. 32:05.000 --> 32:08.000 Please optimize Clank with LVM Bolt, 32:08.000 --> 32:12.000 because it's already shown in a lot of conferences. 32:12.000 --> 32:14.000 It really works. 32:14.000 --> 32:17.000 I got exactly the same answer, I remember. 32:17.000 --> 32:20.000 We don't have a package for LVM Bolt. 32:20.000 --> 32:23.000 We don't care when we need to create a package. 32:23.000 --> 32:29.000 So, sometimes, at that time, LVM Bolt wasn't LVM Bolt. 32:29.000 --> 32:34.000 It was external tool, so they needed to compile it separately. 32:34.000 --> 32:36.000 Create a package, et cetera, et cetera. 32:36.000 --> 32:37.000 It's a maintainer overhead. 32:37.000 --> 32:40.000 Luckily, right now, LVM Bolt is a part of LVM. 32:40.000 --> 32:42.000 It can be built as a part of LVM. 32:42.000 --> 32:48.000 And many all distributions have ready to use RASDF LVM, 32:48.000 --> 32:52.000 and enable just another siege. 32:52.000 --> 32:56.000 Now, easy way to build tool on your own. 32:56.000 --> 33:02.000 For example, it's a really big program for Google Autofidio 33:02.000 --> 33:07.000 and Google to stuff, because they need to specific LVM version. 33:07.000 --> 33:12.000 They're not upstream, and if you'll try to build it as a LVM version, 33:12.000 --> 33:16.000 of course there will be a lot of compilation errors. 33:16.000 --> 33:17.000 We'll need to fix them. 33:17.000 --> 33:22.000 No one can want to waste time here. 33:22.000 --> 33:29.000 So, if even tools cannot help us in a good way, 33:29.000 --> 33:34.000 maybe we'll try to change defaults in our ecosystem. 33:34.000 --> 33:36.000 Okay, let's try. 33:36.000 --> 33:40.000 So, if you'll be able to change defaults, 33:40.000 --> 33:47.000 we will, we will, there is no need to read the documentation. 33:47.000 --> 33:50.000 Install extra tools and whatever. 33:50.000 --> 33:54.000 All recommended optimizations will be done by default. 33:54.000 --> 33:57.000 For example, you just enable a release profile, 33:57.000 --> 34:01.000 and instead of just up to level 3, up to level 2, 34:01.000 --> 34:04.000 to LVM will be enabled by default, 34:04.000 --> 34:07.000 and we will get a more optimized binary. 34:07.000 --> 34:08.000 It's great. 34:08.000 --> 34:11.000 However, there is a high room rule law. 34:11.000 --> 34:14.000 So, if you have a lot of users, 34:14.000 --> 34:16.000 and whatever you will try to change, 34:16.000 --> 34:20.000 it will break some workloads, some scenarios. 34:20.000 --> 34:24.000 And here we have exactly the same case. 34:24.000 --> 34:27.000 Once again, Rustic Assistant is a good example here, 34:27.000 --> 34:30.000 because we are trying to change defaults in a good way. 34:30.000 --> 34:35.000 We are trying to push a more rapid linker 34:35.000 --> 34:40.000 to optimize developer rigs, some benchmarks here. 34:40.000 --> 34:46.000 And of course, we have even more quick linkers on the market. 34:46.000 --> 34:48.000 But of course, we are not ready yet. 34:48.000 --> 34:51.000 But we are already discussions, Rust, 34:51.000 --> 34:55.000 team waiting for stabilizing wild linker or mild linker. 34:55.000 --> 34:58.000 And we will try to switch from LLD to them, 34:58.000 --> 35:03.000 because it gives more speed for us. 35:03.000 --> 35:05.000 Rust, see and Rust analyze, 35:05.000 --> 35:09.000 or like, claim this stuff, I already optimized this pjl 35:09.000 --> 35:11.000 LLD by default. 35:11.000 --> 35:16.000 Using a more optimized allocators for clip stuff, 35:16.000 --> 35:20.000 like a clientize, trying to change, 35:20.000 --> 35:23.000 very very carefully default release profile. 35:23.000 --> 35:25.000 For example, strip debugging, 35:25.000 --> 35:27.000 so binaries will be smaller, 35:27.000 --> 35:30.000 because Rust binary is operating famous, 35:30.000 --> 35:34.000 because since we are huge, by default. 35:34.000 --> 35:37.000 However, in the Rustic Assistant, 35:37.000 --> 35:39.000 we are asked some interesting defaults, 35:39.000 --> 35:42.000 if not questionable. 35:42.000 --> 35:47.000 One of the way, how regular Rust tools are installed, 35:47.000 --> 35:49.000 it's a cargo install. 35:49.000 --> 35:51.000 Cargo install with just, 35:51.000 --> 35:53.000 again, to style installation way, 35:53.000 --> 35:57.000 to take out your repository on your machine, 35:57.000 --> 36:00.000 and compile it on your machine. 36:00.000 --> 36:01.000 That's it. 36:01.000 --> 36:03.000 No pre-built binary package, 36:03.000 --> 36:05.000 whatever just from sources. 36:05.000 --> 36:07.000 And of course, 36:07.000 --> 36:12.000 and the SIG, again, to style installation. 36:12.000 --> 36:15.000 Of course, you can optimize for your hardware, 36:15.000 --> 36:19.000 but no one actually does it with cargo install. 36:19.000 --> 36:22.000 And cons, you have much, much more limitations 36:22.000 --> 36:25.000 in enabling expensive optimizations. 36:25.000 --> 36:28.000 And here, there is a problem with LTO optimization, 36:28.000 --> 36:30.000 because LTO optimization, 36:30.000 --> 36:31.000 in the most aggressive, 36:31.000 --> 36:33.000 it's form, full LTO. 36:33.000 --> 36:36.000 VTLTO is really expensive. 36:36.000 --> 36:39.000 It doubles your compilation time, 36:39.000 --> 36:40.000 at least doubles, 36:40.000 --> 36:43.000 and it requires much more memory. 36:43.000 --> 36:45.000 Okay, okay, 36:45.000 --> 36:49.000 1.5x to x independence, 36:49.000 --> 36:51.000 depends on different flex. 36:52.000 --> 36:55.000 So, if you will try to enable LTO by default, 36:55.000 --> 36:57.000 for the release profile, 36:57.000 --> 37:01.000 all cargo installed applications 37:01.000 --> 37:07.000 will be installing twice time as before. 37:07.000 --> 37:08.000 And it's a problem. 37:08.000 --> 37:10.000 There are solutions to mitigate them, 37:10.000 --> 37:14.000 just using cargo bin install with pre-built packages. 37:14.000 --> 37:15.000 But unfortunately, 37:15.000 --> 37:19.000 this solution is not that popular in the RASTEC system, 37:19.000 --> 37:21.000 and it's not considered as a default. 37:21.000 --> 37:23.000 Whereas, 37:23.000 --> 37:25.000 the reason for that, 37:25.000 --> 37:27.000 for example, cargo bin install is not maintained 37:27.000 --> 37:29.000 by the main RASTEC team, 37:29.000 --> 37:32.000 so credibility issues, etc., etc., etc., 37:32.000 --> 37:34.000 but it's a limitation. 37:34.000 --> 37:36.000 Another example about defaults, 37:36.000 --> 37:39.000 very dedicated tool for preparing the binary 37:39.000 --> 37:41.000 for distributing cargo dist, 37:41.000 --> 37:44.000 or right now it's called simple dist. 37:44.000 --> 37:48.000 So, it's a tool that tries to pick 37:48.000 --> 37:52.000 a good optimization optimization, 37:52.000 --> 37:56.000 cargo switches for optimization of your binary. 37:56.000 --> 37:58.000 However, 37:58.000 --> 38:04.000 this tool enables not the most aggressive form of LTO, 38:04.000 --> 38:06.000 and I was wondering, 38:06.000 --> 38:07.000 what is the reason? 38:07.000 --> 38:08.000 Because the final TOR, 38:08.000 --> 38:10.000 it compiles much faster, 38:10.000 --> 38:13.000 but the trade-off here, 38:13.000 --> 38:15.000 is that the final TOR cannot perform 38:15.000 --> 38:17.000 the most aggressive optimization, 38:17.000 --> 38:22.000 and if we are trying to prepare a binary for delivering 38:22.000 --> 38:23.000 for target machine, 38:23.000 --> 38:26.000 usually we are trying to perform aggressive stuff. 38:26.000 --> 38:29.000 And that was the reason, actually, 38:29.000 --> 38:32.000 because a lot of simply, 38:32.000 --> 38:34.000 on the other point of view, 38:34.000 --> 38:36.000 a lot of build binaries, 38:36.000 --> 38:38.000 build binaries for shipable binaries, 38:38.000 --> 38:39.000 for distributing, 38:39.000 --> 38:41.000 will not be shipped, 38:41.000 --> 38:44.000 so they are not worth of optimizing. 38:44.000 --> 38:45.000 Unfortunately, 38:45.000 --> 38:47.000 this detail is not written as a documentation. 38:47.000 --> 38:50.000 It's written only in a random comment 38:50.000 --> 38:52.000 in the GitHub tracker, 38:52.000 --> 38:53.000 and that's it. 38:53.000 --> 38:56.000 And regular users don't know about this detail. 38:56.000 --> 38:57.000 They simply, 38:57.000 --> 39:00.000 and blindly apply cargo dist profile, 39:00.000 --> 39:02.000 and that's it. 39:02.000 --> 39:05.000 So, I just decided to test once again. 39:05.000 --> 39:07.000 Probably, 39:07.000 --> 39:10.000 cargo dist users will be okay. 39:11.000 --> 39:12.000 This changing, 39:12.000 --> 39:14.000 fine LTE war, 39:14.000 --> 39:15.000 to fat LTE war, 39:15.000 --> 39:16.000 or full LTE war, 39:16.000 --> 39:18.000 in their profiles. 39:18.000 --> 39:20.000 And as you see, 39:20.000 --> 39:21.000 I have, 39:21.000 --> 39:22.000 once again, 39:22.000 --> 39:25.000 pretty good conversion rate here. 39:25.000 --> 39:26.000 So, 39:26.000 --> 39:28.000 I guess, 39:28.000 --> 39:33.000 607 projects are accepted, 39:33.000 --> 39:35.000 my changes, 39:35.000 --> 39:37.000 and five, 39:37.000 --> 39:38.000 five, 39:39.000 --> 39:42.000 or four of them just didn't answer. 39:42.000 --> 39:43.000 No, 39:43.000 --> 39:44.000 no rejections. 39:44.000 --> 39:45.000 No, 39:45.000 --> 39:46.000 no, 39:46.000 --> 39:47.000 it's all. 39:47.000 --> 39:48.000 So, probably, 39:48.000 --> 39:51.000 these defaults are not very good for users. 39:51.000 --> 39:55.000 And we just don't know what we are enabling. 39:55.000 --> 39:56.000 Let's see, 39:56.000 --> 39:58.000 we are just simply following. 39:58.000 --> 39:59.000 So, 39:59.000 --> 40:02.000 I tried to push an idea 40:02.000 --> 40:05.000 into the rough community about pushing LTE war 40:05.000 --> 40:06.000 in any form, 40:06.000 --> 40:07.000 actually, 40:07.000 --> 40:09.000 to the default release mode. 40:09.000 --> 40:11.000 Even if we are knowing about cargo install, 40:11.000 --> 40:12.000 we are not, 40:12.000 --> 40:14.000 we know stuff about, 40:14.000 --> 40:17.000 not doubting compilation time, 40:17.000 --> 40:18.000 if he knows your mode, 40:18.000 --> 40:22.000 but more course on the CI, 40:22.000 --> 40:23.000 and whatever. 40:23.000 --> 40:24.000 Luckily, 40:24.000 --> 40:27.000 we can enable LTE war in a much easier way 40:27.000 --> 40:29.000 in the rough ecosystem, 40:29.000 --> 40:30.000 compared to C++. 40:30.000 --> 40:32.000 Because usually in C++, 40:32.000 --> 40:35.000 enabling LTE war, 40:35.000 --> 40:38.000 we will uncover a lot of hidden and defined behaviors. 40:38.000 --> 40:39.000 And whatever, 40:39.000 --> 40:41.000 we will meet a lot of interesting sick folds 40:41.000 --> 40:43.000 in runtime of the enabling LTE war, 40:43.000 --> 40:45.000 and we will disable it after that. 40:45.000 --> 40:46.000 In Rust, 40:46.000 --> 40:48.000 we don't have such a problem. 40:48.000 --> 40:52.000 You can enable LTE war with one cargo command, 40:52.000 --> 40:55.000 and this optimization is really safe in Rust, 40:55.000 --> 40:57.000 compared to C++. 40:57.000 --> 41:01.000 How dangerous is LTE in C++? 41:01.000 --> 41:04.000 There is a great repository from gentle, 41:04.000 --> 41:05.000 to LTE war, 41:05.000 --> 41:08.000 and you can just see how many issues 41:08.000 --> 41:10.000 there is in this repository, 41:10.000 --> 41:11.000 and believe me, 41:11.000 --> 41:12.000 all of them, 41:12.000 --> 41:14.000 almost all of them, 41:14.000 --> 41:17.000 are about blowing up some C or C++ code 41:17.000 --> 41:18.000 from enabling LTE war. 41:18.000 --> 41:19.000 Yes, 41:19.000 --> 41:20.000 yes, 41:20.000 --> 41:22.000 they are C++ victims. 41:22.000 --> 41:23.000 In Rust, 41:23.000 --> 41:25.000 we don't have such a big such an issue. 41:25.000 --> 41:27.000 In C++ it's a huge issue. 41:27.000 --> 41:30.000 However, 41:30.000 --> 41:33.000 I met a bunch of additional issues. 41:34.000 --> 41:38.000 Some people using cargo release profile 41:38.000 --> 41:41.000 during their development phase on their local machines. 41:41.000 --> 41:44.000 Even if you have documentation in cargo, 41:44.000 --> 41:48.000 that release profile is meant for using on production. 41:48.000 --> 41:50.000 We don't care. 41:50.000 --> 41:52.000 We just don't care. 41:52.000 --> 41:56.000 We are using release profile during the development phase, 41:56.000 --> 41:59.000 and when we will try to point them, 41:59.000 --> 42:01.000 just please don't do it. 42:01.000 --> 42:02.000 We don't care. 42:02.000 --> 42:03.000 We say, 42:03.000 --> 42:05.000 that's okay for me. 42:05.000 --> 42:07.000 That's it. 42:07.000 --> 42:09.000 So, 42:09.000 --> 42:11.000 one week ago, 42:11.000 --> 42:12.000 where I was a topic, 42:12.000 --> 42:13.000 when I read some, 42:13.000 --> 42:14.000 subreddit, 42:14.000 --> 42:17.000 about some of the messengers it's me. 42:17.000 --> 42:20.000 This person created almost 42:20.000 --> 42:23.000 five hundred issues manually, 42:23.000 --> 42:25.000 on GitHub by, 42:25.000 --> 42:29.000 about enabling LTE war in Rust projects. 42:29.000 --> 42:31.000 So, the Rust community was wondering, 42:31.000 --> 42:33.000 what is going on here? 42:33.000 --> 42:36.000 So, that was my attempt 42:36.000 --> 42:40.000 to convince Rust ecosystems in Rust dev team 42:40.000 --> 42:42.000 that they are defaults, 42:42.000 --> 42:45.000 are not that optimal, 42:45.000 --> 42:48.000 and the Rust community are ready 42:48.000 --> 42:52.000 to enable LTE of a very least profile. 42:52.000 --> 42:54.000 So, unfortunately we have really, 42:54.000 --> 42:57.000 we had really great conversation, 42:57.000 --> 42:59.000 but finally I was banned, 42:59.000 --> 43:01.000 not on Rust subreddit. 43:01.000 --> 43:03.000 I was banned on Reddit. 43:03.000 --> 43:06.000 So, I don't have an account anymore in Reddit, 43:06.000 --> 43:09.000 but actually we had really good conversation. 43:09.000 --> 43:11.000 We were raised some points, 43:11.000 --> 43:13.000 good points in this conversation. 43:13.000 --> 43:15.000 Fortunately, I wasn't able to answer all of them, 43:15.000 --> 43:16.000 because I was banned. 43:16.000 --> 43:17.000 So, sorry. 43:17.000 --> 43:20.000 If someone didn't get an answer for me, 43:20.000 --> 43:22.000 it's not my wish. 43:22.000 --> 43:26.000 And so, I created more than 500, 43:26.000 --> 43:28.000 that's only for us, 43:28.000 --> 43:30.000 trust me, only for us projects, 43:30.000 --> 43:32.000 more than 500 stuff, 43:32.000 --> 43:37.000 and more than 300 and 50, 43:37.000 --> 43:39.000 plus is closed. 43:39.000 --> 43:41.000 Closed, unfortunately there is no filter, 43:41.000 --> 43:44.000 to differentiate between closed, 43:44.000 --> 43:47.000 accepted and closed, not planned. 43:47.000 --> 43:51.000 But this statistics is public, you can check. 43:51.000 --> 43:56.000 I would say 95% of closed issues are accepted, 43:56.000 --> 44:00.000 out here, to the default release profile. 44:00.000 --> 44:04.000 So, let's message to the Rust team. 44:04.000 --> 44:08.000 Please reconsider your decision once again. 44:08.000 --> 44:12.000 Probably we need to spend some more time here, 44:12.000 --> 44:15.000 in an instigating and enabling out here for default. 44:15.000 --> 44:16.000 And by the way, 44:16.000 --> 44:20.000 I proposed only full out here. 44:20.000 --> 44:22.000 It's the most aggressive one. 44:22.000 --> 44:25.000 So, the most time consuming and resource consuming. 44:25.000 --> 44:27.000 So, I wanted to test, 44:27.000 --> 44:31.000 to test the border of acceptance from the project offers, 44:31.000 --> 44:33.000 and the asset pretty well, 44:33.000 --> 44:34.000 the trade-off, 44:34.000 --> 44:37.000 because benefits from LTI war are so great, 44:37.000 --> 44:39.000 performance, of course. 44:39.000 --> 44:42.000 But you cannot measure performance for all projects, 44:42.000 --> 44:44.000 because you need to implement benchmark full projects, 44:44.000 --> 44:48.000 but you can easily measure binary size improvement, 44:48.000 --> 44:53.000 and full LTI war gives 20% of binary size improvement, 44:53.000 --> 44:58.000 just by one simple comment in the cargo-tombo file. 44:58.000 --> 45:03.000 So, there are some LLM staff, 45:03.000 --> 45:06.000 so when you try to enable, 45:06.000 --> 45:10.000 of I was hardly switched off. 45:10.000 --> 45:11.000 Okay. 45:11.000 --> 45:12.000 So, 45:13.000 --> 45:17.000 I'm fortunate the LLM is not that heavy, 45:17.000 --> 45:20.000 if a full LTI war, 45:20.000 --> 45:23.000 but Microsoft engineers say go away, 45:23.000 --> 45:24.000 and that's it. 45:24.000 --> 45:27.000 And I have actually much more ideas 45:27.000 --> 45:29.000 to try to implement them, 45:29.000 --> 45:31.000 like semi-automatic way, 45:31.000 --> 45:35.000 to propose more efficient tools like grip, 45:35.000 --> 45:36.000 something, 45:36.000 --> 45:38.000 open the tables of performance challenges, 45:38.000 --> 45:40.000 for different projects, 45:40.000 --> 45:41.000 et cetera, et cetera, 45:41.000 --> 45:42.000 like one beer, 45:42.000 --> 45:43.000 sea-but-from-pants, 45:43.000 --> 45:44.000 or stuff, 45:44.000 --> 45:46.000 it could actually useful for anyone. 45:46.000 --> 45:48.000 And I would say, 45:48.000 --> 45:50.000 I want to collaborate with our 45:50.000 --> 45:53.000 first-damn deaf rooms in different directions, 45:53.000 --> 45:54.000 to compilers, 45:54.000 --> 45:56.000 optimizing from different domains. 45:56.000 --> 45:58.000 I want to hear your opinion about 45:58.000 --> 45:59.000 software performance stuff, 45:59.000 --> 46:02.000 and we can collaborate here 46:02.000 --> 46:04.000 and exchange opinions, 46:04.000 --> 46:07.000 trying to optimize something in different ways. 46:07.000 --> 46:09.000 And actually, 46:09.000 --> 46:11.000 that's it, 46:11.000 --> 46:13.000 with all the reasons 46:13.000 --> 46:14.000 that I've starting 46:14.000 --> 46:16.000 awesome Pedro project, 46:16.000 --> 46:17.000 awesome multi-all, 46:17.000 --> 46:19.000 spin-off of awesome Pedro, 46:19.000 --> 46:22.000 and with why I actually created 46:22.000 --> 46:24.000 software performance dev room, 46:24.000 --> 46:27.000 just to try to push as much as possible, 46:27.000 --> 46:30.000 more optimizations, 46:30.000 --> 46:33.000 much to the software, 46:33.000 --> 46:35.000 and make them more accessible 46:35.000 --> 46:37.000 to the regular developers. 46:37.000 --> 46:39.000 Thank you. 46:39.000 --> 46:41.000 Thank you. 46:41.000 --> 46:43.000 Thank you. 46:43.000 --> 46:45.000 Thank you. 46:45.000 --> 46:47.000 Thank you. 46:53.000 --> 46:55.000 If you have any questions, 46:55.000 --> 46:58.000 you're a microphone. 46:58.000 --> 46:59.000 You can say, 46:59.000 --> 47:00.000 and I'll repeat. 47:00.000 --> 47:01.000 Nobody's. 47:02.000 --> 47:05.000 It seems to me that one would work 47:05.000 --> 47:07.000 to make... 47:09.000 --> 47:10.000 Hello. 47:10.000 --> 47:13.000 So it seems to me that one way to make 47:13.000 --> 47:16.000 this optimization more accessible 47:16.000 --> 47:18.000 for C++ or C code 47:18.000 --> 47:20.000 would be to integrate the settings 47:20.000 --> 47:21.000 or build system, 47:21.000 --> 47:22.000 something like CMAX. 47:22.000 --> 47:24.000 So how difficult would that be 47:24.000 --> 47:27.000 to add an option in CMAX to make life possible? 47:27.000 --> 47:28.000 I will. 47:28.000 --> 47:30.000 I will. 47:30.000 --> 47:31.000 Yes. 47:31.000 --> 47:32.000 Yes. 47:32.000 --> 47:34.000 I proposed all of these ideas 47:34.000 --> 47:35.000 to the CMAX. 47:35.000 --> 47:37.000 Where are open issues of that? 47:37.000 --> 47:38.000 Of course. 47:38.000 --> 47:41.000 No one is implemented. 47:41.000 --> 47:43.000 Because it's difficult? 47:43.000 --> 47:44.000 No. 47:44.000 --> 47:45.000 Well, 47:45.000 --> 47:47.000 I would say it will be more difficult 47:47.000 --> 47:48.000 than for us, 47:48.000 --> 47:50.000 because multiple build systems, 47:50.000 --> 47:52.000 multiple compilers, 47:52.000 --> 47:54.000 multiple dependency managers. 47:54.000 --> 47:56.000 If you want to optimize the world three, 47:56.000 --> 47:57.000 for example, 47:57.000 --> 47:58.000 if Pedro, 47:58.000 --> 47:59.000 you need to pass flags, etc. 47:59.000 --> 48:01.000 It will be more difficult compared to us, 48:01.000 --> 48:02.000 but it's achievable. 48:02.000 --> 48:03.000 It's achievable. 48:03.000 --> 48:04.000 And I know some people 48:04.000 --> 48:06.000 that at least try and 48:06.000 --> 48:07.000 implement it for CMAX. 48:07.000 --> 48:09.000 But we have CMAX, 48:09.000 --> 48:10.000 we have Basel, 48:10.000 --> 48:11.000 we have Meazen, 48:11.000 --> 48:12.000 we have... 48:12.000 --> 48:16.000 We have a lot of stuff in C++. 48:16.000 --> 48:18.000 And unfortunately, 48:18.000 --> 48:21.000 there is no any major build system in C++. 48:21.000 --> 48:23.000 You can see like poles 48:23.000 --> 48:25.000 from official C++ committee 48:25.000 --> 48:27.000 and CMAX will be like 48:27.000 --> 48:29.000 one-third of the ecosystems, 48:29.000 --> 48:31.000 only one-third. 48:31.000 --> 48:33.000 Okay, thank you. 48:45.000 --> 48:47.000 How does this apply to 48:47.000 --> 48:49.000 we've talked now only about 48:49.000 --> 48:51.000 real compiled languages without 48:51.000 --> 48:53.000 any garbage collection or anything? 48:53.000 --> 48:55.000 This is applied to the... 48:55.000 --> 48:57.000 I'm talking about Pedro. 48:57.000 --> 48:59.000 Yeah, Pedro, LTO, 48:59.000 --> 49:01.000 LTO actually, 49:01.000 --> 49:03.000 so let's say, 49:03.000 --> 49:05.000 if you have an LVM-based compiler, 49:05.000 --> 49:08.000 all of the stuff is already available 49:08.000 --> 49:11.000 or easily implementable, I would say. 49:11.000 --> 49:14.000 If you're talking about custom compilers, 49:14.000 --> 49:17.000 you need to check the implementation. 49:17.000 --> 49:19.000 Yeah, I'm talking about, let's say, 49:19.000 --> 49:20.000 go or check here. 49:20.000 --> 49:22.000 Go already supports Pedro. 49:22.000 --> 49:23.000 Yeah, 49:23.000 --> 49:25.000 we're supporting on the sampling Pidgeon mode, 49:25.000 --> 49:27.000 but they know what they did. 49:27.000 --> 49:29.000 So... 49:29.000 --> 49:31.000 JPM rules? 49:31.000 --> 49:32.000 Sorry? 49:32.000 --> 49:35.000 The JPM world has just in time compilers, 49:35.000 --> 49:37.000 it's the main model, 49:37.000 --> 49:39.000 but if you want to ahead of time model, 49:39.000 --> 49:41.000 where is Gralvem? 49:41.000 --> 49:43.000 Gralvem supports Pidgeon, 49:43.000 --> 49:45.000 but unfortunately, 49:45.000 --> 49:47.000 it's insanely undocumented. 49:47.000 --> 49:49.000 And they stopped answering my questions 49:49.000 --> 49:51.000 in the upstream, after some point. 49:51.000 --> 49:53.000 Okay, thank you.