Skip to content

Update Tiny Llama example#161

Open
marikurz-amd wants to merge 5 commits into
mainfrom
update.tiny.transformer
Open

Update Tiny Llama example#161
marikurz-amd wants to merge 5 commits into
mainfrom
update.tiny.transformer

Conversation

@marikurz-amd

Copy link
Copy Markdown
Collaborator

WIP

…the memory advantages of flash attention and its linear scaling. Documentation needs to be updated to correctly show peak memory usage results.
…some current reference results, discussion of results and memory evolution fused vs. unfused.
@marikurz-amd marikurz-amd self-assigned this May 19, 2026
@marikurz-amd marikurz-amd marked this pull request as draft May 19, 2026 10:13
@marikurz-amd marikurz-amd marked this pull request as ready for review June 12, 2026 10:11
@marikurz-amd

Copy link
Copy Markdown
Collaborator Author

@conde-amd if you want to review before merging. Thanks!

--enable-pytorch-profiler # Enable PyTorch profiler
--profile-dir ./profiles # Directory for profile output
--profile-memory # Include memory profiling
--enable-memory-profiling # CUDA alloc records in profiler trace (TensorBoard memory views)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why CUDA?

@conde-amd

Copy link
Copy Markdown
Contributor

@marikurz-amd LGTM. Thanks for the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants