John Heinlein, Ph.D.

San Jose, California, United States

4K followers 500+ connections

View mutual connections with John

Welcome back

Email or phone

Password

Forgot password?

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Join to follow

Sonatus

Stanford University

About

My background and experience offers a balance of both technology and business expertise…

Articles by John

Even the failures are success!

Feb 28, 2018

Even the failures are success!

Thomas Edison was famous for having made 1,000 unsuccessful attempts at inventing the light bulb. When pressed by a…

2 Comments

Activity

Below is a two-minute summary of what Sonatus Elektrobit Arm and NXP Semiconductors have to say about #SDV. #SoftwareDefinedVehicles is a conundrum…

Below is a two-minute summary of what Sonatus Elektrobit Arm and NXP Semiconductors have to say about #SDV. #SoftwareDefinedVehicles is a conundrum…

Liked by John Heinlein, Ph.D.
What a blast to be able to ask Richard Nass the questions for a change! Thanks for joining us!

What a blast to be able to ask Richard Nass the questions for a change! Thanks for joining us!

Shared by John Heinlein, Ph.D.
Big thank you to John Heinlein, Ph.D. and Sonatus for having me as a guest on their Garage Podcast. I have to admit, it was a little weird being in…

Big thank you to John Heinlein, Ph.D. and Sonatus for having me as a guest on their Garage Podcast. I have to admit, it was a little weird being in…

Liked by John Heinlein, Ph.D.

Join now to see all activity

Experience

Sonatus

Sunnyvale, California, United States
-

San Jose, California, United States
-

San Jose, California, United States
-

Shin Yokohama, Japan
-

San Jose, California, United States
-

San Jose, CA
-

San Jose, CA
-

San Jose, CA
-

San Jose, CA
-

San Jose, California, United States
-

San Jose, California, United States
-
-
-
-
-

Education

Stanford University

1992 - 1998

Activities and Societies: Specialization in computer architecture. Core member of Stanford FLASH Multiprocessor project team (www-flash.stanford.edu). Dissertation focused on embedded programmability to optimize high performance data transfer and synchronization in large scale multiprocessors Dissertation: http://i.stanford.edu/pub/cstr/reports/csl/tr/98/759/CSL-TR-98-759.pdf

Honors: Air Force Laboratory Graduate Fellowship (1991-1994), Intel Foundation Fellowship (1995-1996), National Science Foundation Fellowship (funding declined)
1991 - 1992
1988 - 1991

Activities and Societies: Eta Kappa Nu, (EE honor society, Treasurer), Tau Beta Pi (Engineering Honor Society), Carnegie Mellon Computer Club (President)

Honors: University Honors, ECE Department Everard M. Williams Award, National Engineering Consortium William L. Everitt Student Award of Excellence

Volunteer Experience

Chairman, Tech Challenge Executive Committee

The Tech Interactive

Feb 2014 - Present 10 years 10 months

Education

2019-present I serve as Chairman of the Executive Committee for the Tech Challenge, one of the signature programs of the Tech Interactive, which is a STEM-oriented science and technology competition for students in grade 4-12. In prior years I was a committee member. In the committee we provide support for fund raising as well as marketing and other steering of the challenge. Arm is a corporate sponsor and I am one of two people representing our sponsorship to the committee.
ECE Advisory Council for the Department of Electrical and Computer Engineering

Carnegie Mellon University

Jun 2019 - Present 5 years 6 months

Education

The ECE Alumni Advisory Council exists to advise on strategic goals for the department in the areas of undergraduate programs, graduate programs, research, and corporate relations.
Parent Development Council

The Harker School

Aug 2017 - Present 7 years 4 months

Education

I am one of the parent volunteers who drive fundraising and development of the Harker School, support annual giving and capital campaign programs.
Alumni Admissions Council

Carnegie Mellon University

Sep 2010 - Jun 2017 6 years 10 months

Education

I supported Carnegie Mellon University through their Carnegie Mellon Alumni Council (CMAC) program which provides local outreach to potential students, applicant interviews, and helps raise awareness of CMU nationwide.

Publications

Optimized Communication and Synchronization Using a Programmable Protocol Engine (Doctoral Dissertation)

Stanford University Mar 1998

My doctoral dissertation focused on the topic of different ways to use communication across large scale multiprocessor designs. Using an infrastructure conceived for coherency using software, I studied uses of that same infrastructure instead for synchronization and high speed block transfer, demonstrating that specially optimized protocols can vastly outperform those based on shared memory alone.

For full abstract, please see:…

My doctoral dissertation focused on the topic of different ways to use communication across large scale multiprocessor designs. Using an infrastructure conceived for coherency using software, I studied uses of that same infrastructure instead for synchronization and high speed block transfer, demonstrating that specially optimized protocols can vastly outperform those based on shared memory alone.

For full abstract, please see: http://infolab.stanford.edu/TR/CSL-TR-98-759.html

Excerpted abstract:

[...]
Our study focuses in detail on two classes of communication that are important for large scale multiprocessors: block transfer and synchronization using locks and barriers. In particular, we attempt to improve the performance of these classes of communication as compared to implementations using only software on top of shared memory.
[...]
We find that embedding advanced communication and synchronization features in a programmable controller has a number of advantages. For example, the block transfer protocol improves transfer performance in some cases, enables the processor to perform other work in parallel, and reduces processor cache pollution caused by the transfer. The synchronization protocols reduce overhead and eliminate bottlenecks associated with synchronization primitives implemented using software on top of shared memory. Simulations of scientific applications running on FLASH show that, in many cases, synchronization support improves performance and increases the range of machine sizes over which the applications scale. Our study shows that embedded programmability is a convenient approach for supporting block transfer and synchronization, and that the FLASH system design effectively supports this approach.

See publication
Coherent Block Transfer in the FLASH Multiprocessor

Proceedings of the 11th International Parallel Processing Symposium (IPPS97) Apr 1997
Abstract:
A key goal of the Stanford FLASH project is to explore the integration of multiple communication protocols in a single multiprocessor architecture. To achieve this goal, FLASH includes a programmable node controller called MAGIC, which contains an embedded protocol processor capable of implementing multiple protocols. In this paper we present a specialized protocol for block data transfer integrated with a conventional cache coherence protocol. Block transfer forms the basis for…

Abstract:
A key goal of the Stanford FLASH project is to explore the integration of multiple communication protocols in a single multiprocessor architecture. To achieve this goal, FLASH includes a programmable node controller called MAGIC, which contains an embedded protocol processor capable of implementing multiple protocols. In this paper we present a specialized protocol for block data transfer integrated with a conventional cache coherence protocol. Block transfer forms the basis for message passing implementations on top of shared memory, occurs in important workloads such as databases, and is frequently used by the operating system. We discuss the issues that arise in designing a fully integrated protocol and its interactions with cache coherence. Using microbenchmarks, MPI communication primitives, and an application running on the operating system, we compare our protocol with standard bcopy and bcopy augmented with prefetches. Our results show that integrated block transfer can accelerate communication between nodes while off-loading the task from the main processor utilizing the network more efficiently, and reducing the associated cache pollution. Given the aggressive support for prefetching in FLASH, prefetched bcopy is able to achieve competitive performance in many cases but lacks the other three advantages of our protocol

Other authors
See publication
Integration of Message Passing and Shared Memory in the Stanford FLASH Multiprocessor

Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI) Oct 1994
Abstract: The advantages of using message passing over shared memory for certain types of communication and synchronization have provided an incentive to integrate both models within a single architecture. A key goal of the FLASH (FLexible Architecture for SHared memory) project at Stanford is to achieve this integration while maintaining a simple and efficient design. This paper presents the hardware and software mechanisms in FLASH to support various message passing protocols. We achieve low…

Abstract: The advantages of using message passing over shared memory for certain types of communication and synchronization have provided an incentive to integrate both models within a single architecture. A key goal of the FLASH (FLexible Architecture for SHared memory) project at Stanford is to achieve this integration while maintaining a simple and efficient design. This paper presents the hardware and software mechanisms in FLASH to support various message passing protocols. We achieve low overhead message passing by delegating protocol functionality to the programmable node controllers in FLASH and by providing direct user-level access to this messaging subsystem. In contrast to most earlier work, we provide an integrated solution that handles the interaction of the messaging protocols with virtual memory, protected multiprogramming, and cache coherence. Detailed simulation studies indicate that this system can sustain message-transfers rates of several hundred megabytes per second, effectively utilizing projected network bandwidths for next generation multiprocessors.

Other authors
See publication
The Performance Impact of Flexibility in the Stanford FLASH Multiprocessor

Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI) Oct 1994
Abstract: A flexible communication mechanism is a desirable feature in multiprocessors because it allows support for multiple communication protocols, expands performance monitoring capabilities, and leads to a simpler design and debug process. In the Stanford FLASH multiprocessor, flexibility is obtained by requiring all transactions in a node to pass through a programmable node controller, called MAGIC. In this paper, we evaluate the performance costs of flexibility by comparing the…

Abstract: A flexible communication mechanism is a desirable feature in multiprocessors because it allows support for multiple communication protocols, expands performance monitoring capabilities, and leads to a simpler design and debug process. In the Stanford FLASH multiprocessor, flexibility is obtained by requiring all transactions in a node to pass through a programmable node controller, called MAGIC. In this paper, we evaluate the performance costs of flexibility by comparing the performance of FLASH to that of an idealized hardwired machine on representative parallel applications and a multiprogramming workload. To measure the performance of FLASH, we use a detailed simulator of the FLASH and MAGIC designs, together with the code sequences that implement the cache-coherence protocol. We find that for a range of optimized parallel applications the performance differences between the idealized machine and FLASH are small For these programs, either the miss rates are small or the latency of the programmable protocol can be hidden behind the memory access time, For applications that incur a large number of remote misses or exhibit substantial hot-spotting, performance is poor for both machines, though the increased remote access latencies or the occupancy of MAGIC lead to lower performance for the flexible design, In most cases, however, FLASH is only 2%-12% slower than the idealized machine

Other authors
See publication
Mable: A Technique for Efficient Machine Simulation

Stanford University Computer Systems Laboratory Sep 1994
Abstract: We present a framework for an efficient instruction-level machine simulator which can
be used with existing software tools to develop and analyze programs for a proposed
processor architecture. The simulator exploits similarities between the instruction sets
of the emulated machine and the host machine to provide fast simulation. Furthermore,
existing program development tools on the host machine such as debuggers and
profilers can be used without modification on the…

Abstract: We present a framework for an efficient instruction-level machine simulator which can
be used with existing software tools to develop and analyze programs for a proposed
processor architecture. The simulator exploits similarities between the instruction sets
of the emulated machine and the host machine to provide fast simulation. Furthermore,
existing program development tools on the host machine such as debuggers and
profilers can be used without modification on the emulated program running under the
simulator. The simulator can therefore be used to debug and tune application code for
the new processor without building a whole new set of program development tools.
The technique has applicability to a diverse set of simulation problems. We show how
the framework has been used to build simulators for a shared-memory multiprocessor,
a superscalar processor with support for speculative execution, and a dual-issue
embedded processor.

Other authors
See publication
The Stanford FLASH Multiprocessor

Proceedings of the 21st International Symposium on Computer Architecture (ISCA21) Apr 1994
Abstract:
The FLASH multiprocessor efficiently integrates support for cache-coherent shared memory and high-performance message passing, while minimizing both hardware and software overhead. Each node in FLASH contains a microprocessor, a portion of the machine’s global memory, a port to the interconnection network an I/O interface, and a custom node controller called MAGIC. The MAGIC chip handles all communication both within the node and among nodes, using hsrdwired data paths for…

Abstract:
The FLASH multiprocessor efficiently integrates support for cache-coherent shared memory and high-performance message passing, while minimizing both hardware and software overhead. Each node in FLASH contains a microprocessor, a portion of the machine’s global memory, a port to the interconnection network an I/O interface, and a custom node controller called MAGIC. The MAGIC chip handles all communication both within the node and among nodes, using hsrdwired data paths for efficient data movement and a programmable processor optimized for executing protocol operations. the use of the protocol processor makes FLASH very flexible — it can support a variety of different communication mechanisms — and simplifies the design and implementation.
This paper presents the architecture of FLASH and MAGIC, and discusses the base cache-coherence and message-passing protocols. Latency and occupancy numbers, which are derived from our system-level simulator and our Verilog code, are given for several common protocol operations. The paper also describes our software strategy and FLASH’s current status.

Other authors
See publication
Instruction Level Profiling and Evaluation of the IBM RS/6000

ISCA'91: Proceedings of the 18th Annual International Symposium on Computer Architecture May 1991
Abstract: This paper reports preliminary results from using goblin, a new instruction level profiling system, to evaluate the IBM RISC System/6000 architecture. The evaluation presented is based on the SPEC benchmark suite. Each SPEC program (except gee) is processed by goblin to produce an instrumented version. During execution of the instrumented program, profiling routines are invoked which trace the execution of the program. These routines also collect statistics on dynamic instruction mix,…

Abstract: This paper reports preliminary results from using goblin, a new instruction level profiling system, to evaluate the IBM RISC System/6000 architecture. The evaluation presented is based on the SPEC benchmark suite. Each SPEC program (except gee) is processed by goblin to produce an instrumented version. During execution of the instrumented program, profiling routines are invoked which trace the execution of the program. These routines also collect statistics on dynamic instruction mix, branching behavior, and resource utilization. Based on these statistics, the actual performance and the architectural efficiency of the RS/6000 are evaluated. In order to provide a context for this evaluation, a comparison to the DECStation 3100 is also presented. The entire profiling and evaluation experiment on nine of the ten SPEC programs involves tracing and analyzing over 32 billion instructions on the RS/6000. The evaluation indicates that for the SPEC benchmark suite the architecture of the RS/6000 is well balanced and exhibits impressive performance, especially on the floating-point intensive applications.

Other authors
See publication

Honors & Awards

2022 Outstanding Volunteer Fundraiser

Association for Fundraising Professionals, Silicon Valley Chapter

Nov 2022

Recognition for achievement in volunteer fundraising on behalf of The Tech Challenge, nominated by The Tech Interactive

Languages

French

Professional working proficiency
Spanish

Limited working proficiency
Japanese

Limited working proficiency
English

Native or bilingual proficiency

Recommendations received

16 people have recommended John

Join now to view

More activity by John

Exciting to be on Sanjay Gangal's famous EDA Cafe podcast yet again, at least my second time! We were excited to share our perspective on SDV and…

Exciting to be on Sanjay Gangal's famous EDA Cafe podcast yet again, at least my second time! We were excited to share our perspective on SDV and…

Shared by John Heinlein, Ph.D.
"By leveraging software-defined vehicles, we're not just improving cars today; we're laying the foundation for tomorrow's innovations." John…

"By leveraging software-defined vehicles, we're not just improving cars today; we're laying the foundation for tomorrow's innovations." John…

Liked by John Heinlein, Ph.D.
Our latest episode of #TheGarage #podcast with guest Richard Nass from Open Systems Media about his perspective from meeting leaders across the…

Our latest episode of #TheGarage #podcast with guest Richard Nass from Open Systems Media about his perspective from meeting leaders across the…

Shared by John Heinlein, Ph.D.
Waze helps drivers avoid traffic using driver data 🌐 Similarly, #SDVs use driver data to make driving experiences more personalized. But when it…

Waze helps drivers avoid traffic using driver data 🌐 Similarly, #SDVs use driver data to make driving experiences more personalized. But when it…

Liked by John Heinlein, Ph.D.
So pleased to be able to support our important partner NXP at their summit today in Detroit!

So pleased to be able to support our important partner NXP at their summit today in Detroit!

Shared by John Heinlein, Ph.D.
We're at #NXPTechDays24 in Detroit today and tomorrow! 📅 This morning, our CMO John Heinlein, Ph.D. joined John McElroy from Autoline Network and…

We're at #NXPTechDays24 in Detroit today and tomorrow! 📅 This morning, our CMO John Heinlein, Ph.D. joined John McElroy from Autoline Network and…

Liked by John Heinlein, Ph.D.
NXP Tech Days Detroit is finally here! 🏎🤓🎉 We are looking forward to our journey on the Road to Innovation over the next two days. NXP Tech…

NXP Tech Days Detroit is finally here! 🏎🤓🎉 We are looking forward to our journey on the Road to Innovation over the next two days. NXP Tech…

Liked by John Heinlein, Ph.D.
Sonatus is proud to partner with NXP Semiconductors including through their CoreRide platform, which expanded this month to include new products for…

Sonatus is proud to partner with NXP Semiconductors including through their CoreRide platform, which expanded this month to include new products for…

Shared by John Heinlein, Ph.D.
This week I was a guest on the EE Journal Fish Fry podcast, sharing insights about Sonatus can deliver learning from software defined data centers to…

This week I was a guest on the EE Journal Fish Fry podcast, sharing insights about Sonatus can deliver learning from software defined data centers to…

Shared by John Heinlein, Ph.D.
Software defined vehicles take center stage in my EE Journal Fish Fry podcast this week! John Heinlein, Ph.D. (Sonatus) joins me to chat about the…

Software defined vehicles take center stage in my EE Journal Fish Fry podcast this week! John Heinlein, Ph.D. (Sonatus) joins me to chat about the…

Liked by John Heinlein, Ph.D.
I have been thoroughly enjoying the series of podcasts by John Heinlein, Ph.D. at Sonatus. This one with Rivian shares insight that aligns really…

I have been thoroughly enjoying the series of podcasts by John Heinlein, Ph.D. at Sonatus. This one with Rivian shares insight that aligns really…

Liked by John Heinlein, Ph.D.
Happy 1 year Anniversary Tsavorite Scalable Intelligence! Grateful for the exceptional amount of work accomplished by our team in the last 12 months…

Happy 1 year Anniversary Tsavorite Scalable Intelligence! Grateful for the exceptional amount of work accomplished by our team in the last 12 months…

Liked by John Heinlein, Ph.D.

View John’s full profile

See who you know in common
Get introduced
Contact John directly

Join to view full profile

Other similar profiles

Explore more posts

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

John Heinlein, Ph.D.

San Jose, California, United States 4K followers 500+ connections

About

Articles by John

Even the failures are success!

Activity

Below is a two-minute summary of what Sonatus Elektrobit Arm and NXP Semiconductors have to say about #SDV. #SoftwareDefinedVehicles is a conundrum…

Liked by John Heinlein, Ph.D.

What a blast to be able to ask Richard Nass the questions for a change! Thanks for joining us!

Shared by John Heinlein, Ph.D.

Big thank you to John Heinlein, Ph.D. and Sonatus for having me as a guest on their Garage Podcast. I have to admit, it was a little weird being in…

Liked by John Heinlein, Ph.D.

Experience

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Education

Volunteer Experience

Chairman, Tech Challenge Executive Committee

ECE Advisory Council for the Department of Electrical and Computer Engineering

Parent Development Council

Alumni Admissions Council

Publications

Stanford University Mar 1998

Proceedings of the 11th International Parallel Processing Symposium (IPPS97) Apr 1997

Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI) Oct 1994

Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI) Oct 1994

Stanford University Computer Systems Laboratory Sep 1994

Proceedings of the 21st International Symposium on Computer Architecture (ISCA21) Apr 1994

ISCA'91: Proceedings of the 18th Annual International Symposium on Computer Architecture May 1991

Honors & Awards

2022 Outstanding Volunteer Fundraiser

Association for Fundraising Professionals, Silicon Valley Chapter

Languages

French

Professional working proficiency

Spanish

Limited working proficiency

Japanese

Limited working proficiency

English

Native or bilingual proficiency

Recommendations received

Paul Kopp

Jake Kochnowicz

More activity by John

Exciting to be on Sanjay Gangal's famous EDA Cafe podcast yet again, at least my second time! We were excited to share our perspective on SDV and…

Shared by John Heinlein, Ph.D.

"By leveraging software-defined vehicles, we're not just improving cars today; we're laying the foundation for tomorrow's innovations." John…

Liked by John Heinlein, Ph.D.

Our latest episode of #TheGarage #podcast with guest Richard Nass from Open Systems Media about his perspective from meeting leaders across the…

Shared by John Heinlein, Ph.D.

Waze helps drivers avoid traffic using driver data 🌐 Similarly, #SDVs use driver data to make driving experiences more personalized. But when it…

Liked by John Heinlein, Ph.D.

So pleased to be able to support our important partner NXP at their summit today in Detroit!

Shared by John Heinlein, Ph.D.

We're at #NXPTechDays24 in Detroit today and tomorrow! 📅 This morning, our CMO John Heinlein, Ph.D. joined John McElroy from Autoline Network and…

Liked by John Heinlein, Ph.D.

NXP Tech Days Detroit is finally here! 🏎🤓🎉 We are looking forward to our journey on the Road to Innovation over the next two days. NXP Tech…

Liked by John Heinlein, Ph.D.

Sonatus is proud to partner with NXP Semiconductors including through their CoreRide platform, which expanded this month to include new products for…

Shared by John Heinlein, Ph.D.

This week I was a guest on the EE Journal Fish Fry podcast, sharing insights about Sonatus can deliver learning from software defined data centers to…

Shared by John Heinlein, Ph.D.

Software defined vehicles take center stage in my EE Journal Fish Fry podcast this week! John Heinlein, Ph.D. (Sonatus) joins me to chat about the…

Liked by John Heinlein, Ph.D.

I have been thoroughly enjoying the series of podcasts by John Heinlein, Ph.D. at Sonatus. This one with Rivian shares insight that aligns really…

Liked by John Heinlein, Ph.D.

San Jose, California, United States

4K followers 500+ connections