diff --git a/content/collective_intro.tex b/content/collective_intro.tex index 823164ab..4996b178 100644 --- a/content/collective_intro.tex +++ b/content/collective_intro.tex @@ -1,21 +1,24 @@ \emph{Collective routines} are defined as coordinated communication or synchronization operations performed by a group of \acp{PE}. -\openshmem provides three types of collective routines: +\openshmem provides four types of collective routines: \begin{enumerate} -\item Collective routines that operate on teams use a team handle parameter to determine - which \acp{PE} will participate in the routine, and use resources encapsulated by the team object - to perform operations. See Section~\ref{subsec:team} for details on team management. + \item Collective routines that operate on teams use a team handle parameter to determine + which \acp{PE} will participate in the routine, and use resources encapsulated by the team object + to perform operations. See Section~\ref{subsec:team} for details on team management. -\begin{DeprecateBlock} -\item Collective routines that operate on active sets use a set of parameters to determine - which \acp{PE} will participate and what resources are used to perform operations. -\end{DeprecateBlock} + \begin{DeprecateBlock} + \item Collective routines that operate on active sets use a set of parameters to determine + which \acp{PE} will participate and what resources are used to perform operations. + + \item Collective routines that do not accept active set + parameters and, as required, the default context. + \end{DeprecateBlock} -\item Collective routines that accept neither team nor active set - parameters, which implicitly operate on the world team and, as - required, the default context. + \item Collective routines that do not accept team + parameters, which implicitly operate on the world team and, as + required, the default context. \end{enumerate} Concurrent accesses to symmetric memory by an \openshmem collective diff --git a/content/programming_model_overview.tex b/content/programming_model_overview.tex index a76c99de..5daac540 100644 --- a/content/programming_model_overview.tex +++ b/content/programming_model_overview.tex @@ -144,7 +144,7 @@ data object on another symmetric data object. \item \OPR{All-to-All}: All \acp{PE} participating in the routine exchange a fixed amount of contiguous or strided data with all other \acp{PE} - in the active set. + in the team. \end{enumerate} \item \textbf{Mutual Exclusion} diff --git a/content/shmem_alltoall.tex b/content/shmem_alltoall.tex index 188e2875..4e145c26 100644 --- a/content/shmem_alltoall.tex +++ b/content/shmem_alltoall.tex @@ -35,17 +35,17 @@ \apiargument{OUT}{dest}{Symmetric address of a data object large enough to receive the combined total of \VAR{nelems} elements from each \ac{PE} in the - active set. + participating \acp{PE}. The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{source}{Symmetric address of a data object that contains \VAR{nelems} - elements of data for each \ac{PE} in the active set, ordered according to + elements of data for each \ac{PE} in the participating \acp{PE}, ordered according to destination \ac{PE}. The type of \source{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{nelems}{ - The number of elements to exchange for each \ac{PE}. - For \FUNC{shmem\_alltoallmem}, elements are bytes; - for \FUNC{shmem\_alltoall\{32,64\}}, elements are 4 or 8 bytes, - respectively. + The number of elements to exchange for each \ac{PE}. + For \FUNC{shmem\_alltoallmem}, elements are bytes; + for \FUNC{shmem\_alltoall\{32,64\}}, elements are 4 or 8 bytes, + respectively. } \begin{DeprecateBlock} @@ -100,6 +100,21 @@ If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is otherwise invalid, the behavior is undefined. + Before any \ac{PE} calls a \FUNC{shmem\_alltoall} routine, + the following conditions must be ensured: + \begin{itemize} + \item The \VAR{dest} data object on all \acp{PE} in the team is + ready to accept the \FUNC{shmem\_alltoall} data. + \end{itemize} + + Upon return from a \FUNC{shmem\_alltoall} routine, the following is true for + the local PE: + \begin{itemize} + \item Its \VAR{dest} symmetric data object is completely updated and the + data has been copied out of the source data object. + \end{itemize} + +\begin{DeprecateBlock} Active-set-based collective routines operate over all \acp{PE} in the active set defined by the \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet. @@ -116,23 +131,26 @@ Before any \ac{PE} calls a \FUNC{shmem\_alltoall} routine, the following conditions must be ensured: + \begin{itemize} - \item The \VAR{dest} data object on all \acp{PE} in the active set is - ready to accept the \FUNC{shmem\_alltoall} data. - \item For active-set-based routines, the \VAR{pSync} array - on all \acp{PE} in the active set is not still in use from a prior call - to a \FUNC{shmem\_alltoall} routine. + \item The \VAR{dest} data object on all \acp{PE} in the active set is + ready to accept the \FUNC{shmem\_alltoall} data. + \item For active-set-based routines, the \VAR{pSync} array + on all \acp{PE} in the active set is not still in use from a prior call + to a \FUNC{shmem\_alltoall} routine. \end{itemize} + Otherwise, the behavior is undefined. Upon return from a \FUNC{shmem\_alltoall} routine, the following is true for the local PE: \begin{itemize} - \item Its \VAR{dest} symmetric data object is completely updated and - the data has been copied out of the \VAR{source} data object. - \item For active-set-based routines, - the values in the \VAR{pSync} array are restored to the original values. + \item Its \VAR{dest} symmetric data object is completely updated and the + data has been copied out of the source data object. + \item For active-set-based routines, + the values in the \VAR{pSync} array are restored to the original values. \end{itemize} +\end{DeprecateBlock} } \apireturnvalues{ diff --git a/content/shmem_alltoalls.tex b/content/shmem_alltoalls.tex index e371b8cf..d1bd7d1f 100644 --- a/content/shmem_alltoalls.tex +++ b/content/shmem_alltoalls.tex @@ -35,10 +35,10 @@ \apiargument{OUT}{dest}{Symmetric address of a data object large enough to receive the combined total of \VAR{nelems} elements from each \ac{PE} in the - active set. + participating \acp{PE}. The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{source}{Symmetric address of a data object that contains \VAR{nelems} - elements of data for each \ac{PE} in the active set, ordered according to + elements of data for each \ac{PE} in the participating \acp{PE}, ordered according to destination \ac{PE}. The type of \source{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{dst}{The stride between consecutive elements of the \dest{} diff --git a/content/shmem_broadcast.tex b/content/shmem_broadcast.tex index a172a12e..49abd50b 100644 --- a/content/shmem_broadcast.tex +++ b/content/shmem_broadcast.tex @@ -45,7 +45,7 @@ respectively. } \apiargument{IN}{PE\_root}{Zero-based ordinal of the \ac{PE}, with respect to - the team or active set, from which the data is copied.} + the calling PEs, from which the data is copied.} \begin{DeprecateBlock} @@ -61,8 +61,7 @@ \end{apiarguments} \apidescription{ - \openshmem broadcast routines are collective routines over an active set or - valid \openshmem team. + \openshmem team-based broadcast routines are collective routines over a valid \openshmem team. They copy the \source{} data object on the \ac{PE} specified by \VAR{PE\_root} to the \dest{} data object on the \acp{PE} participating in the collective operation. @@ -75,6 +74,9 @@ \item The \dest{} object is updated on all \acp{PE}. \item All \acp{PE} in the \VAR{team} argument must participate in the operation. + \item Only \acp{PE} in the team may call the routine. If a + \ac{PE} not in the team calls a team-based + collective routine, the behavior is undefined. \item If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is otherwise invalid, the behavior is undefined. \item \ac{PE} numbering is relative to the team. The specified @@ -82,59 +84,79 @@ between \CONST{0} and \VAR{N$-$1}, where \VAR{N} is the size of the team. \end{itemize} + + Before any \ac{PE} calls a broadcast routine, the following + conditions must be ensured: + \begin{itemize} + \item The \dest{} array on all \acp{PE} participating in the broadcast + is ready to accept the broadcast data. + \end{itemize} + Otherwise, the behavior is undefined. + + Upon return from a team-based broadcast routine, the following are true for the local + \ac{PE}: + \begin{itemize} + \item The \dest{} data object is updated. + \item The \source{} data object may be safely reused. + \end{itemize} +\begin{DeprecateBlock} + \openshmem active-set broadcast routines are collective routines over an active set. + They copy the \source{} data object on the \ac{PE} specified by + \VAR{PE\_root} to the \dest{} data object on the \acp{PE} + participating in the collective operation. + The same \dest{} and \source{} data objects and the same value of + \VAR{PE\_root} must be passed by all \acp{PE} participating in the + collective operation. + For active-set-based broadcasts: \begin{itemize} - \item The \dest{} object is updated on all \acp{PE} other than the - root \ac{PE}. - \item All \acp{PE} in the active set defined by the - \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet - must participate in the operation. - \item Only \acp{PE} in the active set may call the routine. If a - \ac{PE} not in the active set calls an active-set-based + \item The \VAR{dest} object is updated on all PEs other than the root PE. + \item All \acp{PE} in the active set defined by the + \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet + must participate in the operation. + \item Only \acp{PE} in the active set may call the routine. If a + \ac{PE} not in the active set calls an active-set-based collective routine, the behavior is undefined. - \item The values of arguments \VAR{PE\_root}, \VAR{PE\_start}, + \item The values of arguments \VAR{PE\_root}, \VAR{PE\_start}, \VAR{logPE\_stride}, and \VAR{PE\_size} must be the same value on all \acp{PE} in the active set. - \item The value of \VAR{PE\_root} must be between \CONST{0} and + \item The value of \VAR{PE\_root} must be between \CONST{0} and \VAR{PE\_size $-$ 1}. - \item The same \VAR{pSync} work array must be passed by all \acp{PE} + \item The same \VAR{pSync} work array must be passed by all \acp{PE} in the active set. \end{itemize} - Before any \ac{PE} calls a broadcast routine, the following + Before any \ac{PE} calls a active-set-based broadcast routine, the following conditions must be ensured: \begin{itemize} - \item The \dest{} array on all \acp{PE} participating in the broadcast - is ready to accept the broadcast data. - \item For active-set-based broadcasts, the - \VAR{pSync} array on all \acp{PE} in the - active set is not still in use from a prior call to an \openshmem - collective routine. + \item The \dest{} array on all \acp{PE} participating in the broadcast + is ready to accept the broadcast data. + \item The \VAR{pSync} array on all \acp{PE} in the + active set is not still in use from a prior call to an \openshmem + collective routine. \end{itemize} Otherwise, the behavior is undefined. - Upon return from a broadcast routine, the following are true for the local + Upon return from a active-based broadcast routine, the following are true for the local \ac{PE}: \begin{itemize} - \item For team-based broadcasts, the \dest{} data object is - updated. - \item For active-set-based broadcasts: - \begin{itemize} - \item If the current \ac{PE} is not the root \ac{PE}, the - \dest{} data object is updated. + \item If the current PE is not the root PE, the \dest{} data object is updated. + \item The \source{} data object may be safely reused. \item The values in the \VAR{pSync} array are restored to the original values. - \end{itemize} - \item The \source{} data object may be safely reused. \end{itemize} +\end{DeprecateBlock} } \apireturnvalues{ For team-based broadcasts, zero on successful local completion; otherwise, nonzero. +\begin{DeprecateBlock} For active-set-based broadcasts, none. +\end{DeprecateBlock} + } \apinotes{ diff --git a/content/shmem_collect.tex b/content/shmem_collect.tex index 5430abcf..d14d8f17 100644 --- a/content/shmem_collect.tex +++ b/content/shmem_collect.tex @@ -66,15 +66,13 @@ \openshmem \FUNC{collect} and \FUNC{fcollect} routines perform a collective operation to concatenate \VAR{nelems} data items from the \source{} array into the - \dest{} array, over an \openshmem team or active set - in processor number order. The resultant \dest{} array contains the contribution from + \dest{} array, over an \openshmem team in processor number order. + The resultant \dest{} array contains the contribution from \acp{PE} as follows: \begin{itemize} - \item For an active set, the data from \ac{PE} \VAR{PE\_start} is first, then the - contribution from \ac{PE} \VAR{PE\_start} + \VAR{PE\_stride} second, and so on. - \item For a team, the data from \ac{PE} number \CONST{0} in the team is first, then the - contribution from \ac{PE} \CONST{1} in the team, and so on. + \item For a team, the data from \ac{PE} number \CONST{0} in the team is first, then the + contribution from \ac{PE} \CONST{1} in the team, and so on. \end{itemize} The collected result is written to the \dest{} array for all \acp{PE} @@ -90,6 +88,26 @@ If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is otherwise invalid, the behavior is undefined. +\begin{DeprecateBlock} + \openshmem \FUNC{collect} and \FUNC{fcollect} routines perform a collective + operation to concatenate \VAR{nelems} + data items from the \source{} array into the + \dest{} array, over an \openshmem active set + in processor number order. The resultant \dest{} array contains the contribution from + \acp{PE} as follows: + \begin{itemize} + \item For an active set, the data from \ac{PE} \VAR{PE\_start} is first, then the + contribution from \ac{PE} \VAR{PE\_start} + \VAR{PE\_stride} second, and so on. + \end{itemize} + + The collected result is written to the \dest{} array for all \acp{PE} + that participate in the operation. The same \dest{} and \source{} + arrays must be passed by all \acp{PE} that participate in the operation. + + The \FUNC{fcollect} routines require that \VAR{nelems} be the same value in all + participating \acp{PE}, while the \FUNC{collect} routines allow \VAR{nelems} to + vary from \ac{PE} to \ac{PE}. + Active-set-based collective routines operate over all \acp{PE} in the active set defined by the \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet. As with all active-set-based collective routines, @@ -108,6 +126,7 @@ \item For active-set-based collective routines, the values in the \VAR{pSync} array are restored to the original values. \end{itemize} +\end{DeprecateBlock} } \apireturnvalues{ @@ -115,9 +134,15 @@ } \apinotes{ +\begin{DeprecateBlock} The collective routines operate on active \ac{PE} sets that have a non-power-of-two \VAR{PE\_size} with some performance degradation. They operate with no performance degradation when \VAR{nelems} is a non-power-of-two value. +\end{DeprecateBlock} + The collective routines that operate on teams containing a + non-power-of-two of PEs do so with some performance degradation. They operate + with no performance degradation when \VAR{nelems} is a non-power-of-two value. + } \begin{apiexamples} diff --git a/content/shmem_reductions.tex b/content/shmem_reductions.tex index ff933b35..79f0b42a 100644 --- a/content/shmem_reductions.tex +++ b/content/shmem_reductions.tex @@ -257,6 +257,8 @@ \subsubsubsection{PROD} \VAR{nreduce} must be of type integer.} \begin{DeprecateBlock} +\apiargument{IN}{nreduce}{In active-set based \ac{API} calls, + \VAR{nreduce} must be of type integer.} \apiargument{IN}{PE\_start}{The lowest \ac{PE} number of the active set of \acp{PE}.} \apiargument{IN}{logPE\_stride}{The log (base 2) of the stride between consecutive @@ -273,7 +275,7 @@ \subsubsubsection{PROD} \end{apiarguments} \apidescription{ - \openshmem reduction routines are collective routines over an active set or + \openshmem reduction routines are collective routines over an existing \openshmem team that compute one or more reductions across symmetric arrays on multiple \acp{PE}. A reduction performs an associative binary routine across a set of values. @@ -295,6 +297,37 @@ \subsubsubsection{PROD} If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is otherwise invalid, the behavior is undefined. + Before any \ac{PE} calls a reduction routine, the following conditions must be ensured: + \begin{itemize} + \item The \dest{} array on all \acp{PE} participating in the reduction + is ready to accept the results of the \OPR{reduction}. + \end{itemize} + Otherwise, the behavior is undefined. + + Upon return from a reduction routine, the following are true for the local + \ac{PE}: + \begin{itemize} + \item The \dest{} array is updated and the \source{} array may be safely reused. + \end{itemize} + +\begin{DeprecateBlock} + \openshmem reduction routines are collective routines over an active set + that compute one or more reductions across symmetric + arrays on multiple \acp{PE}. A reduction performs an associative binary routine + across a set of values. + + The \VAR{nreduce} argument determines the number of separate reductions to + perform. The \source{} array on all \acp{PE} participating in the reduction + provides one element for each reduction. The results of the reductions are placed in the + \dest{} array on all \acp{PE} participating in the reduction. + + The same \source{} and \dest{} arrays must be passed by all PEs that + participate in the collective. + The \source{} and \dest{} arguments must either be the same symmetric + address, or two different symmetric addresses corresponding to buffers that + do not overlap in memory. That is, they must be completely overlapping (sometimes referred to as an ``in place'' reduction) or + completely disjoint. + Active-set-based sync routines operate over all \acp{PE} in the active set defined by the \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet. @@ -327,6 +360,7 @@ \subsubsubsection{PROD} \item If using active-set-based routines, the values in the \VAR{pSync} array are restored to the original values. \end{itemize} +\end{DeprecateBlock} The complex-typed interfaces are only provided for sum and product reductions. When the \Cstd translation environment does not support complex types diff --git a/content/shmem_sync.tex b/content/shmem_sync.tex index 6e41ee82..91a2ce61 100644 --- a/content/shmem_sync.tex +++ b/content/shmem_sync.tex @@ -1,7 +1,11 @@ \apisummary{ Registers the arrival of a \ac{PE} at a synchronization point. This routine does not return until all other \acp{PE} in a given OpenSHMEM team - or active set arrive at this synchronization point. + arrive at this synchronization point. +\begin{DeprecateBlock} + Registers the arrival of a \ac{PE} at a synchronization point. + This routine does not return until all other \acp{PE} in a given OpenSHMEM active set arrive at this synchronization point. +\end{DeprecateBlock} } \begin{apidefinition} @@ -38,12 +42,12 @@ \apidescription{ \FUNC{shmem\_sync} is a collective synchronization routine over an - existing \openshmem team or active set. + existing \openshmem team. The routine registers the arrival of a \ac{PE} at a synchronization point in the program. This is a fast mechanism for synchronizing all \acp{PE} that participate in this collective call. The routine blocks the calling \ac{PE} until all \acp{PE} in the - specified team or active set have called \FUNC{shmem\_sync}. In a multithreaded \openshmem + specified team have called \FUNC{shmem\_sync}. In a multithreaded \openshmem program, only the calling thread is blocked. Team-based sync routines operate over all \acp{PE} in the provided team argument. All @@ -51,6 +55,19 @@ If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is otherwise invalid, the behavior is undefined. + In contrast with the \FUNC{shmem\_barrier} routine, \FUNC{shmem\_sync} only + ensures completion and visibility of previously issued memory stores and does not ensure + completion of remote memory updates issued via \openshmem routines. + +\begin{DeprecateBlock} + \FUNC{shmem\_sync} is a collective synchronization routine over an active set. + + The routine registers the arrival of a \ac{PE} at a synchronization point in the program. + This is a fast mechanism for synchronizing all \acp{PE} that participate in this + collective call. The routine blocks the calling \ac{PE} until all \acp{PE} in the + active set have called \FUNC{shmem\_sync}. In a multithreaded \openshmem + program, only the calling thread is blocked. + Active-set-based sync routines operate over all \acp{PE} in the active set defined by the \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet. @@ -64,12 +81,11 @@ \VAR{PE\_size} must be equal on all \acp{PE} in the active set. The same work array must be passed in \VAR{pSync} to all \acp{PE} in the active set. - In contrast with the \FUNC{shmem\_barrier} routine, \FUNC{shmem\_sync} only - ensures completion and visibility of previously issued memory stores and does not ensure - completion of remote memory updates issued via \openshmem routines. - The same \VAR{pSync} array may be reused on consecutive calls to \FUNC{shmem\_sync} if the same active set is used. +\end{DeprecateBlock} + + } \apireturnvalues{ diff --git a/content/shmem_team_split_strided.tex b/content/shmem_team_split_strided.tex index 08969792..59decede 100644 --- a/content/shmem_team_split_strided.tex +++ b/content/shmem_team_split_strided.tex @@ -101,11 +101,8 @@ } \apinotes{ - The \FUNC{shmem\_team\_split\_strided} operation uses an arbitrary - \VAR{stride} argument, whereas the \VAR{logPE\_stride} argument to the - active set collective operations only permits strides that are a power of two. - Arbitrary strides allow a greater number of PE subsets to be expressed - and can support a broader range of usage models. + The \FUNC{shmem\_team\_split\_strided} operation can take any positive integer value + \VAR{stride} argument. See the description of team handles and predefined teams in Section~\ref{subsec:team} for more information about team handle semantics and usage. diff --git a/utils/defs.tex b/utils/defs.tex index 771ba8a7..9d2bdb64 100644 --- a/utils/defs.tex +++ b/utils/defs.tex @@ -362,8 +362,7 @@ \hfill \item[Return Values] \hfill \\ #1 -\\ -\hfill +\hfill \\ } \newcommand{\apitablerow}[2]{