clarkson_woodruff_transform#
- scipy.linalg.clarkson_woodruff_transform(input_matrix, sketch_size, rng=None)[source]#
- Applies a Clarkson-Woodruff Transform/sketch to the input matrix. - Given an input_matrix - Aof size- (n, d), compute a matrix- A'of size (sketch_size, d) so that\[\|Ax\| \approx \|A'x\|\]- with high probability via the Clarkson-Woodruff Transform, otherwise known as the CountSketch matrix. - Parameters:
- input_matrixarray_like, shape (…, n, d)
- Input matrix. 
- sketch_sizeint
- Number of rows for the sketch. 
- rng{None, int, numpy.random.Generator}, optional
- If rng is passed by keyword, types other than - numpy.random.Generatorare passed to- numpy.random.default_rngto instantiate a- Generator. If rng is already a- Generatorinstance, then the provided instance is used. Specify rng for repeatable function behavior.- If this argument is passed by position or seed is passed by keyword, legacy behavior for the argument seed applies: - If seed is None (or - numpy.random), the- numpy.random.RandomStatesingleton is used.
- If seed is an int, a new - RandomStateinstance is used, seeded with seed.
- If seed is already a - Generatoror- RandomStateinstance then that instance is used.
 - Changed in version 1.15.0: As part of the SPEC-007 transition from use of - numpy.random.RandomStateto- numpy.random.Generator, this keyword was changed from seed to rng. For an interim period, both keywords will continue to work, although only one may be specified at a time. After the interim period, function calls using the seed keyword will emit warnings. The behavior of both seed and rng are outlined above, but only the rng keyword should be used in new code.
 
- Returns:
- A’array_like
- Sketch of the input matrix - A, of size- (sketch_size, d).
 
 - Notes - To make the statement \[\|Ax\| \approx \|A'x\|\]- precise, observe the following result which is adapted from the proof of Theorem 14 of [2] via Markov’s Inequality. If we have a sketch size - sketch_size=kwhich is at least\[k \geq \frac{2}{\epsilon^2\delta}\]- Then for any fixed vector - x,\[\|Ax\| = (1\pm\epsilon)\|A'x\|\]- with probability at least one minus delta. - This implementation takes advantage of sparsity: computing a sketch takes time proportional to - A.nnz. Data- Awhich is in- scipy.sparse.csc_matrixformat gives the quickest computation time for sparse input.- >>> import numpy as np >>> from scipy import linalg >>> from scipy import sparse >>> rng = np.random.default_rng() >>> n_rows, n_columns, density, sketch_n_rows = 15000, 100, 0.01, 200 >>> A = sparse.rand(n_rows, n_columns, density=density, format='csc') >>> B = sparse.rand(n_rows, n_columns, density=density, format='csr') >>> C = sparse.rand(n_rows, n_columns, density=density, format='coo') >>> D = rng.standard_normal((n_rows, n_columns)) >>> SA = linalg.clarkson_woodruff_transform(A, sketch_n_rows) # fastest >>> SB = linalg.clarkson_woodruff_transform(B, sketch_n_rows) # fast >>> SC = linalg.clarkson_woodruff_transform(C, sketch_n_rows) # slower >>> SD = linalg.clarkson_woodruff_transform(D, sketch_n_rows) # slowest - That said, this method does perform well on dense inputs, just slower on a relative scale. - References [1]- Kenneth L. Clarkson and David P. Woodruff. Low rank approximation and regression in input sparsity time. In STOC, 2013. [2]- David P. Woodruff. Sketching as a tool for numerical linear algebra. In Foundations and Trends in Theoretical Computer Science, 2014. - Examples - Create a big dense matrix - Afor the example:- >>> import numpy as np >>> from scipy import linalg >>> n_rows, n_columns = 15000, 100 >>> rng = np.random.default_rng() >>> A = rng.standard_normal((n_rows, n_columns)) - Apply the transform to create a new matrix with 200 rows: - >>> sketch_n_rows = 200 >>> sketch = linalg.clarkson_woodruff_transform(A, sketch_n_rows, seed=rng) >>> sketch.shape (200, 100) - Now with high probability, the true norm is close to the sketched norm in absolute value. - >>> linalg.norm(A) 1224.2812927123198 >>> linalg.norm(sketch) 1226.518328407333 - Similarly, applying our sketch preserves the solution to a linear regression of \(\min \|Ax - b\|\). - >>> b = rng.standard_normal(n_rows) >>> x = linalg.lstsq(A, b)[0] >>> Ab = np.hstack((A, b.reshape(-1, 1))) >>> SAb = linalg.clarkson_woodruff_transform(Ab, sketch_n_rows, seed=rng) >>> SA, Sb = SAb[:, :-1], SAb[:, -1] >>> x_sketched = linalg.lstsq(SA, Sb)[0] - As with the matrix norm example, - linalg.norm(A @ x - b)is close to- linalg.norm(A @ x_sketched - b)with high probability.- >>> linalg.norm(A @ x - b) 122.83242365433877 >>> linalg.norm(A @ x_sketched - b) 166.58473879945151