__init__.py 4.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139
  1. """Joblib is a set of tools to provide **lightweight pipelining in
  2. Python**. In particular:
  3. 1. transparent disk-caching of functions and lazy re-evaluation
  4. (memoize pattern)
  5. 2. easy simple parallel computing
  6. Joblib is optimized to be **fast** and **robust** on large
  7. data in particular and has specific optimizations for `numpy` arrays. It is
  8. **BSD-licensed**.
  9. ==================== ===============================================
  10. **Documentation:** https://joblib.readthedocs.io
  11. **Download:** https://pypi.python.org/pypi/joblib#downloads
  12. **Source code:** https://github.com/joblib/joblib
  13. **Report issues:** https://github.com/joblib/joblib/issues
  14. ==================== ===============================================
  15. Vision
  16. --------
  17. The vision is to provide tools to easily achieve better performance and
  18. reproducibility when working with long running jobs.
  19. * **Avoid computing the same thing twice**: code is often rerun again and
  20. again, for instance when prototyping computational-heavy jobs (as in
  21. scientific development), but hand-crafted solutions to alleviate this
  22. issue are error-prone and often lead to unreproducible results.
  23. * **Persist to disk transparently**: efficiently persisting
  24. arbitrary objects containing large data is hard. Using
  25. joblib's caching mechanism avoids hand-written persistence and
  26. implicitly links the file on disk to the execution context of
  27. the original Python object. As a result, joblib's persistence is
  28. good for resuming an application status or computational job, eg
  29. after a crash.
  30. Joblib addresses these problems while **leaving your code and your flow
  31. control as unmodified as possible** (no framework, no new paradigms).
  32. Main features
  33. ------------------
  34. 1) **Transparent and fast disk-caching of output value:** a memoize or
  35. make-like functionality for Python functions that works well for
  36. arbitrary Python objects, including very large numpy arrays. Separate
  37. persistence and flow-execution logic from domain logic or algorithmic
  38. code by writing the operations as a set of steps with well-defined
  39. inputs and outputs: Python functions. Joblib can save their
  40. computation to disk and rerun it only if necessary::
  41. >>> from joblib import Memory
  42. >>> cachedir = 'your_cache_dir_goes_here'
  43. >>> mem = Memory(cachedir)
  44. >>> import numpy as np
  45. >>> a = np.vander(np.arange(3)).astype(np.float)
  46. >>> square = mem.cache(np.square)
  47. >>> b = square(a) # doctest: +ELLIPSIS
  48. ________________________________________________________________________________
  49. [Memory] Calling square...
  50. square(array([[0., 0., 1.],
  51. [1., 1., 1.],
  52. [4., 2., 1.]]))
  53. ___________________________________________________________square - 0...s, 0.0min
  54. >>> c = square(a)
  55. >>> # The above call did not trigger an evaluation
  56. 2) **Embarrassingly parallel helper:** to make it easy to write readable
  57. parallel code and debug it quickly::
  58. >>> from joblib import Parallel, delayed
  59. >>> from math import sqrt
  60. >>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10))
  61. [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
  62. 3) **Fast compressed Persistence**: a replacement for pickle to work
  63. efficiently on Python objects containing large data (
  64. *joblib.dump* & *joblib.load* ).
  65. ..
  66. >>> import shutil ; shutil.rmtree(cachedir)
  67. """
  68. # PEP0440 compatible formatted version, see:
  69. # https://www.python.org/dev/peps/pep-0440/
  70. #
  71. # Generic release markers:
  72. # X.Y
  73. # X.Y.Z # For bugfix releases
  74. #
  75. # Admissible pre-release markers:
  76. # X.YaN # Alpha release
  77. # X.YbN # Beta release
  78. # X.YrcN # Release Candidate
  79. # X.Y # Final release
  80. #
  81. # Dev branch marker is: 'X.Y.dev' or 'X.Y.devN' where N is an integer.
  82. # 'X.Y.dev0' is the canonical version of 'X.Y.dev'
  83. #
  84. __version__ = '0.16.0'
  85. import os
  86. from .memory import Memory, MemorizedResult, register_store_backend
  87. from .logger import PrintTime
  88. from .logger import Logger
  89. from .hashing import hash
  90. from .numpy_pickle import dump
  91. from .numpy_pickle import load
  92. from .compressor import register_compressor
  93. from .parallel import Parallel
  94. from .parallel import delayed
  95. from .parallel import cpu_count
  96. from .parallel import register_parallel_backend
  97. from .parallel import parallel_backend
  98. from .parallel import effective_n_jobs
  99. from .externals.loky import wrap_non_picklable_objects
  100. __all__ = ['Memory', 'MemorizedResult', 'PrintTime', 'Logger', 'hash', 'dump',
  101. 'load', 'Parallel', 'delayed', 'cpu_count', 'effective_n_jobs',
  102. 'register_parallel_backend', 'parallel_backend',
  103. 'register_store_backend', 'register_compressor',
  104. 'wrap_non_picklable_objects']
  105. # Workaround issue discovered in intel-openmp 2019.5:
  106. # https://github.com/ContinuumIO/anaconda-issues/issues/11294
  107. os.environ.setdefault("KMP_INIT_AT_FORK", "FALSE")