A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search
Linux wheels available (python >=3.6) on pypi:
pip install graphgrove
Building from source:
conda create -n gg python=3.8
conda activate gg
pip install numpy
make
To build your own wheel:
conda create -n gg python=3.8
conda activate gg
pip install numpy
make
pip install build
python -m build --wheel
# which can be used as:
# pip install --force dist/graphgrove-0.0.1-cp37-cp37m-linux_x86_64.whl
Toy examples of clustering, DAG-structured clustering, and nearest neighbor search are available.
At a high level, incremental clustering can be done as:
import graphgrove as gg
k = 5
num_rounds = 50
thresholds = np.geomspace(1.0, 0.001, num_rounds).astype(np.float32)
scc = gg.vec_scc.Cosine_SCC(k=k, num_rounds=num_rounds, thresholds=thresholds, index_name='cosine_sgtree', cores=cores, verbosity=0)
# data_batches - generator of numpy matrices mini-batch-size by dim
for batch in data_batches:
scc.partial_fit(batch)
Incremental nearest neighbor search can be done as:
import graphgrove as gg
k=5
cores=4
tree = gg.graph_builder.Cosine_SGTree(k=k, cores=cores)
# data_batches - generator of numpy matrices mini-batch-size by dim
for batch in data_batches:
tree.insert(batch) # or tree.insert_and_knn(batch)
Clustering:
Nearest Neighbor Search: