Analyze a sharded dataset#

import lamindb as ln
import lnschema_bionty as lb

ln.track()
💡 loaded instance: testuser1/test-facs (lamindb 0.54.4)
💡 notebook imports: lamindb==0.54.4 lnschema_bionty==0.31.2 scanpy==1.9.5
💡 Transform(id='zzJzdgJ763Dyz8', name='Analyze a sharded dataset', short_name='facs3', version='0', type=notebook, updated_at=2023-10-01 16:44:53, created_by_id='DzTjkKse')
💡 Run(id='WFBzw6EMZ3lBgpeLPYfC', run_at=2023-10-01 16:44:53, transform_id='zzJzdgJ763Dyz8', created_by_id='DzTjkKse')
ln.Dataset.filter().df()
name description version hash reference reference_type transform_id run_id file_id initial_version_id updated_at created_by_id
id
8zlLWz5kwz4eVoYxBRwf My versioned FACS dataset None 1 Piw2n0vdnoNoAV7ZxgsW-g None None OWuTtS4SAponz8 TJweM0VKGkTQHyy8CZci 8zlLWz5kwz4eVoYxBRwf None 2023-10-01 16:44:34 DzTjkKse
8zlLWz5kwz4eVoYxBREt My versioned FACS dataset None 2 dmrCH-OEK94Zbh7i51wn None None SmQmhrhigFPLz8 qh5Vw8DryjLToUWD3lqo None 8zlLWz5kwz4eVoYxBRwf 2023-10-01 16:44:43 DzTjkKse
dataset = ln.Dataset.filter(name="My versioned FACS dataset", version="2").one()
adata = dataset.load(join="inner")
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/anndata/_core/anndata.py:1838: UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
  utils.warn_names_duplicates("obs")

The AnnData has the reference to the individual files in the .obs annotations:

adata.obs.file_id.cat.categories
Index(['8zlLWz5kwz4eVoYxBRwf', 'rYTrXas2KdzpqLDUILvH'], dtype='object')

By default, the intersection of features is used:

adata.var.index
Index(['CD57', 'Cd19', 'Cd4', 'CD8', 'CD3', 'CD27', 'Cd14', 'Ccr7', 'CD127',
       'CD28'],
      dtype='object')

Let us create a plot:

markers = lb.CellMarker.lookup()
import scanpy as sc

sc.pp.pca(adata)
sc.pl.pca(adata, color=markers.cd14.name, save="_cd14")
filepath = "figures/pca_cd14"
WARNING: saving figure to file figures/pca_cd14.pdf
https://d33wubrfki0l68.cloudfront.net/36fef80fddffc6f39858cd5511079c67855f9ae9/38481/_images/70ecded9e776e84e1aaf246c62b93f70350fd2dd839d6b377ef3ec00b8a00c28.png
file = ln.File("./figures/pca_cd14.pdf", description="My result on CD14")
file.save()
file.view_flow()
https://d33wubrfki0l68.cloudfront.net/8177bf8636aceffcbc57eceaede2809be64d1c7e/c8315/_images/0291c4eace790c25e454df9e5f9e9b82d84f1a5d0beb31b3649576d95cd6daa5.svg
# clean up test instance
!lamin delete --force test-facs
!rm -r test-flow
💡 deleting instance testuser1/test-facs
✅     deleted instance settings file: /home/runner/.lamin/instance--testuser1--test-facs.env
✅     instance cache deleted
✅     deleted '.lndb' sqlite file
❗     consider manually deleting your stored data: /home/runner/work/lamin-usecases/lamin-usecases/docs/test-facs
rm: cannot remove 'test-flow': No such file or directory