Paper - Pangyoalto Blog

Sign in Subscribe

Paper

A collection of 17 posts

[논문 리뷰] HNSW를 위한 벡터 압축 방식 - Flash

[논문 리뷰] HNSW를 위한 벡터 압축 방식 - Flash

이 글은 논문 Accelerating Graph Indexing for ANNS on Modern CPUs 을 리뷰합니다. 이번 논문은 HNSW 및 Product Quantization을 알고 있어야 이해하기 편합니다. 블로그에서 이미 다룬 적이 있으니, 익숙하지 않으신 분들은 아래 글을 먼저 읽고 오시는 것을 추천합니다. 벡터 검색 알고리즘 살펴보기(1): Similarity Search와 HNSW벡터 검색 알고리즘 살펴보기(2)

[Paper review] Databases in the Era of Memory-Centric Computing

[Paper review] Databases in the Era of Memory-Centric Computing

Compared to the rapid advancement of CPU performance, the growth rate of memory capacity has been steadily slowing down. Stemming from this issue, Databases in the Era of Memory-Centric Computing proposes an architecture aimed at efficiently utilizing memory, which has become increasingly expensive relative to other system components. The paper

[논문 리뷰] Databases in the Era of Memory-Centric Computing

[논문 리뷰] Databases in the Era of Memory-Centric Computing

CPU의 성능 발전 속도에 비하여 메모리의 용량 발전 속도는 계속해서 느려지고 있습니다. 이러한 문제점에서 출발한 Databases in the Era of Memory-Centric Computing은 상대적으로 점차 비싸지고 있는 메모리를 효율적으로 활용하기 위한 아키텍처를 제시하였으며 이 구조가 특히 Database에 효율적이라고 말합니다. 해당 논문은 2025년 출판되었으며 비교적 신기술인 CXL을 memory-centric computing을 구현하기 위한 핵심

[논문 리뷰] 효율적인 sparse 벡터 검색을 위한 새로운 역색인 구조

Information Retrieval

[논문 리뷰] 효율적인 sparse 벡터 검색을 위한 새로운 역색인 구조

이번에 다룰 논문인 Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations 는 SIGIR 2024에서 Best Papers에 선정된 논문 중 하나입니다. 벡터 검색에서 자주 사용되는 그래프 방식이 아닌 기존 전통적인 검색 엔진에서 자주 사용되는 역색인 방식으로 벡터 검색을 하는 알고리즘을 제안합니다. 들어가며 검색은 사용자의 질의에 가장 적합한 문서들을 가져오는

When to stop using only bloom filters: Ribbon filter

When to stop using only bloom filters: Ribbon filter

This article is an english translation of the post that I published on the medium to share paper review What kind of data structure should we use to determine which keys are in which sets? The simplest way to do this is to use a hash map that holds all

[구글 시리즈] #3. Chubby (feat ZooKeeper)

[구글 시리즈] #3. Chubby (feat ZooKeeper)

현재 검색 시장뿐만 아니라 다양한 분야를 장악한 구글은 불과 1998년에 설립되었습니다. 구글은 매우 빠른 속도로 성장했고, 이를 위한 인프라 시스템들을 구축했습니다. 구글의 인프라 시스템은 다른 회사 및 개발자에게 많은 영향을 끼쳤습니다. 구글 시리즈는 이러한 논문들 중 일부를 뽑아 소개를 하고자 합니다. 이번 글에서는 구글의 Chubby만 다루는 것이 아니라, Zookeeper도 같이

[구글 시리즈] #2. Bigtable: A Distributed Storage System for Structured Data (2006)

[구글 시리즈] #2. Bigtable: A Distributed Storage System for Structured Data (2006)

현재 검색 시장뿐만 아니라 다양한 분야를 장악한 구글은 불과 1998년에 설립되었습니다. 구글은 매우 빠른 속도로 성장했고, 이를 위한 인프라 시스템들을 구축했습니다. 구글의 인프라 시스템은 다른 회사 및 개발자에게 많은 영향을 끼쳤습니다. 구글 시리즈는 이러한 논문들 중 일부를 뽑아 소개를 하고자 합니다. 1. The Google File System (2003) 2. BigTable: A

[구글 시리즈] #1. The Google File System(2003)

[구글 시리즈] #1. The Google File System(2003)

현재 검색 시장뿐만 아니라 다양한 분야를 장악한 구글은 불과 1998년에 설립되었습니다. 구글은 매우 빠른 속도로 성장했고, 이를 위한 인프라 시스템들을 구축했습니다. 구글의 인프라 시스템은 다른 회사 및 개발자에게 많은 영향을 끼쳤습니다. 구글 시리즈는 이러한 논문들 중 일부를 뽑아 소개를 하고자 합니다. 1. The Google File System (2003) 2. BigTable: A

eBPF/XDP: Secure and Fast Networking You Might Not Know About

eBPF/XDP: Secure and Fast Networking You Might Not Know About

[Summary] * eBPF: Technology enabling application developers to run programs within the kernel space, allowing them to utilize OS functions at runtime. * XDP: Technology based on eBPF that supports packet processing. * XDP consists of XDP driver hooks and the eBPF virtual machine, enabling secure packet processing in the kernel space with

RAG의 짧은 역사 훑어보기(첫 논문부터 최근 동향까지)

RAG의 짧은 역사 훑어보기(첫 논문부터 최근 동향까지)

[요약] * RAG는 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks(2020)에서 처음 등장하였고 모델과 Retriever을 학습시키는데 사용되었다. * 최근 RAG는 모델의 학습이 아닌 모델의 Inference 성능을 레버리지하기 위해 주로 사용되고 있다. * RAG는 문서를 어떻게 잘 검색해서 가져오고 LLM에 잘 넘겨주는지 중요하다. * RAG를 도와주는 구글의 Vertex AI는 문서 검색에 벡터 검색뿐만 아니라 전통적인

[논문 리뷰] Efficient Memory Management for Large Language Model Serving with PagedAttention

[논문 리뷰] Efficient Memory Management for Large Language Model Serving with PagedAttention

[요약] 1. Efficient Memory Management for Large Language Model Serving with PagedAttention 은 vLLM의 기반이 된 논문이다. 2. LLM 서빙에서 throughput을 향상시키려면 배치 처리를 해야한다. 배치 처리의 바틀넥은 KV 캐시가 차지하는 메모리 크기이며 이에 따라 최대 배치 크기가 결정된다. 3. vLLM은 OS의 virtual memory처럼 KV 캐시가 저장되는 메모리를 block단위로 나누고

Exploring the Use of Paxos and TrueTime in Google Spanner

Exploring the Use of Paxos and TrueTime in Google Spanner

[Summary] * Paxos is a consensus algorithm used in distributed systems, often seen as an extension of the two-phase commit protocol (2PC). * Spanner utilizes a leader-based Paxos with leases. * TrueTime in Spanner is a globally synchronized clock with bounded uncertainty, aiding in determining the order of events. * Through TrueTime, Spanner provides

구글 Spanner의 Paxos 및 TrueTime 활용을 알아보자

구글 Spanner의 Paxos 및 TrueTime 활용을 알아보자

[요약] * Paxos는 분산 시스템에서 사용되는 합의 알고리즘으로 2PC를 확장한 알고리즘으로 볼 수 있습니다. * Spanner는 lease를 활용한 리더 기반 Paxos를 사용합니다. * Spanner의 TrueTime은 오차가 제한되는 전역 동기화 시계로 이벤트 간 순서를 정할 때 도움을 줍니다. * TrueTime을 통해 Spanner는 external consistency 및 MVCC를 제공합니다. [목차] * 2PC * paxos * Spanner의 paxos * Spanner의 TrueTime 및

[논문 리뷰] On-demand Container Loading in AWS Lambda

[논문 리뷰] On-demand Container Loading in AWS Lambda

[요약] 1. On-demand Container Loading in AWS Lambda는 Block-level loading, Deduplication 및 Tiered cache을 도입해 scalability 및 cold start latency를 줄이는 방법을 소개한다. 2. Block-level loading은 이미지를 flattening process를 거쳐 하나의 파일 시스템으로 만든 뒤 chunk로 나누어 캐시에 올린다. 3. 캐시에 이미지를 올릴 때 같은 내용의 이미지는 같은 key로 암호화하여

[Paper review] On-demand Container Loading in AWS Lambda

[Paper review] On-demand Container Loading in AWS Lambda

[Summary] 1. On-demand Container Loading in AWS Lambda introduces how to reduce scalability and cold start latency by introducing Block-level loading, Deduplication, and tiered cache. 2. Block-level loading makes an image into a single file system through a flattening process, then divides it into chunks and uploads it to the

[Paper review] ColBERT, ColBERTv2

[Paper review] ColBERT, ColBERTv2

[Summary] 1. BM25 is widely used for ranking method for IR. But since 2016, ranking methods using Neural Networks have started to appear 2. Neural IR generally showed high MRR, but had expensive computation. Especially BERT, which showed excellent MRR, was too slow to be applied to services 3. ColBERT

[논문 리뷰] ColBERT, ColBERTv2

[논문 리뷰] ColBERT, ColBERTv2

[요약] ‌1. IR의 랭킹 방법은 BM25가 널리 쓰이고 있지만 2016년부터 Neural Network를 사용한 랭킹 방법들이 등장하기 시작했다. 2. Neural IR은 대체로 높은 MRR을 보여주었지만 계산이 비싸다는 단점이 있었다. 특히 BERT는 월등한 MRR을 보이지만 서비스에 적용하기에 너무 느리다. 3. ColBERT는 BERT보다 약간 모자란 MRR을 보이지만 훨씬 빠른 성능을 보여주었다. 4. ColBERTv2는