액츄에이터

비즈니스 개발 이외에 서비스에 문제가 없는 지 모니터링하고 지표들을 심어서 감사하는 활동들을 해야 한다.
스프링 부트가 제공하는 액추에이터는 이런 모니터링 기능을 제공해주고 있다.

시작하기

아래의 url을 통해서 기본 확인이 가능하다.
- http://localhost:8080/actuator

{
	"_links": {
		"self": {
			"href": "http://localhost:8080/actuator",
			"templated": false
		},
		"health": {
			"href": "http://localhost:8080/actuator/health",
			"templated": false
		},
		"health-path": {
			"href": "http://localhost:8080/actuator/health/{*path}",
			"templated": true
		}
	}
}

health 링크를 누르면
- 아래와 같이 동작이 살아있음을 확인할 수 있다.

{"status": "up"}

엔드포인트 설정

엔드포인트 활성화
- on, off를 노출하는 것이다.
엔드포인트 노출
- 활성화된 엔드페이지를 외부로 보여지는 것을 말한다.

모든 엔드포인트를 노출

Web 노출

management:  
  endpoints:  
    web:  
      exposure:  
        include: "*"
        exclude: "env"

Jmx 노출

management:  
  endpoints:
    jmx:
      exposure:
        include: "health,info"

엔드 포인트 활성화

management:  
  endpoint:  
    shutdown:  
      enabled: true  
  endpoints:  
    web:  
      exposure:  
        include: "*"

다양한 엔드포인트

beans
- http://localhost:8080/actuator/beans
conditions
- http://localhost:8080/actuator/conditions
- 컨디션 조건이 매칭이 됐는지 않됐는지 확인할 수 있다.
configpros
- http://localhost:8080/actuator/configprops
- Configproperties를 보여준다.
env
- http://localhost:8080/actuator/env
- 환경변수 값들을 보여 준다.
health
- http://localhost:8080/actuator/health
- 애플리케이션 헬스 정보
httpexchanges
- //TODO
info
- http://localhost:8080/actuator/info
logger
- http://localhost:8080/actuator/loggers
- 현재 프로젝트의 로깅 레벨을 보여준다.
metrics
- http://localhost:8080/actuator/metrics
- 시스템 메트릭 정보를 확인할 수 있다.
mappings
- http://localhost:8080/actuator/mappings
- 매핑된 정보를 확인할 수 있다.
threaddump
- http://localhost:8080/actuator/threaddump
- 쓰레드 덤프를 실행해서 보여준다.

헬스 정보

애플리케이션이 문제가 발생했을 때 문제를 빠르게 인지할 수 있다.
yml에서 디테일 정보를 활성화 시킨다.

management:  
  endpoint:  
    health:  
      #show-details: always
      show-components: always # detais가 너무 자세하면 이렇게 하면 상태 정보만 보여줄 수 있다.

output
- 하나라도 down이면 최종 결과는 down 이다.

{
	"status": "UP"
	"components": {
		"db": {
			"status": "UP"
			// db 정보
		},
		"diskSpace": {
			"status": "UP"
			// diskspace 정보
		},
		"ping": {
			"status": "UP"
		}
	}
}

Info 정보

애플리케이션의 기본 정보를 노출한다.

기본으로 제공되는 정보

java: 런타임 정보
- 기본 비활성화
os: OS 정보
- 기본 비활성화
env: Environment에서 info.로 시작하는 정보
- 기본 비활성화
build: 빌드 정보
- META-INF/build-info.properties파일이 필요하다
git: git정보
- git.properties파일이 필요하다

활성화 방법

management:  
  info:  
    java:  
      enabled: true
    os:  
      enabled: true
    env:
      enabled: true
  endpoint:  
    health:  
      show-components: always  
  endpoints:  
    web:  
      exposure:  
        include: "*"
        
info: # info. 로 시작하는 정보를 env에서 보여주기 위함
  app:
    name: hello-actuator
    company: hc

build 정보 추가는 build.gradle에 아래를 추가하면 된다.

springBoot {  
    buildInfo()  
}

git 정보 추가

plugins {
	...
	id "com.gorylenko.gradle-git-properties" version "2.4.1" //git info
}

로거

현재 프로젝트의 로깅 레벨을 확인할 수 있다
좀 더 자세한 요청을 확인할 수 있다.
- /actuator/loggers/hello.controller

실시간으로 로깅 레벨 변경

POST /actuator/loggers/hello.controller

{
	"configuredLevel": "TRACE"
}

서버를 재부팅 하면 다시 기존 세팅으로 돌아온다.

Http 요청 응답 기록

최대 기록 요청수는 기본값이 100개(변경 가능)

@Bean
public InMemoryHttpExchangeRepository httpExchangeRepository() {
 
	return new InMemoryHttpExchangeRepository();
 
}

액츄에이터와 보안

액츄에이터의 엔드포인트들은 외부 인터넷 접근이 불가능하게 막고 내부에서만 접근 가능한 내부망을 사용하는 것이 좋다.

액츄에이터의 포트만 별도로 띄울 수 있다.

management:  
  info:  
    java:  
      enabled: true  
    os:  
      enabled: true  
    env:  
      enabled: true  
  server:  
    port: 9092

마이크로미터, 프로메테우스 ,그라파나

마이크로미터

애플리케이션의 메트릭을 마이크로미터가 정한 표준 방법으로 모아서 제공해준다.
스프링은 이미 잘 만들어져 있는 마이크로미터를 사용한다.
스프링 부트 액츄에이터는 마이크로미터가 제공하는 지표 수집을 @AutoConfiguration을 통해서 자동으로 등록해준다.

매트릭 확인하기

메트릭 목록

http://localhost:8080/actuator/metrics

메트릭 상세 정보

http://localhost:8080/actuator/metrics/{name}

Tag 필터

디테일하게 더 확인하는 것
tag=KEY:VALUE 형식으로 사용해야 한다.
http://localhost:8080/actuator/metrics/jvm.memory.used?tag=area:heap

로그 메트릭

logback.events : logback 로그에 대한 메트릭을 확인할 수 있다.
trace , debug , info , warn , error 각각의 로그 레벨에 따른 로그 수를 확인할 수 있다.
예를 들어서 error로그 수가 급격히 높아진다면 위험한 신호로 받아드릴 수 있다.

톰캣 메트릭

유용한 메틕
- tomcat.threads.busy
  - 현재 사용중인 쓰레드
- tomcat.threads.config.max
  - 최대 동시에 얼마나 받을 수 있는 지 나온다.

server:
	tomcat:
		mbeanregistry:
			enabled: true

사용자 정의 메트릭

핵심 비즈니스 로직을 모니터링하기 위한 사용자 정의 메트릭을 만들 수 있다.

프로메테우스와 그라파나

프로메테우스
- 지속적으로 메트릭을 수집하고 DB에 저장하는 역할
- 데이터베이스 역할
그라파나
- 대시보드

순서

액츄에이터, 마이크로미터를 사용해서 수 많은 메트릭을 생성한다.
프로메테우스는 지속적으로 데이터를 수집한다.
프로메테우스가 메트릭을 DB에 저장한다.
그라파나를 조회 쿼리를 통해서 편리하게 메트릭을 조회한다.

프로메테우스

docker로 실행

services:
 prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    restart: unless-stopped

애플리케이션 설정

프로메테우스가 애플리케이션의 메트릭을 갖고 가려면 프로메테우스가 사용하는 포멧에 맞춰서 메트릭을 만들어야 한다.
마이크로미터 프로메테우스 구현 라이브러리르 추가하면 자동으로 프로메테우스 메트릭 수집 엔드포인트가 자동으로 추가가 된다.
- /actuator/prometheus

prometheus.yml 설정

job_name
- dd
metrics_path
- 수집 경로
scrape_intervals
- 수집 간격
static_configs[targets]
- 수집할 서버의 IP, PORT를 지정

개념

태그, 레이블
- 메트릭 정보를 구분하기 위해서 사용하는 태그
- 숫자 마지막에 보이는 것이 바로 값이다.

http_server_requests_seconds_count{error="none", exception="none", instance="host.docker.internal:8080", job="spring-boot", method="GET", outcome="SUCCESS", status="200", uri="/actuator/prometheus"}     12

필터
- 중괄호를 사용해서 필터링이 가능하다.
- ~를 사용하면 정규식을 사용할 수 있다.
- topk(3, http...)를 사용하면 상위만 볼 수 있다.
- 벡터 범위 선택
  - (데이터를 모두 꺼낸다.)
  - http_server_requests_seconds_count[1m]
- 일반 적인 필터링 예시
  - http_server_requests_seconds_count{uri = "/actuator/prometheus"
  - http_server_requests_seconds_count{method=~"GET|POST"}
  - http_server_requests_seconds_count{uri!~"/actuator.*"}
게이지
- 임의로 오르내릴 수 있는 값
- 예시
  - CPU 사용량, 메모리 사용량
카운터
- 단순하게 증가하는 단일 누적 값(누적값)
- 단순하게 증가만 하는 그래프에서는 유의미한 데이터를 얻기가 어렵다.
  - 이를 위해서 increase()라는 메소드를 사용해서 증가량을 확인할 수 있다.
- 예시
  - HTTP 요청 수

단점

한 눈에 들어오는 대시보드를 만들기가 어렵다.
- 이를 해결하기 위한 것이 그라파나이다

그라파나

아래의 Flow 동작이 되도록 해야 한다.
- 애플리케이션 ⇒ 프로메테우스 ⇒ 그라파나

패널

대시보드가 큰 틀이라면 패널은 그 안에 모듈처럼 들어가는 컴포넌트

실무에서 자주 발생하는 에러 원인

CPU 사용량 초과
JVM 메모리 사용량 초과
커넥션 풀 고갈
에러 로그 급증

메트릭 등록

MicroRegistry

마이크로미터 기능을 제공하는 핵심 컴포넌트
스프링에서 주입 받아서 사용 가능하다

Counter

직접 등록

import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
 
// registry를 주입 받는다.
private final MeterRegistry registry;
 
Counter.builder("my.order")
	.tag("class", this.getClass().getName())
	.tag("method", "order")
	.description("order")
	.register(registry).increment();

PromptQL 예시

increase(my_order_total{method="order"}[1m])

`@Counted`사용하기

서비스에서 사용하기
- 메소드하고 클래스 명을 tag로 자동 적용해준다.

import io.micrometer.core.annotation.Counted;
 
@Counted("my.order")
@Override
public void order() {
	log.info("주문");
	stock.decrementAndGet();
}

Config
- @Counted애노테이션을 사용하려면 CountedAspect를 등록해야 한다.

 
@Configuration
public class OrderConfigV2{
	@Bean
	public CountedAspect countedAspect(MeterRegistry registry) {
		return new CountedAspect(registry);
	}
}

Timer

시간을 측정하는데 사용된다.
seconts_count
- 누적 실행 수
seconds_sum
- 실행 시간의 합
seconds_max
- 최대 실행 시간
- 1~3분 마다 최대 실행 시간이 다시 계산된다.
seconds_sum / seconds_count
- 평균 실행 시간을 구할 수 있다.

직접 등록

import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
 
// registry를 주입 받는다.
private final MeterRegistry registry;
 
Timer timer = Timer.builder("my.order")
	.tag("class", this.getClass().getName())
	.tag("method", "order")
	.description("order")
	.register(registry);
 
// timer를 통해서 시간을 확인한다.
timer.record(() -> {
	log.info("주문");
	stock.decrementAndGet();
	sleep(500 + randomInt);
});

PromptQL 예시

increase(my_order_seconds_count{method="order"}[1m])

`@Timed`

클래스에 적용할 수 있다.(메서드 단위도 가능)

@Timed("my.order")
@Slf4j
public class OrderServiceV4 implements OrderService {
 
	private AtomicInteger stock = new AtomicInteger(100);
	@Override
	public void order() {
		log.info("주문");
		stock.decrementAndGet();
		sleep(500);
	}
}

Config에 적용해야 한다.

@Bean
public TimedAspect timedAspect(MeterRegistry registry) {
	return new TimedAspect(registry);
}

Gauge(게이지)

카운터와 게이지를 구분할 때에는 값이 감소할 수 있는 가를 고민해보면 도움이 많이 된다.
외부에서 호출이 될때마다 함수가 호출이 된다.
- 이 함수의 반환값이 측정되는 게이지 값이다.

직접 Config를 등록하기

import io.micrometer.core.instrument.Gauge;
import io.micrometer.core.instrument.MeterRegistry;
 
@Configuration
public class StockConfigV1{
 
	@Bean
	public MyStockMetric myStockMetric(OrderService orderService, MeterRegistry registry) {
		return new MyStockMetric(orderService, registry);
	}
 
	@Slf4j
	static class MyStockMetric {
		private OrderService orderService;
		private MeterRegistry registry;
		
		public MyStockMetric(OrderService orderService, MeterRegistry
		registry) {
			this.orderService = orderService;
			this.registry = registry;
		}
		
		@PostConstruct
		public void init() {
			Gauge.builder("my.stock", orderService, service -> {
				log.info("stock gauge call");
				return service.getStock().get(); // 서비스의 stock 값을 반환한다
			}).register(registry);
		}
	}
}

게이지를 단순하게 등록하기

import io.micrometer.core.instrument.Gauge;
import io.micrometer.core.instrument.binder.MeterBinder;
 
@Slf4j
@Configuration
public class StockConfigV2 {
	@Bean
	public MeterBinder stockSize(OrderService orderService) {
		return registry -> Gauge.builder("my.stock", orderService, service ->
		{
			log.info("stock gauge call");
			return service.getStock().get();
		}).register(registry);
	}
}

태그는 카디널리티가 낮아서 그룹화 할 수 있는 단위에 사용해야 한다.

모니터링 3단계

대시보드
애플리케이션 추천 - 핀포인트
로그

대시보드

전체를 한 눈에 볼 수 있는 뷰를 의미한다.
시스템 메트릭
- CPU, 메모리
애플리케이션 메트릭
- 톰캣 쓰레드 풀, DB 커넥션 풀
비즈니스 메트릭
- 주문수, 취소수

애플리케이션 추적

고객의 HTTP 요청 하나하나를 추적한다.
마이크로서비스(MSA)환경에서 요청들을 추적할 수 있다.

로그

가장 자세한 추적
원하는데로 커스텀 할 수 있다.
위에서 잡히지 않는 오류에 대한 원인을 찾아볼 수 있다.
같은 HTTP 요청을 묶어서 확인할 수 있는 방법이 중요하다.(MDC 적용)

파일로 남기는 경우

일반 로그와 에러 로그는 파일로 구분해서 남겨야 한다.

정리

각각 용도가 다르므로 적절히 사용해야 한다
관찰을 할 때에는 전체 ⇒ 점점 좁게 진행해야 한다.
모니터링 툴에서 일정 수치가 넘어가면 슬랙, 문자 등을 연동하자
- 알림은 2가지로 구분하자
  - 심각
  - 경고

💻 shurona의 documentation

Explorer

핵심 원리와 활용 - 모니터링

액츄에이터

시작하기

엔드포인트 설정

모든 엔드포인트를 노출

엔드 포인트 활성화

다양한 엔드포인트

헬스 정보

Info 정보

기본으로 제공되는 정보

활성화 방법

로거

실시간으로 로깅 레벨 변경

Http 요청 응답 기록

액츄에이터와 보안

액츄에이터의 포트만 별도로 띄울 수 있다.

마이크로미터, 프로메테우스 ,그라파나

마이크로미터

매트릭 확인하기

메트릭 목록

메트릭 상세 정보

Tag 필터

로그 메트릭

톰캣 메트릭

사용자 정의 메트릭

프로메테우스와 그라파나

순서

프로메테우스

애플리케이션 설정

prometheus.yml 설정

개념

단점

그라파나

패널

실무에서 자주 발생하는 에러 원인

메트릭 등록

MicroRegistry

Counter

직접 등록

PromptQL 예시

@Counted사용하기

Timer

직접 등록

PromptQL 예시

@Timed

Gauge(게이지)

직접 Config를 등록하기

게이지를 단순하게 등록하기

모니터링 3단계

대시보드

애플리케이션 추적

로그

파일로 남기는 경우

정리

Explorer

Graph View

Table of Contents

Backlinks

`@Counted`사용하기

`@Timed`