基于PaddleServing的 PaddlePaddle+PaddleOCR服务化部署(k8s)

官方文档

https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/deploy/pdserving/README_CN.md

https://github.com/PaddlePaddle/Serving/blob/v0.8.3/doc/Latest_Packages_CN.md

PaddlePaddle是一个深度学习框架,提供了丰富的深度学习模型和工具,可以让用户快速构建、训练和部署深度学习模型。PaddlePaddle 支持多种编程语言,包括 Python、C++等,同时还提供了丰富的数据集和工具集,例如预训练模型、模型优化工具等。PaddlePaddle 可以用于各种深度学习应用,包括计算机视觉、自然语言处理、推荐系统等。

PaddleServing 则是一个用于部署深度学习模型的开源服务器,它可以将训练好的深度学习模型转换为可以被部署到生产环境中的格式,例如 HTTP API、Docker 容器等。PaddleServing 支持多种深度学习框架,包括 TensorFlow、PyTorch 等,同时还提供了丰富的功能,例如模型优化、量化、调参等。PaddleServing 可以帮助用户快速构建深度学习应用程序,并方便地进行部署和扩展。

部署

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
FROM registry.baidubce.com/paddlepaddle/paddle:2.4.2
COPY PaddleOCR-2.6.0.zip /home/
RUN unzip PaddleOCR-2.6.0.zip \
&& cd PaddleOCR-2.6.0 \
&& pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple \
&& cd deploy/pdserving \
&& wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.8.3-py3-none-any.whl && pip3 install paddle_serving_server-0.8.3-py3-none-any.whl -i https://pypi.tuna.tsinghua.edu.cn/simple \
&& wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.8.3-cp37-none-any.whl && pip3 install paddle_serving_client-0.8.3-cp37-none-any.whl -i https://pypi.tuna.tsinghua.edu.cn/simple \
&& wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.8.3-py3-none-any.whl && pip3 install paddle_serving_app-0.8.3-py3-none-any.whl -i https://pypi.tuna.tsinghua.edu.cn/simple \
&& wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar -O ch_PP-OCRv3_det_infer.tar && tar -xf ch_PP-OCRv3_det_infer.tar \
&& wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar -O ch_PP-OCRv3_rec_infer.tar && tar -xf ch_PP-OCRv3_rec_infer.tar \
&& python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv3_det_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./ppocr_det_v3_serving/ \
--serving_client ./ppocrv3_det_client/ \
&& python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv3_rec_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./ppocr_rec_v3_serving/ \
--serving_client ./ppocrv3_rec_client/
WORKDIR /home/PaddleOCR-2.6.0/deploy/pdserving
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
apiVersion: apps/v1
kind: Deployment
metadata:
name: paddlepaddle-ocr
namespace: paddlepaddle-ocr
spec:
replicas: 1
selector:
matchLabels:
app: paddlepaddle-ocr
template:
metadata:
labels:
app: paddlepaddle-ocr
spec:
containers:
- name: paddlepaddle-ocr
image: devops/paddlepaddle-ocr:v1
command:
- python3
args:
- 'web_service.py'
- '/home/PaddleOCR-2.6.0/deploy/pdserving/--config=config.yml'
ports:
- name: 9998
containerPort: 9998
protocol: TCP
- name: 18091
containerPort: 18091
protocol: TCP
resources:
limits:
memory: 5Gi
requests:
memory: 5Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
securityContext: {}
schedulerName: default-scheduler
strategy:
type: Recreate
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
---
apiVersion: v1
kind: Service
metadata:
name: paddlepaddle-ocr
namespace: paddlepaddle-ocr
spec:
ports:
- name: http
protocol: TCP
port: 9093
selector:
app: paddlepaddle-ocr
type: NodePort
sessionAffinity: None
externalTrafficPolicy: Cluster

调用

部署后java后端调用时发现接口返回的数据为字符串,并且发现无法正常解析

官方示例中用了eval函数,但是java并没有

后面就用python中转了下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import json
from flask import Flask
from flask import request
import requests

app = Flask(__name__)


@app.route("/ocr/prediction", methods=["POST"])
def hello():
img_b64 = request.json['img_b64']
url = "http://paddlepaddle-ocr:9998/ocr/prediction"
payload = json.dumps({
"key": [
"image"
],
"value": [img_b64]
})
headers = {
'Content-Type': 'application/json'
}
r = requests.request("POST", url, headers=headers, data=payload)
result = r.json()
ocr_result = result["value"][0]
ret = []
for item in eval(ocr_result):
# print(item[0])
ret.append(item[0][0])
data = {
"name": ret
}
print(ret)
response = json.dumps(data) # 将python的字典转换为json字符串
return response, 200, {"Content-Type": "application/json"}

1
2
3
4
5
6
7
8
9
10
11
12
13
[uwsgi]
#uwsgi启动时,所使用的地址和端口(这个是http协议的)
http=0.0.0.0:80
#指向程序目录
chdir=/root
#python 启动程序文件
wsgi-file=app.py
#python 程序内用以启动的application 变量名
callable=app
#处理器数
processes=4
#线程数
threads=2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
apiVersion: apps/v1
kind: Deployment
metadata:
name: paddlepaddle-ocr-api
namespace: paddlepaddle-ocr
spec:
replicas: 1
selector:
matchLabels:
app: paddlepaddle-ocr-api
template:
metadata:
creationTimestamp: null
labels:
app: paddlepaddle-ocr-api
spec:
containers:
- name: paddlepaddle-ocr-api
image: devops/paddlepaddle-ocr-api:v1
command:
- uwsgi
args:
- '--ini'
- /root/api.ini
ports:
- name: http
containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: Always

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
FROM devops/python:3.7.16
COPY app.py /root/app.py
COPY api.ini /root/api.ini
RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple \
certifi==2022.12.7 \
charset-normalizer==3.1.0 \
click==8.1.3 \
colorama==0.4.6 \
Flask==2.2.5 \
idna==3.4 \
importlib-metadata==6.6.0 \
itsdangerous==2.1.2 \
Jinja2==3.1.2 \
MarkupSafe==2.1.2 \
requests==2.28.2 \
typing-extensions==4.5.0 \
urllib3==1.26.15 \
Werkzeug==2.2.3 \
zipp==3.15.0 \
uwsgi==2.0.21